MULTISTAGE HIERARCHICAL OPTIMIZATION PROBLEMS WITH MULTI-CRITERION OBJECTIVES

. A hierarchical optimization (or bilevel programming) problem consists of a decision maker called the leader who is interested in optimizing an objective function that involves with the decisions from another decision maker called the follower whose decisions are based in part on the policies made by the leader. However, if the planning horizon expands into an extended period of time, it may be unrealistic for either players to commit to the original de- cisions so there is a desire to break the problem into stages and the leader may wish to reevaluate the follower’s response at each stage. In this article, we propose a multistage hierarchical optimization problem with the leader’s objective consisting of multiple criteria and study the optimality conditions of such problems using an extremal principle of Mordukhovich.


1.
Introduction. In a hierarchical optimization problem, two levels of decisionmaking are involved in such a way that the upper-level decision maker, called the leader, makes the decision to optimize its objectives. However the objectives and the constraints of the leader are subject to change depending on what the other decision maker, the follower, does. The follower's actions in making decisions are passive and responsive in order to optimize its own objectives based on whatever the leader's choices of decisions. If the planning horizon includes an extended period of time, it is desirable to break the time into multiple stages. There are also applications in that the leader's decisions are involved in a design process that evolves over multiple stages and the leader wishes to reevaluate the follower's reactions to improve its own decisions over a number of time periods. At any stage, the leader is assumed to know the behavior of the follower once a decision is announced. On the other hand, the follower only has the knowledge of the past decisions of the both levels as well as the current leader's policy, and thus makes decisions in response to the leader's decisions one stage after another.
The type of hierarchical optimization problems that we are interested in consists of n stages with the upper-level objectives of multicriteria. The decision variables of both decision makers, the leader and the follower, are defined over n stages. At the beginning of stage i for i ∈ {1, 2, . . . , n}, the leader, at the upper-level, announces its decision x i and the follower, at the lower-level, based on this decision from the 104 ROXIN ZHANG, BAO TRUONG AND QINGHONG ZHANG leader and the previous decisions x i−1 and y i−1 from both levels, takes a responsive action with a decision y i by optimizing its own objective function of that stage. The leader's decisions over the entire period of n stages are made as such to optimize its overall objective with multiple criteria. At the same time, the leader is assumed to be able to establish an agreement with the follower in that if two decisions are indifferent to the follower's own objective (in a multi-valued solution set scenario) at any stage, a follower's decision should be chosen in favor of the leader's benefit. In fact such an agreement is actually mutually beneficial to both of the decision makers because the leader has the advantage of foreseeing the entire time horizon and thus is able to advise the follower to avoid certain decisions at some of the stages that may not lead to any optimal solutions at a later stage.
The hierarchical multistage optimization models described above can often arise from industrial applications. In references [5] and [4], for instance, an energy absorbing process through delamination was considered and a hierarchical optimization model with the evolutionary equilibrium constraints was studied. In their model, the amount of energy absorption is maximized at the upper level and at the lower level, as the constraints, there is a sequence of parametric sub-optimization problems resulting from the time-discretization of the problem of minimizing the elastic stored energy and the dissipation potential. In their treatment of the time-discretization, each sub-problem depends on the solutions to the previous sub-problem as well as the upper level design variable, exhibiting an n-stage parametric optimization problem.
Specifically, let X i and Y i be finite dimensional spaces for i = 1, . . . , n, X := X 1 × · · · × X n and Y := Y 1 × · · · × Y n . The upper-and lower-level decision variables at stage i, for i = 1, . . . , n, are denoted by x i ∈ X i and y i ∈ Y i respectively. We are given a set of real-valued lower-level objective functions g i defined on the space and a set of lower-level constraint set-valued mappings Ω i defined from X i−1 × Y i−1 × X i to Y i for i = 1, 2, . . . , n. We denote by R and R + the sets of all real numbers and all nonnegative real numbers respectively, by N the set of all natural numbers.
Starting at the first stage the follower minimizes its first objective function g 1 (x 1 , y 1 ) subject to y 1 ∈ Ω 1 (x 1 ) for a given upper-level decision x 1 ∈ X 1 . At stage i, for i = 2, 3, . . . , n, the follower takes on a given set of decision variables x i−1 ∈ X i−1 , y i−1 ∈ Y i−1 from the previous stage and the current leader's decision x i ∈ X i , and wishes to solve the problem The reactions of the follower in optimizing its own objective at each stage is known to the leader and affect the leader's objetives. To reduce the redundancies in notations, throughout this paper, we will regard x 0 and y 0 as dummy variables and treat g 1 (x 1 , y 1 ) as a function of (x 0 , y 0 , x 1 , y 1 ), namely g 1 (x 0 , y 0 , x 1 , y 1 ) := g 1 (x 1 , y 1 ). Suppose we denote the optimal solution set of the follower from the first stage by S 1 (x 0 , y 0 , x 1 ), and the solution set from the second stage by S 2 (x 1 , y 1 , x 2 ), and so on with the solution set from the last stage by S n (x n−1 , y n−1 , x n ). Then for each given upper-level decision vector x := (x 1 , x 2 , . . . , x n ) the set of lower-level solutions for all stages R(x 1 , x 2 , . . . , x n ) ∈ Y , called the rational response mapping, is defined to be the set of vectors (ȳ 1 ,ȳ 2 , . . . ,ȳ n ) such that y 1 ∈ S 1 (x 0 , y 0 , x 1 ),ȳ 2 ∈ S 2 (x 1 ,ȳ 1 , x 2 ), . . . ,ȳ n ∈ S n (x n−1 ,ȳ n−1 , x n ).
The leader, at the upper-level, anticipates the rational responses from the lowerlevel to its decisions at each stage, and tries to solve a multiobjective optimization problem. To clearly state the upper-level problem, we will need the following definition [6,7]. Definition 1.1. Given a mapping ϕ : Z → H between two finite dimensional vector spaces, an ordering set Θ ⊂ H with 0 ∈ Θ, and a constraint Λ ⊂ Z. We say a pointz ∈ Λ is locally (ϕ,Θ)-optimal subject to the constraint z ∈ Λ if there is a neighborhood U ofz and a sequence of a k ∈ H with a k → 0 as k → ∞ such that in this case, we also say thatz is a Θ-minimizer of ϕ subject toz ∈ Λ.
In the sequel we will use the term "Θ-minimize ϕ" to represent the multiobjective problem of finding a Θ-minimizer of the mapping ϕ.
The notion of the Θ-optimality presented in the above definition is a unified generalization to a variety of vector optimality concepts in the literature. In particular, the scalar optimality of minimizing a real-valued function becomes a special case with Θ := {τ ∈ R | τ ≤ 0}. In general, by choosing different types of the ordering set Θ, the (ϕ, Θ)-optimality reverts to various specific notions of vector optimality or efficiency.
For example, for a given convex cone K ⊂ H, and the problem of minimizing ϕ(z) over z ∈ Λ withz and U as defined in Definition 1.1. We have a.) If Θ = K, the pointz is (ϕ, Θ)-optimal if and only ifz is a strong Pareto optimal solution in the sense that there is no z ∈ Λ ∩ U and z =z such that ϕ(z) − ϕ(z) ∈ K; b.) If int K = ∅ and Θ = int K ∪ {0}, the pointz is (ϕ, Θ)-optimal if and only ifz is a weak Pareto optimal solution in the sense that there is no z ∈ Λ ∩ U such that ϕ(z) − ϕ(z) ∈ int K; c.) If ri K = ∅ and Θ = ri K ∪ {0}, the pointz is (ϕ, Θ)-optimal if and only if z is an Pareto-type efficient solution in the sense that there is no z ∈ Λ ∩ U such that ϕ(z) − ϕ(z) ∈ ri K. Now, for X, Y and H as defined before, consider the leader's vector-valued objective function f : X × Y → H and a constraint set D ⊂ X × Y . The leader wishes to find a Θ-minimizer (x,ȳ) := (x 1 , . . . ,x n ,ȳ 1 , . . . ,ȳ n ) ∈ X × Y of the function f subject to the constraints that (x, y) = (x 1 , . . . , x n , y 1 , . . . , y n ) ∈ D and (y 1 , . . . , y n ) ∈ R(x) where is the follower's rational response mapping.
To proceed with our optimality conditions, we assume in the sequel that for each given (x i−1 , y i−1 , x i ) the function g i (x i−1 , y i−1 , x i , ·) is convex and the set Ω i (x i−1 , y i−1 , x i ) is also convex for i = 1, 2, . . . , n.
Under these assumptions, it is clear that the rational response mapping R(x) can be described as such that (y 1 , y 2 , . . . , y n ) ∈ R(x 1 , x 2 , . . . , x n ) if and only if where ∂ yi g i is the subdifferential of g i with respect to y i and N (y i | Ω i ) is the normal cone to the set Ω i at the point y i in the sense of convex analysis.

By using the notation that
, we can restate the leader's multiobjective optimization problem (P) as Note that if we have only one stage (namely n = 1), then the above problem reduces to a bilevel programming problem studied in [1]. The basic structure of the problem (4) is essentially an extension of the proposed problem in [11] to the multiobjective optimization at the upper-level, and here we are able to obtain much stronger results. Note that in the problem (P), if we remove the multiobjective component of the upper level problem and replace the constraints by the system where x := (x 1 , . . . , x n ) andG i andQ i are defined similarly to G i and Q i , we arrive at the application model of the delamination process treated in [5] and [4], and the system (5) is referred to as an evolutionary equilibrium constraints.

Preliminaries.
In this section, we present some of the necessary tools in modern variational analysis that are needed to develop our optimality conditions. Let Z and H be two finite dimensional vector spaces and consider a set valued mapping F : Z → → H andz ∈ Z. We denote by gphF := {(z, v) | v ∈ F (z)} the graph of the mapping F . The Painlevé-Kuratowski outer limit of F as z →z is defined by The normal (or limiting normal) cone N (z | C) is defined by Note that in finite dimensional spaces the definition of the normal (limiting) cone defined above is equivalent to the Mordukhovich normal cone defined in [8].
Given an extended real-valued function ψ : Z → R ∪ {∞} and a pointz ∈ Z with |ψ(z)| < ∞, the subdifferential of ψ at the pointz is defined by where the set epi ψ : The coderivative of the set-valued mapping F at (z,v) ∈ gphF is defined by For a single-valued mapping ϕ : If, in addition, ϕ is continuously differentiable atz, we will have where ∇ϕ(z) and ∇ϕ(z) are the Jacobian matrix of ϕ atz and its transpose respectively. When ϕ is locally Lipschitz continuous around the pointz, then . We refer to [6,7] and [9] for complete discussions on these concepts.
Definition 2.1 (extremal system of two sets). Let Ω 1 and Ω 2 be nonempty subsets of the space Z, and letz ∈ Ω 1 ∩ Ω 2 . We sayz is a locally extremal point of the set system {Ω 1 , Ω 2 } if there is a sequence {a k } ⊂ Z and a neighborhood U ofz such that a k → 0 as k → ∞ and In this case, we call {Ω 1 , Ω 2 ,z} an extremal system in Z.
The following version of the Mordukhovich extremal principle will be the fundamental vehicle in deriving our optimality condition results.
Given an ordering set Θ and a set Ξ ⊂ Z, we will present a lemma that provides the necessary conditions for multiobjective optimization problems with only the geometric constraint: where ϕ : Z → H is a vector-valued function. The result below is actually a specification of Theorem 5.59 of [6,7] for problems in infinite-dimensional Asplund spaces. Nevertheless, for completeness and the reader's convenience we present a simplified proof of this lemma, which follows the approach in [1] based on the extremal principle for sets instead of that for multifunctions.

Lemma 2.3 (necessary conditions in vector-valued optimization with geometric constraints).
Letz be a local minimizer to the constrained vector-valued optimization problem (8), where the ordering cone Θ is locally closed at the origin. Assume that ϕ is locally Lipschitz continuous atz and Ξ is locally closed aroundz. Then there Proof. We proceed with creating the extremal system of sets generated by the local (ϕ, Θ)-minimizerz to the problem (8) and then apply the extremal principle, Lemma 2.2, to the system. We first define the sets in the product space Z × H, and then show that (z,ᾱ) withᾱ = ϕ(z) is a local extremal point of the system {Λ 1 , Λ 2 }. In fact, it is obvious that (z,ᾱ) ∈ Λ 1 ∩ Λ 2 and the sets Λ 1 and Λ 2 are locally closed around this point. To justify the condition (6) for the set system (10), we find a neighborhood U ofz and a sequence {α k } ⊂ H with α k → 0 by the local (ϕ, Θ)-minimality of (z,ᾱ) to (8) such that This condition can be equivalently written as Therefore the local extremality of {Λ 1 , Λ 2 } at (z,ᾱ) is justified with the neighborhood U × H and the sequence {(0, −α k )} ∞ k=0 . By employing now the extremal principle, Lemma 2.2, to the set system (10), we can find (z * , α * ) = 0 satisfying Taking into account the definition of coderivative and the normals to Cartesian products, we have that clearly proves the necessary condition (9) provided that α * = 0. This last nontriviality is true for otherwise if α * = 0, it would imply that z * = 0 because D * ϕ(z)(0) = {0} due to the locally Lipschitz continuity of ϕ aroundz. We arrive at the contradiction that (z * , α * ) = 0, and therefore complete the proof of the lemma.
3. Optimality conditions. In this section, we first develop necessary conditions for our general multistage hierarchical optimization problem (P), and then apply the results to several problems with certain specific properties.
At the upper-level, the objective function f : X ×Y → H is a vector-valued mapping involving the decision variables x = (x 1 , . . . , x n ) ∈ X and y = (y 1 , . . . , y n ) ∈ Y of both levels, and we wish to find an (f, Θ)-optimal solution for a given ordering set Θ.
On the other hand, at stage i for i = 1, . . . , n, for a given set of vectors x i−1 , y i−1 and x i , the lower-level objective function and constraint set are g i (x i−1 , y i−1 , x i , y i ) and Ω i (x i−1 , y i−1 , x i ) respectively with the decision variable y i ∈ Y i . We consider two set-valued mappings G i and Q i : and the subset D ⊂ X × Y . Recall that our underline multiobjective hierarchical optimization problem takes the form Θ-minimize f (x, y) We wish to establish an optimality condition for the above problem through certain reformulations.
Note that all variables with 0 indices are dummy variables and are for symmetry purposes only, we may simply regard all of them, such asx 0 ,ȳ 0 , x * 0 , y * 0 and later x * 0,j , y * 0,j , for whatever j, null vectors.
It is obvious that (x, y) = (x 1 , . . . , x n , y 1 , . . . , y n ) is a feasible solution of (13) if and only if there exists v : for i = 1, . . . , n. Then the problem (13) is equivalent to a constrained multiobjective optimization problem with a finite set of geometric constraints where We can then apply Lemma 2.3 to the above problem (15), and subsequently obtain a set of necessary conditions to the multistage and multiobjective hierarchical optimization problem (13).
The next result provides a much simplified form of the necessary condition in Theorem 3.1 where we assume our problem (13) possesses the following specific set of properties: f (x 1 , . . . , x n , y 1 , . . . , y n ) := f 1 (x 1 , y 1 ) + . . . + f n (x n , y n ), In the following proposition, for a given set C and z ∈ C, δ(z | C) is the indicator function of C at z, namely δ(z | C) = 0 if z ∈ C and +∞ otherwise. Proposition 1. Let (x,ȳ,v) solve the problem (15), in which case (x,ȳ) is a (f, Θ)minimizer of the problem (13), where Θ is a locally closed ordering set containing the origin, and f is locally Lipschitz continuous at the point (x,ȳ). Assume further that the following simplified qualification condition is fulfilled: Proof. This is a straightforward special case of Theorem 3.1 by taking into account the special structures of the objective functions f and g i , and that of the constrained sets Ω i for all i ∈ {1, . . . , n}. In particular, we have a reduced form of the mapping . . , n. We also have that The conclusion follows.
We now turn to the case where the lower-level constraint sets Ω i are assumed to be polyhedral with respect to the lower level decision variable y i for all i, and obtain appropriate optimality conditions. We first state the coderivative representation of the normal cone to a polyhedral set obtained in [3].
Given a finite index set T ⊂ N, a finite dimensional vector space Z, and a set of vectors {c j ∈ Z | j ∈ T }, define For anyz ∈ ∆, we may regard the normal cone N (z | ∆) to ∆ atz as a set-valued map and write F ∆ (z) := N (z|∆). The coderivative of this mapping is denoted by D * F ∆ . We use I(z) to represent the set of indices of all active constraints of ∆ at z, namely I(z) := {j ∈ T | c j ,z = 0}. Lemma 3.2. Letz * ∈ N (z | ∆) and assume the generating vectors {c j } are linearly independent. Then for each u ∈ DomD * F ∆ (z,z * ), one has where I 0 (u) := {j ∈ I(z) | c j , u = 0}, I > (u) := {j ∈ I(z) | c j , u > 0}.
In what follows, we consider the special case of the multistage hierarchical problem where the lower-level problem (Q i ) has a smooth objective function and a polyhedral constrained set at each stage. Assume, for each i ∈ {1, · · · , n}, the objective function g i is twice continuously differentiable, and there is an index set T i and a set of linearly independent vectors {a j i | j ∈ T i } such that the lower-level constraint set Ω i takes the form: Let F i (y i ) := N(y i | Ω i ). Then in the optimality conditions (17) of Theorem 3.1, for any i ∈ {1, . . . , n} the inclusion holds if and only if x * i−1,n+i = 0, y * i−1,n+i = 0, x * i,n+i = 0 and due to the fact that N Ω i is independent of x i−1 , y i−1 and x i , and consequently By applying Lemma 3.2 to D * F i (ȳ i , −v i )(v * i ), we have that (25) holds if and only if there exist constants µ j i ∈ R and η j i ∈ R + such that where On the other hand, since {g i | i = 1, . . . , n} are twice continuously differentiable, we have that the condition appearing in the optimality conditions of Theorem 3.1 is equivalent to and where ∇ xi−1,yi−1,xi ∇ yi g i and ∇ 2 yi g i are the Jacobian matrix of the mapping ∇ yi g i with respect to (x i−1 , y i−1 , x i ) and y i respectively.
To conclude this section let us compare the approach used in this paper with the value function approach to bilevel programming. In order to study nonsmooth bilevel/multi-level programming we convert the given problem to an optimization problem with an equilibrium constraint. Since i = 1 we will drop all indexes in (13) and thus it reduces to Θ-minimize f (x, y) subject to 0 ∈ ∂ y g(x, y) + N (y; Ω(x)), (x, y) ∈ D. (31) Then the optimal solution (if exists) (x,ȳ) ensures the existence ofv ∈ ∂ y g(x,ȳ) ∩ N (ȳ | Ω(x)) . It has been shown that under very mild assumptions imposed on initial data at the triple (x,ȳ,v) the necessary conditions in this paper and the paper [1] hold. In contrast, the known value function approach requires the socalled calmness condition; see, for example, [2]. Such a condition is quite restrictive but unavoidable since the latter approach concerns the following equivalent problem Θ-minimize f (x, y) subject to g(x, y) − φ(x) ≤ 0, y ∈ Ω(x), (x, y) ∈ D, where φ : X → R∪{∞} is the value function of the lower-level optimization problem, and imposes assumptions at the optimal solution (x,ȳ). In the other words, it allows some kind of freedom in choosingv at each (x,ȳ) due to the fact that ∂ y g(x, y) is in general not a singleton, and thus the calmness condition is needed to guarantee a well-behavior ofv. Note finally that both approaches might yield the classical necessary conditions for bilevel programming with differentiation data.

Conclusions.
A multistage hierarchical optimization problem with multicriterion upper-level objective function is proposed and its distinctive structure is described in details. The optimality conditions are obtained for the problem where the lower-level solution sets are allowed to be multi-valued. The results are then applied to several special cases where the underline problem possesses some additional properties.