An Asymmetric Proximal Decomposition Method for Convex Programming with Linearly Coupling Constraints

. The problems studied are the separable variational inequalities with linearly coupling constraints. Some existing decomposition methods are very problem speciﬁc, and the computation load is quite costly. Combining the ideas of proximal point algorithm (cid:2) PPA (cid:3) and augmented Lagrangian method (cid:2) ALM (cid:3) , we propose an asymmetric proximal decomposition method (cid:2) AsPDM (cid:3) to solve a wide variety separable problems. By adding an auxiliary quadratic term to the general Lagrangian function, our method can take advantage of the separable feature. We also present an inexact version of AsPDM to reduce the computation load of each iteration. In the computation process, the inexact version only uses the function values. Moreover, the inexact criterion and the step size can be implemented in parallel. The convergence of the proposed method is proved, and numerical experiments are employed to show the advantage of AsPDM.


Introduction
The original model considered here is the convex minimization problem with linearly coupling constraints: where X i ⊂ R n i , A i are given m × n i matrixes, b i are given m-vector, and θ i : R n i → R are the ith block convex differentiable functions for each i 1, . . ., N. This special problem is called convex separable problem.Problems possessing such separable structure arise in discretetime deterministic optimal control and in the scheduling of hydroelectric power generation 1 .Note that θ i are differentiable, setting ∇θ i x i f i x i ; by the well-known minimum principle in nonlinear programming, it is easy to get an equivalent form of problem 1.1 : find x * x * 1 , . . ., x * N ∈ Ω such that where

1.3
Problems of this type are called separable variational inequalities VIs .We will utilize this equivalent formulation and provide method for solution of separable VI.One of the best-known algorithms for solving convex programming or equivalent VI is the proximal point algorithm PPA first proposed by Martinet see 2 and had been studied well by Rockafellar 3,4 .PPA and its dual version, the method of multipliers, draw on a large volume of prior work by various authors 5-9 .However, classical PPA and most of its subsequence papers cannot take advantage of the separability of the original problem, and this makes them inefficient in solving separable structure problems.One major direction of PPA's study is to develop decomposition methods for separable convex programming and VI.The motivations for decomposition techniques are splitting the problem into isolate smaller or easier subproblems and parallelizing computations on specific parallel computing device.Decomposition-type methods 10-14 for large-scale problems have been widely studied in optimization as well as in variational problems and are explicitly or implicitly derived from PPA.However, most of those methods only can solve separable problems with special equality constraints: Minimize θ x ψ y , Ax y 0, x ∈ X, y ∈ Y.

1.4
Two very well-known methods for solving equality constrained convex problems and VI are the augmented Lagrangian method 15, 16 ALM and the alternating direction method ADM 17 .The classic ALM has been deeply studied and has many advantages over the general Lagrange methods; see 18 for more detail.However, it can not preserve separability.ADM is a different method but closely related to ALM, which essentially can preserve separability for problems with two operators N 2 .Recently, separable augmented Lagrangian method SALM 19, 20 overcomes the nonseparability of ALM.For example, for solving problem 1.1 with equality constraints, Hamdi and Mahey 19 allocated a resource quantity y i to each block

1.5
It is worth mentioning that 1.5 is only equivalent to problem 1.1 with equality constraints.The expression of the augmented lagrangian function of 1.5 is: SALM finds a saddle point of problem 1.5 by the following stages: Note that the process in SALM for x k 1 allows one to solve N subproblems in parallel.This has great practical importance from the computation point of view.In fact, SALM belongs to the family of splitting algorithms and ADM for solving special convex problem 1.4 with ψ y 0 and SALM has to introduce an additive variable y to exploit the inner separable structure of the problem, which makes the problem larger.Moreover, SALM is suitable to solve equality constraints problems and fraught with difficulties in solving inequality constraints problems.
To our best knowledge, there are few dedicated methods for solving inequality constraints problems 1.1 or VI 1.2 -1.3 , except the decomposition method proposed by Tseng 21 and the PPA-based contraction method by He et al. 22 .The decomposition method in 21 decomposes the computation of x k 1 at a fine level without introducing additive variable y.But, in each iteration of this method, the minimization subproblems for x k 1 are dependent on the step size of multiplier, which greatly restricts the computation of subproblems.The PPA-based contraction method in 22 has a nice decomposable structure; however, it has to solve the subproblem exactly.To solve 1.1 or VI 1.2 -1.3 , motivated by PPA-based contraction method and SLAM, we propose an asymmetric proximal decomposition method AsPDM which can well conserve the separability feature of the problem.Besides, it does not need to introduce the resource variables y like SALM and the subproblems do not depend on the step size of multiplier.In the following, we briefly describe our method for 1.1 : we add an auxiliary quadratic term to the general Lagrangian function: L j x j , x k j , λ , 1.9 with

1.10
The general framework of AsPDM is as follows: Phase I Phase II

1.12
Here, β i > 0, μ > 0, α k > 0, and G are proper chosen which will be detailed in the later sections.Note that the first phase consists of N isolate subproblems, and each involves x i , i 1, . . ., N only, namely; it can be partitioned into N independent lower-dimension subproblems.Hence, this method can take advantage of operators' separability.Since we mainly focus on solving equivalent separable VI, hence, we present this method under VI framework and analyze its convergence in the following sections.

Structured VI
The separable VI 1.2 -1.3 consists of N partitioned sub-VIs.Introducing a Lagrange multiplier vector λ ∈ Λ Λ R m or R m associated with the linearly coupling constraint N j 1 A j x j ≥ b or N j 1 A j x j b , we equivalently formulate the separable VI 1.2 -1.3 as an enlarged VI: find w * x * , λ * ∈ W such that where 2 is referred as structured variational inequality SVI , denoted as SVI W, Q .Here,

Preliminaries
We summarize some basic properties and related definitions which will be used in the following discussions.
Definition 2.1.i The mapping f is said to be monotone if and only if ii A function f is said to be Lipschitz continuous if there is a constant L > 0 such that The projection onto a closed convex set is a basic concept in this paper.Let W ⊂ R n be any closed convex set.We use P W w to denote the projection of w onto W under the Euclidean norm; that is, The following lemmas are useful for the convergence analysis in this paper.

Advances in Operations Research
Lemma 2.2.Let W be a closed convex set in R n , then one has Proof.See 23 .
Hence, solving VI W, Q is equivalent to finding a zero point of the residue function e w, β w − P W w − βQ w , β > 0.

2.10
Generally, the term e w denotes e w, 1 is referred to as the error bound of VI W, Q , since it measures the distance of u from the solution set.

The Presentation of the Exact AsPDM
In each iteration, by our proper construction, our method solves N independent sub-VIs involving each individual variable x i only so that x i can be obtained in parallel.In what follows, to illustrate our method's practical significance, we interpret our algorithm process as a system which has a central authority and N local administrators; each administrator attempts to unilaterally solve a certain problem under the presumption that the instructions given by the authority are parametric inputs and the responses of other administrations' actions are not available; namely, the N local administrators acts synchronously and independently once they receive the information given by the central authority.We briefly describe our method which consists of two main phases.
Phase I: For arbitrary x k 1 , . . ., x k N , λ k given by the central authority, each local administrator uses his own way to offer the solution denoted as x k i of his individual problem: find x k i ∈ X i , such that

2.11
Phase II: After the N local administrators accomplish their tasks, the central authority collects the resulting x k 1 , . . ., x k N , moreover, corresponding which can be viewed as the feedback information from the N local administrators and sets: Here, μ is suitably chosen by the central authority.So the central authority aims to employ this feedback information effectively to provide x k 1 1 , . . ., x k 1 N , λ k 1 which will be beneficial for the next iteration loop.In this paper, our proposed methods will update the new iterate by the following two forms: or where α k > 0 is a specific step size and We make the standard assumptions to guarantee that the problem under consideration is solvable and the proposed methods are well defined.
Assumption A. f i x i is monotone and Lipschitz continuous, i 1, . . ., N.
By this assumption, it is easy to get that Q w is monotone.

Presentation of the Inexact AsPDM
In this section, the inexact version of the AsPDM method is present, and some remarks are briefly made.
For later analysis convenience, we denote

Advances in Operations Research
At each iteration, we solve N sub-VIs see 2.11 independently.No doubt, the computation load for an exact solution of 2.11 is usually expensive.Hence, it is desirable for us to consider solving 2.11 inexactly under a relative relaxed inexact criterion.We now describe and analyze our inexact method.Each iteration consists of two main phases, one of which provides an inexact solution of 2.11 and the other of which employs the inexact solution to offer a new iterate for the next iteration.
The first phase of our method works as follows.At the beginning of kth iteration, an iterate is the exact solution of ith sub-VI; there is nothing we need to do with the ith sub-VI.Otherwise, we should find Here, the obtained β i should satisfy following two inexact criteria: Once one of the above criteria fails to be satisfied, we will increase β i by β i β i * 1.8 and turn back to solve the ith sub-VI of 2.17 with this updated β i .It should be noted that both inexact criteria are quite easy to check since they do not contain any unknown variables.In addition, another favorable characterization of these criteria is that they are independent; namely, they only involve x k i , x k i , irrelevant to x k j , x k j j / i .In what follows, let us describe the second phase.We require where μ ∈ 0, N j 1 A j A T j /2β j η here, η > 0 is suitably chosen to satisfy

2.20
Now we use this w k x k , λ k or Q w k to construct the new iteration.Here, we provide two simple forms for the new iteration: where In fact, each iteration of the proposed method consists of two main phases.Using the point of view that the problem is a system with a central authority and N administrators, the first phase is accomplished by N administrators based on the instruction given by the authority.
That is, ith sub-VI only involves ith administrator's activities.On the other hand, the second phase is implemented by the central authority to give new instruction for the next iteration.
Remark 2.4.In the inexact AsPDM, the main task of Phase I is to find a solution for 2.17 .From 2.17 , it is easy to get that It seems that equality 2.22 is an implicit form since both sides of 2.22 contain x k i .In fact, we can transform equality 2.22 to an explicit form.Using the property of the projection, we have

2.23
Consequently, using the above formula, we can compute x k i quite easily.

2.24
If ξ k 0, it yields an exact version.In this special case, it is clear that We find that this formula is quite similar to the iterates produced by the classic PPA 3 , which employs as the new iterate; here, S is a positive symmetry definite matrix.For deeper insight, our method does not appear fit into any of the known PPA frameworks.It is virtually not equivalent to PPA even if G is positive definite.The reason why our method can not be viewed as PPA lies in the fact that G is asymmetry, moreover, may be not positive definite.This lack of symmetry makes it fail to introduce an inner product as S. Consequently, if one sets w k as the new iterate, one may fail to obtain the convergence.Due to the asymmetric feature of G, we call our method asymmetric proximal decomposition method.
Remark 2.6.Recalling that λ k is obtained by 2.19 , it is easy to get that

2.27
Combining 2.17 and 2.27 , we have

2.28
Since w k x k 1 , . . ., x k N , λ k ∈ W is generated by 2.17 -2.20 from a given x k 1 , . . ., x k N , λ k , we have that w k − w k 0 implies ξ k 0 and G w k − w k ξ k 0. According to 2.28 , we have

2.29
In other words, w k is a solution of Problem 2.1 -2.2 if x k i x k i i 1, . . ., N and λ k λ k .Hence, we use w k − w k ≤ ε as stopping criterion in the proposed method.
Remark 2.7.The update form 2.21 * a is based on the fact that G w k − w k ξ k is a descent direction of the unknown distance function 1/2 w − w * 2 at point w k .This property will be proved in Section 3.1.α * k in 2.21 is the "optimal" step length, which will be detailed in Section 3.2.We can also use 2.21 * b to update the new iterate.For fast convergence, the practical step length should be multiplied by a relaxed factor γ ∈ 1, 2 .Remark 2.8.Note that w k − w k 0 if and only if w k − w k D 0. In the case w k − w k D / 0, by choosing a suitable μ ∈ 0, N j 1 A j A T j /2β j η , 2.20 will be satisfied.We state this fact in the following lemma.Lemma 2.9.Let G and D be defined in 2.15 and 2.16 , respectively.If μ N j 1 A j A T j /2β j η, for all w x 1 , . . ., x N , λ ∈ R N j 1 n j m , one has

2.30
Proof.According to Definition 2.16 , we have and the assertion is obtained.
Set w w k − w k in the above lemma, we get

2.32
If one chooses μ N j 1 A j A T j /2β j η, then the Condition 2.20 is always satisfied; hence, N j 1 A j A T j /2β j η can be regarded as a safe upper bound for this condition.Note that we use an inequality in the proof of Lemma 2.9; it seems that there exists some relaxations.As a result, rather than fix μ N j 1 A j A T j /2β j η, let μ be a smaller value, and check if Condition 2.20 is satisfied.If not, increase μ by μ min{ N j 1 A j A T j /2β j η, 4μ} and try again.This process enables us to reach a suitable μ ∈ 0, Note that, in our proposed method, problems VI 2.17 produce x k i in a parallel wise.In addition, instead of taking the solution of the subproblems, the new iterate in the proposed methods is updated by a simple manipulation, for example, 2.18 * a -2.18 * b .

Convergence of AsPDM
In the proposed methods, the first phase accomplished by the local administrators offers a descent direction of the unknown distance function, and the second phase accomplished by the central authority determines the "optimal" step length along this direction.This section gives more theory analysis.

The Descent Direction in the Proposed AsPDM
For any w * ∈ W * , w k − w * is the gradient of the unknown distance function

3.1
It guarantees that G w k − w k ξ k is a descent direction of 1/2 w − w * 2 at point w k / ∈ W * .The above inequality plays an important role in the convergence analysis.

3.2
Proof.Since w * ∈ W, substituting w w * in 2.28 , we obtain Using the monotonicity of Q w and applying with w w k in 2.1 , it is easy to get Combining 3.3 and 3.4 , we then find Note that Criterion 2.18 * a holds; we have

3.6
The last inequality follows directly from the result of Lemma 2.9.Consequently, Advances in Operations Research 13 Using the preceding inequality and 2.32 in 3.5 yields completing the proof.Now, we state the main properties of Q w k in the lemma below.

Lemma 3.2. Let w k
x k 1 , . . ., x k N , λ k be generated by 2.17 -2.20 from given w k x k 1 , . . ., x k N , λ k .Then for any w ∈ W one has Proof.Recalling that By some manipulations, our assertion holds immediately.

The Step Size and the New Iterate
Since G w k − w k ξ k is a descent direction of 1/2 w − w * 2 at point w k , the new iterate will be determined along this direction by choosing a suitable step length.In order to explain why we have the "optimal" step α * k as defined in 2.21 , we let be the step-size-dependent new iterate, and let be the profit function of the kth iteration.Because Θ k,i α includes the unknown vector w * , it can not be maximized directly.The following lemma offers us a lower bound of Θ k,i α which is a quadratic function of α.

Lemma 3.3. Let w k
x k 1 , . . ., x k N , λ k be generated by 2.17 -2.20 from a given w k x k 1 , . . ., x k N , λ k .Then one has where Proof.It follows from Definition 3.12 and inequality 3.5 that ≥ q k α .

3.15
Let us deal with Θ k,2 α which seems more complicated:

3.16
Since q k α is a quadratic function of α, it reaches its maximum at 3.17 this is just the same defined in 2.21 .In practical computation, taking a relaxed factor γ is wise for fast convergence.Note that for any α k γα * k , it follows from 3.13 , 3.14 , and 2.21 that

3.18
In order to guarantee that the right-hand side of 3.18 is positive, we take γ ∈ 0, 2 .
In fact, α * k is bounded below by a positive amount which is the subject of the following lemma.
Lemma 3.4.Assume that w k x k 1 , . . ., x k N , λ k is generated by 2.17 -2.20 from a given w k x k 1 , . . ., x k N , λ k , then one has

3.19
Proof.Using the fact that square matrix D is positive symmetric definite, we have Moreover, Note that Criterion 2.18 * b implies

3.21
Hence, applying w k − w k 2 D D 1/2 w k − w k 2 in the above inequality, we get Combining 3.20 and 3.22 , we have Consequently, applying the above inequality, 3.7 , and 2.32 to

3.24
and thus the assertion is proved.Now, we are in the stage to prove the main convergence theorem of this paper.
Theorem 3.5.For any w * x * , λ * ∈ W * , the sequence {w k x k , λ k } generated by the proposed method satisfies

3.25
Thus we have and iteration of the proposed method will terminate in finite loops.
Proof.First, it follows from 3.12 and 3.18 that

3.28
Substituting 3.28 in 3.27 , Assertion 3.25 is proved.Therefore, we have and Assertion 3.26 follows immediately.Since we use as the stopping criterium, it follows from 3.26 that the iteration will terminate in finite loops for any given > 0.

Numerical Examples
This section describes experiments testifying to the good performance of proposed method.The algorithms were written in Matlab version 7.0 and complied on a notebook with CPU of Intel Core 2 Duo 2.01 GHz and RAM of 0.98 GB .
To evaluate the behavior of the proposed method, we construct examples about convex separable quadratic programming CSQP with linearly coupling constraints.The convex separable quadratic programming was generated as follows: where In this way, H i is positive definite and has prescribed eigenvalues between 0, 2 .If x * is the solution of Problem 4.1 , according to the KKT principle, there is a 0 ≤ y * ∈ R m such that

4.4
Let ξ i ∈ R n i and z ∈ R m be random vectors whose elements are between −1, 1 .We set where τ i , i 1, 2, 3, 4, are positive parameters.By setting we constructed a test problem of 4.1 which has the known solution point x * and the optimal Lagrangian multipliers y * .We tested such problems with τ 1 0.5, τ 2 10, τ 3 0.5, τ 4 10.Here, two example sets were considered.The problems in the first set have 3 separable operators N 3 , and the second have 2 N 2 .
In the first experiment, we employ AsPDM with update w k 1 2 to solve CSQP with 3 separable operators.The reason why we choose w k 1 2 here is that it usually performs better than w k 1 1 .The stopping criterion was chosen as e w ∞ < 10 −3 ; the parameters were set as ν 0.2, η 0.5, and γ 1.8.Table 1 reports the number iterations denoted as Its.,  1, the solutions are obtained in a moderate number of iterations; thus the proposed method is effectively applicable.In addition, the evaluations of f i per iteration are approximately equal to 2. AsPDM is well suited to solve separable problems.
Next, we compared the computational efficiency of AsPDM against the method in 7 denoted as PCM , regarded as a highly efficient PPA-based method that can be well suited to solve VI.Iterations were terminated when the criterion e w ∞ < 10 −5 was met.Table 2 reports the iterations, the total number of function evaluation for both methods.We observe that both methods are acceptable to for us to find a solution.Concerning computational efficiency, we can observe that AsPDM is comparable and clearly faster than PCM; moreover, function evaluations are also less, except in the case of m 500, n 1 500, n 2 500.In some cases, AsPDM can reduce about 20% computation cost than PCM.For m 100, n 1 100, n 2 100, we plot the error versus iteration number for both AsPDM and PCM in Figure 1.We have found that both methods converge quickly for the first hundred iterations but slow down as the exact solution is reached.The speed of AsPDM is better than PCM.
In addition to being fast, AsPDM can solve the problem separately; that is the most significant advantage over other methods.Hence, AsPDM is more suitable to solve the reallife separable problems.

Conclusions
We have proposed AsPDM for solving separable problems.It decomposes the original problem to independent low-dimension subproblems and solves those subproblems in parallel.Only the function values is required in the process, and the total computational cost is very small.AsPDM is easy to implement and does not appear to require applicationspecific tuning.The numerical results also evidenced the efficiency of our method.Thus, the new method is applicable and recommended in practice.

Figure 1 :
Figure 1: Error versus iteration number for the method and for with m 100, n 1 100, n 2 100.
1/2 w − w * 2 at point w k / ∈ W * .A direction d is called a descent direction of 1/2 w − w * 2 at point w k if and only if the inner product w k − w * , d < 0. Let w * ∈ W * , and c i ∈ R n i .We construct matrices A i and H i in the test examples as follows.The elements of A i are randomly given in −5, 5 , and the matrices H i are defined by setting

Table 2 :
Its. and function eval.for different problem sizes.
the total number of function evaluations denoted as Num f for different problem-sizes.Here, Num f 3 i 1 Num f i .Observed form Table