Modiﬁed Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming

: The accelerated prox-level (APL) and uniform smoothing level (USL) methods recently proposed by Lan (Math Program, 149: 1–45, 2015) can achieve uniformly optimal complexity when solving black-box convex programming (CP) and structure non-smooth CP problems. In this paper, we propose two modiﬁed accelerated bundle-level type methods, namely, the modiﬁed APL (MAPL) and modiﬁed USL (MUSL) methods. Compared with the original APL and USL methods, the MAPL and MUSL methods reduce the number of subproblems by one in each iteration, thereby improving the efﬁciency of the algorithms. Conclusions of optimal iteration complexity of the proposed algorithms are established. Furthermore, the modiﬁed methods are applied to the two-stage stochastic programming, and numerical experiments are implemented to illustrate the advantages of our methods in terms of efﬁciency and accuracy.


Introduction
In the fields of production planning, finance risk, telecommunication, and electricity, decision makers need to take into consideration uncertainty about the information and the model itself. Lack of data, calculation errors as well as unpredictability, etc. lead to the uncertainty in information. The uncertainty of model is derived from the structure of problem, the features of constraints, as well as the risks and profiles of decisions. Stochastic programming (SP) is an effective tool for dealing with optimization problems under uncertainty. The expectation model in stochastic programming is widely used, which maximizes the expectation of the benefit or minimizes the loss under the expected constraints. In this paper, we are concerned with the two-stage stochastic programming with recourse. Next, through a practical problem, network planning with random demand, we introduce the mathematical model of two-stage stochastic programming.
Due to demand for higher bandwidth and dedicated lines, network capacity is becoming a scarce resource. Consider a situation where a network provider plans bandwidth allocation between network links, under total network capacity b that is available for allocation. In addition, there are n different links needed to be expanded in capacity. The extra capacity allocated to link j is x j , where j = 1, 2, ..., n and vector x ∈ R n consists of elements x j . In network planning, demands refer to the number of connections requested between point-to-point pairs provided by the network at a certain time. Here, the demand D ∈ R m related to the m point-to-point pairs is modeled as a random variable.
Suppose that the extra capacities x j , j = 1, 2, ..., n are given and the demand D is observed. Then the capacity planning model introduced in [1] is to minimize the total number of unoffered requests.
Let i = 1, ..., m be the indices of point-to-point pairs, and the paths that can be offered to connections related to point-to-point pair i be represented in P(i). c ∈ R n is a vector whose elements c j denote the capacity of current link j. f ip and s i denote the number of connections and unoffered requests related to pair i, respectively. For an observation d = (d 1 , ..., d m ) of the random variable D, one can obtain the optimal decision by solving the linear programming problem as follows .., m, and p ∈ P(i). (1) Here, 0-1 vectorA ip is introduced as an incidence whose j-th component, a ipj , is 1 if link j lies in path p and 0 otherwise. Let Q(x, d) denote the optimal value of the above linear programming (1) which depends on the observation d of D and capacity x. It is obvious that Q(x, D) is also a random variable due to the randomness of D. Thus, the capacities x j , j = 1, 2, ..., n can be obtained from the following programming problem min E[Q(x, D)] s.t. ∑ n j=1 x j ≤ b, x ≥ 0. (2) Here E[·] refers to the expectation of the probability distribution of random variable D. In a word, the purpose of programming problem (2) is to minimize the expectation of unoffered requests, while satisfying the constraint within the total network capacity b. The above optimization model (1) and (2) is an instance of two-stage stochastic programming with recourse, in which (1) is the second-stage problem and (2) is the first-stage problem.
From the above practical instance we can see that the stochastic programming plays a key role in network planning with random demand. In addition to stochastic programming, the unit commitment problems in the field of energy systems usually involve mixed-integer quadratic programming. Recently, there have been some studies on solving such problems [2][3][4]. For solving the programming problems (1) to (2), Sen et al. [1] applied the stochastic decomposition method, which is one of the most efficient methods for stochastic programming. There are still many applications of stochastic programming in other fields, such as insurance and finance, valuation of electricity, telecommunications, hydrothermal power production planning and pollution control, etc. Stochastic models of these application problems can be found in [5]. In what follows, as an extension of the network planning problem, the standard mathematical model of two-stage stochastic programming with recourse is given, which is more convenient for generalization and theoretical analysis.
The two-stage stochastic programming with fixed recourse is in the following form where x is the decision variable, c is the the cost of production and Q(x, D) denotes the optimal objective value of the second-stage problem Here, D is the demand vector with the elements q T , h T , and T. Let W be the fixed recourse and D = (D 1 , ..., D n ) be a random variable with the support of probability distribution Ξ ⊂ R d . E[Q(x, D)] in the first-stage problem (3) is the expectation with respect to D.
The mathematical model of the two-stage stochastic programming is given and next we will summarize the methods mainly used to solve it. Since the two and multi-stage stochastic programming problems have very large dimension and special structures, they can be solved by means of decomposition. In two-stage models (3) and (4), suppose that the probability space Θ is finite, ω s , s = 1, ..., S and that p s , s = 1, ..., S represent all basic events and their probabilities. Then the first-stage problem can be rewritten in the form Here, Q s (x) is the optimal value of the second-stage problem where, The main idea of the basic dual decomposition methods is to establish certain approximations to the first-stage problem by means of solving subproblems in structure (6). As an original decomposition-type method, the cutting-plane method establishes a linear model of ∑ S s=1 p s Q s (x), and its main scheme can be found in [6,7]. Simplicity and cheapness in calculation are advantages of the cutting-plane model. With the increasing of cuts, however, the demand for storage growths which is a typical difficulty of such method. On the other hand, although there is a good initial iteration, such method may generate a longer stepsize. In order to tackle these difficulties, Ruszczynski [8] proposed the regularized decomposition method which is considered as an improvement of the basic one. Another way to avoid generating longer steps is the trust region method which was extended to two-stage stochastic programming by Linderoth and Wright [9]. In addition, bundle methods, which can be viewed as stabilized variants of the original cutting-plane methods, were developed in [10][11][12]. Algorithmic modifications based on bundle methods were introduced in [13,14]. The bundle level (BL) method was first proposed by Lemaréchal et al. [15] as another kind of bundle method. On this basis, in [16][17][18] the "restricted-memory" version of BL method is developed which performs well in numerical experiments. In recent years, there has been tremendous development in "asynchronous" and "partial" versions of BL methods, see references [19][20][21][22][23]. Considering that the research on BL methods is only focused on general non-smooth convex optimization (CP) problems, Lan [24] proposed an accelerated BL-type method, namely the accelerated bundle level (ABL) method and its restricted memory version, the accelerated prox-level (APL) method. Benefiting from the multi-step strategy introduced by Nesterov [25] and later applied in [26][27][28][29][30], both ABL and APL methods are uniformly optimal for solving non-smooth, weakly smooth and smooth CP problems. In addition, by incorporating Nesterov's smoothing technique [16,17] into APL method, Lan [24] presented the uniform smoothing level (USL) method for solving structure non-smooth CP problems with optimal iteration complexity. In particular, the USL method does not require any input of problem parameters. Moreover, Lan [24] illustrated that the APL and USL methods, when applied to solving semidefine programming and two-stage stochastic programming, have obvious advantages in computation time and accuracy over the related gradient-type algorithms and some existing methods.
Our main work in this paper includes several aspects. First of all, on the basis of Lan's work, we make further improvement and present the modified accelerated prox-level (MAPL) method. By selecting the proper proximal function, the MAPL method only need to solve one subproblem to update the prox-center and the lower bound synchronously, which improves the computational efficiency. In addition, MAPL method can achieve uniformly optimal complexity for solving smooth, weakly smooth, and non-smooth CP problems uniformly. Furthermore, we extend the MAPL method to solve the structure non-smooth CP problems and present the modified uniform smoothing level (MUSL) method. Finally, we apply the proposed methods to solve the two-stage stochastic programming problems with recourse. The numerical results show that the MAPL and MUSL methods have certain advantages in iterations and computation time.
The present paper is built up as follows. In Section 2, some related works about BL-type methods are reviewed. The MAPL method and its complexity analysis are presented in Section 3. The MUSL method and its complexity analysis are presented in Section 4. The application of MAPL and MUSL methods to two-stage stochastic programming as well as numerical experiments are shown in Section 5. Conclusions are presented in the final section.

Related Work
This section reviews some related work on BL methods. Specifically, the main ideas of BL methods and related notation are reviewed in Section 2.1. The APL method and its gap reduction procedure G MAPL introduced in Section 2.2 are the basis of the main work in this paper.

The Bundle Level (BL) Method
Consider the general CP problem where the constraint set X ⊆ R n is convex and compact, and the objective function f : X → R is closed and convex over X. The function f is known only through a first-order oracle returning the function value f (x) and a subgradient f ( Given a sequence of iteration points, the first-order information { f (x k )} and { f (x k )} are provided by oracle. The cutting-plane approximation of f is generated by In addition, x, y := ∑ i x i y i , x, y ∈ R n refers to the inner product. The level set X k off k with level parameter k is defined by X k := {x ∈ X :f k (x) ≤ k }. The BL method [15] generates the next iteration point x k+1 by The main steps of the BL method are listed below.

The Accelerated Prox-Level (APL) Method and Its Gap Reduction Procedure G MAPL
In this subsection we consider the CP problem (7), where f satisfies the following inequality for some M > 0, ρ ∈ [0, 1] and f (x) ∈ ∂ f (x). It is easy to show that non-smooth (ρ = 0), smooth (ρ = 1) and weakly smooth (ρ ∈ (0, 1)) CP problems are contained in this family of problems. Lan [24] generalized the BL method to accelerated bundle level (ABL) and accelerated prox-level (APL) methods, such that they achieve uniformly optimal complexity bounds for functions satisfying (9). Here, we mainly introduce the APL method and its principle.
Firstly, Lan [24] introduced three related iteration sequences, {x l k }, {x u k } and {x k } to establish the cutting-plane approximationf k (x), generate the upper bounds f k , and update prox-center x k , respectively. Specifically, {x l k } and {x u k } are updated by x l k = (1 − α k )x u k + α k x k−1 and x u k = (1 − α k )x u k−1 + α k x k . Secondly, Lan introduced an internal procedure G APL to reduce the gap between the upper and lower bounds of f * . The algorithmic framework of the APL method is as follows (Algorithm 1).
Next, we focus on describing the gap reduction procedure G APL . Denote the level set X f ( ) := {x ∈ X : f (x) ≤ }. For a given iterate z, denotē Then, it can be verified that This implies that min{ ,f } a lower bound of f * . However, the problem (10) is difficult to solve in general. To overcome the difficulty and obtain the lower bound in a convenient way, Lan used a compact and convex set X to replace X f ( ) in problem (10). Thus, one can solve the relaxation of (10) Here, the set X is called the localizer of the level set X f ( ) satisfying X f ( ) ⊆ X ⊆ X. Then we obtain a lower bound on f * as follow Indeed, as shown in [24,Lemma 4], if X = ∅, then f ≤ min{h(z, x) : x ∈ X f ( )} ≤ f (x), for all x ∈ X. If X = ∅, we have that f = +∞ and X f ( ) = ∅, and therefore f (x) ≥ for all x ∈ X. Thus, (11) holds.
Moreover, in order to make better use of the structure of the feasible set X, similar to the NERML algorithm in [16,17], Lan introduced the prox-function to replace the Euclidean distance function · 2 . Here, a function ω : X → R is served as a prox-function of a convex compact set X ⊆ R n with coefficient σ ω , if it is a function with differentiability, strong convexity as well as coefficient Furthermore, one can redefine the diameter of X with d ω by This lead to the following relation Furthermore, let be the prox-function on X and x 0 := arg min x∈X d ω (x) be the prox-center of d ω (x). It follows that and d ω (x) is a strong convex function with coefficient σ ω . The internal gap reduction procedure G APL of APL method is as follows.
The APL gap reduction procedure: In addition, choose x 0 ∈ X and the initial localizer X 0 = X. The prox-function d ω (x) is defined in (15). Also let k = 1.
Step 1: (Update lower bound) Let , then stop the procedure and output p + = x u k−1 and lb + = f k .

The Modified Accelerated Prox-Level (MAPL) Method
In this section, we proposed the modified accelerated prox-level (MAPL) method which requires only one subproblem to be solved per iteration while achieving the uniformly optimal iteration complexity for solving the black-box CP problems. We first present the modified gap reduction procedure G MAPL , which generates a new search point p + and a new lower bound lb + of f * such that from the given input p and lb and some q ∈ (0, 1). Here, the value of q depends on the parameters λ, θ ∈ (0, 1).
The MAPL gap reduction procedure: In addition, choose x 0 ∈ X and the initial localizer X 0 = X. The prox-function d ω (x) is defined in (15). Let k = 1.
Step 1: (Update level set) Set Step 2: (Update prox-center and lower bound) Set If X k = ∅, then stop the procedure and output p + = x u k−1 , lb + = .

Step 3: (Update upper bound)
and then stop the procedure (there is a significant improvement on the upper bound) and output p + = x u k , lb + = lb.
Step 4: (Update localizer) Choose an arbitrary X k such that X k ⊆ X k ⊆ X k , where Step 5: (Loop) Set k = k + 1 and return to Step 1.
The following are a few remarks on the MAPL gap reduction procedure. Firstly, the upper bound Step 0 are obtained from the outer iteration of the MAPL method (described below), and f 0 and f 0 are fixed throughout the entire progress of G MAPL . Furthermore, the level parameter is also fixed as a convex combination of f 0 and f 0 . But in the original BL methods, the level parameter changes in each iteration. Secondly, Step 2 and Step 3 have two exits. When the procedure stops at Step 2, is the lower bound on f * ; when the procedure stops at Step 3, there is a significant progress on f k , which depends on the parameter θ. Compared with the gap reduction procedure G APL of APL method, the G MAPL updates the lower bound in a easier way. Indeed, in Step 1 of G APL see [24], in order to determine whether the lower bound lb needs to be updated, a linear programming subproblem is first solved. However, in the G MAPL procedure, the prox-center x k and the lower bound lb are merged into one subproblem to be updated by selecting appropriate prox-function d ω . While solving the subproblem (20), it can be automatically checked that whether X k = ∅. If X k = ∅, the lower bound lb is directly updated to , which avoids solving the linear programming as G APL and reduce the number of calculation. Thirdly, in Step 4, X k can be selected arbitrarily under condition X k ⊆ X k ⊆ X k . For convenience, one can directly choose X k = X k or X k = X k . However, the number of constraints in X k increases with k, and X k has only one more constraint than the feasible set X. In practice, we can choose X k between these two sets, which can control the number of constraints in (19) and reduce the computation cost. Moreover, a proper selection of the stepsize sequences {α k } is critical for G MAPL to terminate after finite iterations and achieve optimal iteration complexity. Lan proposed a generalized selection rule for the sequence {α k } in [24]. Chen et al. [31] proposed a more concise selection rule, i.e., the sequence {α k } satisfies the following conditions: for any γ ∈ [0, 1] and some c > 0. Two examples for {α k } are as follows [31]: The following lemma shows the important properties of procedure G MAPL . These properties are similar to those of [24,31], whose proof can be found in Appendix A. Lemma 1. The following properties hold for the procedure G MAPL .
a. X k k≥0 is a collection of localizers for the level set X f ( ) ; b. f 0 ≥ f 1 ≥ · · · ≥ f k ≥ f * holds for any k ≥ 1; c. X f ( ) = ∅, and hence X f ( ) ⊆ X k ⊆ X k ⊆ X k for any k ≥ 1; d. When X k = ∅ holds, the problem (20) has a unique solution. In addition, if the procedure G MAPL stops at Step 2, we have ≤ f * . e. When the procedure G MAPL stops, the relation f (p Referring to the theoretical analysis mode of [24], the following proposition will show that the gap between the upper bound f (x u k ) and the level parameter will decrease with k, and it is proven that when the algorithm stops, the total number of iterations does not exceed an upper bound. (24), take γ = ρ, let be the level parameter and d ω (x) denotes the prox-function, then the number of iterations of the procedure G MAPL does not exceed where ∆ 0 = f 0 − f 0 and Ω ω,X is defined in (14).
Proof. Assume that the gap reduction procedure G MAPL does not stop at the K-th iteration. From (16) we have for It follows from (19) and (20) that x k+1 ∈ X k+1 ⊆ X k , for all 1 ≤ k ≤ K. Moreover, due to the strong convexity of d ω (x) and the optimality condition of subproblem (20), we have It turns out that Summing up the above inequalities over k, we obtain that According to (22), (9) and (18), we have From (18), (21), the convexity of f , (19) and (22), we have By the definition ofx u k (21) and x l k (17), it is easy to show that It then follows that Subtracting both sides of the above inequality by and dividing both sides by α 1+ρ k , from (24), we obtain that As a result, 1 Summing up the above inequality over m and from the fact that α k > 0, ∀k ≥ 1, we have Applying the Hölder inequality to the above inequality and using (27), we have Applying (24) and (14) to (28), as well as the above inequality, we have From the fact that G MAPL does not stop at Step 3, we have This together with (29) shows that We finally conclude that After giving the relevant properties and complexity analysis of the procedure G MAPL , we are now in a position to introduce the so-called modified accelerated prox-level (MAPL) method, which repeatedly calls the procedure G MAPL during iteration progress until it finds an approximate solution with the given accuracy. The algorithmic framework of MAPL method is as follows (Algorithm 2).
Since the procedure G MAPL is called during the progress of the MAPL method, we consider that an iteration of G MAPL is also an iteration of the MAPL method. Take this fact into consideration, the following theorem establishes the convergence and iteration complexity of the MAPL method. The principle of the proof comes from reference [24]. (2) The total number of iterations performed by the MAPL method does not exceed Proof. (1) Without loss of generality, we suppose that ∆ 1 > ε, where ∆ s = ub s − lb s , s ≥ 1. According to Step 1 of the MAPL method, (9) and (14), we have From Lemma 1 and the fact that ub s = f (p s ), we have Furthermore, suppose that the MAPL method finds an ε-solution after calling S(ε) times G MAPL , i.e., the MAPL method stops at s := S(ε) + 1. Then we have and (2) Suppose that the procedure G MAPL has been called S(ε) times in MAPL method. Then by (30) and (31) we have ∆ s > εq s−S(ε) , s = 1, 2, ..., S(ε).
Due to ∆ S(ε) > ε, we know that . This together with Lemma 1 and Proposition 1 shows that the total number of iterations performed by the MAPL method does not exceed Here, we denote the number of the internal iterations performed by the s-th procedure G MAPL with N s .
We present a few remarks on the iteration complexity of MAPL method.

Remark 1.
According to the classic complexity theory [32] for CP problem (7), the number of iterations to find an ε-solution i.e., an approximate solution x ∈ X such that f (x) − f * ≤ ε, does not exceed O(1/ε 2 ), if f is a general non-smooth Lipschitz continuous convex function. For smooth convex optimization, the optimal iteration complexity bound is O(1/ε 1 2 ). Furthermore, in case that f is weakly smooth and its gradient is Hölder continuous, the optimal iteration complexity is bounded by O 1/ε 2 1+3ρ for some ρ ∈ (0, 1). It follows from Theorem 1 that the iteration complexity bound of the MAPL method is In other words, the MAPL method can achieve uniformly optimal iteration complexity bounds for solving non-smooth (ρ = 0), smooth (ρ = 1) and weakly smooth (ρ ∈ (0, 1)) CP problems.

The Modified Uniform Smoothing Level (MUSL) Method
In this section we consider the objective function f in (7) with the form of wheref : X → R is Lipschitz continuous and simple. In addition, F(x) has a special structure that Here, the compact convex set Y ⊂ R n is nonempty, A : R n → R m is a linear operator and g : Y → R is convex and continuous on Y.
Generally, the function F is convex and non-smooth. In this case, F can be approximated by constructing a series of smooth convex functions [16,17]. Let v(y) denote the prox-function of compact convex set Y with coefficient σ v . c v := arg min y∈Y v(y) refers to the prox-center of v(y). Let The functions F(x) and f (x) are approximated by F η (x) and f η (x) (with some smoothing parameter η > 0), respectively, i.e., As described in [17], from the first-order optimality conditions, the convexity ofĝ(y) and the strong convexity of V(y), we know that the gradient of F η (x) is Lipschitz continuous with Lipschitz constant Furthermore, we have and for any x ∈ X. Inspired by the smoothing technique of [16,17], Lan [24] proposed the uniform smoothing level (USL) method for solving structure non-smooth CP problem (7) with f being defined by (32).
The advantage of the USL method is that the parameter η and estimation of D v,Y can be automatically adjusted and obtained during the gap reduction procedure G USL , which makes the USL method free of parameters. However, similar to the APL method, each iteration of the USL method also involves two subproblems. Based on the USL method, combining with the analysis of the MAPL method in the previous section, we propose the modified uniform smoothing level (MUSL) method. Same as the USL method, the MUSL method can achieve the optimal iteration complexity when solving problem (7) with (32), but only one subproblem is required to solve in each iteration.
We next describe the gap reduction procedure G MUSL of MUSL method. As the internal procedure of MUSL method, G MUSL is called to compress the gap between upper and lower bounds of f * .
Step 1: (Update level set) Let Step 2: (Update prox-center and lower bound) Let If X k = ∅, then stop the procedure and output p + = x u k−1 , lb + = ,D + =D.

Step 3: (Update upper bound) Letx
and f k := f (x u k ). Check the following two possible stopping rules: Step 4: (Update localizer) Choose an arbitrary X k such that X k ⊆ X k ⊆ X k , where Step 5: (Loop) Set k = k + 1 and return to Step 1.
The following are a few remarks to the procedure G MUSL including the differences from G MAPL and some properties. Firstly, compared with G MAPL , the procedure G MUSL needs to input one more parameterD which is used to calculate the smoothing parameter η, so that the approximate function f η of f can be defined. Secondly, these two procedures approximate the objective function in different ways. G MAPL approximates the objective function f by the linearization of f (18), while G MUSL only linearizes the function F η and approximates the function f with h η (40). According to (40), convexity of F η , (36) and (32), we have which means that the function h η (x l k , x) is the lower estimate of f (x). Thirdly, the procedure G MUSL has one more exit than G MAPL with three possible exits: Step 2, Step 3a, and Step 3b. If the procedure stops at Step 2, the lower bound lb is updated to ; if it stops at Step 3a, we say that a significant improvement has been made on the upper bound of f * and update the upper bound f k . In addition, if the procedure stops at Step 3b, it is considered that there is no significant improvement on the upper bound of f * , thus the parameterD needs to be adjusted to estimate D v,Y and we updateD to 2D.
Referring to the work of [24,31], the following lemma gives some simple observations of procedure G MUSL . Its proof can be found in Appendix A.

Lemma 2.
The following results hold for internal procedure G MUSL : a. If G MUSL terminates at Step 2 or Step 3a, we have f (p Similar to the previous section, it is time to build the convergence results for procedure G MUSL . Referring to the theoretical analysis mode of [24], the detailed derivations of the iteration complexity are given as follows.

Proposition 2.
Let {α k } satisfy (24) and take γ = 1. Let (x l k , x k , x u k ) ∈ X × X × X, k ≥ 1 be the search points, be the level parameter, d ω (·) be the prox-function and η be the smoothing parameter. Then the number of iterations of internal procedure G MUSL can be bounded by, (14).
Proof. Suppose the gap reduction procedure G MUSL does not stop at the K-th iteration. From (34) and inequality (45), we have h η (z, x) ≤ f η (x), ∀z, x ∈ X. Because of (34), (40) and the Lipschitz continuity of the subgradient of F η , we have Hence, From (40), the definition ofx u k (43), the convexity off , (45) and (44), we have for all k ≥ 1, Moreover, by (39) and (43) we can easily obtaiñ It then follows that Subtracting both sides of the above inequality by , then dividing both sides by α 2 k , we have Summing up the above inequalities over k, together with (24) and Hölder inequality, we obtain Thus, from the relation d ω (x K ) ≤ D ω,X , (35) and (38), we have From the fact that the procedure does not stop at Step 3b at the K-th iteration, we obtain In conclusion, we obtain Based on the results of the convergence of the procedure G MUSL above, we next give the algorithm of the MUSL method. Similar to the MAPL method, the MUSL method is also implemented with outernal algorithm framework and internal gap reduction procedure G MUSL . The outernal algorithm of the MUSL method is mainly to determine whether the gap between the upper and lower bounds on f * has reached the given tolerance ε in current iteration k. If the given tolerance ε is reached, the algorithm terminates and output an approximate optimal solution of f , otherwise the outernal algorithm call the internal procedure G MUSL continuously to compress the gap between the upper and lower bounds. The algorithmic framework of the MUSL method is as follows (Algorithm 3).
We now turn to analyze the optimal complexity bound for the MUSL method. Please note that the following results are modifications of those in [24].

Lemma 3.
Suppose that F is defined in (33) and v is a prox-function of Y with strong convex modulus σ v . Then, we have the following relation Here F (x 1 ) ∈ ∂F(x 1 ) denotes the subgradient of F at x 1 and D v,Y is defined in (13).

Theorem 2.
Suppose that {α k } in procedure G MUSL is chosen to satisfy condition (24), and that γ = 1. Then for given ε > 0, the following statements hold for the MUSL method.
(1) the number of non-significant phases can be bounded by and the number of significant phases can be bounded by (2) the total number of iterations performed by the MUSL method does not exceed and Ω ω,X and Ω v,Y are defined in (14).
Furthermore, by (47) and (48) in Lemma 3 we have From (e) in Lemma 1 and the definition of ub s and lb s , we know that In conclusion, we obtain that (2) Let K 1 = {a 1 , a 2 , ..., a m } and K 2 = {b 1 , b 2 , ..., b m } denote the index sets of non-significant and significant phases respectively. For any 1 ≤ i ≤ m in the non-significant phases, we haveD a i+1 = 2D a i . Then we know thatD a i = 2 i−1D a 1 = 2 i−1D 1 and ∆ a i > ε. Due to Proposition 2, we know that K decreases about the first variable and increases about the second variable monotonously. In addition, we conclude that the total number of iterations performed by the significant phases can be bounded by From the fact that ∆ b j+1 ≤ q∆ b j , ∀1 ≤ j ≤ n, we know that It follows from the monotonic of K about variables and Proposition 2 thatD b j ≤ 2D v,Y . In addition, we obtain that the total number of iterations performed in the non-significant phases can be bounded by Combining (49) and (50), we conclude that the total number of iterations of the MUSL method does not exceed In what follows, we apply the MUSL method to solve a special problem wheref in (32) is a smooth convex function. In addition, the complexity results of procedure G MUSL and MUSL method are established in this special case.f is a smooth convex function means that there exists a constant L 1 > 0 such thatf satisfieŝ From the fact that F η has a Lipschitz continuous gradient with Lipschitz-constant L η > 0, we have Thus, we know that f η :=f (x) + F η (x) is a smooth function on X and satisfies where L = L 1 + L η . From the fact that F is non-smooth, we konw that f is non-smooth as well.
We make the following changes to the outernal algorithm of the MUSL method and internal procedure G MUSL . Replace (47) in outernal algorithm with and (40) in procedure G MUSL with The function f η is fixed as a result of the fact that η is fixed in the internal procedure G MUSL . Similar to Propostion 2, we have the following results.
Theorem 3. We assume that {α k } satisfies condition (24) and take γ = 1. Let be the level parameter, d ω (·) denotes the prox-function and η is the smoothing parameter in procedure G MUSL . Iff is a smooth function, then the number of iterations of procedure G MUSL can be bounded by, and Ω ω,X is defined in (14).
Proof. Suppose that the procedure G MUSL does not stop in iteration K. Because of the features off and F η , we know that f η satisfies (52). Let L η = L. From (14), we have Then according to the stopping test in Step 3b in procedure G MUSL , we have Due to L = L 1 + L η , (35) and (38) we obtain that Thus, we have Here we will also establish the complexity of the MUSL method under this special case wheref is smooth and convex. Theorem 4. Assume that {α k } satisfy condition (24) and take γ = 1. Iff is a smooth convex function, then the following statements hold for the MUSL method.
(1) the number of non-significant phases can be bounded by and the number of significant phases can be bounded by (2) the total number of iterations performed by the MUSL method does not exceed Proof. Let ∆ s = ub s − lb s , s ≥ 1. Similar to Theorem 2, when non-significant phase occurs we have S < log 2 . By (53), (48) and (51) we have From (e) in Lemma 1 and the definition of ub s and lb s , we know that In conclusion, we obtain that (2) Let K 1 = {c 1 , c 2 , ..., c m } and K 2 = {d 1 , d 2 , ..., d m } denote the index sets of non-significant and significant phases respectively. For any 1 ≤ i ≤ m in the non-significant phases, we have Q c i+1 = 2Q c i . Then we know that Q c i = 2 i−1 Q c 1 = 2 i−1 Q 1 and ∆ c i > ε. By Proposition 2, we know that K decreases about the first variable and increases about the second variable monotonously. It follows that the total number of iterations performed by the significant phases can be bounded by From the fact that ∆ d j+1 ≤ q∆ d j ∀1 ≤ j ≤ n, we know that By the monotonic of K about variables and the fact thatD b j ≤ 2D v,Y , we know that the total number of iterations performed in the non-significant phases can be bounded by This together with (54) and (55) shows that the total number of MUSL method does not exceed N 1 + N 2 .

Two-Stage Stochastic Programming and Numerical Experiments
In this section, the MAPL and MUSL methods are applied to solve the two-stage stochastic programming problems. Furthermore, we compare the two modified methods with the APL and USL methods. All algorithms are implemented in MATLAB (R2014a), and Mosek (8.0) is called for solving subproblems. The environment in which the program runs is Windows 7 (64 bite), Intel(R) Core(TM) i7-6700 CPU 3.40GHz and 16G memory.
Consider the following two-stage stochastic programming with recourse where Q(x, ξ) = min{q T y : Here, x ∈ R n 1 and y ∈ R n 2 are the decision variables of the first-stage problem and the second-stage problem respectively. X = ∅ is a compact convex set, and ξ = (q, h, T) ∈ R n 1 +m 2 +m 2 +n 1 is a random vector with a known probability distribution and support Ξ. In [33], it was pointed out that by strong convexity one has Let Π(q) = {π ∈ R m 2 : W T ≤ q} be the feasible set of (58) and we assume that Π(q) = ∅, ∀q ∈ Ξ. It is easy to know that Q(x, ξ) is generally non-smooth. Therefore, for general distributions of ξ, (56) is a non-smooth convex programming problem. In [33] Ahmed applied Nesterov's smoothing technique to the two-stage stochastic programming problem (56), and established a proper smooth approximation to Q(x, ξ) as follows. For a given smoothing parameter µ, consider the function f µ (x) with the form where For ξ with discrete distribution, it is shown in [34,35] that f µ is differentiable and its gradient is Lipschitz continuous. Furthermore, when µ is sufficiently small, f µ can approximate f uniformly. Based on the above analysis, we next carry out the MAPL and MUSL methods to solve the problem (59) and (60) to illustrate the effectiveness of the methods and compare them with the APL and USL methods.
We perform numerical experiments on some existing SP instances in [1,36,37] including a telecommunication design (SSN) problem and a motor freight carrier routing problem (20Term). The SSN problem studied by Sen, Doverspike, and Cosares [1] comes from the telecommunications industry: the first-stage problem allocates capacity between network links, and the second-stage problem generates demands to connections requested between point-to-point pairs. The 20Term problem studied by Mak, Morton, and Wood [37] comes from a motor freight carrier's model: the first-stage problem determines a program of carriers, and the second-stage problem adjusts the program according a multi-commodity network. The instances of SSN, 20Term and Storm are downloaded from the link: http://pwp.gatech.edu/guanghui-lan/computer-codes/. The dimensions related to these instances are shown in Table 1. n i denotes the number of constraints in the i-stage problem, and m i denotes the number of variable in the i-stage problem. In particular, we assume that the number of possible realizations of ξ is fixed, i.e. N = 50 or 100. In this case, a total of five instances are tested. The integers in the brackets are the number of possible realizations. For given parameters λ ∈ (0, 1), θ ∈ (0, 1), tolerance ε = 1.0e − 6 and stepsize α k = 2/(k + 1) such that (24), we compare the number of iterations and CPU time of different methods. The results are shown in Tables 2-4. There are some observations on the results in Tables 2-4. When the initial gap ∆ is large (up to 10 2 or 10 7 ), MAPL and APL algorithms are implemented 400 times. The results in Table 2 show that MAPL method has certain advantages over APL method in terms of CPU time and the number of iterations. The results in Table 3 show that in addition to the advantages in CPU time, MUSL algorithm can achieve higher accuracy. From the results in Table 4 we know that compared with the MUSL algorithm, the MAPL algorithm has less CPU time and fewer number of iterations. This is because that while the MAPL method solves 2N linear programming problems, MUSL only needs to solve N smooth quadratic programming problems in the progress of algorithm.

Conclusions
In this paper, we presented two modified BL-type methods, the modified accelerated prox-level (MAPL) and modified uniform smoothing level (MUSL) methods, for uniformly solving the black-box CP problems and a class of structure non-smooth problems. In addition, both MAPL and MUSL methods can achieve optimal complexities respectively. To illustrate the effectiveness of the modified methods, they were then applied to solve the two-stage stochastic programming with recourse and numerical experiments were carried out. Finally, the numerical results shown that the MAPL and MUSL methods have certain advantages in algorithm efficiency and solution time.