A Modified Three-Term Conjugate Gradient Algorithm for Large-Scale Nonsmooth Convex Optimization

It is well known that Newton and quasi-Newton algorithms are effective to small and medium scale smooth problems because they take full use of corresponding gradient function’s information but fail to solve nonsmooth problems. The perfect algorithm stems from concept of ‘bundle’ successfully addresses both smooth and nonsmooth complex problems, but it is regrettable that it is merely effective to small and medium optimization models since it needs to store and update relevant information of parameter’s bundle. The conjugate gradient algorithm is effective both large-scale smooth and nonsmooth optimization model since its simplicity that utilizes objective function’s information and the technique of Moreau-Yosida regularization. Thus, a modified three-term conjugate gradient algorithm was proposed, and it has a sufficiently descent property and a trust region character. At the same time, it possesses the global convergence under mild assumptions and numerical test proves it is efficient than similar optimization algorithms.


Introduction
Optimization model is one of the most important problems since it was widely used in all aspects [Liu, Yong, Gao et al. (2018); Liu, Chen, Ji et al. (2017); Melro and Jensen (2017)], where the nonsmooth mathematical model is more and more famous since it influences various fields and is a urgent as well as complex problem [Bonettini, Loris and Porta (2015); Li, Li and Yang et al. (2017); Shi, Tuan, Su et al. (2017) ;Zhang, Wu and Nguyen (2014)]. Writes and scholars paid much attention to this optimization program and obtained fruitful achievements. In Birge et al. [Birge, Qi and Wei (1998)], an algorithm on the basis of a classical proximal point was proposed and numeral test proves it is an effective algorithm because proximal point is helpful to simplify the process of calculating objective function's value. Sagara et al. [Sagara and Fukushima (2017)] and Yuan et al. [Yuan, Wei and Wang (2013)] proposed a trust region algorithm that mainly consider the search region and radius by relevant parameters. The concept of 'bundle' was proposed, and scholars introduced bundle algorithm into solving optimization model. The bundle algorithm sufficiently takes full use the bundle's information of the limiting variable conditions and has wide applications [Albaali, Spedicato and Maggioni (2014);Hall, Zimbro, Maduro et al. (2017); Luo, Tang and Zhou (2008); Kim, Um, Suh et al. (2016) ;Xu, Zhang, Du et al. (2016)]. To small and medium scale smooth and nonsmooth problems, bundle algorithm is effective but fails to solve large-scale problems. Conjugate gradient algorithm is more and more popular because of its simplicity and high efficiency [Dai, Liu, Zhang et al. (2017);Fasi, Langou and Robert (2016) ;Li, Zhang and Dong (2016); Wang and Zhu (2016); Yuan and Sheng (2017); Yuan, Meng and Li (2016); Yuan and Zhang (2015); Yuan, Wei and Li (2014); Yuan and Lu (2009)] and wide applications (see Baggio et al. [Baggio, Franceschini, Spiezia et al. (2017);Bernaschi, Bisson, Fantozzi et al. (2016); Janna (2016); Liu and Weng (2016); Sarkar (2016) ;Tarzanagh, Nazari and Peyghami (2016); Zhang and Liu (2016)]). For general conjugate gradient algorithm, it divides into two parts: one is inexactly or exactly line technique, the other is search direction, thus conjugate gradient algorithm is various since different formulas of search technique and search direction. Conjugate gradient algorithm addresses not only small and medium scale smooth and nonsmooth problems, but also solves large scale optimization model, which avoids calculating and store complex matrices and information of limiting variable boundary. On the basis of above discussion, this paper proposes a new three-term conjugate gradient algorithm to solve large-scale nonsmooth optimization models. Objective algorithm has following perfect properties: 1. Proposed algorithm combines steepest descent method with conjugate gradient algorithm by introducing a parameter into search direction. 2. The search direction not only possesses a descent character but also has a trust region trait. 3. The objective algorithm has the ability of obtaining the global convergence. 4. The numeral test expresses the objective algorithm is superior than relevant optimization algorithms. In this paper, next section expresses the model of nonsmooth convex optimization. The objective algorithm was introduced in Section 3 and its outstanding characters were expressed in Section 4. Numeral results were introduced in Section 5 and relevant references were listed in Section 6.

The model of nonsmooth convex functions
Consider general nonsmooth optimization model where ( ) v x sometimes does not satisfy the property of smooth. It means that Newton and quasi-Newton algorithms fail to solve it because of the nonsmooth property. Thus, a perfect technique was named 'Moreau-Yosida' regularization introduce this paper into solving the projective model, where the main idea of 'Moreau-Yosida' regularization is objective model exchanges a new constant with the variable, so objective model is translated into a smooth mathematical convex model. In general, the model's formula of 'Moreau-Yosida' regularization is where ϕ is a positive constant, \|.\| presents the Euclidean norm. This paper denotes x as the 'Moreau-Yosida' regularization of objective model and similar papers proved it is outstanding [Wang and Zhu (2016)]. This paper assumes objective functions of (1) are convex and the objective model's regularization has solution of ( ) is the objective function's gradient function. In virtue of relevant papers, the ( ) M v x ∇ not only satisfies the smooth property, but also its gradient function is Lipschitz continuous. It is easy to find that (1) and (2) are equivalent because they have identical solution, thus this paper mainly considers the model of (2) since it is an optimization model that has perfect properties.

Three-term conjugate gradient algorithm
It is well known that a wonderful optimization algorithm not only has a perfect search direction but also outstanding exactly or inexactly line search technique. The three-term conjugate gradient algorithm has seen extensive study and obtained extremely good theoretical results. In view of Albaali et al. [Albaali (1985); Gilbert and Nocedal (1990); Touati-Ahmed and Storey (1990)] on conjugate gradient methods, the sufficient descent condition and trust region character are crucial to the global convergence. From this, a famous formula for the search direction 1 k d + is expressed as follows: Zhang et al. [Zhang, Zhou and Li (2006)] proposed the following formula: [Nazareth (1977)], Nazareth proposed another search formula that is computed by where 1 y k k k g g + = − , k g is the gradient function value at the point k x and 0

Objective algorithm
In virtue of above discussion, this paper proposes a new three-term conjugate gradient algorithm and the formula is Then we list the specific algorithm (denoted as Algorithm 2.1) Step 1: (Initiation) Choose an initial point Step 2: If Step 3: Calculation k α in virtue of (7).
Step 4: Set new iteration point of Step 5: Update the search direction by (6).
Step 6: If holds, the algorithm stops, otherwise go to next step.
Step 7: Let 1 + = k k and go to Step 2.

Important characteristics and global convergence
Now we list the search direction has a trust region trait and a sufficient descent character without any conditions and this section lists them and expresses relevant proof.
holds. To save space, we merely list but omit its proof. The formulas of (8) and (9) express the algorithm has a sufficient descent character and a trust region trait. In the rest of this section, we express the specific process of proposed algorithm (algorithm 2.1) and prove the existence and necessity of step length k α in inexactly line search.
Theorem 4.1. If Assumption (i-ii) are true, then has a constant k α that satisfy the formula of (7). proof: To prove relevant conclusion, we introduce following function holds, it means that modified Armijo line search technique is well defined. From above discussion, the Algorithm 2.1 has a sufficient descent trait and a trust region character and the line search technique is reasonable and necessary, thus we express the theorem of global convergence. Assumption: In virtue of the technique of 'Moreau-Yosida' regularization, we arrive a conclusion that function ( ) x v M ∇ is Lipschitz continuous. It means that exists a positive constant κ satisfy the formula of ( ) ( ) Theorem 4.2. If relevant assumptions hold, the relative sequences , α are generated in virtue of the Algorithm 2.1, then we obtain that proof: We will prove it by contradiction. Suppose above conclusion does not hold, it means that exist a constant * ε and a corresponding index * k , which satisfies On the basis of line search technique (7), we have It is notable that the formula of (13) express In virtue of (8), we obtain Then we divide above conclusion into two parts: In virtue of the algorithm's process, the constant γ α k does not satisfies (7), it means that Considering the formulas of (8) and (9), we obtain Thus has a positive constant On the basis of the mean-value theorem, the property of descent and objective function's continuity, there has a positive ( ) x v , which contradicts with (12). Thus original conclusion holds, we complete the proof.

Numerical results of nonsmooth functions
Related content is presented in this section and consists of two parts: test problems and corresponding numerical results. To measure the algorithm's efficiency, we compare Algorithm 2.1 with algorithm 1 in terms of NI, NF, and CPU on the test problems listed in Tab. 1, where NI, NF, and CPU express the number of iterations, the calculation's frequency of the objective function, and the calculation time needed to solve various test problems (in seconds), respectively. Algorithm 1 is different from the objective algorithm in the formula of 1 + k d that was determined by (4), and the remainder of algorithm 1 is identical to Algorithm 2.1.
Stopping rule: If the formula of ( ) ε ≤ ∇ k x v holds, the whole iteration number is greater than 10000 and the interior iteration frequency of step length $\alpha_k$ is greater than 5, then the algorithm stops.    On the basis of numeral tests, it is obvious that discussed algorithms successfully address large scale nonsmooth problems and are valuable in addressing complex problems. In Figs. 1-3, the red curve of proposed algorithm (Algorithm 2.1) is above the others since the iteration number (NI), the calculation frequency (NF), and calculation time (CPU) in solving problems are outstanding than the others. The initial points in three Figures are close to 1.0, which denotes the proposed algorithm is effective in solving challenging problems. It is noted that the red curve increases rapidly and is very smooth, which is important for a good optimization algorithm. Proposed algorithm avoids the denominator of the formula for search direction is 0, thus proposed algorithm enables efficiently to solve more complex problems.