Globally convergent three-term conjugate gradient projection methods for solving nonlinear monotone equations

In this paper, we propose two derivative-free conjugate gradient projection methods for systems of large-scale nonlinear monotone equations. The proposed methods are shown to satisfy the sufficient descent condition. Furthermore, the global convergence of the proposed methods is established. The proposed methods are then tested on a number of benchmark problems from the literature and preliminary numerical results indicate that the proposed methods can be efficient for solving large scale problems and therefore are promising.

and conjugate gradient methods [8]. All these methods are iterative, that is, starting with x k the next iterate is found by where d k is the search direction and α k is the step length. The gradient projection method [1,6,[12][13][14][15]19,21,25] is the most effective method for solving systems of large-scale nonlinear monotone equations. The projection concept was first proposed by Goldstein [6] for convex programming in Hilbert spaces. It was then extended by Solodov and Svaiter [17]. In their paper, Solodov and Svaiter constructed a hyperplane H k which strictly separates the current iterate x k from the solution set of Problem (1.1) as where z k = x k + α k d k is generated by performing some line search along the direction d k such that The hyperplane H k strictly separates x k from the solutions of Problem (1.1) and from the monotonicity of F we have that for any x * such that F(x * ) = 0, Now, after constructing this hyperplane, Solodov and Svaiter's next iteration point, x k+1 , is constructed by projecting x k onto H k as Recently, most research has been focussed on the conjugate gradient projection methods for solving Problem (1.1). For the conjugate gradient projection method, the search direction is found using where F k = F(x k ) and β k is a parameter, and x k+1 is obtained by (1.3). One such method is that of Sun and Liu [19] where the constrained nonlinear system (1.1) with convex constraints is solved by (1.5) where z k = x k + α k d k and d k is computed using the conjugate gradient scheme (1.4) with β k defined as and λ > 1 is a constant. This method was shown to be globally convergent using the line search where μ > 0, α k = ρ m k and ρ ∈ (0, 1) with m k being the smallest nonnegative integer m such that (1.6) holds.
In equation (1.5), P [·] denotes the projection mapping from R n onto the convex set , i.e., This is an optimization problem that minimizes the distance between x and y, where y ∈ . Also, the projection operator is such that it is nonexpansive, i.e., Ahookhosh et al. [1] also extended the projection method by Solodov and Svaiter [17] to three-term conjugate gradient method where x k+1 is given by (1.3) and z k is found using the direction holds. To satisfy Condition (1.8), the authors suggest using This leads to two derivative-free algorithms DFPB1 and DFPB2, respectively. Other conjugate gradient projection methods can be found in [9,[12][13][14][15]18,25]. In this paper we propose two globally convergent derivative-free conjugate gradient projection methods. The rest of the paper is structured as follows: In the next section, motivation and the details of the proposed algorithm are given. The sufficient descent property and the global convergence of the proposed algorithm are presented in Sect. 3. Numerical results and conclusion are presented in Sects. 4 and 5, respectively.

Motivation and algorithm
Before presenting the proposed methods, we first present some work that motivated us. We start with the work of Hager and Zhang [8] where the unconstrained minimization problem is solved, with f : R n → R being a continuously differentiable function. The iterations x k+1 = x k + α k d k are generated using the direction is the gradient of f at x k , and and , and η > 0 is a constant. The parameterβ N k satisfies the descent condition where c = 7 8 and its global convergence was established by means of the standard Wolfe line search technique. In Yuan [22], a modified β PRP is presented where σ > 1 4 is a constant, and is an earlier modification of β PRP k (see [22] and reference therein). Other modified formulas similar in pattern to (2.5) were proposed in [22] using β C D k , β L S k , β DY k and β H S k . These methods satisfy the sufficient descent condition (2.4) and were also shown to converge globally under the standard Wolfe line search conditions.
In [2], Dai suggests that any β k of the form β k = g T k z k , where z k ∈ R n is any vector, can be modified as with σ > 1 4 and will satisfy the sufficient descent condition (2.4) with c = 1 − 1 4σ . In order to prove the global convergence of (2.7) an assumption that β G S D k ≥ η k , where η k is defined as in (2.3), is made in Nakamura et al. [16]. That is, they proposed Motivated by the work of [1,8,16,22], we propose a direction The term θ k is determined such that Condition (1.8) holds. Below, we present our algorithm.

Global convergence of the proposed method
In order to establish the global convergence of the proposed approach, the following assumption is necessary.
A3. The function F(x) is Lipschitz continuous on R n , i.e., there exists a positive constant L such that (3.1)

Lemma 3.2 (3TCGPB1) Let d k be generated by Algorithm 1 with
For k ≥ 1, we divide the rest of the proof into the following cases.

Lemma 3.3 (3TCGPB2) Consider the search direction d k generated by Algorithm 1 and
and For k ≥ 1, we divide the rest of the proof into the following cases.
Substituting (3.7) into (3.10) immediately we obtain that Hence, the direction given by (2.9) and (2.10) is a descent direction.

Lemma 3.4 The line search procedure (1.6) of Step 5 in Algorithm 1 is well-defined.
Proof We proceed by contradiction. Suppose that for some iterate indexes such ask the condition (1.6) does not hold. As a result, by setting αk = ρ m s, it can be concluded that Letting m → ∞ and using the continuity of F yields −F(xk) T dk ≤ 0. Proof Since x * is such that F(x * ) = 0 and the mapping F is monotone, then For x * ∈ we have from (1.5) and (1.7) that where By the monotonicity of F, we have that Using (3.13) and (3.15), we have from (3.14) that which means that This shows that { x k − x * } is a decreasing sequence and hence {x k } is bounded. Also, from (3.13), it follows that Then, by means of (3.3), we also have

Numerical experiments
In this section, we present numerical results obtained from our two proposed methods, 3TCGPB1 and 3TCGPB2, and compare them with the methods proposed by Ahookhosh et al. [1], DFPB1 and DFPB2. All algorithms are coded in MATLAB R2016a and run on a computer with Intel(R) Core(TM) i7-4770 CPU at 3.40GHz and installed memory (RAM) of 8.00 GB. The parameters used in all the four methods are set as ρ = 0.7 and μ = 0.3. Similar to [1], the initial adaptive step length is taken as where t = 10 −6 . For our two methods 3T CG P B1 and 3T CG P B2, we use additional parameters σ = 0.7, η = 0.01, and set ξ k = η k . We adopt the same termination condition for all the four methods, i.e., we stop the algorithms when the maximum number of iterations exceeds 500 or the inequality F(x k ) ≤ = 10 −5 is satisfied. Test problems used here are taken from Hu and Wei [9], Sun and Liu [18,19] and Zhang and Zhou [24]. These problems are outlined below.

Problem 4.3 The mapping F(·) is taken as
Initial guess x 0 = (1, 1, 1, ..., 1) T .    We present the results in Tables 1, 2, 3, 4, 5, where the dimension (N ) of each problem is varied from 100 to 50 000. In each table, we present the results in terms of iterations (NI), function evaluations (FE), the optimal function value ( F(x k ) ) at termination as well as the CPU time. In all the test runs, the methods were successful in solving all the problems. A comparison of the methods from Tables 1, 2, 3, 4, 5, shows that the proposed methods are very competitive with the DFPB1 and the DFPB2 methods. We further compare the methods using the performance profile tools suggested by Dolan and Moré [5]. We do this by plotting the performance profiles on NI, FE and CPU time. Figure 1 presents the performance profile on NI, Fig. 2 shows the performance profile on FE and finally Fig. 3 shows the performance profile on CPU time. It is clear from the figures that 3TCGPB2 performs much better than the other methods. However, overall the proposed methods are very much competitive and therefore promising.

Conclusion
In this work, two new derivative-free conjugate gradient projection methods for systems of large-scale nonlinear monotone equations were proposed. The proposed methods were motivated by the work of Ahookhosh et al. [1], Zhang et al. [8], Nakamura et al. [16] and Yuan [22]. The proposed methods were shown to satisfy the sufficient