A Family of Modified Spectral Projection Methods for Nonlinear Monotone Equations with Convex Constraint

In this paper, we propose a family of modified spectral projection methods for nonlinear monotone equations with convex constraints, where the spectral parameter is mainly determined by a convex combination of the modified long Barzilai–Borwein stepsize and the modified short Barzilai–Borwein stepsize. We obtain a trial point by the spectral method and then get the iteration point by the projection technique. The algorithm can generate a bounded iterative sequence automatically, and we obtain the global convergence of the proposed method in the sense that every limit point is a solution of the nonlinear equation. The proposed method can be used to resolve the large-scale nonlinear monotone equations with convex constraints including smooth and nonsmooth equations. Numerical results for nonlinear equation problems and the 
 
 
 
 ℓ
 
 
 1
 
 
 
 -norm regularization problem in compressive sensing demonstrate the efficiency and efficacy of our method.


Introduction
In this paper, we consider the nonlinear monotone equations with constraints as follows: where F: Ω ⟶ R n is continuous and monotone and Ω⊆R n is a nonempty closed convex set. A mapping is called monotone if holds, and 〈x, y〉 � x T y denotes the inner product of x, y ∈ R n . Problem (1) is derived from various practical problems such as subproblems in generalized proximal algorithms with Bregman distance [1], variational inequality problems [2], chemical equilibrium problems [3], power flow equations [4], and other engineering problems. During the last decades, many numerical approaches have been proposed to solve problem (1), such as Newtonbased methods [1,[5][6][7], the derivative-free methods [8][9][10][11][12][13][14], and the gradient-based methods [15][16][17][18][19][20][21][22][23][24]. Due to their fast local convergence property [25,26], the Newton-based methods as well as their variants are extremely popular for solving problem (1). However, Newton-based methods involve the calculation of a Jacobian matrix or an approximation Jacobian at each iteration, which will reduce their efficiency especially in solving large-scale problems. e first-order methods have received much attention in recent years due to their low storage requirement. One of the widely known first-order methods is the spectral gradient method, which was introduced by Barzilai and Borwein in [27] and was extended further by Raydan and La Cruz [15,16,28,29]. Based on the inexact Newton method introduced by Solodov and Svaiter in [6], Zhang and Zhou studied the spectral gradient projection method for the unconstrained monotone equations in [30], and Yu et al. applied it to solve constrained monotone equations in [31], the reported numerical results indicating the excellent performance of these methods.
Recently, Dai et al. [32] proposed a family of spectral gradient methods for unconstrained optimization, whose stepsize is determined by a convex combination of the long Barzilai-Borwein (BB) stepsize and the short BB stepsize, and the algorithm presents good numerical performance. e success of the combined method motivates us to extent the algorithm to constrained nonlinear equation problem. Since the constrained nonlinear equation is very different from the unconstrained optimization, our algorithm, without using any merit function, is also distinct from the algorithm in [32] and thus can be used to solve nonsmooth problems. Moreover, our algorithm can be seen as a family of modified spectral gradient projection methods for the nonlinear monotone equations with convex constraints, where the spectral parameter is mainly determined by a convex combination of the modified long BB stepsize and the modified short BB stepsize. We prove that the algorithm generates a bounded iterative sequence automatically and it converges globally in the sense that every limit point of the iterative sequence is a solution of the original problem. e numerical tests show the efficiency of the proposed algorithm. We notice that modified spectral gradient algorithms were proposed in [33,34] for solving problem (1), but our algorithm is very different from theirs in the following aspects. Firstly, we give a changeable choice for the inexact parameter in the spectral method. Secondly, the relationship between the modified long and short BB stepsize is studied. Moreover, it has been shown that a suitable alteration of long stepsize and short stepsize is more effective than the respective deployment of each of them. We also present several cyclic strategies for the convex combination, which contributes to the efficiency of our algorithm. e remainder of this paper is organized as follows. In Section 2, we propose the new algorithm, and some properties of the new method are also discussed. In Section 3, we prove the global convergence of the proposed method under appropriate assumptions. e numerical results are given in Section 4. Application of proposed algorithm in compressive sensing can be found in Section 5, and conclusions are provided in Section 6.

Algorithm
In this section, we will present a family of modified spectral gradient projection methods for solving (1).
Firstly, we introduce the projection operator, which is defined as a mapping from R n to a closed convex set Ω: and it has been equipped with a distinct nonexpansive feature: Next, we recall the idea of the spectral gradient method developed by Barzilai and Borwein [27] for unconstrained optimization: where f is continuously differentiable with gradient g.
Given an x k ∈ R n , the next iteration point generated by the BB method is defined as follows: x k+1 � x k − α k g k , s k− 1 � x k − x k− 1 , and y k− 1 � g k − g k− 1 , and we can get By symmetry, we can also get α k � · α BB2 k by minimizing As described in [32], when 〈s k− 1 , always holds, which means α BB1 k is the long BB stepsize and α BB2 k is the short one. Numerical experiments show that the long BB stepsize is superior to the short one, which can be seen in [35,36]. But in practice, a suitable alternation of long BB stepsize and short BB stepsize works better than using them separately, which can be seen in numerical section.
Moreover, we discuss some related problems about the modified long BB stepsize and short BB stepsize and then give the algorithm. For k > 1, given an r k− 1 > 0, define the corresponding modified stepsizes are defined as follows: If r k− 1 � 0, it reduces the classical spectral approach for nonlinear equations. Now, we discuss the relationship between the stepsizes Setting On the one hand, by Cauchy-Schwarz inequality On the other hand, by the monotonicity of F, we have 2 Mathematical Problems in Engineering which means α BB1 k ≥ α BB1 k . Next, we discuss the relationship between α BB1 k and α BB2 k by updating the parameter r k− 1 . Denote In fact, the maximum of α BB2 k can be worked out by setting and we can deduce the extremum of φ(r k− 1 ) at some point r k− 1 , so we choose parameter r k− 1 according to the following rule: However, if r k− 1 ≤ 0, we set r k− 1 � r 0 for a given positive constant. We can also get α BB2 k ≥ α BB2 k . In conclusion, we can obtain that Finally, we terminate this section by describing the specific algorithm, which can be viewed as a modified spectral gradient projection method, named as MSGP for short.

Remark 1
(1) In Step 5, y k− 1 is different from the standard definition.
(2) If Algorithm 1 terminates at a finite iteration k and F(x k ) � 0, x k is the solution of (1). However, if F(x k ) ≠ 0 for k ≥ 1, Algorithm 1 generates an infinite sequence x k . So without loss of generality, we assume that the algorithm generates an infinite sequence.

Convergence Analysis
To analyze the global convergence of the Algorithm 1, we assume that F(x) is Lipschitz continuous on the set Ω, i.e., there exists a positive constant L such that e following lemmas state that Algorithm 1 is well defined.

Lemma 1.
Under the assumption listed above, there exists a non-negative number m k satisfying Algorithm 1 for all k ≥ 1.
Proof. We first give an estimation of the stepsize α k . For α BB1 k � (〈s k− 1 , s k− 1 〉/〈s k− 1 , y k− 1 〉), by the monotonicity of F(x), we have that On the one hand, by the Lipschitz continuity of F(x), we have Hence, On the other hand, by the definition of y k− 1 , we have Combining this together with (17), we can derive α BB2 k ≥ (r k− 1 /(L + r k− 1 ) 2 ). So, by (19) and α BB1 k ≥ α BB2 k , we have From the definition of α k , we finally get Now we prove the conclusion by contradiction; suppose that there exists k 0 ≥ 1 such that Algorithm 1 is not satisfied for any non-negative integer m, i.e., Let m ⟶ ∞; we have Nonetheless, by the definition of d k and (23), we have Mathematical Problems in Engineering is contradicts inequality (25), and hence we get the desired result.
Based on the conclusion of Lemma 1, we can get the global convergence of the Algorithm 1.

Theorem 1.
Suppose that the sequences x k and z k are generated by Algorithm 1; then, x k and z k are both bounded and it holds that Furthermore, the sequence x k converges to some point Proof. We complete this proof from two aspects. First of all, we give the proof of (27). For this purpose, we assume that the solution set of (1), denoted by S, is nonempty. From Algorithm 1, we have Let x * ∈ S; from (4) and Algorithm 1, we have By the monotonicity of mapping F and x * ∈ S, it holds that From (29)-(31) and Algorithm 1, we have Hence, the sequence ‖x k − x * ‖ is decreasing and convergent; moreover, the sequence x k is bounded. Using the continuity of F, we know that there exists a constant M > 0 such that By the Cauchy-Schwarz inequality, the monotonicity of F, and (29), one has From (33) and (34), we obtain that the sequence z k is also bounded.
Step 2: compute the search direction d k by d k � − α k F(x k ).
Step 3: find the trial point Step 4: compute Step 5: choose r k− 1 and compute are defined by (8) and (9).
Mathematical Problems in Engineering us, e remaining part is the proof of (28). Since z k � x k + θ k d k , the following holds from (36) and (38): From Algorithm 1 and (23), we have By (33), it holds that d k is bounded. And we complete the following proof in two different cases.
By (40), we obtain that e continuity of F implies that the sequence x k has some accumulation point x such that F(x) � 0, i.e., x ∈ S. From (32), it holds that ‖x k − x‖ converges, and since x is an accumulation point of x k , it must hold that x k converges to x. Case 2: From (40), we have By (39), it holds that From Algorithm 1, we have Since x k , d k are bounded, we can choose a subsequence; let k ⟶ ∞ in (46), and we obtain that where x, d are limit points of corresponding subsequence.
On the other hand, let k ⟶ ∞ in (26), and we have which contradicts (47). Hence, is impossible. is completes the proof.

Preliminary Numerical Results
In order to observe the performance of the new algorithm, we first discuss the selection of convex combination coefficients λ k and make preparations for the numerical experiments. e simplest scheme is to fix λ k for all iterations, i.e., select a constant within the interval [0, 1] as the value of λ k . Another approach is randomly choosing in the interval [0, 1], known as randomly relaxed Cauchy method [37], but the information of the current and former iterations is not utilized. Moreover, numerous cyclic methods have shown that a suitable alternation of short stepsize and long stepsize works better than the separate use of each. For instance, an adaptive truncated cyclic (ATC) scheme is proposed by Dai et al. in [32], where which yields the following stepsize: ere are three methods that emerged along with different cyclic schemes. e first one is ATC1, i.e., where α k is defined in (27) and α BB3 Dai proposed an alternate step method (AS) in [38] and used the steepest descent and the BB steps alternately. is motivates us to consider updating α k as follows: which yields

Mathematical Problems in Engineering
For an in-depth study on the effect of λ k in different situations on the performance of the Algorithm 1, we use two classic examples to test these methods. For the fixed methods, we set λ � 0.1, 0.3, 0.618, and 0.9 for all k, respectively. For ATC and AS methods, we set parameter m � 5. In particular, we set σ � 0.01 and β � 0.5 in Algorithm 1 and stopped the iterative process if ‖F(x k )‖ ≤ 10 − 6 or the number of iterations exceeded 1000. In all methods described here, we use MATLAB (v.9.6.0-R2019a) for the implementation. All codes were carried out on a PC with an Intel Core (TM) i5, 2.39 GHz processor and 4 G memory of RAM. We now list the test examples; the mapping F is taken as is example originates from [31], i.e., where Ω � R n + and i � 1, 2, . . . , n.

Example 2.
is example comes from [39], i.e., where Ω � R n + and i � 1, 2, . . . , n. Obviously, Example 1 is nonsmooth and Example 2 is a strict convex function. For different initial points, Table 1 gives the preliminary numerical results with dimensions from 100 to 10000. Note that E i denotes experiment on the example i, where i � 1, 2. x j represents start point of the iteration, where j � 1, 2, 3. In addition, A/B stands for number of iterations and CPU time in seconds. We observed that the noncyclic methods are superior to other methods in the number of iterations and CPU time when λ k is very small. For the cyclic schemes, ATC2 is superior to other cyclic methods both in the number of iterations and execution time. Furthermore, we can get the conclusion that ATC2 dominates all methods both in the number of iterations and the elapsed time. us, we only apply ATC2 method to Algorithm 1 for the further experiment.
In the rest of this section, to evaluate the efficiency of Algorithm 1 in solving nonlinear monotone equations with convex constraints, we compare the MSGP method with PDY method [11] and SGP method [31] through two aspects. So, we set different iteration starting points for each problem. Additionally, we increase the dimensions gradually to study the performance of the new method for large-scale problems.

Problem 1.
is problem is logarithmic function with Ω � x ∈ R n |tx i n > q − h1 , i.e., where i � 1, 2, . . . , n and mapping F is taken as

Problem 2.
is problem is where i � 1, 2, . . . , n and Ω � x ∈ R n |x i t ≥ n − q2 and mapping F is taken as

Problem 4.
is problem is a modification of Problem 2 in [30].

Mathematical Problems in Engineering
We test the robustness of the methods in two ways; for Problems 1-3, we compare the performance of algorithms at different initial points once the dimension of the test problem is selected. e results are shown in Tables 2-4, while start points are chosen as As for Problems 4-6, we increase the dimensions of the problem gradually to test the performance of Algorithm 1 in addressing the large-scale problems. e results are reported in Tables 5-7 where the initial points are set as T , x 2 � (1, 1, . . . , 1) T , e detailed results are listed in Tables 2-7, where the performance of the MGSP method, SGP method, and PDY method is compared with respect to the total number of iterations (Iter), the number of function evaluations (NF), CPU time in seconds (CPUtime), and the norm of objective function (Norm). Furthermore, P i (i � 1, 2, . . . , 6) denotes the test problem, and the dimension of the test problem is signified by Dim, respectively. We claim that the method fails, using the symbol (-), when the CPU time is greater than or equal to 10. e performance of the three methods was evaluated using the performance profile which is presented by Dolan and Moré in [42]. We denote the set of methods as M and the set of experiments as E. We claim comparing three methods with the same dimension and start point as one experiment. For each problem, l e,m denotes the number of iterations required to execute experiment e by approach m. And the performance ratio is defined as r e,m ≔ l e,m min l e,m |tmn ∈ qM .
Obviously, r e,m ≥ 1 for all e ∈ E and m ∈ M. Furthermore, r e,m � R where R is a large positive constant when method m does not solve experiment e, and r e,m < R always holds for all e ∈ R and m ∈ M. If method m can solve experiment e successfully, we obtain an overall assessment of the performance between these solvers. It can be described as follows: where n e represents the number of elements in set M, ρ m (τ) can be viewed as the probability that method m is superior to the other methods in terms of the total number of iterations, and r e,m is within a factor τ ∈ R of the best possible ratio. In the same way, we can obtain the performance profile with respect to CPU time and the number of function evaluations. e performance profile of the MSGP method, SGP method, and PDY method is compared in the sense of the number of iterations, CPU time, and number of function  Figures 1-3, respectively. An observation that emerges from Figure 1 is that the MSGP method performs much better than the PDY method and SGP method with respect to the number of iterations. Figure 2 reveals that the MSGP method is faster than the SGP method and PDY method in terms of CPU execution time. Moreover, we perceive that, in Figure 3, our method also precedes other methods with respect to the number of function evaluations. Furthermore, based on the analysis mentioned in the previous section and experiment results, our MSGP method outperforms the SGP method and PDY method in terms of total number of iterations, CPU time, and the number of function evaluations.

Application in Compressive Sensing
In this section, we consider ℓ 1 -norm regularization problem in compressive sensing which involves finding sparse solutions to underdetermined or ill-conditioned linear systems of equations. A popular approach is minimizing an objective function which can be expressed as follows: where x ∈ R n , y ∈ R k is an observation, and A ∈ R k×n (k ≪ n) is a linear operator. Moreover, τ is a non-negative parameter.
‖x‖ 2 and ‖x‖ 1 denote the Euclidean norm and the ℓ 1 -norm of x, respectively. Obviously, problem (72) is a convex unconstrained optimization problem which has become a research hotspot, particularly in compressive sensing if the original signal is sparse or approximately sparse in some orthogonal basis. Consequently, an exact restoration can be obtained by solving problem (72). In recent decades, different approaches have been proposed to solve (72). In [43], these methods are divided into four categories including greedy strategy approximation, homotopy algorithm, proximate algorithm, and constrained optimization (see [44][45][46][47] in order). e gradient-based method is very popular among these methods, and one of the most earliest gradient projection methods for sparse reconstruction (GPSR) was presented in [47]. We now briefly review this approach; the first key step of GPSR approach is to express (72) as a quadratic program. For any x ∈ R n , split it into its positive and negative parts, and then x can be formulated in the following form: Moreover, as in [47], (72) can be written in more standard BCQP form: . Since B is a positive semidefinite matrix, obviously, (75) is a convex quadratic program problem.
Recently, Xiao et al. [48] translated (75) into a linear variable inequality problem (VIP) and pointed out that this VIP is equivalent to the linear complementary problem, i.e., find a vector z ∈ R n such that (76) ey also claimed that z is a solution of (76) if and only if it satisfies e mapping F is monotone and Lipschitz continuous, and relevant proofs are listed in literature [48,49], respectively. erefore, we can utilize our MSGP method to solve (77).
In what follows, we design an experiment to reconstruct a sparse signal of length n from k observations. e quality of restoration is determined by the mean of squared error (MSE) to the original signal x: where x * is recovered signal. In our experiment, the size of signal is selected with n � 2 12 and k � 2 10 , and original signal contains 2 7 random nonzero elements. Under MATLAB R2019a environment, we use command randn(m, n) to generate the Gaussian matrix A. In addition, the       measurement y is distributed with noise, i.e., y � Ax + ξ, where ξ is the Gaussian noise distributed as N(0, 10 − 4 ).
In order to illustrate good performance of proposed method, we compare it with two different gradient methods. e first one is the conjugate gradient descent (CGD) method [50] which is more efficient than the spectral gradient (SGCS) method [48] in compressive sensing field. e second is a projection method (PCG) [51] which can be viewed as an extension of the CGD method. e parameters for three algorithms are taken as follows: (i) CGD method: ξ � 10, σ � 0.0001, and ρ � 0.5. (ii) PCG method: ξ � 10, σ � 0.0001, ρ � 0.5, and r � 0.2.
We use f(x) � (1/2)‖y − Ax‖ 2 2 + τ‖x‖ 1 as the merit function. For a fair comparison, we use the same continuation technique on parameter τ which is forced to decrease as the measure in [47]. In addition, each algorithm starts at the same initial point. e iterative process starts at the measurement image, i.e., x 0 � A T y, and terminates when the relative change between successive iterates falls below 10 − 5 , i.e.,    Mathematical Problems in Engineering 13 where f k denotes the function evaluation at x k . Numerical results including the original signal, the measurement, and the recovered signal by related approaches are reported in Figure 4. Moreover, we plot four figures to demonstrate their convergence behavior from the trends of MSE and objective function values along with increasing number of iterations and CPU time (see . In Figure 4, we can observe that CGD, PCG, and MSGP methods recovered the original signal almost exactly. However, our method requires less number of iterations and less CPU time (in seconds) to achieve similar quality restoration. is can also be seen in Figures 5-8 where the objective function values and MSE obtained by MSGP method decrease faster than other methods throughout the whole iteration process. Furthermore, the experiment is repeated for 10 different noise samples, and the average of 10 results is also given (see Table 8). We can clearly see that in Table 8, the MSGP method outperforms the CGD method and PCG method in terms of MSE, the number of iterations, and CPU time for most cases.

Conclusion
In this paper, we proposed a family of modified spectral projection methods for solving nonlinear monotone equations with convex constraints, where the spectral parameter is mainly determined by a convex combination of two modified BB stepsizes. We also introduced several schemes to update the spectral parameter. Under the specific assumptions, we proved the global convergence of new algorithm.
anks to its low memory requirement, our method is suitable for resolving the large-scale nonlinear monotone equations as well as the ℓ 1 -norm regularization problem in compressive sensing. How to analyze the local convergence and the iterative complexity of the proposed method is worthy of further study.

Data Availability
e data used to support the findings of this study are included within the article.