Derivative-free method based on DFP updating formula for solving convex constrained nonlinear monotone equations and application

1 Center of Excellence in Theoretical and Computational Science (TaCS-CoE), & KMUTT Fixed Point Research Laboratory, Room SCL 802 Fixed Point Laboratory, Science Laboratory Building, Departments of Mathematics, Faculty of Science, King Mongkut’s University of Technology Thonburi (KMUTT), 126 Pracha-Uthit Road, Bang Mod, Thung Khru, Bangkok 10140, Thailand 2 Department of Mathematics, Faculty of Science, Gombe State University, Gombe 760214, Nigeria 3 Department of Medical Research, China Medical University Hospital, China Medical University, Taichung 40402, Taiwan 4 Intelligent and Nonlinear Dynamic Innovations Research Center, Department of Mathematics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok (KMUTNB), Bangkok, Thailand 5 Faculty of Natural Sciences II, Institute of Mathematics, Martin Luther University Halle–Wittenberg, 06099 Halle (Saale), Germany 6 Department of Mathematics, School of Chemical Engineering and Physical Sciences, Lovely Professional University, Phagwara 144411, India 7 Faculty of informatics and Computing, Universiti Sultan Zainal Abidin (UniSZA), Terengganu 22200, Malaysia


Introduction
Let U ⊆ R n be a nonempty closed convex subset where R n is an n−dimensional Euclidean space equipped with the Euclidean norm · . Recall that a mapping P : R n → R n is said to satisfy monotonicity and Lipschitz continuity conditions if for all u, v ∈ R n , the inequalities 0 ≤ (u − v) T (P(u) − P(v)), (1.1) (1.2) hold, respectively. Consider the following nonlinear system of equations where P : U ⊆ R n → R n is a monotone mapping. The problem (1.3) is of great importance to many researchers as it appears in a wide range of applications such as robotic motion control problems, signal recovery problems, image deblurring problems, varational inequality and so on (see [1][2][3][4]).
Quasi-Newton methods are among the very popular iterative methods for solving problem (1.3) as well as the general unconstrained optimization problem, min{p(u) ∈ R : u ∈ R n }. Note that P(u) can be viewed as ∇p(u), that is, the gradient of p at u. The quasi-Newton methods define their search directions as q k := −W k P(u k ), where the matrix W k is either an approximation of the Hessian inverse of p at u k or the approximation of Jacobian of P at u k . The matrix W k is usually updated in every iteration via a suitable updating formula such as DFP (Davidon-Fletcher-Powell), BFGS (Broyden-Fletcher-Goldfarb-Shanno), SR1 (Symmetric rank-one), diagonal updating formulas and so on [5][6][7][8]. However, one of the shortcomings associated with these methods, i.e, DFP, BFGS and SR1, is the need to compute and store matrices in every iteration. Consequently, a number of researchers have modified some quasi-Newton search directions in such a way that they mimic the behaviour of the popular conjugate gradient methods. Conjugate gradient methods are known for their abilities to handle largescale problems efficiently due to the fact that they neither require to form nor store matrices throughout the entire iteration processes.
For instance, based on the BFGS updating formula and Polak-Ribière-Polyak conjugate gradient parameter [9,10], Zhang et al. [11] developed a three-term conjugate gradient method for solving general unconstrained optimization problems. With the aid of the Armijo-type line search, their method was shown to be globally convergent. Also, Andrei [12] proposed another simple three-term conjugate gradient algorithm for unconstrained optimization problem based on the BFGS updating formula. The proposed search direction in [12] satisfies both descent and conjugacy conditions independent of any line search strategy used. In a similar approach, Awwal et al. [13] presented an interesting three-term derivative-free method for solving problem (1.3) with convex constraints. Their algorithm was developed by incorporating the modified Perry conjugate gradient parameter [14] into the modified BFGS updating formula. Their algorithm was also applied to recover some disturbed signals. Very recently, based on a modified scaled SR1 updating formula and the projection technique of Solodov and Svaiter [15], Abubakar et al. [16] proposed another derivative-free method for solving problem (1.3) with convex constraints. The convergence analysis of their method was established based on the monotonicity and Lipschitz continuity assumption on the underlying mapping.
The DFP updating formula is among the earliest quasi-Newton updating formulas that are positive definite. However, it has not received the required attention in recent time, especially with regards to the system of nonlinear equations. In this paper, based on the modified scaled DFP updating formula, we propose a new three-term iterative method for solving problem (1.3) with convex constraints. The new method mimics the conjugate gradient method and also satisfies the sufficient descent property. The followings are some of the contributions of this paper: • This paper presents a new iterative algorithm for solving convex constrained monotone nonlinear system of equations. • It also explores the efficiency as well as the numerical performance of the DFP-based iterative method. • The new search direction is derived in such a way that it satisfies the sufficient descent condition.
• The global convergence of the proposed method is established under mild conditions. • The proposed method is successfully applied on signal recovery problems.
The remaining part of this paper is organized as follows. The proposed method and its convergence analysis are described in Section 2. The numerical performance of the proposed method is discussed in Section 3 and the application is described in Section 4. Finally, some concluding remarks are given in Section 5.

Proposed DFDFP method and its global convergence results
Recall that quasi-Newton methods update their sequence of points {u k } via the recursive formula where the step length t k > 0 is usually obtained via a suitable line search technique. The search direction q k is defined as q k := −W k P(u k ), k = 0, 1, 2, . . . , (2.1) where for k = 0, W 0 = I and I is an identity matrix. For k ≥ 1, W k is a given matrix updated via a suitable formula such that the following secant equation is satisfied with γ k−1 := P(u k ) − P(u k−1 ) and s k−1 := u k − u k−1 .
Consider the DFP updating formula, which was originally proposed by Davidon and later developed by Fletcher and Powell [5], given as Motivated by the approach in [11,13], we consider the following DFP-like formula given as where µ k is a positive parameter to be determined. Note that if µ k = 1 for all k, then (2.3) reduces to the classical DFP updating formula (2.2). The aim here is to derive a DFP-like search direction, q k , where the parameter µ k will be determined in such a way that the search direction q k in (2.1) satisfies the following sufficient descent condition Now, following the approach in [17], setting W k−1 ≡ τ k I in (2.3), where I is an identity matrix, we have where τ k ∈ [z 1 , z 2 ], z 1 , z 2 > 0, is a scaling parameter. Substituting (2.5) into (2.1) gives This means that the search direction (2.6) mimics the behaviour of the classical three-term conjugate gradient method. For the search direction q k in (2.6) to be well-defined, we must ensure that s T k−1 γ k−1 0 and γ k−1 0. Now, redefining γ k−1 as γ k−1 := P(u k ) − P(u k−1 ) + cs k−1 , c > 0, s k−1 := u k − u k−1 and assume that the mapping P is monotone and Lipschitz continuous, we have and Therefore, it is justifiable to redefine the search direction q k as follows where τ k := s k−1 2 γ T k−1 s k−1 , for k ≥ 1. Remark 2.1. The choice of the parameter τ k is motivated by the work of Andrei [17], i.e, Eq (32) of [17], to be specific. The inequality (2.8) guarantees the well-definedness of τ k for all k. Furthermore, we show that τ k ∈ [z 1 , z 2 ], z 1 , z 2 > 0 holds as follows This together with (2.8) gives (2.11) which means that taking z 1 = 1/(c + L) and z 2 = 1/c yields the desired result.
Next, we determine the value of the spectral parameter µ k such that (2.4) is fulfilled for q k given in (2.9). Let k ≥ 1 and P(u k ) 0. Then, we have from the definition of q k in (2.9) (2.12) Since τ k ≥ z 1 , this means that by setting Now, taking α = z 1 α means that the sufficient descent condition (2.4) holds. Therefore, without loss of generality, it makes sense to set the value of µ k as Next, we state the following definition Definition 2.2. Let u be an arbitrary point in R n , then its projection onto the feasible set U is given as The projection operator Γ U (u) satisfies the following inequality (2.14) We now state the steps of the proposed algorithm for solving problem (1.3).
Step 1: Compute P(u k ). If P(u k ) ≤ T ol, stop. Else, go to Step 2.
Step 2: Compute the search direction q 0 := −P(u 0 ) and for k ≥ 1, (2.16) Step 3: Set v k := u k + t k q k , where t k := κρ i such that i is the smallest non-negative integer for which Step 4: If P(v k ) = 0, stop. Otherwise, the next iterate is computed as Step 5: Set k := k + 1 and go to step 1.

Remark 2.3. The line search used in
Step 3 of Algorithm 1 (DFDFP) is a modification of the popular Solodov and Svaiter [15] and was adopted from [18,19]. It was shown in [18] that there exists a step length t k > 0 that satisfies the line search (2.17).
Lemma 2.4. Let P be a mapping that satisfies the monotonicity condition. If u ∈ U is such that P( u) = 0 and the sequence {u k } is generated by Algorithm 1 (DFDFP), then, the lim k→∞ u k − u exists.
Proof. Let u ∈ U such that P( u) = 0, then P( u) T (u k − u) = 0. Then it holds that where the inequality follows from the monotonicity of P, i.e, Next, since 0 < < 2, then by (2.14) and (2.18), we have The inequality (2.21) means that the sequence { u k − u } is a decreasing sequence and therefore the proof is complete.
Lemma 2.5. Let P be a mapping that satisfies the Lipschitz continuity. If the sequence of the search direction {q k } is generated by the Algorithm 1 (DFDFP) then Proof. From Lemma 2.4, we can have some positive constant, say a 1 , such that Next, from (2.15), we have The first inequality follows from Cauchy-Schwarz inequality, the (2.7) and (2.8). The second and the last inequalities follow from (2.24) and (2.11), respectively. Since P is Lipschitz continuous, then it holds that (2.28) From the line search (2.17) and the inequality (2.20), it follows that By multiplying both sides of the inequality (2.29) by P(v k ) −2/h and using (2.28) gives Using the fact that the lim k→∞ u k − u exists, (see, Lemma 2.4) and the fact that 0 < < 2, we have Hence, lim k→∞ t k q k = 0.
(2.32) Theorem 2.6. Suppose that the mapping P is Lipschitz continuous, then the sequence {u k } generated by Algorithm 1 converges to u ∈ U such that P( u) = 0.
Proof. We begin with the claim that lim inf We provide the proof of this claim by contradiction. If (2.33) does not hold, then For t k κ, implies that t k := ρ −1 t k will not satisfy the line search (2.17). Therefore by defining v k := u k + t k q k , then This means applying Cauchy-Schwartz inequality on the inequality (2.4) yields The last inequality follows from (2.27) and (2.28).
On the other hand, using the inequalities (2.4) and (2.34) it is easy to see that Now, rearranging (2.36) and multiplying both sides by q k yields Taking the limits on both sides of (2.38) shows a contradiction with (2.22). Hence, (2.33) must hold. Now, the continuity of P implies that there is some accumulation point u of {u k } such that P( u) = 0. Since lim k→∞ u k − u exists and the fact that u is an accumulation point of {u k }, it then follows that {u k } converges to u.

Numerical experiments on collection of some test problems
Attention now turns to numerical experiments where the performance of the proposed Algorithm 1 (DFDFP) will be demonstrated in comparison with some existing methods. The following two existing methods are considered for the numerical comparison: (i) A paper titled "Solving nonlinear monotone operator equations via modified SR1 update" by Abubakar et al. [16]. This method shall be referred to as MSR1. (ii) A paper titled "Modified Hager-Zhang conjugate gradient methods via singular value analysis for solving monotone nonlinear equations with convex constraint" by Sabi'u et al. [20]. This method will be denoted by MHZ1.
Each of the three algorithms is applied to solve eleven (11) test problems (see Supplementary) by varying their dimensions and starting points (SP). Each test problem is solved using five dimensions: 1, 000, 5, 000, 10, 000, 50, 000, 100, 000 as well as six starting points, namely, . . , 0) T and u 6 = rand(0, 1). Therefore, a total of 330 problems were solved in the course of this experiment. The parameters used in implementing MSR1 and MHZ1 are taken from [16] and [20], while Algorithm 1 (DFDFP) is implemented using the parameters: h = 5, ρ = 0.5, α = 0.1, c = 0.01, σ = 0.01, κ = 1, and = 1.99. All the three algorithms are coded in MATLAB R2019b and run on a PC with intel Core(TM) i5-8250u processor with 4 GB of RAM and CPU 1.60 GHZ. Each algorithm is terminated whenever P(u k ) ≤ 10 −6 . If this condition is not satisfied after 1000 iterations, failure is declared which is denoted as "-". The performance of each algorithm is assessed by #iter (number of iterations), #fval (number of function evaluations) and #time (CPU time) recorded. In addition, for each problem, P( u) (denoted by Norm) is reported to indicate that an algorithm successfully obtained an approximate solution of a particular problem.               The results recorded by each algorithm are reported in Tables 1-11. The Norm reported in the tables revealed that all the three algorithms successfully obtained a solution of each of the test problems considered except for Problem S6 with the initial point of u 2 , where MHZ1 failed. Moreover, the Norm values reported in Tables 1, 2, 3, 6 and 11 showed that DFDFP obtained the solutions of these problems with very high degree of accuracy. Perusing through Tables 1-11, it can be seen that DFDFP solves the entire 330 test problems with least #iter except for Problem S3. The huge difference between #iter recorded by DFDFP against the two existing methods showed the efficiency of the proposed method. The corresponding #fval values also revealed the superior performance of DFDFP method over the two existing methods considered. Lastly, the #time reported in the tables showed that DFDFP is faster than both MSR1 and MHZ1 in virtually all the cases considered in the experiments. On the other hand, Figures 1-3 present the graphical view of the performance recorded by each algorithm. These figures summarized the #iter, #fval and #time reported in Tables 1-11 which was done with the aid of the popular Dolan and Moré performance profile [21]. It can as well be seen from the Figures 1-3 that DFDFP outperforms MSR1 and MHZ1 with wide margin. This underscores the efficiency as well as the robustness of the proposed Algorithm 1 (DFDFP) over its main competitors, namely, MSR1 and MHZ1.

Signal reconstruction problem
Signal processing involves finding some sparse solutions to ill-conditioned system of linear equations. Popular approaches for recovering some disturbed signals include minimizing some objective functions containing a quadratic ( 2 ) error term and a sparse 1 -regularization term. Now, consider the convex unconstrained minimization problem below where η > 0 is a regularization term, u ∈ R n , v ∈ R k is an observation, Q ∈ R k×n (k << n) is a linear operator, · 2 and · 1 denote the norm 2 and norm 1, respectively. The presence of the · 1 in (4.1) makes iterative algorithms involving gradients unsuitable to handle it in its original form. Recently, different derivative-free iterative algorithms for dealing with signal recovery problem have been proposed, thanks to the reformulation of problem (4.1) into nonlinear system of equations by Xiao et al. [23] which was motivated by the work of Figueiredo et al. [22]. Since Algorithm 1 (DFDFP) is derivative-free, we briefly review the reformulation of problem (4.1) into the form of problem (1.3) and then apply the algorithm to solve it.
Define (·) + = max{0, ·} and let a and b be two vectors such that u = a − b with a i = (u i ) + and b i = (−u i ) + for all i = 1, 2, . . . , n. Then Figueiredo et al. [22] have shown that the 1 -regularization problem (4.1) is transformed into the following problem min w≥0 1 2 w T Zw + r T w, It is easy to see that Z is a positive semi-definite matrix, which means that Eq (4.2) is a convex quadratic problem.
Taking it further, Xiao et al. [23] showed that the above convex quadratic problem (4.2) is equivalent to the following system of nonlinear equation where the "min" is interpreted as componentwise minimum. However, recall that the convergence analysis of Algorithm 1 (DFDFP) was established based on the assumptions that the underlying mapping is monotone and Lipschitz continuous. Therefore, the mapping P in (4.3) must satisfy monotonicity and Lipschitzian conditions. Fortunately, Pang [24] has shown that the mapping P in (4.3) is Lipschitzian and in addition, Xiao et al. [23] proved that it is also monotone. Next, we describe the signal reconstruction experiment as follows. Consider the reconstruction of sparse signal with length n from k observations. In the course of the experiment, the size of the signal is chosen as n = 2 11 , k = 2 9 where the original signal contains 2 7 randomly nonzero elements. Furthermore, the measurement v is distributed with some noise, v = Qũ +¯ , with Q being a randomly generated Gaussian matrix and¯ is the Gaussian noise distributed normally with mean 0 and variance 10 −4 .
The performance of the proposed Algorithm 1 (DFDFP) on signal reconstruction is assessed by comparing it with DPP algorithm [13]. The two algorithms are coded in MATLAB R2019b and run on a PC with intel Core(TM) i5-8250u processor with 4 GB of RAM and CPU 1.60 GHZ. The parameters used for Algorithm 1 (DFDFP) are the same as in Section 3 except that α = 1 τ k − 1, whereas that of DPP are taken from [13]. Moreover, both algorithms were run from the same initial point u 0 = Q T v and the same continuation technique on the parameter η. The following termination criteria p(u k ) − p(u k−1 ) p(u k−1 ) < 10 −5 , is used throughout the experiment. The numerical performance of each algorithm is assessed by Iter (number of iterations) and Time (CPU time) required to successfully recover the disturbed signal. In addition, the quality of the reconstruction of the disturbed signal is assessed by MSE (mean of squared error) to the original signalũ, that is, MS E = 1 n ũ − u * 2 , where u * is the recovered signal. Figures 4-5 show that the DFDFP and DPP successfully reconstruct the disturbed signal. However, the quality of the reconstruction varies as can be seen from the MSEs recorded by each algorithm. The MSE recorded by DFDFP is 0.000926 which is less than that recorded by DPP, that is 0.00179. This means that the quality of the signal recovered by the proposed Algorithm 1 (DFDFP) is better than that of DPP. In addition, Algorithm 1 (DFDFP) successfully reconstructed the disturbed signal in less than 1 second with 89 iterations. Comparing with Iter= 145 and Time= 1.77 seconds recorded by DPP, it shows that DFDFP is faster and very efficient in signal reconstruction problem.

Conclusions
Based on the quasi-Newton updating formula, line search strategy and the projection technique, a Davidon-Fletcher-Powell (DFP) like derivative-free method for solving nonlinear monotone system of equations with convex constraints has been proposed. We have shown that the proposed method is globally convergent to a solution of problem (1.3). In addition, the search direction satisfies the sufficient descent condition independent of the line search used. Numerical results reported showed that the proposed method outperformed two existing methods [16,20] on a collection of some test problems and was able to recover some noisy signals in compressive sensing with a better quality recovery than the method in [13]. Future work includes extending the new algorithm to solve robotic motion control.