An accelerated forward-backward algorithm with a new linesearch for convex minimization problems and its applications

Abstract: We study and investigate a convex minimization problem of the sum of two convex functions in the setting of a Hilbert space. By assuming the Lipschitz continuity of the gradient of the function, many optimization methods have been invented, where the stepsizes of those algorithms depend on the Lipschitz constant. However, finding such a Lipschitz constant is not an easy task in general practice. In this work, by using a new modification of the linesearches of Cruz and Nghia [7] and Kankam et al. [14] and an inertial technique, we introduce an accelerated algorithm without any Lipschitz continuity assumption on the gradient. Subsequently, a weak convergence result of the proposed method is established. As applications, we apply and analyze our method for solving an image restoration problem and a regression problem. Numerical experiments show that our method has a higher efficiency than the well-known methods in the literature.


Introduction
Throughout this article, we suppose that H is a real Hilbert space with an inner product ·, · and the induced norm · . We are interested in studying the following unconstrained convex minimization problem: minimize f 1 (x) + f 2 (x), (1.1) subject to x ∈ H where f 1 , f 2 : H → R ∪ {∞} are two proper, lower semi-continuous and convex functions such that f 1 is differentiable on an open set containing the domain of f 2 . Problem (1.1) has been widely studied due to its applications which can be used in various real-world applications such as in signal and image processing, in regression problems, and in classification problems, etc., see [3,5,8,10,11,13] and the references therein. One of the important topics of studying Problem (1.1) is to invent some efficient procedures for approximating minimizers of f 1 + f 2 . Various optimization methods were introduced and developed by many researchers, see [3, 5-8, 12, 14-16, 25-27, 32], for instance. If a minimizer x * of f 1 + f 2 exists, it is known that x * is characterized by the fixed point equation of the forward-backward operator where α > 0, prox f 2 is the proximity operator of f 2 and ∇ f 1 stands for the gradient of f 1 . The above equation leads to the following iterative method: where 0 < α k < 2 L and L is a Lipschitz constant of ∇ f 1 . This method is well known as the forward-backward splitting algorithm [8,15], which includes the proximal point algorithm [17,24], the gradient method [4,9] and the CQ algorithm [1] as special cases. It is observed from Method 1 that we need to assume the Lipschitz continuity condition on the gradient of f 1 and the stepsize α k depends on the Lipschitz constant L, which is not an easy task to find in general practice (see [3,5,8,12] for other relevant methods).
In the sequel, we set the standing hypotheses on Problem (1.1) as follows: (H1) f 1 , f 2 : H → R ∪ {∞} are two proper, lower semi-continuous and convex functions with dom f 2 ⊆ dom f 1 , and dom f 2 is nonempty, closed and convex; (H2) f 1 is differentiable on an open set containing dom f 2 . The gradient ∇ f 1 is uniformly continuous on any bounded subset of dom f 2 and maps any bounded subset of dom f 2 to a bounded set in H.
We note that the second part of (H2) is a weaker assumption than the Lipschitz continuity assumption on ∇ f 1 .
Linesearch 1 is a particular case of the linesearch proposed in [29] for inclusion problems and it was shown that this linesearch is well defined, that is, it stops after finitely many steps, see [7,Lemma 3.1] and [29,Theorem 3.4(a)]. Cruz and Nghia [7] employed the forward-backward iteration where the stepsize α k is generated by Linesearch 1.
In optimization theory, to speed up the convergence of iterative methods, many mathematicians often use the inertial-type extrapolation [20,22] by adding the technical term β k (x k − x k−1 ). The control parameter β k is called an inertial parameter, which controls the momentum x k − x k−1 . Using Linesearch 1, Cruz and Nghia [7] also introduced an accelerated algorithm with the inertial technical term as follows.
The technique of choosing β k in Method 3 was first mentioned in the fast iterative shrinkage-thresholding algorithm (FISTA) by Beck and Teboulle [5]. Weak convergence results of Methods 2 and 3 were obtained for solving Problem (1.1) with (H1) and (H2).
A weak convergence theorem of Method 4 was proved and an application in signal recovery was illustrated, see [14].
In this paper, inspired and motivated by the results of Cruz and Nghia [7] and Kankam et al. [14], and other related researches, we aim to improve Linesearches 1 and 2 and introduce a new accelerated algorithm using our proposed linesearch for the convex minimization problem of the sum of two convex functions. The paper is organized as follows. Basic definitions, notations and some useful tools for proving our convergence results are given in Section 2. Our main result is in Section 3. In this section, we introduce a new modification of Linesearches 1 and 2 and present a double forward-backward algorithm by using an inertial technique for solving Problem (1.1) with Hypotheses (H1) and (H2). After that, a weak convergence theorem of the proposed method is proved. The complexity of our reduced algorithm is also discussed. In Section 4, we apply the convex minimization problem to an image restoration problem and a regression problem. We analyze and illustrate the convergence behavior of our method, and also compare its efficiency with the well-known methods in the literature.

Notations and tools
The mathematical symbols used throughout this paper are as follows. R, R + and N are the set of real numbers, the set of nonnegative real numbers, and the set of positive integers, respectively. I d stands for the identity operator on H. Denote weak and strong convergence of a sequence {x k } ⊂ H to x ∈ H by x k x and x k → x, respectively. The set of all weak-cluster points of {x k } is denoted by ω w (x k ). If C is a nonempty closed convex subset of H, then P C stands for the metric projection from H onto C, i.e., for each x ∈ H, P C x is the unique element in C such that x − P C x = dist(x, C) := inf y∈C x − y .
Let us recall the concept of the proximity operator which extends the notion of the metric projection. Let f : H → R ∪ {∞} be a proper, lower semi-continuous and convex function. The proximity (or proximal) operator [2,18] of f , denoted by prox f is defined for each x ∈ H, prox f x is the unique solution of the minimization problem: The proximity operator can be formulated in the equivalent form where ∂ f is the subdifferential of f defined by Moreover, we have the following useful fact: The following is a property of the subdifferential operator.
satisfies that x k x and y k → y, then (x, y) ∈ Gph(∂ f ).
We end this section by providing useful tools for proving our main results.
Fact. Let x, y ∈ H. The following inequalities hold on H: 12]). Let {a k } and {t k } be two sequences of nonnegative real numbers such that Then the following holds ). Let {a k } and {b k } be two sequences of nonnegative real numbers such that a k+1 ≤ a k + b k for all k ∈ N. If ∞ k=1 b k < ∞, then lim k→∞ a k exists. Lemma 2.4 (Opial [19]). Let {x k } be a sequence in H such that there exists a nonempty set Ω ⊂ H satisfying: Then, {x k } converges weakly to a point in Ω.

Methods and convergence analysis
In this section, using the idea of Linesearches 1 and 2, we introduce a new linesearch and present an inertial double forward-backward algorithm with the proposed linesearch for solving the convex minimization problem of the sum of two convex functions without any Lipschitz continuity assumption on the gradient. A weak convergence result of our proposed method is analyzed and established.
We now focus on Problem (1.1) and assume that f 1 and f 2 satisfy Hypotheses (H1) and (H2). The solution set of (1.1) is denoted by Ω. Also, suppose that Ω ∅. For simplicity, we let F := f 1 + f 2 and denote by FB α the forward-backward operator of f 1 and f 2 with respect to α, that is, FB α := prox α f 2 (I d − α∇ f 1 ). Here, our linesearch is designed as follows.
Remark 3.1. The loop termination condition of Linesearch 3 is weaker than that of Linesearch 2. Indeed, since it follows from the well-definedness of Linesearch 2 that our linesearch also stops after finitely many steps, see [14, Lemma 3.2] for more details.
Using Linesearch 3, we propose the following iterative method with the inertial technical term.
Step 2. Compute Set k := k + 1 and return to Step 1.
To verify the convergence of Method 5, the following result is needed.
Let {x k } be the sequence generated by Method 5. For each k ≥ 1 and p ∈ dom f 2 , we have Proof. Let k ≥ 1 and p ∈ dom f 2 . By (2.3), (3.1) and (3.2), we get
Then, by the definition of the subdifferential of f 2 , we have and By the assumptions on f 1 and f 2 , we have the following fact:
The following method is obtained by reducing the inertial step in Method 5.
We also analyze the convergence and the complexity of Method 6.

It follows that
By the decreasing property of {F (x k )}, we get It follows from (3.26) that We note that the stepsize condition on {α k } in Theorems 3.3 and 3.4 needs the boundedness from below by a positive real number. Next, we show that this condition can be guaranteed by the Lipschitz continuity assumption on the gradient.
Let C ⊂ H be a nonempty closed convex set. Recall that an operator T : C → H is said to be Lipschitz continuous if there exists L > 0 such that T x − T y ≤ L x − y for all x, y ∈ C. Proposition 3.5. Let {α k } be the sequence generated by Linesearch 3 of Method 5 (or Method 6). If ∇ f 1 is Lipschitz continuous on dom f 2 with a coefficient L > 0, then α k ≥ min σ, δθ L for all k ∈ N. Proof. Let ∇ f 1 be L-Lipschitz continuous on dom f 2 . From Linesearch 3, we know that α k ≤ σ for all k ∈ N. If α k < σ, then α k = σθ m k where m k is the smallest positive integer such that Letᾱ k := α k θ > 0. By the Lipschitz continuity of ∇ f 1 and the above expression, we havē which implies that α k > δθ L . Consequently, α k ≥ min σ, δθ L for all k ∈ N. Remark 3.6. It is worth mentioning again that the Lipschitz continuity assumption on the gradient of f 1 is sufficient for Hypothesis (H2). However, if we assume this assumption further, the calculation of the stepsize α k generated by Linesearch 3 is still independent of the Lipschitz constant.

Applications
In this section, we apply the convex minimization problem, Problem (1.1), to an image restoration problem and a regression problem. To slove these problems, we analyze and illustrate the convergence behavior of our method (Method 5) and compare its efficiency with Methods 1-4 and 6. We also consider the following method with another linesearch strategy [21]: where α k := Linesearch 4(x k ,σ,θ).
Note that Method 7 is well known as the FISTA with backtracking, see [5,25].
All experiments and visualizations are done with MATLAB and are performed on a laptop computer (Intel Core-i5/4.00 GB RAM/Windows 8/64-bit).

Image restoration problems
Many problems rising in image and signal processing, especially the image restoration problem, can be formulated as the following equation: where x ∈ R N is an original image, y ∈ R M is the observed image, ε is an additive noise and A ∈ R M×N is the blurring operation. To approximate the original image, we need to minimize the value of ε by using the LASSO model [28]: where λ > 0 is a regularization parameter, · 1 is the l 1 -norm, and · 2 is the Euclidean norm. In this situation, we apply Problem (1.1) to the LASSO model where A = RW, R is the matrix representing the blur operator and W is the inverse of a three stage Haar wavelet transform. We use the peak signal-to-noise ratio (PSNR) in decibel (dB) [30] and the structural similarity index metric (SSIM) [33] as the image quality measures, which are formulated as follows: PS NR(x k ) = 10 log 10 M is the number of image samples, and x is the original image and where {ν x k , ξ x k } and {ν x , ξ x } denote the mean intensity and standard deviation set of the deblerring image x k and the original image x, respectively, ξ x n x denotes its cross correlation, and C 1 , C 2 are small constants value to avoid instability problem when the denominator is too close to zero.
Example 4.1. Consider two grayscale images, Cameraman (see Figure 1 (a)) and Boy (see Figure 2 (a)), with size of 256 × 256, which are contaminated by Gaussian blur of filter size 9 × 9 with standard deviationσ = 4, noise 10 −5 (see Figure 1 (b) and Figure 2 (b)). Firstly, we test the efficiency of Method 5 for recovering the Caneraman image at the 300th iteration by choosing various linesearch parameters and different starting points as shown in Table 1.  It is observed from Table 1 that the linesearch parameter σ influences the image recovery performance of Method 5, while choosing the starting point x 1 has no significant impact on the convergence behavior of our method.
Next, we test and analyze the image recovery performance of Methods 1-7 for recovering the images (Cameraman and Boy) by setting the parameters as in Table 2 and by choosing the blurred images as the starting points. The maximum iteration number for all methods is fixed at 500. The comparative experiments for recovering the Cameraman and Boy images are as follows. Original images, blurry image contaminated by Gaussian blur and restored images by Methods 1-7 are shown in Figures 1 and 2. The PSNR and SSIM results are shown in Figures 3 and 4. It can be seen that Method 5 gives the higher values of PSNR and SSIM than the others, meanwhile Method 6 gives the better results than the other methods without the inertial step. Therefore, our method has the highest image recovery efficiency comparing with the studied methods.
Original images, blurry image contaminated by Gaussian blur and restored images by Methods 1-5 are shown in Figures 1 and 2. The PSNR and SSIM results are shown in Figures 3 and 4. It can be seen that Method 5 gives the higher values of PSNR and SSIM than the other tested methods. Therefore, our method has the highest image recovery efficiency comparing with other methods.

Regression Problems
We first introduce a brief concept of extreme learning machine (ELM) for regression problems. Let S = {(x k , t k ) : x k ∈ R n , t k ∈ R m , k = 1, 2, ..., N } be a training set of N distinct samples where x k is an input data and t k is a target. For any single hidden layer of ELM, the output of the i-th hidden node is where G is an activation function, a i and b i are parameters of the i-th hidden node. The output function of ELM for a single-hidden layer feedforward neural networks (SLFNs) with M hidden nodes is        where ξi is the output weight of the i-th hidden node. The hidden layer output matrix H is defined by The main goal of ELM is to find ξ = [ξ 1 , ..., ξ M ] such that 16 Figure 4. The comparison of PSNR and SSIM values for blurred images and restored images by the deblurring methods.

Regression problems
We first introduce a brief concept of extreme learning machine (ELM) for regression problems. Let S = {(x k , t k ) : x k ∈ R n , t k ∈ R m , k = 1, 2, ..., N} be a training set of N distinct samples where x k is an input data and t k is a target. For any single hidden layer of ELM, the output of the i-th hidden node is where G is an activation function, a i and b i are parameters of the i-th hidden node. The output function of ELM for a single-hidden layer feedforward neural networks (SLFNs) with M hidden nodes is where ξ i is the output weight of the i-th hidden node. The hidden layer output matrix H is defined by The main goal of ELM is to find ξ = [ξ 1 , ..., ξ M ] such that and find the optimal weight ξ k of this problem by employing Methods 1-5 with certain number of iterations; • Using sigmoid as an activation function generates the hidden layer output matrix H 2 of the testing matrix V with the hidden node M = 100; • Calculate the output O k = H 2 ξ k and calculate the mean squared error (MSE) at k-th iteration by where N is the number of distinct samples.
The parameters for each prediction method are set as in Table 3. Now a graph of predicting the sine function by Methods 1-7 at the 500th iteration is shown in Figure 5. We show a comparative result of iteration numbers, mean squared error values and times in Table 4 when the stopping criterion is set as: MS E ≤ 10 −3 or at the 10000th iteration. Also, a graph of the mean squared error is shown in Figure 6. It can be observed that Method 5 gives a better performance for predicting the sine function than the other tested methods.   Figure 6. Plot of the MSE value by the prediction methods.

Conclusions
In this paper, we discuss the convex minimization problem of the sum of two convex functions in a Hilbert space. The challenge of removing the Lipschitz continuity assumption on the gradient of the function attracts us to study the concept of the linesearch process. We introduce a new linesearch and propose an inertial forward-backward algorithm whose stepsize does not depend on any Lipschitz constant for solving the considered problem without any Lipschitz continuity condition on the gradient. It is shown that the sequence generated by our proposed method converges weakly to a minimizer of the sum of those two convex functions under some mild control conditions. As applications, we employ our method to recover blurry images in an image restoration problem and predict the sine function in a regression problem. The results of the experiments show that our method has a higher efficiency than the well-known methods in [5,7,14,15,25].