A Nesterov type algorithm with double Tikhonov regularization: fast convergence of the function values and strong convergence to the minimal norm solution

We investigate the strong convergence properties of a Nesterov type algorithm with two Tikhonov regularization terms in connection to the minimization problem of a smooth convex function $f.$ We show that the generated sequences converge strongly to the minimal norm element from $\argmin f$. We also show that from a practical point of view the Tikhonov regularization does not affect Nesterov's optimal convergence rate of order $\mathcal{O}(n^{-2})$ for the potential energies $f(x_n)-\min f$ and $f(y_n)-\min f$, where $(x_n),\,(y_n)$ are the sequences generated by our algorithm. Further, we obtain fast convergence to zero of the discrete velocity, but also some estimates concerning the value of the gradient of the objective function in the generated sequences.


Introduction
Let H be a Hilbert space endowed with the scalar product •, • and norm • and consider the optimization problem inf where f : H −→ R is a convex, continuously Fréchet differentiable function, with L-Lipschitz continuous gradient, whose set of minimizers argmin f is nonempty.We associate to the optimization problem (1) the following inertial-gradient type algorithm.Let x 0 , x 1 ∈ H and for all k ≥ 1 set We assume that s ∈ 0, 1 L and the sequence (c k ) k≥1 is nonnegative for k big enough and satisfies lim k→+∞ c k = 0. Further, we assume that (ǫ k ) k≥1 is a non-increasing positive sequence that satisfies lim k→+∞ ǫ k = 0. Observe that in case the inertial parameter (b k ) k≥0 satisfies lim k→+∞ b k = 1, then Algorithm (2) has the form of the famous Nesterov algorithm, (see [22,15] and also [11,18]), with two Tikhonov regularization terms.Indeed, the terms c k x k and ǫ k y k in Algorithm (2) play the role of Tikhonov regularization terms, consequently our aim is to obtain the strong convergence of the generated sequences to the element of minimal norm from argmin f, (see [1,3,6,8,9,10,12,14,16,17,19,20,25,26]) and at the same time to preserve the optimality of Nesterov algorithm concerning the convergence rate of order O(k −2 ) for the potential energy f (x k ) − min f , (see [22,15] ).Our analysis reveals that the inertial parameter and the Tikhonov regularization parameters are strongly correlated.This fact is in concordance with some recent results from the literature concerning the strong convergence of the trajectories of some continuous second order dynamical systems to a minimal norm minimizer of a convex function or to the minimal norm zero of a maximally monotone operator [1,2,6,7,10,12,13,14,19].Concerning the discrete case, that is, the case of inertial algorithms that converge strongly to the minimal norm solution of a convex optimization problem, there are only few results in the literature, see [10,20] and also those refer to proximal inertial algorithms obtained via implicit discretizations of some second order continuous dynamical systems, (see [24,19]).
Indeed, in [10] the following inertial-proximal algorithm was considered in connection to the optimization problem (1): , where α > 3, c > 0 and prox f : H → H, prox f (x) = argmin y∈H f (y) + 1 2 y − x 2 , denotes the proximal point operator of the convex function f .Due to our best knowledge this is the first inertial algorithm in the literature for which both strong convergence results for the generated sequences and fast convergence of the potential energy f (x k ) − min f and discrete velocity x k − x k−1 were obtained.However, from practical point of view, it is not natural that the minimizers of a smooth function to be approximated via proximal, i.e. backward, steps.Another drawback of this algorithm is that does not assure the full strong convergence of the generated sequences to the minimum norm minimizer x * .Indeed, according to [10] only the strong convergence result lim inf k→+∞ x k − x * = 0 is provided.In order to overcome these deficiencies in [20] the author assumed that the objective function in (1) is proper, convex and lower semicontinuous only and associated to this optimization problem the following inertial-proximal algorithm: , where α, q, c, p > 0 and (λ k ) is a sequence of positive real numbers.According to [20], in case the stepsize λ k ≡ 1 and 0 < q < 1, 1 < p < q+1 the full convergence of the generated sequences to the minimum norm minimizer x * is obtained, i.e. lim k→+∞ x k − x * = 0. Further, the fast convergence of the potential energy f (x k ) − min f and discrete velocity x k − x k−1 were shown.
In concordance to the results emphasized above, the main goal of this paper is to obtain similar results for gradient type inertial algorithms.Unfortunately our parameters in Algorithm (2) will not have such simple forms as the parameters in [10] or [20] and this is due to the fact that we cannot use a discrete Lyapunov function of similar form as the ones considered in [10,20], instead we have to construct a new discrete Lyapunov function suitable for our analysis.Therefore, the forms of the given sequences (b k ) k≥0 , (c k ) k≥1 are crucial in order to obtain our results.More precisely, for given Tikhonov regularization parameter (ǫ k ) k≥1 and a fixed stepsize s consider the sequence (q k ) k≥0 which after an index k big enough, satisfies Then, the inertial parameter (b k ) k≥0 and the regularization parameter (c k ) k≥1 from Algorithm (2) are defined via the conditions Note that despite of the complex form of these parameters, from a practical perspective, Algorithm (2) can easily be implemented.A comprehensive analysis of the above conditions will be carried out in section 4.Here we just underline that in case we specify the parameters as ǫ k = c k p , c, p > 0 and we take q k = ak q , a > 0, 0 < q < 1 then (Q) is satisfied for every fixed stepsize s ∈ 0, 1  L and the main result of the paper can be summarized in the following theorem.
Theorem 1.For p < 2q let (x k ) k≥0 and (y k ) k≥1 be the sequences generated by Algorithm (2).Then, (x k ) and (y k ) converge strongly to x * , where {x * } = pr argmin f (0) is the minimum norm minimizer of our objective function f.Further, f The paper is organized as follows.In the next section we present some preliminary results and notions that we need to carry out our analysis.In section 3 we prove the main result of the paper.We obtain strong convergence of the sequences generated by Algorithm (2) and also fast convergence of the potential energy and discrete velocity.In section 4 we consider the parameters in a simple form and discuss the conditions these parameters must satisfy in order to obtain the results presented at section 3. Further, in section 5 via some numerical experiments we show that Algorithm (2) indeed assures the convergence of the generated sequences to a minimal norm solution and also that both Tikhonov regularization terms are indispensable in order to obtain this result.Finally, we conclude our paper with some future research plans.

Preliminary results
In order to obtain strong convergence for the sequence (x k ) generated by Algorithm (2) we need some preliminary results.The first one is the Descent Lemma [21].
Further, we need the following property of smooth, convex functions, see [21].
Lemma 3. Let f : H −→ R be a convex smooth function, with L−Lipschitz continuous gradient.Then, Our first original result is a modified descent lemma, which in particular contains Lemma 1 from [5], however has a considerable simplified proof.
Assume further that s ∈ 0, 1 L .Then, Proof.Indeed, by taking x = y − s∇f (y) in Lemma 2, we get From Lemma 3 we have Combining ( 5) and ( 6) we get leads to (4), that is We continue the present section by emphasizing the main idea behind the Tikhonov regularization, which will assure strong convergence results for the sequence generated our algorithm (2) to a minimizer of minimal norm of the objective function f .By x k we denote the unique solution of the strongly convex minimization problem We know, (see for instance [8]), that lim x is the minimal norm element from the set argmin f.Obviously, {x * } = pr argmin f 0 and we have the inequality x k ≤ x * (see [12]).Since x k is the unique minimizer of the strongly convex function Further, from Lemma A.1 c) from [20] we have Note that since f k is strongly convex, from the gradient inequality we have In particular Moreover, observe that for all x, y ∈ H, one has Hence, if we apply Lemma 4 to f k we get that for all s ∈ 0, 1 L+ǫ k one has Now, we can rewrite Algorithm (2) in a more convenable equivalent form, by using the strongly convex function f k .Indeed, since ∇f k (x) = ∇f (x) + ǫ k x, Algorithm (2) can equivalently be written as:

Strong convergence
In this section we provide sufficient conditions such that the sequences generated by ( 13) converge strongly to the minimum norm minimizer of f and at the same time fast convergence of the function values in the generated sequences and also fast convergence of the discrete velocity to zero are obtained.Moreover, we also show some pointwise estimates for the gradient of the objective function.
In order to obtain our general result concerning the strong convergence of the sequences generated by the algorithm (13) we need to use (12), hence we adjust the indexes in algorithm (13) as follows.
Assume that 0 < s < 1 L and let k 0 ∈ N such that the following assumption holds.
Note that such index k 0 exists, since ǫ k is nonincreasing and Consider the sequence (q k ) k≥0 which after an index The following general result holds.
Theorem 5.For a sequence (q k ) satisfying (Q) and the stepsize s satisfying (S), consider the sequences (b k ) k≥0 and (c k ) k≥1 defined at (B) and (C) and let (x k ) k≥0 , (y k ) k≥1 be the sequences generated by Algorithm (2).Assume that the sequence Then, (x k ) converges strongly to x * , where {x * } = pr argmin f (0) is the minimum norm minimizer of our objective function f.Moreover Further, the following estimates hold.
Proof.Assume that k ≥ k.We take y = y k , x = x k in ( 12) and we get Now we take y = y k , x = x * in ( 12) and taking into account that ∇f (x * ) = 0 we get Consider the sequence (p k ) k≥k defined by for all k ≥ k.Note that due to assumption (Q) one has p k ≥ 0 for all k ≥ k.
We multiply (14) with p k and (15) with q k and add to get for all k ≥ k.Now by neglecting the nonpositive terms from the right hand side of (17) we obtain for all k ≥ k.Further, by using the fact that (ǫ k ) is nonincreasing we have In one hand, according to (10) one has On the other hand, by using the gradient inequality we get Hence, combining ( 18), ( 19), ( 20) and ( 21) we obtain Now, according to the form of p k and condition (Q) one has for all k ≥ k.Consequently, (22) leads to For all k ≥ k, consider now the sequence Recall that hence, by using Algorithm (13) one has In what follows we show that Indeed, by using ( 24) we get 23) and ( 25) lead to

Consequently, by denoting
for all k ≥ k.

Consider now the sequence π
. Note that (π k ) k≥k is well defined, positive and increasing since by the hypotheses we have q k−1 ≥ 2s for all k ≥ k.Further, since Now, by multiplying (26) with π k we obtain for all k > k.
By summing up (27 for some C > 0.
Next we show that Indeed, according to the hypotheses q 2 n ǫ n → +∞ as n → +∞ and we know that (π n ) is increasing, hence πn x * +C πn = o(q 2 n ǫ n ) as n → +∞.Further, since ) is increasing and lim n→+∞ q 2 n ǫ n π n = +∞, by using the fact that lim n→+∞ x * , x * − x n+1 = 0, via the Cesàro-Stolz theorem we get Finally, according to the hypotheses lim n→+∞ qn(ǫn−ǫ n+1 ) ǫn = 0, hence for some M > 0 one has Consequently, from (28) we get as n → +∞ and by using (11) we get In order to show strong convergence, we use (10) and we get lim Concerning the rates of convergence for the discrete velocity x n − x n−1 we conclude the following.From the definition of E n and the fact that E n = o q 2 n ǫ n as n → +∞ we have that Now, using the definition of η n and the fact that Now, since (q 2 n ǫ n ) is increasing, and (ǫ n ) is non-increasing we deduce that (q n ) is increasing.Further, since lim n→+∞ q 2 n ǫ n = +∞ and lim n→+∞ ǫ n = 0 we obtain that lim n→+∞ q n = +∞.Consequently, for n big enough one has Consequently, by using the fact that (x n ) is bounded we have which combined with the fact that lim n→+∞ ηn−x * qn But (b n ) is bounded and according to our hypothese cannot go to 0 as n → +∞, hence From here we deduce at once that From ( 24) we have 1 ∇f n (y n ) and using the fact that which combined with the facts that Finally, by using Lemma 2 we get 4 Particular choice of the parameter sequences (q k ) and (ǫ k ) Let us consider a specific choice of the sequences (q k ) and (ǫ k ) being polynomial type, namely, q k = ak q , ǫ k = c k p , where 1 ≥ q > 0, p > 0 and a and c are positive real numbers.Let us fix 0 < s < 1 L .Then from condition (S) we have s ≤ 1 L+ǫ k 0 for some k 0 ∈ N, and it is an easy computation that where int(x) denotes the integer part of x.Now we compute the index Note that the second condition if always fulfilled starting from k large enough due to q and p being positive.Now consider the case q < 1.Since Concerning the sequence (b k ) k≥0 in this particular case condition (B) becomes: Note that b k → 1 as k → +∞.Further, condition (C) becomes: Note that c k > 0 for k big enough, further (c k ) is nonincreasing and c k → 0 as k → +∞.Hence, indeed in this case the term c k x k in Algorithm (2) plays the role of a Tikhonov regularization term.More precisely, Algorithm (2) reads as: x 0 , x 1 ∈ H and for all k ≥ 1 Note that from a numerical point of view Algorithm (29) can easily be implemented.In this particular case, we have the following result.Theorem 6.Let 0 < q < 1 and 0 < p < 2q and for a fixed the stepsize s ∈ 0, 1  L consider the sequences (x k ) k∈N , (y k ) k∈N generated by Algorithm (29).Then, (x k ) and (y k ) converge strongly to x * , where {x * } = pr argmin f (0) is the minimum norm minimizer of our objective function f.Further, Proof.We only need to show that the following conditions from the hypotheses of Theorem 5 hold: • the sequence q k ǫ k q k−1 ǫ k−1 = k q−p (k−1) q−p is bounded if we take the starting index big enough; • the sequence q 2 k ǫ k = a 2 ck 2q−p is increasing after a starting index big enough; • lim k→+∞ q 2 k ǫ k = lim k→+∞ a 2 ck 2q−p = +∞; First of all, the sequence k q−p (k−1) q−p is indeed bounded if we take the starting index big enough since lim k→+∞ k q−p (k−1) q−p = 1.Secondly, the sequence a 2 ck 2q−p is increasing when 2q > p and lim k→+∞ a 2 ck 2q−p = +∞ when 2q > p. Finally, Remark 7. We emphasize that Algorithm (29) can be seen as a Nesterov type algorithm with two Tikhonov regularization terms.Indeed, the extrapolation parameter (b k ) goes to 1 as k → +∞ such as in the Nesterov algorithm.Further, the terms c k x k and ǫ k y k can be thought as Tikhonov regularization terms since both c k and ǫ k are nonnegative and nonincreasing sequences (after k big enough), and goes to 0 as k → +∞.Unfortunately we could not allow the case q = 1 and p = 2 in our algorithm.Nevertheless, if p is close to 2, (and q is close to 1), then from a numerical perspective the convergence rates obtained for the potential energy f (x k ) − min f and discrete velocity x k − x k−1 are as good as the rates obtained for the famous Nesterov algorithm, see [11].Moreover, our algorithm assures the strong convergence of the generated sequences to the minimum norm minimizer a feature that makes it unique in the literature.

Numerical experiments
In this section we consider some numerical experiments in order to sustain the theoretical results obtained in Theorem 6.To this purpose, let us consider the objective function where a, b ∈ R \ {0}.Then obviously f is smooth and convex and its gradient is Lipschitz continuous, having Lipschitz constant ). Observe that the minimal value of f is 0 and the set argmin f is x, − a b x : x ∈ R , further clearly (0, 0) is the minimizer of minimal norm.For simplicity in the following experiments concerning Algorithm (29) we take everywhere q k = k 4 5 and s = 0.1, (which always will satisfy s < 1 L ), and fix the starting points x 0 = (1, −1) and x 1 = (−1, 1).In our first experiment we fix a = 0.1 and b = 100 and in Algorithm (29) we set ǫ k = 1 k p , where p ∈ {0.3, 0.6, 0.9, 1.2, 1.5}.Further, we consider the case when there is no Tikhonov regularization, that is the case of a Nesterov type algorithm by taking ǫ k ≡ 0 and c k ≡ 0. In order to show the theoretical rates obtained in Theorem 6 for the discrete velocity x k − x k−1 and the potential energy f (x k ) − min f we also represent the values 1/k and f (x 1 )/k 2 , respectively.
So we run Algorithm (29) for 20 iterations, the results are shown in Figure 1.Observe that indeed, our algorithm has a similar (even better) behavior as the Nesterov type algorithm, the convergence rates for the discrete velocity and potential energy are of order o(1/k) and O(1/k 2 ), respectively.Further, the Tikhonov regularization does not affect the optimal rates, even more, while we increase p these rates become better.
In our second experiment for a = 1, b = 5 we show the influence of the Tikhonov regularization terms ǫ k y k and c k x k on the behaviour of the iterates of the algorithm.In the next figures we represent the first component of the iterates x k with red meanwhile the second component will be represented with blue.
First, we analyze what happens if we renounce to both Tikhonov regularization terms.So let us put both c k ≡ 0 and ǫ k ≡ 0 in Algorithm (29).According to Figure 2 in this case there is no convergence to the minimal norm element.we show that in order to have convergence to the minimal norm element the presence of both Tikhonov regularization terms are essential.To this purpose, we take ǫ k = 1/k 3 2 , (and the corresponding c k ), in order to show convergence to the minimum norm minimizer and also ǫ k ≡ 0 to show that in this case our algorithm does not converge anymore to the minimum norm minimizer, see Figure 3(a).Note that in case ǫ k ≡ 0 the parameter c k in Algorithm (29) becomes c k = 2s 2 (k−1) q k q , hence the term c k x k in the formulation of y k still has the role of a Tikhonov regularization term.Further, we consider the case c k ≡ 0, but ǫ k = 1/k   As we can see, in the absence of one of the Tikhonov regularization terms we do not have the convergence to the element of the minimal norm.Hence, according to the last two figures the presence of double Tikhonov regularization terms in our algorithm is fully justified.

Conclusions, perspectives
Due to our best knowledge, Algorithm (2) and in particular Algorithm (29) are the first inertial gradient type algorithms considered in the literature that assure strong convergence to the minimum norm minimizer of a smooth convex function and also fast convergence of the function values and discrete velocity.As we have emphasized in the paper these algorithms can be seen as Nesterov type algorithms with two Tikhonov regularization terms.Despite of the complex structure of the inertial parameter and one of the Tikhonov regularization parameters our algorithms can easily be implemented, therefore are suitable for use in practical problems arising in image processing and machine learning.As a future related research we mention here the forward-backward algorithms with Tikhonov regularization associated to the minimization problem having in its objective the sum of a proper convex lower semicontinuous function and a smooth convex function with Lipschitz continuous gradient.In our opinion similar results to those provided in Theorem 6 can be obtained.Indeed, the success of such research is promising taking into account that in [20] strong convergence of an inertial-proximal algorithm to the minimal norm minimizer of a proper convex and lower semicontinuous function is shown, meanwhile in the present paper we obtained similar results for inertial gradient type algorithm in connection to a smooth convex optimization problem.

Lemma 4 .
Let f : H −→ R be a convex smooth function, with L−Lipschitz continuous gradient and let s > 0.Then, consequently the sequences b k and c k defined at (B) and (C) have the following forms:

Figure 1 :
Figure 1: Different choices of ǫ k
The absence of the term c k x k

Figure 3 :
Figure 3: Dropping one of the Tikhonov regularization terms in Algorithm (29) we do not have convergence to the minimum norm solution anymore.