Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution

In a Hilbertian framework, for the minimization of a general convex differentiable function $f$, we introduce new inertial dynamics and algorithms that generate trajectories and iterates that converge fastly towards the minimizer of $f$ with minimum norm. Our study is based on the non-autonomous version of the Polyak heavy ball method, which, at time $t$, is associated with the strongly convex function obtained by adding to $f$ a Tikhonov regularization term with vanishing coefficient $\epsilon(t)$. In this dynamic, the damping coefficient is proportional to the square root of the Tikhonov regularization parameter $\epsilon(t)$. By adjusting the speed of convergence of $\epsilon(t)$ towards zero, we will obtain both rapid convergence towards the infimal value of $f$, and the strong convergence of the trajectories towards the element of minimum norm of the set of minimizers of $f$. In particular, we obtain an improved version of the dynamic of Su-Boyd-Cand\`es for the accelerated gradient method of Nesterov. This study naturally leads to corresponding first-order algorithms obtained by temporal discretization. In the case of a proper lower semicontinuous and convex function $f$, we study the proximal algorithms in detail, and show that they benefit from similar properties.


Introduction
Throughout the paper, H is a real Hilbert space which is endowed with the scalar product •, • , with x 2 = x, x for x ∈ H.We consider the convex minimization problem where f : H → R is a convex continuously differentiable function whose solution set S = argmin f is nonempty.We aim at finding by rapid methods the element of minimum norm of S. As an original aspect of our approach, we start from the Polyak heavy ball with friction dynamic for strongly convex functions, and then adapt it to treat the case of general convex functions.Recall that a function f : H → R is said to be µ-strongly convex for some µ > 0 if f − µ 2 • 2 is convex.In this setting, we have the exponential convergence result: Theorem 1 Suppose that f : H → R is a function of class C 1 which is µ-strongly convex for some µ > 0. Let x(•) : [t 0 , +∞[→ H be a solution trajectory of ẍ(t) + 2 √ µ ẋ(t) + ∇f (x(t)) = 0. (2) Then, the following property holds: f (x(t)) − min H f = O e − √ µt as t → +∞.
Let us see how to take advantage of this fast convergence result, and how to adapt it to the case of a general convex differentiable function f : H → R. The main idea is linked to Tikhonov's method of regularization.It consists in considering the corresponding non-autonomous dynamic which at time t is governed by the gradient of the strongly convex function Then replacing f by f t in (2), and noticing that f t is ǫ(t)-strongly convex, we obtain the dynamic (TRIGS) ẍ(t) + δ ǫ(t) ẋ(t) + ∇f (x(t)) + ǫ(t)x(t) = 0, with δ = 2. (TRIGS) stands shortly for Tikhonov regularization of inertial gradient systems.In order not to asymptotically modify the equilibria, we suppose that ǫ(t) → 0 as t → +∞.This condition implies that (TRIGS) falls within the framework of the inertial gradient systems with asymptotically vanishing damping.The importance of this class of inertial dynamics has been highlighted by several recent studies [3], [5], [8], [10], [18], [28], [38], which make the link with the accelerated gradient method of Nesterov [35,36].

Historical facts and related results
In relation to optimization algorithms, a rich literature has been devoted to the coupling of dynamic gradient systems with Tikhonov regularization.

First-order gradient dynamics
For first-order gradient systems and subdifferential inclusions, the asymptotic hierarchical minimization property which results from the introduction of a vanishing viscosity term in the dynamic (in our context the Tikhonov approximation [39,40]) has been highlighted in a series of papers [2], [4], [12], [14], [20], [30], [33].In parallel way, there is a vast literature on convex descent algorithms involving Tikhonov and more general penalty, regularization terms.The historical evolution can be traced back to Fiacco and McCormick [31], and the interpretation of interior point methods with the help of a vanishing logarithmic barrier.Some more specific references for the coupling of Prox and Tikhonov can be found in Cominetti [29].The time discretization of the first-order gradient systems and subdifferential inclusions involving multiscale (in time) features provides a natural link between the continuous and discrete dynamics.The resulting algorithms combine proximal based methods (for example forward-backward algorithms), with the viscosity of penalization methods, see [15], [16], [22], [25,26], [33].

Model results
To illustrate our results, let us consider the case ǫ(t) = c t r where r is positive parameter satisfying 0 < r ≤ 2. The case r = 2 is of particular interest, it is related to the continuous version of the accelerated gradient method of Nesterov, with optimal convergence rate for general convex differentiable function f .

Case r = 2
Let us consider the (TRIGS) dynamic where the parameter α ≥ 3 plays a crucial role.As a consequence of Theorems 8 and 9 we have Theorem 2 Let x : [t 0 , +∞[→ H be a solution of (7).We then have the following results: as t → +∞. ii) as t → +∞.Further, the trajectory x is bounded, , and there is strong convergence to the minimum norm solution: As a consequence of Theorems 7 and 11, we have: Then, we have fast convergence the values, and strong convergence to the minimum norm solution: These results are completed by showing that, if there exists T ≥ t 0 , such that the trajectory {x(t) : t ≥ T } stays either in the open ball B(0, x * ) or in its complement, then x(t) converges strongly to x * as t → +∞.Corresponding results for the associated proximal algorithms, obtained by temporal discretization, are obtained in Section 5.
A remarkable property of the above results is that the rate of convergence of values is comparable to the Nesterov accelerated gradient method.In addition, we have a strong convergence property to the minimum norm solution, with comparable numerical complexity.These results represent an important advance compared to previous works by producing new dynamics for which we have both rapid convergence of values and strong convergence towards the solution of minimum norm.Let us stress the fact that in our approach the fast convergence of the values and the strong convergence towards the solution of minimum norm are obtained for the same dynamic, whereas in the previous works [11], [13], they are obtained for different dynamics obtained for different settings of the parameters.It is clear that the results extend naturally to obtaining strong convergence towards the solution closest to a desired state x d .It suffices to replace in Tikhonov's approximation x 2 by x − x d 2 .This is important for inverse problems.

Contents
In section 2, we show existence and uniqueness of a global solution for the Cauchy problem associated with (TRIGS).Then, based on Lyapunov analysis, we obtain convergence rates of the values which are valid for a general ǫ(•).Section 3 is devoted to an in-depth analysis in the critical case ǫ(t) = c/t 2 .Section 4 is devoted to the study of the strong convergence property of the trajectories towards the minimum norm solution, in the case of a general ǫ(•).Then in Section 5 we obtain similar results for the associated proximal algorithms, obtained by temporal discretization.
2 Convergence analysis for general ǫ(t) We are going to analyze via Lyapunov analysis the convergence properties as t → +∞ of the solution trajectories of the inertial dynamic (TRIGS) that we recall below Throughout the paper, we assume that t 0 is the origin of time, δ is a positive parameter, and

Existence and uniqueness for the Cauchy problem
Let us first show that the Cauchy problem for (TRIGS) is well posed.
Proof The proof relies on the combination of the Cauchy-Lipschitz theorem with energy estimates.First consider the Hamiltonian formulation of (9) as the first order system According to the hypothesis (H 1 ), (H 2 ), (H 3 ), and by applying the Cauchy-Lipschitz theorem in the locally Lipschitz case, we obtain the existence and uniqueness of a local solution.Then, in order to pass from a local solution to a global solution, we rely on the energy estimate obtained by taking the scalar product of (TRIGS) with ẋ(t).It gives From (H 3 ), ǫ(•) is non-increasing.Therefore, the energy function t → W (t) is decreasing where The end of the proof follows a standard argument.Take a maximal solution defined on an interval [t 0 , T [.If T is infinite, the proof is over.Otherwise, if T is finite, according to the above energy estimate, we have that ẋ(t) remains bounded, just like x(t) and ẍ(t) (use (TRIGS)).Therefore, the limit of x(t) and ẋ(t) exists when t → T .Applying the local existence result at T with the initial conditions thus obtained gives a contradiction to the maximality of the solution.

General case
The control of the decay of ǫ(t) to zero as t → +∞ will play a key role in the Lyapunov analysis of (TRIGS).Precisely, we will use the following condition.
Definition 1 Given δ > 0, we say that t → ǫ(t) satisfies the controlled decay property (CD) K , if it is a nonincreasing function which satisfies: there exists t 1 ≥ t 0 such that for all t ≥ t 1 , where K is a parameter such that δ 2 < K < δ for 0 < δ ≤ 2, and δ+ Theorem 5 Let x : [t 0 , +∞[→ H be a solution trajectory of (TRIGS).Let δ be a positive parameter.Suppose that ǫ(•) satisfies the condition (CD) K for some K > 0.Then, we have the following rate of convergence of values: for all t ≥ t 1 f (x(t)) − min where and will be the basis for our Lyapunov analysis.The function c : [t 0 , +∞[→ R will be defined later, appropriately.Let us differentiate E (•).By using the derivation chain rule, we get According to the constitutive equation ( 8), we have Therefore, By combining (13) with (15), we get Consider the function According to the strong convexity property of f t , we have x − y 2 , for all x, y ∈ H.
Take y = x * and x = x(t) in the above inequality.We get Consequently, By multiplying (17) with c(t) and injecting in (16) we get On the other hand, for a positive function µ(t) we have By adding (18) and (19) we get Since we have no control on the sign of ẋ(t), x(t) − x * , we take the coefficient in front of this term equal to zero, that is Take c(t) = K ǫ(t).Indeed, it is here that the choice of c, and of the corresponding parameter K, come into play.The relation ( 21) can be equivalently written According to this choice for µ(t) and c(t), the inequality (20) becomes Let us show that the condition (CD) K provide the nonpositive sign for the coefficients in front of the terms of the right side of (22).Recall that, according to the hypotheses (CD) K , for all t ≥ t 1 we have the properties a) and b): Without ambiguity we write briefly Let us justify these inequalities (23).
The inequalities ( 23) can be equivalently written as follows: for all t ≥ t 1 The inequalities (24) give that the coefficients entering the right side of ( 22) are nonpositive: . Therefore, by i) we have that the coefficient of Let us return to (22).Using (24) and the above results, we obtain By multiplying (25) with M(t) = exp t t1 µ(s)ds we obtain By integrating ( 26) on [t 1 , t] we get By definition of E (t) we deduce that for all t ≥ t 1 , and this gives the convergence rate of the values.
Remark 1 By integrating the relation 0 Therefore, denoting This shows that the Lyapunov analysis developed previously only provides information in the case where ǫ(t) is greater than or equal to C/t 2 .Since the damping coefficient γ(t) = δ ǫ(t), this means that γ(t) must be greater than or equal to C/t.This is in accordance with the theory of inertial gradient systems with time-dependent viscosity coefficient, which states that the asymptotic optimization property is valid provided that the integral on [t 0 , +∞[ of γ(t) is infinite, see [8].
As a consequence of Theorem 5 we have the following result.
Corollary 1 Under the hypothesis of Theorem 5 we have Suppose moreover that ǫ Proof By definition of µ(t), since ǫ(•) is nonincreasing and δ ≥ K, we have that , and integrate on [t 1 , t].We obtain . Since lim t→∞ ǫ(t) = 0, we get Moreover, if we suppose that ǫ Combining these properties with the convergence rate (11) of Theorem 5, we obtain (31).

Particular cases
Since ǫ(t) → 0 as t → +∞, (TRIGS) falls within the setting of the inertial dynamics with an asymptotic vanishing damping coefficient γ(t).Here, γ(t) = δ ǫ(t).We know with Cabot-Engler-Gaddat [27] that for such systems, the optimization property is satisfied asymptotically if +∞ t0 γ(t)dt = +∞ (i.e.γ(t) does no tend too rapidly towards zero).By taking ǫ(t) = c t p , it is easy to verify that the condition t p , with p ≤ 1, which is in accordance with the above property.Let us particularize Theorem 5 to situations where the integrals can be computed (at least estimated). Then, Therefore, (11) becomes Consequently, we have .
By assumption we have , where we set α = δ M and β = C M .Since M < M 1 ≤ 1 3 δ we get α ∈ ]3, +∞[.Indeed, we can get any α > 3. Note also that by translating the time scale the result in the general case β ≥ 0 results from its obtaining for a particular case β = 0.According to the fact that we can take for δ any positive number, we obtain Then, the following convergence rate of the values is satisfied: Remark 2 It is an natural question to compare our dynamic (c > 0) with the Su-Boyd-Candès dynamic [38] (c = 0), which was introduced as a continuous version of the Nesterov accelerated gradient method.
We obtain the optimal convergence rate of values with an additional Tikhonov regularization term, which is a remarkable property.In fact, in the next sections we will prove that the Tikhonov term induces strong convergence of the trajectory to the minimum norm solution. Therefore Set According to (28) we have that for some Note that according to r < 2, m(t) is an increasing function which has an exponential growth as t → +∞.Accordingly, by the mean value theorem we have the following majorization.
Let us summarize these results in the following statement.
Then, the following convergence rate of the values is satisfied: Remark 3 When r → 2 the exponent 3r 2 − 1 tends to 2. So there is a continuous transition in the convergence rate.As in Remark 2 the additional Tikhonov regularization term is expected to have a regularization effect (even better than in the case r = 2).In addition, the above analysis makes appear another critical value, namely r = 2 3 .
3 In-depth analysis in the critical case ǫ(t) = c/t 2 Let us refine our analysis in the case where the Tikhonov regularization coefficient and the damping coefficient are respectively of order 1/t 2 and 1/t.Our analysis will now take into account the coefficients α and c in front of these terms.So the Cauchy problem for (TRIGS) is written where t 0 > 0, c > 0, (x 0 , v 0 ) ∈ H × H, and α ≥ 3. The starting time t 0 is taken strictly greater than zero to take into account the fact that the functions c t 2 and α t have singularities at 0. This is not a limitation of the generality of the proposed approach, since we will focus on the asymptotic behaviour of the generated trajectories.

Convergence rate of the values
Theorem 8 Let t 0 > 0 and, for some initial data x 0 , v 0 ∈ H, let x : [t 0 , +∞[→ H be the unique global solution of (35).Then, the following results hold.

Strong convergence
Theorem 9 Let t 0 > 0 and, for some starting points x 0 , v 0 ∈ H, let x : [t 0 , +∞[→ H be the unique global solution of (35).Let x * be the element of minimal norm of S = argmin f , that is x * = proj S 0. Then, for all α > 3 we have that Further, if there exists T ≥ t 0 , such that the trajectory {x(t) : t ≥ T } stays either in the open ball B(0, x * ) or in its complement, then x(t) converges strongly to x * as t → +∞.
Proof The proof combines energetic and geometric arguments, as it was initiated in [13].We successively consider the three following configurations of the trajectory.
I. Assume that there exists T ≥ t 0 such that x(t) ≥ x * for all t ≥ T. Let us denote f t (x) := f (x) + c 2t 2 x 2 and let x t := argmin f t (x).Let us recall some classical properties of the Tikhonov approximation: ∀t > 0 x t ≤ x * , and lim Using the gradient inequality for the strongly convex function f t , we have On the other hand By adding the last two inequalities we get Therefore, according to (56), to obtain the strong convergence of the trajectory x(t) to x * , it is enough to show that f t (x(t)) − f t (x * ) = o 1 t 2 , as t → +∞.For K > 0, consider now the energy functional Then, Let us examine the different terms of (59).According to the constitutive equation (35) we have Further, from (41) we get Injecting ( 60) and ( 61) in (59) we get Consider now the function µ(t) = α+1−K t .Then, Consequently, (62) and (63) yield Assume that α+1 2 < K < α − 1.Since α > 3 such K exists.As in the proof of Theorem 8 we deduce that α − 2K + 1 < 0, K − α + 1 < 0 and since c > 0 there exists K ∈ α+1 2 , α − 1 such that So take K ∈ α+1 2 , α − 1 such that (65) holds.Then, (64) leads to Let us integrate the differential inequality (66).After multiplication by t α+1−K we get and integrating the latter on [T, t], t > T we obtain In one hand, from the definition of E(t) we have Therefore, On the other hand (57) gives Consequently, By assumption x(t) ≥ x * for all t ≥ T and α − 1 − 2K < 0. Hence, for all t > T , (68) leads to (69) Now, by taking the limit t −→ +∞ and using that Combining this property with x(t) ⇀ x * as t → +∞, we obtain the strong convergence, that is III.We suppose that for every T ≥ t 0 there exists t ≥ T such that x * > x(t) and also there exists s ≥ T such that x * ≤ x(s) .From the continuity of x, we deduce that there exists a sequence (tn) n∈N ⊆ [t 0 , +∞) such that tn → +∞ as n → +∞ and, for all n ∈ N we have Consider x ∈ H a weak sequential cluster point of (x (tn)) n∈N .We deduce as in case II that x = x * .Hence, x * is the only weak sequential cluster point of x(tn) and consequently the sequence x(tn) converges weakly to x * .Obviously x (tn) → x * as n → +∞.So, it follows that x(tn) → x * , n → +∞, that is x (tn) − x * → 0 as n → +∞.This leads to lim inf t→+∞ x(t) − x * = 0.

Strong convergence-General case
We are going to analyze via Lyapunov analysis the strong convergence properties as t → +∞ of the solution trajectories of the inertial dynamic (TRIGS) that we recall below ẍ(t) + δ ǫ(t) ẋ(t) + ∇f (x(t)) + ǫ(t)x(t) = 0.
Theorem 10 Let consider the dynamic system (TRIGS) where we assume that ǫ(•) satisfies the condition (CD) K for some K > 0, Then, for any global solution trajectory x : [t 0 , +∞[→ H of (TRIGS), lim inf t→+∞ x(t) − x * = 0, where x * is the element of minimal norm of argmin f , that is x * = proj argmin f 0. Further, if there exists T ≥ t 0 , such that the trajectory {x(t) : t ≥ T } stays either in the open ball B(0, x * ) or in its complement, then x(t) converges strongly to x * as t → +∞.
Proof The proof is parallel to that of Theorem 9. We analyze the behavior of the trajectory x(•) depending on its position with respect to the ball B(0, x * ).
Using the gradient inequality for the strongly convex function f t we have 2 for all x ∈ H and t ≥ t 0 .
On the other hand Now, by adding the last two inequalities we get Hence, ( 72) and (73) lead to , for all t ≥ T 1 .
(74) Now, by taking the limit as t → +∞, and using that x ǫ(t) → x * as t → +∞ and the assumption in the hypotheses of the theorem we get lim t→+∞ x(t) − x ǫ(t) ≤ 0, and hence lim t→+∞ x(t) = x * .
II. Assume now, that x(t) < x * for all t ≥ T. By Corollary 1 we get that f (x(t)) → min f as t → +∞.Now, we take x ∈ H a weak sequential cluster point of the trajectory x, which exists since the trajectory is bounded.This means that there exists a sequence (tn) n∈N ⊆ [T, +∞) such that tn → +∞ and x (tn) converges weakly to x as n → +∞.We know that f is weakly lower semicontinuous, so one has hence x ∈ argmin f.Now, since the norm is weakly lower semicontinuous one has that x ≤ lim inf n→+∞ x (tn) ≤ x * which, from the definition of x * , implies that x = x * .This shows that the trajectory x(•) converges weakly to x * .So From the previous relation and the fact that x(t) ⇀ x * as t → +∞, we obtain the strong convergence, that is lim t→+∞ x(t) = x * .III.We suppose that for every T ≥ t 0 there exists t ≥ T such that x * > x(t) and also there exists s ≥ T such that x * ≤ x(s) .From the continuity of x, we deduce that there exists a sequence (tn) n∈N ⊆ [t 0 , +∞) such that tn → +∞ as n → +∞ and, for all n ∈ N we have Consider x ∈ H a weak sequential cluster point of (x (tn)) n∈N .We deduce as at case II that x = x * .Hence, x * is the only weak sequential cluster point of x(tn) and consequently the sequence x(tn) converges weakly to x * .

The case
Therefore, Theorem 10 can be applied.Let us summarize these results in the following statement. Then, Further, if there exists T ≥ t 0 , such that the trajectory {x(t) : t ≥ T } stays either in the open ball B(0, x * ) or in its complement, then x(t) converges strongly to x * as t → +∞.

Fast inertial algorithms with Tikhonov regularization
On the basis of the convergence properties of continuous dynamic (TRIGS), one would expect to obtain similar results for the algorithms resulting from its temporal discretization.To illustrate this, we will do a detailed study of the associated proximal algorithms, obtained by implicit discretization.
A full study of the associated first-order algorithms would be beyond the scope of this article, and will be the subject of further study.So, for k ≥ 1, consider the discrete dynamic with time step size equal to one.We take ξ k = x k , which gives (IPATRE) where (IPATRE) stands for Inertial Proximal Algorithm with Tikhonov REgularization.According to (75) we have (76)

Convergence of values
We have the following result.
Theorem 12 Let (x k ) be a sequence generated by (IPATRE).Assume that α > 3. Then for all s ∈ 1 2 , 1 the following hold: , consider the discrete energy where The sequence (d k ) will be defined later.Set shortly Consequently, (78) becomes Let us proceed similarly with E k+1 .Let us first observe that from (77) we have Therefore, after development we get (80) Further, Therefore, (80) yields By combining (79) and (81), we obtain By convexity of f , we have According to the form of (a k ) and (b k ), there exists 0 which, according to the above convexity inequalities, gives Set µ k := 2b 2 k − 2a k b k and observe that µ k ≥ 0 for all k ≥ k 0 , and µ k ∼ Ck 2r (we use C as a generic positive constant).Let us also introduce where the second inequality comes from 2r < a. Replacing x with k gives the claim.In addition m k ∼ Ck 2r−1 .Combining ( 82) and (83), we obtain that for all k ≥ k 0 Let us now analyze the right hand side of (84).
Remark 4 The convergence rate of the values is f (x k )−min H f = o(k −2s ) for any 0 < s < 1. Practically it is as good as the rate f (x(t)) − min H f = O 1 t 2 obtained for the continuous dynamic.

Strong convergence to the minimum norm solution
Theorem 13 Take α > 3. Let (x k ) be a sequence generated by (IPATRE).Let x * be the minimum norm element of argmin f .Then, lim inf k→+∞ x k − x * = 0. Further, (x k ) converges strongly to x * whenever (x k ) is in the interior of the ball B(0, x * ) for k large enough, or (x k ) is in the complement of the ball B(0, x * ) for k large enough.
Proof Case I. Assume that there exists k 0 ∈ N such that and define fc k (x) := f (x) + c 2k 2 x 2 .Consider the energy function defined in (77) with r = 1, that is a k = a and b k = k 2 , where we assume that max(2, α − 2) < a < α − 1.Then, where the sequence (d k ) will be defined later.Next, we introduce another energy functional Note that According to (90), there exists ) to both side of (93) we get The right hand side of (94) can be written as In this case we have ≥ 0 and an easy computation gives that there exists Now, since by assumption x k ≥ x * for k ≥ k 0 , we get that the right hand side of (94) is nonpositive for all k ≥ k 2 .Hence, for all k ≥ k 2 we have Note that ν k ∼ C. Therefore, from (95), similarly as in the proof of Theorem 12, we deduce that x k − x * is bounded, and therefore (x k ) is bounded.Further, that is, lim k→+∞ ν k x k − x * 2 = 0 and hence lim k→+∞ x k = x * .
Case II.Assume that there exists k 0 ∈ N such that x k < x * for all k ≥ k 0 .From there we get that (x k ) is bounded.Now, take x ∈ H a weak sequential cluster point of (x k ), which exists since (x k ) is bounded.This means that there exists a sequence (kn) n∈N ⊆ [k 0 , +∞) ∩ N such that kn → +∞ and x kn converges weakly to x as n → +∞.Since f is weakly lower semicontinuous, according to Theorem 12 we have f (x) ≤ lim inf n→+∞ f (x kn ) = min f , hence x ∈ argmin f.Since the norm is weakly lower semicontinuous, we deduce that x ≤ lim inf n→+∞ x kn ≤ x * .
According to the definition of x * , we get x = x * .Therefore (x k ) converges weakly to x * .So Therefore, we have lim k→+∞ x k = x * .From the previous relation and the fact that x k ⇀ x * as k → +∞, we obtain the strong convergence, that is lim k→+∞ x k = x * .
Case III.Suppose that for every k ≥ k 0 there exists l ≥ k such that x * > x l , and suppose also there exists m ≥ k such that x * ≤ xm .So, let k 1 ≥ k 0 and l 1 ≥ k 1 such that x * > x l1 .Let k 2 > l 1 and l 2 ≥ k 2 such that x * > x l2 .Continuing the process, we obtain (x ln ), a subsequence of (x k ) with the property that x ln < x * for all n ∈ N. By reasoning as in Case II, we obtain that lim n→+∞ x ln = x * .Consequently, lim inf k→+∞ x k − x * = 0.

Non-smooth case
Let us extend the results of the previous sections to the case of a proper lower semicontinuous and convex function f : H → R∪{+∞}.We rely on the basic properties of the Moreau envelope f λ : H → R (λ is a positive real parameter), which is defined by Recall that f λ is a convex differentiable function, whose gradient is λ −1 -Lipschitz continuous, and such that min H f = min H f λ , argmin H f λ = argmin H f .The interested reader may refer to [21,24] for a comprehensive treatment of the Moreau envelope in a Hilbert setting.Since the set of minimizers is preserved by taking the Moreau envelope, the idea is to replace f by f λ in the previous algorithm, and take advantage of the fact that f λ is continuously differentiable.Then, algorithm (IPATRE) applied to f λ now reads (recall that α k = 1 − α k ) (IPATRE) By applying Theorems 12 and 13, we obtain fast convergence of the sequence (x k ) to the element of minimum norm of f .Thus, we just need to formulate these results in terms of f and its proximal mapping.This is straightforward thanks to the following formulae from proximal calculus [21]: 1. f λ (x) = f (prox λf (x)) + 1 2λ x − prox λf (x) 2 .2. ∇f λ (x) = 1 λ x − prox λf (x) .3. prox θf λ (x) = λ λ+θ x + θ λ+θ prox (λ+θ)f (x).
We obtain the following relaxed inertial proximal algorithm (NS stands for non-smooth): (IPATRE-NS) Theorem 14 Let f : H → R ∪ {+∞} be a convex, lower semicontinuous, proper function.Assume that α > 3. Let (x k ) be a sequence generated by (IPATRE-NS).Then for all s ∈ 1 2 , 1 , we have: (iii) lim inf k→+∞ x k − x * = 0. Further, (x k ) converges strongly to x * the element of minimum norm of argmin f , if (x k ) is in the interior of the ball B(0, x * ) for k large enough, or if (x k ) is in the complement of the ball B(0, x * ) for k large enough.

Conclusion, perspective
In the framework of convex optimization in general Hilbert spaces, we have introduced an inertial dynamic in which the damping coefficient and the Tikhonov regularization coefficient vanish as time tends to infinity.The judicious adjustment of these parameters makes it possible to obtain trajectories converging quickly (and strongly) towards the minimum norm solution.This seems to be the first time that these two properties have been obtained for the same dynamic.Indeed, the Nesterov accelerated gradient method and the hierarchical minimization attached to the Tikhonov regularization are fully effective within this dynamic.On the basis of Lyapunov's analysis, we have developed an in-depth mathematical study of the dynamic which is a valuable tool for the development of corresponding results for algorithms obtained by temporal discretization.We thus obtained similar results for the corresponding proximal algorithms.This study opens up a large field of promising research concerning first-order optimization algorithms.Many interesting questions such as the introduction of Hessian-driven damping to attenuate oscillations [9], [19], [23], and the study of the impact of errors, perutrbations, deserve further study.These results also adapt well to the numerical analysis of inverse problems for which strong convergence and obtaining a solution close to a desired state are key properties.

− x 2 ,
for any x ∈ H.
and differentiable, ∇f is Lipschitz continuous on bounded sets.
(H 2 ) S := argminf = ∅.We denote by x * the element of minimum norm of S.
Let x ∈ H be a weak sequential cluster point of the trajectory x, which exists since, by Theorem 8, the trajectory is bounded.So, there exists a sequence (tn) n∈N ⊆ [T, +∞) such that tn → +∞ and x (tn) converges weakly to x as n → +∞.Since f is weakly lower semicontinuous, we deduce that ∈ argmin f.Now, since the norm is weakly lower semicontinuous, and since x(t) < x * for all t ≥ T , we havex ≤ lim inf n→+∞ x (tn) ≤ x * .Combining x ∈ argmin f with the definition of x * , this implies that x = x * .This shows that the trajectory x(•) converges weakly to x * .Sox x t → x * , t → +∞ we get lim t→+∞ x(t) − x t ≤ 0 and hence lim t→+∞ x(t) = x * .II.Assume now that there exists T ≥ t 0 such that x(t) < x * for all t ≥ T. According to Theorem 8, we have that lim t→+∞ f (x(t)) = min H f. * ≤ lim inf t→+∞ x(t) ≤ lim sup t→+∞ x(t) ≤ x * , hence we have lim t→+∞ x(t) = x * .