Abstract
First-order optimization algorithms can be considered as a discretization of ordinary differential equations (ODEs) (Su et al. in Adv Neural Inf Process Syst 27, 2014). In this perspective, studying the properties of the corresponding trajectories may lead to convergence results which can be transfered to the numerical scheme. In this paper we analyse the following ODE introduced by Attouch et al. (J Differ Equ 261(10):5734–5783, 2016):
where \(\alpha >0\), \(\beta >0\) and \(H_F\) denotes the Hessian of F. This ODE can be derived to build numerical schemes which do not require F to be twice differentiable as shown in Attouch et al. (Math Program 1–43, 2020) and Attouch et al. (Optimization 72:1–40, 2021). We provide strong convergence results on the error \(F(x(t))-F^*\) and integrability properties on \(\Vert \nabla F(x(t))\Vert \) under some geometry assumptions on F such as quadratic growth around the set of minimizers. In particular, we show that the decay rate of the error for a strongly convex function is \(O(t^{-\alpha -\varepsilon })\) for any \(\varepsilon >0\). These results are briefly illustrated at the end of the paper.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Adly, S., Attouch, H.: Finite convergence of proximal-gradient inertial algorithms combining dry friction with hessian-driven damping. SIAM J. Optim. 30(3), 2134–2162 (2020)
Alvarez, F., Attouch, H., Bolte, J., Redont, P.: A second-order gradient-like dissipative dynamical system with hessian-driven damping: application to optimization and mechanics. Journal de mathématiques pures et appliquées 81(8), 747–779 (2002)
Apidopoulos, V., Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of an inertial gradient descent algorithm under growth and flatness conditions. Math. Program. 187(1), 151–193 (2021)
Attouch, H., Balhag, A., Chbani, Z., Riahi, H.: Accelerated gradient methods combining tikhonov regularization with geometric damping driven by the hessian. arXiv preprint arXiv:2203.05457 (2022)
Attouch, H., Balhag, A., Chbani, Z., Riahi, H.: Fast convex optimization via inertial dynamics combining viscous and hessian-driven damping with time rescaling. Evol. Equ. Control Theory 11(2), 487–514 (2022)
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: First-order optimization algorithms via inertial systems with hessian driven damping. Math. Program. 193(1), 113–155 (2020)
Attouch, H., Chbani, Z., Fadili, J., Riahi, H.: Convergence of iterates for first-order optimization algorithms with inertia and hessian driven damping. Optimization 72, 1–40 (2021)
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168(1), 123–175 (2018)
Attouch, H., Chbani, Z., Riahi, H.: Rate of convergence of the Nesterov accelerated gradient method in the subcritical case \(\alpha \le 3\). ESAIM 25, 2 (2019)
Attouch, H., Fadili, J., Kungurtsev, V.: On the effect of perturbations, errors in first-order optimization methods with inertia and hessian driven damping. arXiv preprint arXiv:2106.16159 (2021)
Attouch, H., Goudou, X., Redont, P.: The heavy ball with friction method, i. the continuous dynamical system: global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun. Contemp. Math. 2(01), 1–34 (2000)
Attouch, H., Maingé, P.E., Redont, P.: A second-order differential system with hessian-driven damping; application to non-elastic shock laws. Differ. Equ. Appl. 4(1), 27–65 (2012)
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with hessian driven damping. J. Differ. Equ. 261(10), 5734–5783 (2016)
Aujol, J.F., Dossal, C., Rondepierre, A.: Optimal convergence rates for Nesterov acceleration. SIAM J. Optim. 29(4), 3131–3153 (2019)
Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of the heavy-ball method for quasi-strongly convex optimization. HAL preprint: hal-02545245v2 (2021)
Aujol, J.F., Dossal, C., Rondepierre, A.: FISTA is an automatic geometrically optimized algorithm for strongly convex functions. HAL preprint: hal-03491527 (2021). https://hal.archives-ouvertes.fr/hal-03491527
Aujol, J.F., Dossal, C., Rondepierre, A.: Convergence rates of the heavy-ball method under the łojasiewicz property. Math. Program. 198, 1–60 (2022)
Balti, M., May, R.: Asymptotic for the perturbed heavy ball system with vanishing damping term. arXiv preprint arXiv:1609.00135 (2016)
Bolte, J., Daniilidis, A., Ley, O., Mazet, L.: Characterizations of łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Am. Math. Soc. 362(6), 3319–3363 (2010)
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Boţ, R.I., Csetnek, E.R., László, S.C.: Tikhonov regularization of a second order dynamical system with hessian driven damping. Math. Program. 189(1), 151–186 (2021)
Cabot, A., Engler, H., Gadat, S.: On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Am. Math. Soc. 361(11), 5983–6017 (2009)
Garrigos, G., Rosasco, L., Villa, S.: Convergence of the forward-backward algorithm: beyond the worst case with the help of geometry. arXiv preprint arXiv:1703.09477 (2017)
Jendoubi, M.A., May, R.: Asymptotics for a second-order differential equation with nonautonomous damping and an integrable source term. Appl. Anal. 94(2), 435–443 (2015)
Li, B., Shi, B., Yuan, Y.x.: Linear convergence of Nesterov-1983 with the strong convexity. arXiv preprint arXiv:2306.09694 (2023)
Maulen, J.J., Peypouquet, J.: A speed restart scheme for a dynamics with hessian driven damping. arXiv preprint arXiv:2301.12240 (2023)
Nesterov, Y.: A Method of Solving a Convex Programming Problem with Convergence rate o (1/k\(^2\)). In: Sov. Math. Dokl, vol. 27
Sebbouh, O., Dossal, C., Rondepierre, A.: Nesterov’s acceleration and Polyak’s heavy ball method in continuous time: convergence rate analysis under geometric conditions and perturbations. arXiv preprint arXiv:1907.02710 (2019)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. 195(1), 79–148 (2021)
Su, W., Boyd, S., Candes, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17, 153:1-153:43 (2014)
Funding
The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR) under reference ANR- PRC-CE23 MaSDOL and the support of FMJH Program PGMO 2019-0024 and from the support to this program from EDF-Thales-Orange.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing Interests
The authors have not disclosed any competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Appendix
A Appendix
1.1 A.1 Supplementary Material for Remark 1
Let \(\alpha =1+\frac{2}{\gamma }\). We consider the same Lyapunov energy as in the case \(\alpha >1+\frac{2}{\gamma }\), i.e
where \(\lambda =\frac{2\alpha }{\gamma +2}\).
Observe that Lemmas 2 and 3 are both valid for this value of \(\alpha \). By noticing that \(K(\alpha )=0\) and \(\alpha -\lambda =1\), we then get that for all \(t> \max \{t_0,\beta \}\),
We can deduce that \(t\mapsto {\mathcal {E}}(t)e^{\frac{\beta }{t-\beta }}\) is decreasing on \([t_0+\beta ,+\infty )\). Consequently, for all \(t\geqslant t_0+\beta \),
Considering the expression of \({\mathcal {E}}\), this directly implies that:
The above inequality implies the first claim of Remark 1. Note that the growth condition \({\mathcal {G}}^2_\mu \) does not intervene in the proof.
However, we can use this geometry condition to get an upper bound of \(F(x(t))-F^*\) depending on the mechanical energy \(E_m\) defined for all \(t\geqslant t_0+\beta \) by:
The assumption \({\mathcal {G}}^2_\mu \) and the decreasing behaviour of \(E_m\) ensure that
using inequality (62). Hence, for all \(t\geqslant t_0+\beta \),
Inequality (78) also guarantees that
is bounded on \((t_0+\beta ,+\infty )\). As \({\mathcal {E}}(t)e^{\frac{\beta }{t-\beta }}\) is positive for all \(t\geqslant t_0+\beta \), we can deduce that there exists \(M>0\) such that for all \(t\geqslant t_0+\beta \),
and thus,
By using the same arguments as in the proof of Theorem 1, we conclude that:
1.2 A.2 Proof of Corollary 1
The first claim is obtained by applying the following lemma to Theorem 2. The proof of this lemma is given in Appendix A.5.
Lemma 10
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a convex function having a non empty set of minimizers where \(F^*=\inf \limits _{x\in \mathbb {R}^n}F(x)\). Assume that for some \(t_1>0\) and \(\delta >0\), F satisfies:
Let \(z:t\mapsto \frac{\int _{t/2}^tu^\delta x(u)du}{\int _{t/2}^tu^\delta du}\). Then, as \(t\rightarrow +\infty \),
The second and third claim are proved by applying Lemma 11 to \(\phi :x\mapsto F(x)-F^*\). The proof of this lemma is given in Appendix A.6.
Lemma 11
Let \(\phi : \mathbb {R}^{n} \rightarrow \mathbb {R}^+\) such that for some \(t_1>0\) and \(\delta >0\), \(\phi \) satisfies:
Then, as \(t\rightarrow +\infty \),
1.3 A.3 Proof of Corollary 2
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a convex \(C^2\) function having a unique minimizer \(x^*\). Assume that F satisfies \({\mathcal {H}}_{\gamma _1}\) and \({\mathcal {G}}_\mu ^{\gamma _2}\) for some \(\gamma _1>2\), \(\gamma _2>2\) such that \(\gamma _{1}\geqslant \gamma _{2}\) and \(\mu >0\). Let x be a solution of (DIN-AVD) for all \(t\geqslant t_0\) where \(t_0>0\), \(\alpha \geqslant \frac{\gamma _{1}+2}{\gamma _{1}-2}\) and \(\beta >0\). Theorem 3 ensures that:
Moreover, as F satisfies \({\mathcal {G}}_\mu ^{\gamma _2}\) for some \(\gamma _2>2\), Lemma 1 implies that:
By applying Lemma 11 to \(\phi :x\mapsto \left( F(x)-F^*\right) ^{\frac{2(\gamma _2-1)}{\gamma _2}}\), we get that as t tends to \(+\infty \),
Hence,
1.4 A.4 Proof of Lemma 6
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a convex \(C^2\) function with a non empty set of minimizer \(X^*\). Let \(\delta \in (0,1]\) and \(x^*\in X^*\).
We introduce the following lemma which is proved in Appendix A.8.
Lemma 12
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a \(C^2\) function. Then, for all \(x\in \mathbb {R}^n\) and \(\varepsilon >0\), there exists \(\nu >0\) such that for all \(y\in B(x,\nu )\):
As F is a \(C^2\) function, Lemma 12 ensures that there exists \(\nu >0\) such that for all \(x\in B\left( x^*,\nu \right) \):
where \(K(x)=(x-x^*)^T H_F(x^*) (x-x^*)\).
Let \(\phi _{x,x^*}\) be defined as follows:
for some \(x\in B\left( x^*,\nu \right) \). The function \(\phi _{x,x^*}\) is twice differentiable and we have that for all \(t\in [0,1]\):
By rewriting (87) at the point \(tx+(1-t)x^*\) it follows that for all \(t\in [0,1]\):
By integrating the left-hand inequality of (88) and noticing that \(\phi _{x,x^*}^{\prime }(0)=0\) (since \(\nabla F(x^*)=0\)), we get that:
By integrating the right-hand inequality of (88), we get that:
and consequently,
By choosing \(t=1\) and rewriting \(\phi _{x,x^*}\) and \(\phi _{x,x^*}^\prime \) we deduce that
1.5 A.5 Proof of Lemma 10
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a convex function having a non empty set of minimizers where \(F^*=\inf \limits _{x\in \mathbb {R}^n}F(x)\). Assume that for some \(t_1>0\) and \(\delta >0\), F satisfies:
Let \(\varepsilon >0\). Assumption (89) ensures that there exists \(t_2\geqslant 2t_1\) such that:
Let z be defined as follows:
Let \(t\geqslant t_2\). We define \(\nu \) as:
where \({\mathcal {B}}([t/2,t])\) is the Borel \(\sigma \)-algebra on [t/2, t]. Then, we can write that \(z(t)=\int _{t/2}^t x(u)d\nu (u)\). As \(\nu ([t/2,t])=1\) and F is a convex function, Jensen’s inequality ensures that:
Hence, as t tends towards \(+\infty \), \(F(z(t))-F^*=o\left( t^{-\delta -1}\right) .\)
1.6 A.6 Proof of Lemma 11
Let \(\phi : \mathbb {R}^{n} \rightarrow \mathbb {R}^+\) such that for some \(t_1>0\) and \(\delta >0\), \(\phi \) satisfies:
Let \(\varepsilon >0\). Assumption (90) guarantees that there exists \(t_2\geqslant 2t_1\) such that
Consequently, for all \(t\geqslant t_2\),
and
Hence, as \(t\rightarrow +\infty \),
We recall that \(\liminf \limits _{t\rightarrow +\infty } f(t)=\lim \limits _{t\rightarrow +\infty }\left[ \inf \limits _{\tau \geqslant t}f(\tau )\right] \). As \(\phi \) is a positive function, we get that:
Suppose that \(l>0\). Then there exists \({\hat{t}}>t_1\) such that:
and hence:
This inequality can not hold as we assume that (90) is satisfied. We can deduce that \(l=0\).
1.7 A.7 Proof of Lemma 5
Let \(u\in \mathbb {R}^n\), \(v\in \mathbb {R}^n\) and \(a>0\). The first inequality comes from the following inequalities:
and
The second inequality is proved by rewriting \(\Vert u\Vert ^2\) as follows:
and by applying the first inequality to \(\langle u+v,v\rangle \).
1.8 A.8 Proof of Lemma 12
Let \(F: \mathbb {R}^{n} \rightarrow \mathbb {R}\) be a \(C^2\) function. We denote the second order partial derivatives of F by \(\partial _{ij} F =\frac{\partial ^2 F}{\partial x_i\partial x_j}\) for all \((i,j)\in \llbracket 1,n\rrbracket ^2\).
Let \(x\in \mathbb {R}^n\) and \(\varepsilon >0\). For all \((i,j)\in \llbracket 1,n\rrbracket ^2\), \(\partial _{ij} F\) is continuous on \(\mathbb {R}^n\) and consequently,
By taking the minimal value of \({\tilde{\nu }}\) for all \((i,j)\in \llbracket 1,n\rrbracket ^2\), we get that there exists \({\tilde{\nu }}>0\) such that:
Let \(\nu =\min \left\{ {\tilde{\nu }},\left( n\max \limits _{(i,j)\in \llbracket 1,n\rrbracket ^2}|\partial _{ij}F(x)|\right) ^{-\frac{1}{2}}\right\} \), \(y\in B(x,\nu )\) and \(h=y-x\). Equation (92) gives us that for all \((i,j)\in \llbracket 1,n\rrbracket ^2\):
We recall that for all \((i,j)\in \llbracket 1,n\rrbracket ^2\), \(\left( H_F(x)\right) _{i,j}=\partial _{ij}F(x)\) and therefore:
By summing (93) for all \((i,j)\in \llbracket 1,n\rrbracket ^2\), we get that:
Noticing that \(|h_ih_j|\leqslant \frac{1}{2}\left( h_i^2+h_j^2\right) \) for all \((i,j)\in \llbracket 1,n\rrbracket ^2\), we can deduce that:
Hence,
\(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aujol, JF., Dossal, C., Hoàng, V.H. et al. Fast Convergence of Inertial Dynamics with Hessian-Driven Damping Under Geometry Assumptions. Appl Math Optim 88, 81 (2023). https://doi.org/10.1007/s00245-023-10058-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s00245-023-10058-6