Abstract
We consider a continuous-time optimization method based on a dynamical system, where a massive particle starting at rest moves in the conservative force field generated by the objective function, without any kind of friction. We formulate a restart criterion based on the mean dissipation of the kinetic energy, and we prove a global convergence result for strongly-convex functions. Using the Symplectic Euler discretization scheme, we obtain an iterative optimization algorithm. We have considered a discrete mean dissipation restart scheme, but we have also introduced a new restart procedure based on ensuring at each iteration a decrease of the objective function greater than the one achieved by a step of the classical gradient method. For the discrete conservative algorithm, this last restart criterion is capable of guaranteeing a qualitative convergence result. We apply the same restart scheme to the Nesterov Accelerated Gradient (NAG-C), and we use this restarted NAG-C as benchmark in the numerical experiments. In the smooth convex problems considered, our method shows a faster convergence rate than the restarted NAG-C. We propose an extension of our discrete conservative algorithm to composite optimization: in the numerical tests involving non-strongly convex functions with \(\ell ^1\)-regularization, it has better performances than the well known efficient Fast Iterative Shrinkage-Thresholding Algorithm, accelerated with an adaptive restart scheme.
Similar content being viewed by others
Availability of data and material
Not applicable.
Code avilability
The Matlab code employed to run the numerical experiment is available upon request to the Authors.
References
Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261, 5734–5783 (2016). https://doi.org/10.1016/j.jde.2016.08.020
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009). https://doi.org/10.1137/080716542
Beck, A.: Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. SIAM, Philadelphia (2014)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019). https://doi.org/10.1093/imanum/drz007
Fercoq, O., Qu, Z.: Restarting the accelerated coordinate descent method with a rough strong convexity estimate. Comput. Optim. Appl. 75, 63–91 (2020). https://doi.org/10.1007/s10589-019-00137-2
Hairer, E., Lubic, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer-Verlag, Berlin (2006)
Hiriart-Urruty, J..-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer Science and Business Media (2012)
Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172, 187–205 (2017). https://doi.org/10.1007/s10957-016-1018-7
Kim, D., Fessler, J.A.: Adaptive restart of the optimized gradient method for convex optimization. J. Optim. Theory Appl. 178, 240–263 (2018). https://doi.org/10.1007/s10957-018-1287-4
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Nesterov, Y.: Lectures on Convex Optimization. Springer International Publishing, Berlin (2018)
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015). https://doi.org/10.1007/s10208-013-9150-3
Polyak, B.T.: Gradient method for the minimization of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
Rockafellar, R..T.: Convex Analysis. Princeton University Press, Princeton (1997)
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01681-8
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Acceleration via symplectic discretization of high-resolution differential equations. Adv. Neural. Inf. Process. Syst. 32, 5744–5752 (2019)
Shi, B., Iyengar, S.S.: Mathematical Theories of Machine Learning—Theory and Applications; Ch. 8, pp. 63–85. Springer Nature Switzerland AG (2020). arXiv:1708.08035v3
Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Adv. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
Teel, A.R., Poveda, J.I., Le, J.: First-order optimization algorithms with resets and Hamiltonian flows. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5838–5843 (2019). https://doi.org/10.1109/CDC40024.2019.9029333
Acknowledgements
The Authors thank Prof. G. Savaré for the helpful suggestions and discussions. The Authors are grateful to two anonymous referees for their observations, which contributed to improve the quality of the paper.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Even though the idea of conservative methods coupled with a restart criterion has already been proposed, no global convergence result was known under the sole hypothesis of strong convexity of the objective function. The proof of the result is based on the notion of Maximum Mean Dissipation, which is an original object. We achieve a qualitative global convergence result for a discrete method with suitable restart criterion which was not available before the present paper. The numerical tests show that restart conservative methods can effectively compete with the most performing existing algorithms. Finally, the discrete algorithm for composite optimization is completely original.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1: Proof of Proposition 1
Proof
Let us prove that the function \(t\mapsto E_K(t)\) has a local maximum in \([0,+\infty )\). By contradiction, if \(t\mapsto E_K(t)\) has no local maxima, then \(t\mapsto E_K(t)\) is injective (otherwise we can apply twice Weierstrass Theorem and we can find a local maximum). Since \(t\mapsto E_K(t)\) is continuous, it has to be strictly increasing. This implies that \(t \mapsto \dot{x}(t)\) can not change sign and hence that it is monotone as well. Moreover, it follows that \(t \mapsto x(t)\) is monotone as well. Since both x(t) and \(\dot{x}(t)\) remain bounded for every \(t \in [0, +\infty )\), there exist \(x_\infty , v_\infty \in \mathbb {R}\) such that
On the other hand, \(v_{\infty }\) should be zero, and this is a contradiction.
Let \(\bar{t}\) be a point of local maximum for the kinetic energy function \(t \mapsto E_K(t)\). This implies that \(|\dot{x}(\bar{t})| > 0\). The conservation of the total mechanical energy ensures that the function \(t\mapsto f(x(t))\) attains a local minimum at \(\bar{t}\). Using the Implicit Function Theorem, we obtain that \(t \mapsto x(t)\) is a local homeomorphism around \(\bar{t}\). This implies that \(x(\bar{t})\) is a point of local minimum for f. \(\square\)
Appendix 2: Proof of Proposition 2
Proof
Without loss of generality, we can assume that \(x^*=0\) and that \(x_0>0\). We define a strongly convex function \(g: \mathbb {R}\mapsto \mathbb {R}\) as follows:
We claim that, for every \(y \in [0,x_0]\), the following inequality is satisfied:
Indeed, we have that
since \(f'(u) - g'(u)\ge 0\) for every \(u\ge 0\). Combining (5) and (59) we obtain:
This completes the proof. \(\square\)
Appendix 3: Proof of Lemma 1
Proof
Up to a linear orthonormal change of coordinates, we can assume that the function f is of the form
Hence, the differential system (7) becomes
i.e., the components evolve independently one of each other. If the Cauchy datum is
then we can compute the expression of the kinetic energy function \(E_K: t\mapsto \frac{1}{2} |\dot{x}(t)|^2\):
For every \(0 \le t\le \frac{\pi }{2\sqrt{{\lambda }_n}}\), we have that
and then we deduce that
for every \(t \in [ 0, {\pi }/{(2\sqrt{{\lambda }_n})} ]\). Evaluating the last inequality for \(t = \frac{\pi }{2\sqrt{{\lambda }_n}}\) and using the conservation of the energy, we obtain the thesis. \(\square\)
Rights and permissions
About this article
Cite this article
Scagliotti, A., Colli Franzone, P. A piecewise conservative method for unconstrained convex optimization. Comput Optim Appl 81, 251–288 (2022). https://doi.org/10.1007/s10589-021-00332-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-021-00332-0