Skip to main content
Log in

A piecewise conservative method for unconstrained convex optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

We consider a continuous-time optimization method based on a dynamical system, where a massive particle starting at rest moves in the conservative force field generated by the objective function, without any kind of friction. We formulate a restart criterion based on the mean dissipation of the kinetic energy, and we prove a global convergence result for strongly-convex functions. Using the Symplectic Euler discretization scheme, we obtain an iterative optimization algorithm. We have considered a discrete mean dissipation restart scheme, but we have also introduced a new restart procedure based on ensuring at each iteration a decrease of the objective function greater than the one achieved by a step of the classical gradient method. For the discrete conservative algorithm, this last restart criterion is capable of guaranteeing a qualitative convergence result. We apply the same restart scheme to the Nesterov Accelerated Gradient (NAG-C), and we use this restarted NAG-C as benchmark in the numerical experiments. In the smooth convex problems considered, our method shows a faster convergence rate than the restarted NAG-C. We propose an extension of our discrete conservative algorithm to composite optimization: in the numerical tests involving non-strongly convex functions with \(\ell ^1\)-regularization, it has better performances than the well known efficient Fast Iterative Shrinkage-Thresholding Algorithm, accelerated with an adaptive restart scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and material

Not applicable.

Code avilability

The Matlab code employed to run the numerical experiment is available upon request to the Authors.

References

  1. Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261, 5734–5783 (2016). https://doi.org/10.1016/j.jde.2016.08.020

    Article  MathSciNet  MATH  Google Scholar 

  2. Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8

    Article  MathSciNet  MATH  Google Scholar 

  3. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009). https://doi.org/10.1137/080716542

    Article  MathSciNet  MATH  Google Scholar 

  4. Beck, A.: Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. SIAM, Philadelphia (2014)

    Book  Google Scholar 

  5. Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  6. Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019). https://doi.org/10.1093/imanum/drz007

    Article  MathSciNet  MATH  Google Scholar 

  7. Fercoq, O., Qu, Z.: Restarting the accelerated coordinate descent method with a rough strong convexity estimate. Comput. Optim. Appl. 75, 63–91 (2020). https://doi.org/10.1007/s10589-019-00137-2

    Article  MathSciNet  MATH  Google Scholar 

  8. Hairer, E., Lubic, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer-Verlag, Berlin (2006)

    MATH  Google Scholar 

  9. Hiriart-Urruty, J..-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer Science and Business Media (2012)

    MATH  Google Scholar 

  10. Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172, 187–205 (2017). https://doi.org/10.1007/s10957-016-1018-7

    Article  MathSciNet  MATH  Google Scholar 

  11. Kim, D., Fessler, J.A.: Adaptive restart of the optimized gradient method for convex optimization. J. Optim. Theory Appl. 178, 240–263 (2018). https://doi.org/10.1007/s10957-018-1287-4

    Article  MathSciNet  MATH  Google Scholar 

  12. Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Sov. Math. Dokl. 27, 372–376 (1983)

    MATH  Google Scholar 

  13. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5

    Article  MathSciNet  MATH  Google Scholar 

  14. Nesterov, Y.: Lectures on Convex Optimization. Springer International Publishing, Berlin (2018)

    Book  Google Scholar 

  15. O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015). https://doi.org/10.1007/s10208-013-9150-3

    Article  MathSciNet  MATH  Google Scholar 

  16. Polyak, B.T.: Gradient method for the minimization of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)

    Article  Google Scholar 

  17. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Article  Google Scholar 

  18. Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)

    MATH  Google Scholar 

  19. Rockafellar, R..T.: Convex Analysis. Princeton University Press, Princeton (1997)

    MATH  Google Scholar 

  20. Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01681-8

    Article  Google Scholar 

  21. Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Acceleration via symplectic discretization of high-resolution differential equations. Adv. Neural. Inf. Process. Syst. 32, 5744–5752 (2019)

    Google Scholar 

  22. Shi, B., Iyengar, S.S.: Mathematical Theories of Machine Learning—Theory and Applications; Ch. 8, pp. 63–85. Springer Nature Switzerland AG (2020). arXiv:1708.08035v3

  23. Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Adv. Neural Inf. Process. Syst. 27, 2510–2518 (2014)

    MATH  Google Scholar 

  24. Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)

    MathSciNet  MATH  Google Scholar 

  25. Teel, A.R., Poveda, J.I., Le, J.: First-order optimization algorithms with resets and Hamiltonian flows. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5838–5843 (2019). https://doi.org/10.1109/CDC40024.2019.9029333

Download references

Acknowledgements

The Authors thank Prof. G. Savaré for the helpful suggestions and discussions. The Authors are grateful to two anonymous referees for their observations, which contributed to improve the quality of the paper.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

Even though the idea of conservative methods coupled with a restart criterion has already been proposed, no global convergence result was known under the sole hypothesis of strong convexity of the objective function. The proof of the result is based on the notion of Maximum Mean Dissipation, which is an original object. We achieve a qualitative global convergence result for a discrete method with suitable restart criterion which was not available before the present paper. The numerical tests show that restart conservative methods can effectively compete with the most performing existing algorithms. Finally, the discrete algorithm for composite optimization is completely original.

Corresponding author

Correspondence to A. Scagliotti.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Proposition 1

Proof

Let us prove that the function \(t\mapsto E_K(t)\) has a local maximum in \([0,+\infty )\). By contradiction, if \(t\mapsto E_K(t)\) has no local maxima, then \(t\mapsto E_K(t)\) is injective (otherwise we can apply twice Weierstrass Theorem and we can find a local maximum). Since \(t\mapsto E_K(t)\) is continuous, it has to be strictly increasing. This implies that \(t \mapsto \dot{x}(t)\) can not change sign and hence that it is monotone as well. Moreover, it follows that \(t \mapsto x(t)\) is monotone as well. Since both x(t) and \(\dot{x}(t)\) remain bounded for every \(t \in [0, +\infty )\), there exist \(x_\infty , v_\infty \in \mathbb {R}\) such that

$$\begin{aligned} \lim _{t \rightarrow +\infty }x(t) = x_{\infty } \,\,\,\,\, \text{ and } \,\, \lim _{t \rightarrow +\infty }\dot{x}(t) = v_{\infty }. \end{aligned}$$

On the other hand, \(v_{\infty }\) should be zero, and this is a contradiction.

Let \(\bar{t}\) be a point of local maximum for the kinetic energy function \(t \mapsto E_K(t)\). This implies that \(|\dot{x}(\bar{t})| > 0\). The conservation of the total mechanical energy ensures that the function \(t\mapsto f(x(t))\) attains a local minimum at \(\bar{t}\). Using the Implicit Function Theorem, we obtain that \(t \mapsto x(t)\) is a local homeomorphism around \(\bar{t}\). This implies that \(x(\bar{t})\) is a point of local minimum for f. \(\square\)

Appendix 2: Proof of Proposition 2

Proof

Without loss of generality, we can assume that \(x^*=0\) and that \(x_0>0\). We define a strongly convex function \(g: \mathbb {R}\mapsto \mathbb {R}\) as follows:

$$\begin{aligned} g(x) := {\frac{1}{2} \mu |x-x^*|^2=} \frac{1}{2} \mu |x|^2. \end{aligned}$$

We claim that, for every \(y \in [0,x_0]\), the following inequality is satisfied:

$$\begin{aligned} f(x_0) - f(y) \ge g(x_0) - g(y). \end{aligned}$$
(59)

Indeed, we have that

$$\begin{aligned} f(x_0) - g(x_0)&= f(y) - g(y) + \int _y ^{x_0}(f'(u) - g'(u))\,du \ge f(y) - g(y), \end{aligned}$$

since \(f'(u) - g'(u)\ge 0\) for every \(u\ge 0\). Combining (5) and (59) we obtain:

$$\begin{aligned} t_1 = \int _0^{x_0}\frac{1}{\sqrt{2(f(x_0)-f(y))}}dy \le&\int _0^{x_0}\frac{1}{\sqrt{2(g(x_0)-g(y))}}dy\\&= \int _0^{x_0}\frac{1}{\sqrt{\mu (x_0^2- y^2)}}dy = \frac{\pi }{2\sqrt{\mu }}. \end{aligned}$$

This completes the proof. \(\square\)

Appendix 3: Proof of Lemma 1

Proof

Up to a linear orthonormal change of coordinates, we can assume that the function f is of the form

$$\begin{aligned} f(x) = \sum _{i=1}^n {\lambda }_i \frac{x_i^2}{2}. \end{aligned}$$

Hence, the differential system (7) becomes

$$\begin{aligned} {\left\{ \begin{array}{ll} \ddot{x}_1 + {\lambda }_1 x_1 = 0, \\ \vdots \\ \ddot{x}_n + {\lambda }_n x_n = 0, \end{array}\right. } \end{aligned}$$

i.e., the components evolve independently one of each other. If the Cauchy datum is

$$\begin{aligned} x(0) = (x_{1,0}, \, \ldots \, x_{n,0} ) \,\,\, \text{ and } \,\, \dot{x}(0)=0, \end{aligned}$$

then we can compute the expression of the kinetic energy function \(E_K: t\mapsto \frac{1}{2} |\dot{x}(t)|^2\):

$$\begin{aligned} E_K(t) = \sum _{i=1}^n {\lambda }_i \frac{x_{i,0}^2}{2} {\sin ^2( \sqrt{{\lambda }_i} t)}. \end{aligned}$$

For every \(0 \le t\le \frac{\pi }{2\sqrt{{\lambda }_n}}\), we have that

$$\begin{aligned} 0 \le \sin ( \sqrt{{\lambda }_1} t )\le \cdots \le \sin ( \sqrt{{\lambda }_n} t ), \end{aligned}$$

and then we deduce that

$$\begin{aligned} E_K(t) \ge \left( \sum _{i=1}^n {\lambda }_i\frac{x_{i,0}^2}{2} \right) {\sin ^2(\sqrt{{\lambda }_1} t)}, \end{aligned}$$

for every \(t \in [ 0, {\pi }/{(2\sqrt{{\lambda }_n})} ]\). Evaluating the last inequality for \(t = \frac{\pi }{2\sqrt{{\lambda }_n}}\) and using the conservation of the energy, we obtain the thesis. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scagliotti, A., Colli Franzone, P. A piecewise conservative method for unconstrained convex optimization. Comput Optim Appl 81, 251–288 (2022). https://doi.org/10.1007/s10589-021-00332-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-021-00332-0

Keywords

Navigation