A piecewise conservative method for unconstrained convex optimization

Scagliotti, A.; Colli Franzone, P.

doi:10.1007/s10589-021-00332-0

A piecewise conservative method for unconstrained convex optimization

Published: 22 November 2021

Volume 81, pages 251–288, (2022)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

525 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

We consider a continuous-time optimization method based on a dynamical system, where a massive particle starting at rest moves in the conservative force field generated by the objective function, without any kind of friction. We formulate a restart criterion based on the mean dissipation of the kinetic energy, and we prove a global convergence result for strongly-convex functions. Using the Symplectic Euler discretization scheme, we obtain an iterative optimization algorithm. We have considered a discrete mean dissipation restart scheme, but we have also introduced a new restart procedure based on ensuring at each iteration a decrease of the objective function greater than the one achieved by a step of the classical gradient method. For the discrete conservative algorithm, this last restart criterion is capable of guaranteeing a qualitative convergence result. We apply the same restart scheme to the Nesterov Accelerated Gradient (NAG-C), and we use this restarted NAG-C as benchmark in the numerical experiments. In the smooth convex problems considered, our method shows a faster convergence rate than the restarted NAG-C. We propose an extension of our discrete conservative algorithm to composite optimization: in the numerical tests involving non-strongly convex functions with $\ell ^1$-regularization, it has better performances than the well known efficient Fast Iterative Shrinkage-Thresholding Algorithm, accelerated with an adaptive restart scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Random Gradient-Free Minimization of Convex Functions

Article 30 November 2015

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Article Open access 06 March 2024

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Article 11 April 2024

Availability of data and material

Not applicable.

Code avilability

The Matlab code employed to run the numerical experiment is available upon request to the Authors.

References

Attouch, H., Peypouquet, J., Redont, P.: Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differ. Equ. 261, 5734–5783 (2016). https://doi.org/10.1016/j.jde.2016.08.020
Article MathSciNet MATH Google Scholar
Attouch, H., Chbani, Z., Peypouquet, J., Redont, P.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168, 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Beck, A.: Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB. SIAM, Philadelphia (2014)
Book Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Fercoq, O., Qu, Z.: Adaptive restart of accelerated gradient methods under local quadratic growth condition. IMA J. Numer. Anal. 39(4), 2069–2095 (2019). https://doi.org/10.1093/imanum/drz007
Article MathSciNet MATH Google Scholar
Fercoq, O., Qu, Z.: Restarting the accelerated coordinate descent method with a rough strong convexity estimate. Comput. Optim. Appl. 75, 63–91 (2020). https://doi.org/10.1007/s10589-019-00137-2
Article MathSciNet MATH Google Scholar
Hairer, E., Lubic, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Springer-Verlag, Berlin (2006)
MATH Google Scholar
Hiriart-Urruty, J..-B., Lemaréchal, C.: Fundamentals of Convex Analysis. Springer Science and Business Media (2012)
MATH Google Scholar
Kim, D., Fessler, J.A.: On the convergence analysis of the optimized gradient method. J. Optim. Theory Appl. 172, 187–205 (2017). https://doi.org/10.1007/s10957-016-1018-7
Article MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Adaptive restart of the optimized gradient method for convex optimization. J. Optim. Theory Appl. 178, 240–263 (2018). https://doi.org/10.1007/s10957-018-1287-4
Article MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $O(1/k^2)$. Sov. Math. Dokl. 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140, 125–161 (2013). https://doi.org/10.1007/s10107-012-0629-5
Article MathSciNet MATH Google Scholar
Nesterov, Y.: Lectures on Convex Optimization. Springer International Publishing, Berlin (2018)
Book Google Scholar
O’Donoghue, B., Candès, E.: Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15(3), 715–732 (2015). https://doi.org/10.1007/s10208-013-9150-3
Article MathSciNet MATH Google Scholar
Polyak, B.T.: Gradient method for the minimization of functionals. USSR Comput. Math. Math. Phys. 3(4), 864–878 (1963)
Article Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Article Google Scholar
Polyak, B.T.: Introduction to Optimization. Optimization Software, New York (1987)
MATH Google Scholar
Rockafellar, R..T.: Convex Analysis. Princeton University Press, Princeton (1997)
MATH Google Scholar
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Understanding the acceleration phenomenon via high-resolution differential equations. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01681-8
Article Google Scholar
Shi, B., Du, S.S., Jordan, M.I., Su, W.J.: Acceleration via symplectic discretization of high-resolution differential equations. Adv. Neural. Inf. Process. Syst. 32, 5744–5752 (2019)
Google Scholar
Shi, B., Iyengar, S.S.: Mathematical Theories of Machine Learning—Theory and Applications; Ch. 8, pp. 63–85. Springer Nature Switzerland AG (2020). arXiv:1708.08035v3
Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. Adv. Neural Inf. Process. Syst. 27, 2510–2518 (2014)
MATH Google Scholar
Su, W.J., Boyd, S., Candès, E.: A differential equation for modeling Nesterov’s accelerated gradient method: theory and insights. J. Mach. Learn. Res. 17(153), 1–43 (2016)
MathSciNet MATH Google Scholar
Teel, A.R., Poveda, J.I., Le, J.: First-order optimization algorithms with resets and Hamiltonian flows. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5838–5843 (2019). https://doi.org/10.1109/CDC40024.2019.9029333

Download references

Acknowledgements

The Authors thank Prof. G. Savaré for the helpful suggestions and discussions. The Authors are grateful to two anonymous referees for their observations, which contributed to improve the quality of the paper.

Funding

Not applicable.

Author information

Authors and Affiliations

Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy
A. Scagliotti
Dipartimento di Mamtematica, Università di Pavia, Pavia, Italy
P. Colli Franzone

Authors

A. Scagliotti
View author publications
You can also search for this author in PubMed Google Scholar
P. Colli Franzone
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Even though the idea of conservative methods coupled with a restart criterion has already been proposed, no global convergence result was known under the sole hypothesis of strong convexity of the objective function. The proof of the result is based on the notion of Maximum Mean Dissipation, which is an original object. We achieve a qualitative global convergence result for a discrete method with suitable restart criterion which was not available before the present paper. The numerical tests show that restart conservative methods can effectively compete with the most performing existing algorithms. Finally, the discrete algorithm for composite optimization is completely original.

Corresponding author

Correspondence to A. Scagliotti.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proof of Proposition 1

Proof

Let us prove that the function $t\mapsto E_K(t)$ has a local maximum in $[0,+\infty )$. By contradiction, if $t\mapsto E_K(t)$ has no local maxima, then $t\mapsto E_K(t)$ is injective (otherwise we can apply twice Weierstrass Theorem and we can find a local maximum). Since $t\mapsto E_K(t)$ is continuous, it has to be strictly increasing. This implies that $t \mapsto \dot{x}(t)$ can not change sign and hence that it is monotone as well. Moreover, it follows that $t \mapsto x(t)$ is monotone as well. Since both x(t) and $\dot{x}(t)$ remain bounded for every $t \in [0, +\infty )$, there exist $x_\infty , v_\infty \in \mathbb {R}$ such that

$$\begin{aligned} \lim _{t \rightarrow +\infty }x(t) = x_{\infty } \,\,\,\,\, \text{ and } \,\, \lim _{t \rightarrow +\infty }\dot{x}(t) = v_{\infty }. \end{aligned}$$

On the other hand, $v_{\infty }$ should be zero, and this is a contradiction.

Let $\bar{t}$ be a point of local maximum for the kinetic energy function $t \mapsto E_K(t)$. This implies that $|\dot{x}(\bar{t})| > 0$. The conservation of the total mechanical energy ensures that the function $t\mapsto f(x(t))$ attains a local minimum at $\bar{t}$. Using the Implicit Function Theorem, we obtain that $t \mapsto x(t)$ is a local homeomorphism around $\bar{t}$. This implies that $x(\bar{t})$ is a point of local minimum for f. $\square$

Appendix 2: Proof of Proposition 2

Proof

Without loss of generality, we can assume that $x^*=0$ and that $x_0>0$. We define a strongly convex function $g: \mathbb {R}\mapsto \mathbb {R}$ as follows:

$$\begin{aligned} g(x) := {\frac{1}{2} \mu |x-x^*|^2=} \frac{1}{2} \mu |x|^2. \end{aligned}$$

We claim that, for every $y \in [0,x_0]$, the following inequality is satisfied:

$$\begin{aligned} f(x_0) - f(y) \ge g(x_0) - g(y). \end{aligned}$$

(59)

Indeed, we have that

$$\begin{aligned} f(x_0) - g(x_0)&= f(y) - g(y) + \int _y ^{x_0}(f'(u) - g'(u))\,du \ge f(y) - g(y), \end{aligned}$$

since $f'(u) - g'(u)\ge 0$ for every $u\ge 0$. Combining (5) and (59) we obtain:

$$\begin{aligned} t_1 = \int _0^{x_0}\frac{1}{\sqrt{2(f(x_0)-f(y))}}dy \le&\int _0^{x_0}\frac{1}{\sqrt{2(g(x_0)-g(y))}}dy\\&= \int _0^{x_0}\frac{1}{\sqrt{\mu (x_0^2- y^2)}}dy = \frac{\pi }{2\sqrt{\mu }}. \end{aligned}$$

This completes the proof. $\square$

Appendix 3: Proof of Lemma 1

Proof

Up to a linear orthonormal change of coordinates, we can assume that the function f is of the form

$$\begin{aligned} f(x) = \sum _{i=1}^n {\lambda }_i \frac{x_i^2}{2}. \end{aligned}$$

Hence, the differential system (7) becomes

$$\begin{aligned} {\left\{ \begin{array}{ll} \ddot{x}_1 + {\lambda }_1 x_1 = 0, \\ \vdots \\ \ddot{x}_n + {\lambda }_n x_n = 0, \end{array}\right. } \end{aligned}$$

i.e., the components evolve independently one of each other. If the Cauchy datum is

$$\begin{aligned} x(0) = (x_{1,0}, \, \ldots \, x_{n,0} ) \,\,\, \text{ and } \,\, \dot{x}(0)=0, \end{aligned}$$

then we can compute the expression of the kinetic energy function $E_K: t\mapsto \frac{1}{2} |\dot{x}(t)|^2$:

$$\begin{aligned} E_K(t) = \sum _{i=1}^n {\lambda }_i \frac{x_{i,0}^2}{2} {\sin ^2( \sqrt{{\lambda }_i} t)}. \end{aligned}$$

For every $0 \le t\le \frac{\pi }{2\sqrt{{\lambda }_n}}$, we have that

$$\begin{aligned} 0 \le \sin ( \sqrt{{\lambda }_1} t )\le \cdots \le \sin ( \sqrt{{\lambda }_n} t ), \end{aligned}$$

and then we deduce that

$$\begin{aligned} E_K(t) \ge \left( \sum _{i=1}^n {\lambda }_i\frac{x_{i,0}^2}{2} \right) {\sin ^2(\sqrt{{\lambda }_1} t)}, \end{aligned}$$

for every $t \in [ 0, {\pi }/{(2\sqrt{{\lambda }_n})} ]$. Evaluating the last inequality for $t = \frac{\pi }{2\sqrt{{\lambda }_n}}$ and using the conservation of the energy, we obtain the thesis. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scagliotti, A., Colli Franzone, P. A piecewise conservative method for unconstrained convex optimization. Comput Optim Appl 81, 251–288 (2022). https://doi.org/10.1007/s10589-021-00332-0

Download citation

Received: 02 November 2020
Accepted: 08 November 2021
Published: 22 November 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10589-021-00332-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A piecewise conservative method for unconstrained convex optimization

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Availability of data and material

Code avilability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Proposition 1

Proof

Appendix 2: Proof of Proposition 2

Proof

Appendix 3: Proof of Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A piecewise conservative method for unconstrained convex optimization

Abstract

Access this article

Similar content being viewed by others

Random Gradient-Free Minimization of Convex Functions

On convergence of iterative thresholding algorithms to approximate sparse solution for composite nonconvex optimization

Global convergence of a BFGS-type algorithm for nonconvex multiobjective optimization problems

Availability of data and material

Code avilability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Proof of Proposition 1

Proof

Appendix 2: Proof of Proposition 2

Proof

Appendix 3: Proof of Lemma 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation