Open Access
2021 Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance
Anas Barakat, Pascal Bianchi, Walid Hachem, Sholom Schechtman
Author Affiliations +
Electron. J. Statist. 15(2): 3892-3947 (2021). DOI: 10.1214/21-EJS1880

Abstract

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

Funding Statement

A.B. was supported by the “Futur & Ruptures” research program which is jointly funded by the IMT, the Mines-Télécom Foundation and the Carnot TSN Institute. S.S. was supported by the “Région Ile-de-France”.

Acknowledgments

We would like to thank the anonymous reviewers for their outstanding job of refereeing, especially for their comments on the avoidance of traps results and the stability of the iterates of the stochastic algorithm which helped us to improve and clarify our manuscript.

Citation

Download Citation

Anas Barakat. Pascal Bianchi. Walid Hachem. Sholom Schechtman. "Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance." Electron. J. Statist. 15 (2) 3892 - 3947, 2021. https://doi.org/10.1214/21-EJS1880

Information

Received: 1 December 2020; Published: 2021
First available in Project Euclid: 2 August 2021

Digital Object Identifier: 10.1214/21-EJS1880

Subjects:
Primary: 62L20
Secondary: 34A12 , 60F99 , 68T99

Keywords: ADAM , adaptive gradient methods with momentum , avoidance of traps , dynamical systems , Nesterov accelerated gradient , stochastic approximation

Vol.15 • No. 2 • 2021
Back to Top