Adaptive Variant of the Frank-Wolfe Algorithm for Convex Optimization Problems

G. V. Aivazian; Айвазян Г. В.; F. S. Stonyakin; Стонякин Ф. С.; D. A. Pasechnyk; Пасечнюк Д. А.; M. S. Alkousa; Алкуса М. С.; A. M. Raigorodsky; Райгородский А. М.; I. V. Baran; Баран И. В.

doi:10.31857/S0132347423060031

Adaptive Variant of the Frank-Wolfe Algorithm for Convex Optimization Problems

Authors: Aivazian G.V.¹, Stonyakin F.S.¹^,2, Pasechnyk D.A.¹^,3, Alkousa M.S.¹^,4, Raigorodsky A.M.¹^,5^,6, Baran I.V.²
Affiliations:
1. Moscow Institute of Physics and Technology
2. Vernadsky Crimean Federal University
3. Trusted Artificial Intelligence Research Center of ISP RAS
4. National Research University “Higher School of Economics”
5. Moscow State University M. V. Lomonosov, Faculty of Mechanics and Mathematics
6. Caucasian Mathematical Center of the Adyghe State University
Issue: No 6 (2023)
Pages: 14-26
Section: АНАЛИЗ ДАННЫХ
URL: https://journals.rcsi.science/0132-3474/article/view/148116
DOI: https://doi.org/10.31857/S0132347423060031
EDN: https://elibrary.ru/GGFHZJ
ID: 148116

Cite item

Full Text

Open Access
Restricted Access

Access granted
Restricted Access

Subscription Access

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

In this paper, a variant of the Frank–Wolfe method for convex optimization problems with adaptive selection of the step parameter corresponding to information about the smoothness of the target function (the Lipschitz constant of the gradient) was investigated. Theoretical estimates of the quality of the approximate solution given out by the method using adaptively selected parameters L_k are obtained. On a class of problems on a convex feasible set with a convex objective function, the guaranteed convergence rate of the proposed method is sublinear. The special subclass of such problems is considered (the objective function with the condition of gradient dominance) and estimate of the convergence rate using adaptively selected parameters L_k is obtained. An important feature of the obtained result is the elaboration of a situation in which it is possible to guarantee, after the completion of the iteration, a reduction of the discrepancy in the function by at least 2 times. At the same time, the use of adaptively selected parameters in theoretical estimates makes it possible to apply the method for both smooth and non-smooth problems, provided that the exit criterion from the iteration is met. For smooth problems, it can be proved that the theoretical estimates of the method are guaranteed to be optimal up to multiplication by a constant factor. Computational experiments were performed, and a comparison with two other algorithms was carried out, during which the efficiency of the algorithm was demonstrated for a number of both smooth and non-smooth problems.

References

Canon M.D., Cullum C.D. A tight upper bound on the rate of convergence of Frank–Wolfe algorithm // SIAM Journal on Control. 1968. V. 6 (2.4). P. 509–516.
Bomze I.M., Rinaldi F., Zeffiro D. Frank–Wolfe and friends: a journey into projection-free first-order optimization methods // 4OR-Q J Oper Res. 2021. V. 19. P. 313–345.
Braun G., Carderera A., Combettes C.W., Hassani H., Karbasi A., Mokhtari A., Pokutta S. Conditional Gradient Methods. https://arxiv.org/pdf/2211.14103.pdf
Nesterov Y. Complexity bounds for primal-dual methods minimizing the model of objective function // Math. Program. 2018. V. 171 (1-2). P. 311–330.
Nesterov Y. Universal gradient methods for convex optimization problems // Math. Program. A 2015. V. 152. P. 381–404.
Pedregosa F., Negiar G., Askari A., Jaggi M. Linearly convergent Frank–Wolfe with backtracking line-search. In: International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research. 2020. P. 1–10.
Polyak B.T. Gradient methods for minimizing functionals (in Russian) // Zh. Vychisl. Mat. Mat. Fiz. 1963. P. 643–653.
Łojasiewicz S. A topological property of real analytic subsets (in French) // Coll. du CNRS, Les équations aux dérivos partielles. 1963. P. 87–89.
Karimi H., Nutini J., Schmidt M. Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition // Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part I 16. Springer International Publishing, 2016. P. 795–811.
Freund R.M., Grigas P., Mazumder R. An extended Frank–Wolfe method within face directions, and its application to low-rank matrix completion // SIAM Journal on Optimization. 2017. V. 27 (2.1). P. 319–346.
,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018. https://grouplens.org/datasets/movielens/
Vapnik V. The Nature of Statistical Learning Theory. Springer. 2013.
Clarkson K.L. Coresets, sparse greedy approximation, and the Frank–Wolfe algorithm // ACM Transactions on Algorithms. 2010. V. 6 (2.4). P. 1–30.
Pima Indians Diabetes Database. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
Ivanov V.K., Vasin V.V., Tanana V.P. Theory of linear ill-posed problems and its applications. Walter de Gruyter. 2013.
LIBSVM Data: Classification (Binary Class). https://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets/binary.html
Левитин Е.С., Поляк Б.Т. Методы минимизации при наличии ограничений // Журнал вычислит. матем. и матем. физ. 1966. Т. 6. № 5. С. 787–823.
Candes E.J., Recht B. Exact matrix completion via convex optimization // Foundations of Computational Mathematics. 2009. V. 9 (2.6). P. 717–772.
Combettes C.W., Pokutta S. Complexity of Linear Minimization and Projection on Some Sets // Operations Research Letters. 2021. V. 49 (2.4). P. 565–571.
Frank M., Wolfe P. An algorithm for quadratic programming // Naval Research Logistics Quarterly. 1956. V. 3 (1–2). P. 95–110.