Abstract
Proximal operations are among the most common primitives appearing in both practical and theoretical (or high-level) optimization methods. This basic operation typically consists in solving an intermediary (hopefully simpler) optimization problem. In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems. Then, we show that worst-case guarantees for algorithms relying on such inexact proximal operations can be systematically obtained through a generic procedure based on semidefinite programming. This methodology is primarily based on the approach introduced by Drori and Teboulle (Math Program 145(1–2):451–482, 2014) and on convex interpolation results, and allows producing non-improvable worst-case analyses. In other words, for a given algorithm, the methodology generates both worst-case certificates (i.e., proofs) and problem instances on which they are achieved. Relying on this methodology, we study numerical worst-case performances of a few basic methods relying on inexact proximal operations including accelerated variants, and design a variant with optimized worst-case behavior. We further illustrate how to extend the approach to support strongly convex objectives by studying a simple relatively inexact proximal minimization method.
Similar content being viewed by others
References
Ajalloeian, A., Simonetto, A., Dall’Anese, E.: Inexact online proximal-gradient method for time-varying convex optimization. In: 2020 American Control Conference (ACC), pp. 2850–2857. IEEE (2020)
Alves, M.M., Eckstein, J., Geremia, M., Melo, J.: Relative-error inertial-relaxed inexact versions of Douglas–Rachford and ADMM splitting algorithms. Preprint arXiv:1904.10502 (2019)
Alves, M.M., Marcavillaca, R.T.: On inexact relative-error hybrid proximal extragradient, forward-backward and Tseng’s modified forward-backward methods with inertial effects. Set-Valued Var. Anal. 28, 301–325 (2020)
Auslender, A.: Numerical methods for nondifferentiable convex optimization. In: Nonlinear Analysis and Optimization, pp. 102–126. Springer (1987)
Barré, M., Taylor, A., d’Aspremont, A.: Complexity guarantees for Polyak steps with momentum. In: Conference on Learning Theory, pp. 452–478. PMLR (2020)
Bastianello, N., Ajalloeian, A., Dall’Anese, E.: Distributed and inexact proximal gradient method for online convex optimization. arXiv preprint arXiv:2001.00870 (2020)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Bello-Cruz, Y., Gonçalves, M.L.N., Krislock, N.: On inexact accelerated proximal gradient methods with relative error rules. Preprint arXiv:2005.03766 (2020)
Boţ, R.I., Csetnek, E.R.: A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 36(8), 951–963 (2015)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Brøndsted, A., Rockafellar, R.T.: On the subdifferentiability of convex functions. Proc. Am. Math. Soc. 16(4), 605–611 (1965)
Bruck, R.E., Jr.: An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bull. Am. Math. Soc. 81(5), 890–892 (1975)
Burachik, R.S., Iusem, A.N., Svaiter, B.F.: Enlargement of monotone operators with applications to variational inequalities. Set-Valued Anal. 5(2), 159–180 (1997)
Burachik, R.S., Martínez-Legaz, J.E., Rezaie, M., Théra, M.: An additive subfamily of enlargements of a maximally monotone operator. Set-Valued Var. Anal. 23(4), 643–665 (2015)
Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: \(\varepsilon \)-enlargements of maximal monotone operators: Theory and applications. In: Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp. 25–43. Springer (1998)
Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: Bundle methods for maximal monotone operators. In: Ill-Posed Variational Problems and Regularization Techniques, pp. 49–64. Springer (1999)
Burke, J., Qian, M.: A variable metric proximal point algorithm for monotone operators. SIAM J. Control. Optim. 37(2), 353–375 (1999)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Chierchia, G., Chouzenoux, E., Combettes, P.L., Pesquet, J.C.: The proximity operator repository. User’s guide (2020). http://proximity-operator.net/download/guide.pdf
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)
Cominetti, R.: Coupling the proximal point algorithm with approximation methods. J. Optim. Theory Appl. 95(3), 581–600 (1997)
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Math. Program. 62(1–3), 261–275 (1993)
Cyrus, S., Hu, B., Van Scoy, B., Lessard, L.: A robust accelerated optimization algorithm for strongly convex functions. In: 2018 Annual American Control Conference (ACC), pp. 1376–1381 (2018)
de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
de Klerk, E., Glineur, F., Taylor, A.B.: Worst-case convergence analysis of inexact gradient and newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3), 2053–2082 (2020)
Devolder, O.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)
Dixit, R., Bedi, A.S., Tripathi, R., Rajawat, K.: Online learning with inexact proximal online gradient descent algorithms. IEEE Trans. Signal Process. 67(5), 1338–1352 (2019)
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
Dragomir, R.A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods. Math. Program. 194, 41–83 (2022)
Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. thesis, Tel-Aviv University (2014)
Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184(1), 183–220 (2020)
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)
Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)
Eckstein, J.: Approximate iterations in Bregman-function-based proximal algorithms. Math. Program. 83(1–3), 113–123 (1998)
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)
Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Res. Rep. 32(3), 44 (2012)
Eckstein, J., Yao, W.: Approximate ADMM algorithms derived from Lagrangian splitting. Comput. Optim. Appl. 68(2), 363–405 (2017)
Eckstein, J., Yao, W.: Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Math. Program. 170(2), 417–444 (2018)
Fortin, M., Glowinski, R.: On decomposition-coordination methods using an Augmented Lagrangian. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
Fuentes, M., Malick, J., Lemaréchal, C.: Descentwise inexact proximal algorithms for smooth optimization. Comput. Optim. Appl. 53(3), 755–769 (2012)
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (1996)
Gu, G., Yang, J.: On the optimal ergodic sublinear convergence rate of the relaxed proximal point algorithm for variational inequalities. Preprint arXiv:1905.06030 (2019)
Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems. Preprint arXiv:1904.05495 (2019)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1549–1557. JMLR (2017)
Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investig. Oper. 8(11–49), 7 (1999)
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190, 57–87 (2021)
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)
Kim, D., Fessler, J.A.: Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018)
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1), 192–219 (2021)
Lemaire, B.: About the convergence of the proximal method. In: Advances in Optimization, pp. 39–51. Springer (1992)
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)
Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett. 15, 405–418 (2021)
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Löfberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference (2004)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle 4, 154–158 (1970)
Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. cas de l’application prox. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 274, 163–165 (1972)
Megretski, A., Rantzer, A.: System analysis via integral quadratic constraints. IEEE Trans. Autom. Control 42(6), 819–830 (1997)
Millán, R.D., Machado, M.P.: Inexact proximal \(epsilon\)-subgradient methods for composite convex optimization problems. J. Global Optim. 75(4), 1029–1060 (2019)
Monteiro, R.D., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 255, 2897–2899 (1962)
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
Mosek, A.: The MOSEK optimization software. http://www.mosek.com54 (2010)
Nemirovski, A.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(\(1/k^2\)). Soviet Math. Doklady 27, 372–376 (1983)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Nesterov, Y.: Inexact accelerated high-order proximal-point methods. Technical report, CORE discussion paper (2020)
Nesterov, Y.: Inexact high-order proximal-point methods with auxiliary search procedure. Technical report, CORE discussion paper (2020)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354–373 (1973)
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1996)
Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)
Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020)
Ryu, E.K., Vũ, B.C.: Finding the forward-Douglas–Rachford-forward method. J. Optim. Theory Appl. 184, 858–876 (2019)
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in neural information processing systems (NIPS), pp. 1458–1466 (2011)
Simonetto, A., Jamali-Rad, H.: Primal recovery from consensus-based dual decomposition for distributed convex optimization. J. Optim. Theory Appl. 168(1), 172–197 (2016)
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)
Solodov, M.V., Svaiter, B.F.: A hybrid projection-proximal point algorithm. J. Convex Anal. 6(1), 59–70 (1999)
Solodov, M.V., Svaiter, B.F.: A comparison of rates of convergence of two inexact proximal point algorithms. In: Nonlinear optimization and related topics, pp. 415–427. Springer (2000)
Solodov, M.V., Svaiter, B.F.: Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math. Program. 88(2), 371–389 (2000)
Solodov, M.V., Svaiter, B.F.: An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman functions. Math. Oper. Res. 25(2), 214–230 (2000)
Solodov, M.V., Svaiter, B.F.: A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22(7–8), 1013–1035 (2001)
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Svaiter, B.F.: A weakly convergent fully inexact Douglas–Rachford method with relative error tolerance. Preprint arXiv:1809.02312 (2018)
Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In: Proceedings of the Thirty-Second Conference on Learning Theory (COLT), vol. 99, pp. 2934–2992. PMLR (2019)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1278–1283 (2017)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)
Toker, O., Ozbay, H.: On the np-hardness of solving bilinear matrix inequalities and simultaneous stabilization with static output feedback. In: 1995 Annual American Control Conference (ACC), vol. 4, pp. 2525–2526 (1995)
Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1), 49–54 (2018)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Zong, C., Tang, Y., Cho, Y.: Convergence analysis of an inexact three-operator splitting algorithm. Symmetry 10(11), 563 (2018)
Acknowledgements
The authors would like to thank Ernest Ryu for insightful feedbacks on a preliminary version of this manuscript. The authors also thank the two referees and the associate editor who helped improving this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
MB acknowledges support from an AMX fellowship. The authors acknowledge support from the European Research Council (grant SEQUOIA 724063).This work was funded in part by the french government under management of Agence Nationale de la recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).
Appendices
A More examples of fixed-step inexact proximal methods
This section extends the list of examples from Sect. 3.1.2.
-
The hybrid approximate extragradient algorithm (see [90] or [67, Section 4]) can be described as
$$\begin{aligned} x_{k+1}=x_k-\eta _{k+1}g_{k+1},\end{aligned}$$such that \(\exists u_{k+1}, {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(u_{k+1},g_{k+1};\,x_{k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert u_{k+1}-x_{k}\Vert ^2}\) (see Lemma 1 for a link between \(\varepsilon \)-subgradient formulation and primal-dual gap). One iteration of this form can be artificially cast into three iterations of (2) as
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{3k+1} &{}=&{} w_{3k}-e_{3k}\\ w_{3k+2} &{}=&{} w_{3k+1}-e_{3k+1}\\ w_{3k+3} &{}=&{} w_{3k+2} +e_{3k}+e_{3k+1} - \eta _{k+1}v_{3k+2} \\ \end{array}\right. \end{aligned}$$with \(v_{3k+2} \in \partial h(w_{3k+2})\) . This corresponds to setting \(\lambda _{3k+1} = \lambda _{3k+2} =\lambda _{3k+3} = 0\), \(\alpha _{3k+3,3k+2}=\eta _{k+1}\), \(\beta _{3k+1,3k} = \beta _{3k+2,3k+1}=1\), \(\beta _{3k+3,3k+1}=\beta _{3k+3,3k+2} = -1\) and the other parameters to zero. Notice that \(w_{3k+3} = w_{3k} - \eta _{k+1}v_{3k+2}\). By requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}\) we can identify the primal-dual pair \((u_{k+1},g_{k+1})\) with \((w_{3k+1},v_{3k+2})\) and iterates \(x_{k+1}\) with \(w_{3k+3}\). In addition, we set
$$\begin{aligned} \text {EQ}_{3k+1}&=0,\\ \text {EQ}_{3k+2}&=0,\\ \text {EQ}_{3k+3}&= {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) - \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}. \end{aligned}$$Using \(v_{3k+2}\in \partial h (w_{3k+2})\), we have \(h^*(v_{3k+2}) = {\langle v_{3k+2}; w_{3k+2}\rangle } - h(w_{3k+2})\) and thus
$$\begin{aligned} \text {EQ}_{3k+3} =&\,\tfrac{1}{2}{\Vert w_{3k+1}-w_{3k+3}\Vert ^2}+\eta _{k+1}(h(w_{3k+1})-h(w_{3k+2}) \\&- {\langle v_{3k+2}; w_{3k+1}-w_{3k+2}\rangle })-\tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}, \end{aligned}$$which complies with (3) and is Gram-representable.
-
The inexact accelerated proximal point algorithm IAPPA1 in its form from [87, Section 5] can be written as
$$\begin{aligned}\left\{ \begin{array}{rcl} t_{k+1}&{}=&{} \tfrac{1+\sqrt{1+4t_k^2\tfrac{\eta _{k+1}}{\eta _{k+2}}}}{2} \\ x_{k+1} &{}=&{} y_{k} - \eta _{k+1}(g_{k+1}+r_{k+1}) \\ y_{k+1} &{}=&{} x_{k+1} + \tfrac{t_k-1}{t_{k+1}}(x_{k+1}-x_k) \end{array}\right. \end{aligned}$$with \(t_0 = 1\), \(\{\eta _k\}_k\) a sequence of step sizes, \(y_0=x_0\in \mathbb {R}^d\) along with an inexactness criterion of the form \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(x_{k+1},g_{k+1};y_k) \leqslant \varepsilon _{k+1}\) given a nonnegative sequence \(\{\varepsilon _k\}_k\). Similarly to Güler’s method we get the recursive formulation
$$\begin{aligned} x_{k+2} = \left( 1+\tfrac{t_k-1}{t_{k+1}}\right) x_{k+1} - \tfrac{t_k-1}{t_{k+1}}x_k - \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$We consider particular iterations from (2) of the form
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$with initial iterate \(w_0=x_0\). We aim at finding parameters \(\alpha _{i,j}\), \(\beta _{i,j}\) such that we can identify \(\{w_{2k}\}_k\) with \(\{x_k\}_k\) (i.e., any sequence \(\{x_k\}_k\) can be obtained as a sequence \(\{w_{2k}\}_k\)). We set \(\alpha _{2k+2,2k+1}=\beta _{2k+2,2k+1} = \eta _{k+1}\), \(\alpha _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\alpha _{2k,i}\) for \(i=1,\ldots ,2k-1\) and \(\beta _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\beta _{2k,i}\) for \(i\in \{0,\ldots ,2k-1\}\backslash \{2(k-1)\}\) as well as \(\beta _{2k+2,2k} = -1\) and \(\beta _{2k+2,2(k-1)} = \tfrac{t_{k-1}-1}{t_k}(1+\beta _{2k,2(k-1)})\).
This gives
$$\begin{aligned} w_{2(k+1)} =\,&w_{2k+1} + e_{2k} - \tfrac{t_{k-1}-1}{t_k}(e_{2(k-1)}) -\tfrac{t_{k-1}-1}{t_k}\displaystyle \sum _{i=1}^{2k-1}\alpha _{2k,i}v_{i} \\&-\tfrac{t_{k-1}-1}{t_k}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{k+1}(v_{2k+1}+e_{2k+1})\\ =&\, (1+\tfrac{t_{k-1}-1}{t_k})w_{2k} -\tfrac{t_{k-1}-1}{t_k}w_{2(k-1)} - \eta _{k+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$which shows that \(\{w_{2k}\}_k\) follows the same recursive equation as \(\{x_{k}\}_k\). In addition, we have \(w_0 = x_0\) and \(w_2 = x_0 - \eta _{1}(v_1+e_1)\) similar to \(x_1 = x_0 -\eta _1(g_1+r_1)\). Requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \varepsilon _{k+1}\) (with the convention \(w_{-1}=w_0\)) allows to identify the primal-dual pair \((x_{k+1},g_{k+1})\) with \((w_{2k+2},v_{2k+1})\).
Finally, we can set \(\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \varepsilon _{k+1}\) which is Gram-representable (similar to hybrid approximate extragradient algorithm).
Note that we can proceed similarly for IAPPA2 from [87, Section 5] with sequence \(\{a_k\}_k\) constant equal to 1, by removing the sequence \(\{r_k\}_k\) “type 2” errors).
-
The accelerated hybrid proximal extragradient algorithm (A-HPE) [68, Section 3] can be written as
$$\begin{aligned}\left\{ \begin{array}{rcl} a_{k+1} &{}=&{} \tfrac{\eta _{k+1}+\sqrt{\eta _{k+1}^2+4\eta _{k+1}A_k}}{2}\\ A_{k+1} &{}=&{} A_k + a_{k+1}\\ \tilde{x}_k &{}=&{} y_k + \tfrac{a_{k+1}}{A_{k+1}}(x_k-y_k)\\ y_{k+1} &{}=&{} \tilde{x}_k - \eta _{k+1}(g_{k+1}+r_{k+1})\\ x_{k+1} &{}=&{} x_k - a_{k+1}g_{k+1}, \end{array}\right. \end{aligned}$$with \(A_0=0\), \(\{\eta _k\}_k\) a sequence of step sizes, \(y_0=x_0\in \mathbb {R}^d\) along with an inexactmess criterion of the form \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(y_{k+1},g_{k+1};\tilde{x}_k) \leqslant \tfrac{\sigma }{2}{\Vert y_{k+1}-\tilde{x}_k\Vert ^2}\) given a parameter \(\sigma \in [0,1]\). As in the previous examples, we search for a recursive equation followed by the sequence \(\{y_k\}_k\). By performing multiple substitutions, we obtain
$$\begin{aligned} y_{k+2}=&\,\tilde{x}_{k+1} - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}x_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( x_{k}- a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\tilde{x}_k - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\left( y_{k+1}+\eta _{k+1}(g_{k+1}+r_{k+1}) \right) \right. \\&\left. - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( \tfrac{A_{k+1}}{A_{k+2}}+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( 1+\tfrac{a_{k+2}A_{k}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$Similar to IAPPA1, we consider particular iterations from (2) of the form
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$with initial iterate \(w_0=x_0\). We aim at finding parameters \(\alpha _{i,j}\), \(\beta _{i,j}\) such that we can identify \(\{w_{2k}\}_k\) with \(\{y_k\}_k\) (i.e., any sequence \(\{y_k\}_k\) can be obtained as a sequence \(\{w_{2k}\}_k\)). We set \(\alpha _{2(k+1),2k+1}=\beta _{2(k+1),2k+1} = \eta _{k+1}\), \(\alpha _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\alpha _{2k,i}\) for \(i\in \{1,\ldots ,2(k-1)\}\) and \(\beta _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\beta _{2k,i}\) for \(i\in \{0,\ldots ,2k-3\}\) as well as \(\beta _{2(k+1),2k} = -1\), \(\beta _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}a_{k}}(A_{k-1}\beta _{2k,2k-1} -A_k\eta _{k} )\), \(\beta _{2(k+1),2(k-1)} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(1+\beta _{2k,2(k-1)})\) and \(\alpha _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k-1}}{a_k}\alpha _{2k,2k-1}-\tfrac{A_{k}}{a_{k}}\eta _{k} + a_{k}\right) \).
This gives
$$\begin{aligned} w_{2(k+1)} =&\,w_{2k+1} +e_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}\\ {}&+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=1}^{2k-1}\alpha _{2k,i}v_i \\ {}&- \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,w_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\ {}&+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(w_{2k}-w_{2(k-1)}+e_{2(k-1)}) - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,\left( 1+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\right) w_{2k}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}w_{2(k-1)}+\tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\&+\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \eta _{n+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$which shows that \(\{w_{2k}\}_k\) follows the same recursive equation as \(\{y_{k}\}_k\). In addition, we have \(w_0 = x_0 = y_0\) and \(w_2 = y_0 - \eta _{1}(v_1+e_1)\) similar to \(x_1 = x_0 -\eta _1(g_1+r_1)\). Requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2(k+1)},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}\) allows to identify the primal-dual pair \((y_{k+1},g_{k+1})\) with \((w_{2(k+1)},v_{2k+1})\).
Finally, we set \(\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}\) which is Gram-representable (similar to hybrid approximate extragradient algorithm).
B Interpolation with \(\varvec{\varepsilon }\)-subdifferentials
In this section, we provide the necessary interpolation result for working with \(\varepsilon \)-subdifferentials inside performance estimation problems.
Theorem B.1
Let I be a finite set of indices and \(S=\{(w_i,v_i,h_i,\varepsilon _i)\}_{i\in I}\) with \(w_i,v_i\in \mathbb {R}^d\), \(h_i,\varepsilon _i\in \mathbb {R}\) for all \(i\in I\). There exists \(h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}(\mathbb {R}^d)\) satisfying
if and only if
holds for all \(i,j\in I\).
Proof
\((\Rightarrow )\) Assuming \(h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}\) and (24), the inequalities (25) hold by definition.
\((\Leftarrow )\) Assuming (25) hold, one can perform the following construction:
and one can easily check that \(h=\tilde{h}\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}\) satisfies (24). \(\square \)
C Equivalence with Güler’s method
In this section, we show that optimized algorithm (ORI-PPA) and Güler’s second method [49, Section 6] are equivalent (i.e., produce the same iterates), in the case of exact proximal computations (i.e., \(\sigma =0\)).
We consider a constant sequence of step sizes \(\{\lambda _k\}_k\) with \(\lambda _k = \lambda >0\). In Güler’s second method, the sequence \(\{\beta _k\}_k\) is defined as \(\beta _1 = 1\) and
The sequence \(\{A_k\}_k\) generated by (ORI-PPA) satisfies \(A_0=0\) and
We can link together these two sequences through the following equality
Let us prove it recursively. First, observe that \(\beta _1 = 1\) and \(\tfrac{A_1-A_0}{\lambda } = 1\). Then assuming that the property is true for some \(k \geqslant 1\), we have
One might notice that
giving
and we finally arrive to (26). In the case \(\sigma =0\), the iterations of (ORI-PPA) can be written as
Therefore, we can write
Combining the last equality with (26) leads to
which is exactly the update in Güler’s second method [49, Section 6] modulo a translation in the indices of the \(\{y_k\}_k\) sequence (indeed in Güler’s method \(y_1=x_0\) whereas in (ORI-PPA) \(y_0=x_0\)).
D Missing details in Theorem 1
The missing elements in the proof of Theorem 1 are presented bellow.
Proof
Let us rewrite the method in terms of a single sequence, by substitution of \(y_k\) and \(z_k\):
and let us state the following identity on the \(A_k\) coefficients
We prove the desired convergence result by induction. First, for \(N=1\)
with \(\nu _{\star ,1}= \tfrac{A_1-A_0}{1+\sigma }=\tfrac{A_1}{1+\sigma }\) as \(A_0=0\), \(\nu _{1,1} = \tfrac{(1-\sigma )A_1}{\sigma (1+\sigma )}\) and \(\nu _1 = \tfrac{A_1}{\sigma (1+\sigma )}\). This gives
where we used in the last line that \(A_1 = \lambda _1\).
Now, assuming the weighted sum can be reformulated as the desired inequality for \(N=k\), that is:
let us prove it also holds true for \(N=k+1\). Noticing that the weighted sum for \(k+1\) is exactly the weighted sum for k (which can be reformulated as desired, through our induction hypothesis) with 4 additional inequalities, we get the following valid inequality
By grouping all function values we get the following simplification:
where \(h(x_k)\) and \(h(u_{k+1})\) cancelled out. The remaining inequality is therefore
Then, by using (28), one can observe that
and by re-injecting this inside the last line of (29), we get
We can then proceed in a similar manner as in the case \(k=1\) for factorizing the quadratic terms,
since \(1-\tfrac{\sigma }{2}-\tfrac{1}{1+\sigma } - \tfrac{\sigma (1-\sigma )}{2(1+\sigma )} = 0\) and this concludes the proof. \(\square \)
E Tightness of Theorem 1
Proof
The guarantee for (ORI-PPA) provided by Theorem 1 is non-improvable. That is, for all \(\{\lambda _k\}_k\) with \(\lambda _k>0\), \(\sigma \in [0,1]\), \(d\in \mathbb {N}\), \(x_0\in \mathbb {R}^d\), and \(N\in \mathbb {N}\), there exists \(f\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)\) such that this bound is achieved with equality. For proving this statement, it is sufficient to exhibit a one-dimensional function for which the bound is attained, which is what we do below. The bound is attained on the one-dimensional (constrained on the nonnegative orthant) linear minimization problem
with an appropriate choice of \(c> 0\), where \(i_{\mathbb {R}_+}\) denotes the convex indicator function of \(\mathbb {R}_+\). Indeed, one can check that the relative error criterion
is satisfied with equality when picking \(g_{k}=c\) (\(g_k\) is thus a subgradient at \(x_k\geqslant 0\)), \(u_k=x_k\), and \(e_{k}=-\tfrac{c\sigma }{1+\sigma }\); and hence \(x_{k}=y_{k-1}-\tfrac{c\lambda _k}{1+\sigma }\). The argument is then as follows: if for some \(x_0>0\) and \(0\leqslant h\leqslant x_0/c\) we manage to show that \(x_N=x_0-c h\), then \(f(x_N)-f(x_\star )=c(x_0-c h)\) and hence the value of c producing the worst possible (maximal) value of \(f(x_N)\) is \(c=\tfrac{x_0}{2h}\). In that case, the resulting value is \(f(x_N)-f(x_\star )=\tfrac{x_0^2}{4h}\). Therefore, in order to prove that the guarantee from Theorem 1 cannot be improved, we show that \(x_N=x_0-\tfrac{A_N}{1+\sigma }c\) on the linear problem (30). It is easy to show that \(x_1=x_0-\tfrac{A_1}{1+\sigma }c\) using \(A_1=\lambda _1\). The argument follows by induction: assuming \(x_{k}=x_0-\tfrac{A_k}{1+\sigma }c\), one can compute
where the second equality follows from simple substitutions, and the last equalities follow from basic algebra and \(\lambda _{k+1}A_{k+1} = (A_{k+1}-A_k)^2\). The desired statement is proved by picking \(c=\tfrac{(1+\sigma )x_0}{2A_N}\), reaching \(f(x_N)-f(x_\star ) = \tfrac{(1+\sigma )x_0^2}{A_N}\). \(\square \)
F Tightness of Theorem 2
Proof
We show that the guarantee provided in Theorem 2 is non-improvable. That is, for all \(\mu \geqslant 0\), \(\{\lambda _k\}_k\) with \(\lambda _k\geqslant 0\), \(\sigma \in [0,1]\), \(d\in \mathbb {N}\), \(w_0\in \mathbb {R}^d\), and \(N\in \mathbb {N}\), there exists \(h\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)\) such that this bound is achieved with equality. Indeed, the bound is attained on the simple quadratic minimization problem
We can check that the relative error criterion
is satisfied with equality when picking \(v_k = \nabla h(w_{k+1}) = \mu w_{k+1}\) and \(e_k = -\tfrac{\sigma }{1+\sigma }v_k\). Under these choices, one can write
which leads to \(w_{k+1} = \tfrac{1+\sigma }{1+\sigma + \lambda _{k+1}}w_k\), hence
and the desired result follows. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Barré, M., Taylor, A.B. & Bach, F. Principled analyses and design of first-order methods with inexact proximal operators. Math. Program. 201, 185–230 (2023). https://doi.org/10.1007/s10107-022-01903-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-022-01903-7