Principled analyses and design of first-order methods with inexact proximal operators

Barré, Mathieu; Taylor, Adrien B.; Bach, Francis

doi:10.1007/s10107-022-01903-7

Principled analyses and design of first-order methods with inexact proximal operators

Full Length Paper
Series A
Published: 17 December 2022

Volume 201, pages 185–230, (2023)
Cite this article

Mathematical Programming Submit manuscript

710 Accesses
2 Citations
Explore all metrics

Abstract

Proximal operations are among the most common primitives appearing in both practical and theoretical (or high-level) optimization methods. This basic operation typically consists in solving an intermediary (hopefully simpler) optimization problem. In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems. Then, we show that worst-case guarantees for algorithms relying on such inexact proximal operations can be systematically obtained through a generic procedure based on semidefinite programming. This methodology is primarily based on the approach introduced by Drori and Teboulle (Math Program 145(1–2):451–482, 2014) and on convex interpolation results, and allows producing non-improvable worst-case analyses. In other words, for a given algorithm, the methodology generates both worst-case certificates (i.e., proofs) and problem instances on which they are achieved. Relying on this methodology, we study numerical worst-case performances of a few basic methods relying on inexact proximal operations including accelerated variants, and design a variant with optimized worst-case behavior. We further illustrate how to extend the approach to support strongly convex objectives by studying a simple relatively inexact proximal minimization method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On a primal-dual Newton proximal method for convex quadratic programs

Article Open access 06 January 2022

Relaxed-inertial proximal point type algorithms for quasiconvex minimization

Article 26 August 2022

Level constrained first order methods for function constrained optimization

Article Open access 06 March 2024

References

Ajalloeian, A., Simonetto, A., Dall’Anese, E.: Inexact online proximal-gradient method for time-varying convex optimization. In: 2020 American Control Conference (ACC), pp. 2850–2857. IEEE (2020)
Alves, M.M., Eckstein, J., Geremia, M., Melo, J.: Relative-error inertial-relaxed inexact versions of Douglas–Rachford and ADMM splitting algorithms. Preprint arXiv:1904.10502 (2019)
Alves, M.M., Marcavillaca, R.T.: On inexact relative-error hybrid proximal extragradient, forward-backward and Tseng’s modified forward-backward methods with inertial effects. Set-Valued Var. Anal. 28, 301–325 (2020)
MathSciNet MATH Google Scholar
Auslender, A.: Numerical methods for nondifferentiable convex optimization. In: Nonlinear Analysis and Optimization, pp. 102–126. Springer (1987)
Barré, M., Taylor, A., d’Aspremont, A.: Complexity guarantees for Polyak steps with momentum. In: Conference on Learning Theory, pp. 452–478. PMLR (2020)
Bastianello, N., Ajalloeian, A., Dall’Anese, E.: Distributed and inexact proximal gradient method for online convex optimization. arXiv preprint arXiv:2001.00870 (2020)
Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)
MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
MathSciNet MATH Google Scholar
Bello-Cruz, Y., Gonçalves, M.L.N., Krislock, N.: On inexact accelerated proximal gradient methods with relative error rules. Preprint arXiv:2005.03766 (2020)
Boţ, R.I., Csetnek, E.R.: A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 36(8), 951–963 (2015)
MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
MATH Google Scholar
Brøndsted, A., Rockafellar, R.T.: On the subdifferentiability of convex functions. Proc. Am. Math. Soc. 16(4), 605–611 (1965)
MathSciNet MATH Google Scholar
Bruck, R.E., Jr.: An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bull. Am. Math. Soc. 81(5), 890–892 (1975)
MathSciNet MATH Google Scholar
Burachik, R.S., Iusem, A.N., Svaiter, B.F.: Enlargement of monotone operators with applications to variational inequalities. Set-Valued Anal. 5(2), 159–180 (1997)
MathSciNet MATH Google Scholar
Burachik, R.S., Martínez-Legaz, J.E., Rezaie, M., Théra, M.: An additive subfamily of enlargements of a maximally monotone operator. Set-Valued Var. Anal. 23(4), 643–665 (2015)
MathSciNet MATH Google Scholar
Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: $\varepsilon $-enlargements of maximal monotone operators: Theory and applications. In: Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp. 25–43. Springer (1998)
Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: Bundle methods for maximal monotone operators. In: Ill-Posed Variational Problems and Regularization Techniques, pp. 49–64. Springer (1999)
Burke, J., Qian, M.: A variable metric proximal point algorithm for monotone operators. SIAM J. Control. Optim. 37(2), 353–375 (1999)
MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
MathSciNet MATH Google Scholar
Chierchia, G., Chouzenoux, E., Combettes, P.L., Pesquet, J.C.: The proximity operator repository. User’s guide (2020). http://proximity-operator.net/download/guide.pdf
Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)
Cominetti, R.: Coupling the proximal point algorithm with approximation methods. J. Optim. Theory Appl. 95(3), 581–600 (1997)
MathSciNet MATH Google Scholar
Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Math. Program. 62(1–3), 261–275 (1993)
MathSciNet MATH Google Scholar
Cyrus, S., Hu, B., Van Scoy, B., Lessard, L.: A robust accelerated optimization algorithm for strongly convex functions. In: 2018 Annual American Control Conference (ACC), pp. 1376–1381 (2018)
de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)
MathSciNet MATH Google Scholar
de Klerk, E., Glineur, F., Taylor, A.B.: Worst-case convergence analysis of inexact gradient and newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3), 2053–2082 (2020)
MathSciNet MATH Google Scholar
Devolder, O.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers (2013)
Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)
MathSciNet MATH Google Scholar
Dixit, R., Bedi, A.S., Tripathi, R., Rajawat, K.: Online learning with inexact proximal online gradient descent algorithms. IEEE Trans. Signal Process. 67(5), 1338–1352 (2019)
MathSciNet MATH Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)
MathSciNet MATH Google Scholar
Dragomir, R.A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods. Math. Program. 194, 41–83 (2022)
MathSciNet MATH Google Scholar
Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. thesis, Tel-Aviv University (2014)
Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184(1), 183–220 (2020)
MathSciNet MATH Google Scholar
Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)
MathSciNet MATH Google Scholar
Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)
MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)
Eckstein, J.: Approximate iterations in Bregman-function-based proximal algorithms. Math. Program. 83(1–3), 113–123 (1998)
MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)
MathSciNet MATH Google Scholar
Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)
MathSciNet MATH Google Scholar
Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Res. Rep. 32(3), 44 (2012)
Google Scholar
Eckstein, J., Yao, W.: Approximate ADMM algorithms derived from Lagrangian splitting. Comput. Optim. Appl. 68(2), 363–405 (2017)
MathSciNet MATH Google Scholar
Eckstein, J., Yao, W.: Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Math. Program. 170(2), 417–444 (2018)
MathSciNet MATH Google Scholar
Fortin, M., Glowinski, R.: On decomposition-coordination methods using an Augmented Lagrangian. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
MATH Google Scholar
Fuentes, M., Malick, J., Lemaréchal, C.: Descentwise inexact proximal algorithms for smooth optimization. Comput. Optim. Appl. 53(3), 755–769 (2012)
MathSciNet MATH Google Scholar
Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)
MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (1996)
Gu, G., Yang, J.: On the optimal ergodic sublinear convergence rate of the relaxed proximal point algorithm for variational inequalities. Preprint arXiv:1905.06030 (2019)
Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems. Preprint arXiv:1904.05495 (2019)
Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)
MathSciNet MATH Google Scholar
Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1549–1557. JMLR (2017)
Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investig. Oper. 8(11–49), 7 (1999)
Google Scholar
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190, 57–87 (2021)
MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)
MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018)
MathSciNet MATH Google Scholar
Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1), 192–219 (2021)
MathSciNet MATH Google Scholar
Lemaire, B.: About the convergence of the proximal method. In: Advances in Optimization, pp. 39–51. Springer (1992)
Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)
MathSciNet MATH Google Scholar
Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett. 15, 405–418 (2021)
MathSciNet MATH Google Scholar
Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)
MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
MathSciNet MATH Google Scholar
Löfberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference (2004)
Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle 4, 154–158 (1970)
MATH Google Scholar
Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. cas de l’application prox. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 274, 163–165 (1972)
MathSciNet MATH Google Scholar
Megretski, A., Rantzer, A.: System analysis via integral quadratic constraints. IEEE Trans. Autom. Control 42(6), 819–830 (1997)
MathSciNet MATH Google Scholar
Millán, R.D., Machado, M.P.: Inexact proximal $epsilon$-subgradient methods for composite convex optimization problems. J. Global Optim. 75(4), 1029–1060 (2019)
MathSciNet MATH Google Scholar
Monteiro, R.D., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)
MathSciNet MATH Google Scholar
Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)
MathSciNet MATH Google Scholar
Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 255, 2897–2899 (1962)
MathSciNet MATH Google Scholar
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)
MathSciNet MATH Google Scholar
Mosek, A.: The MOSEK optimization software. http://www.mosek.com54 (2010)
Nemirovski, A.: Prox-method with rate of convergence $o(1/t)$ for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)
MathSciNet MATH Google Scholar
Nesterov, Y.: A method of solving a convex programming problem with convergence rate O($1/k^2$). Soviet Math. Doklady 27, 372–376 (1983)
MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
MathSciNet MATH Google Scholar
Nesterov, Y.: Inexact accelerated high-order proximal-point methods. Technical report, CORE discussion paper (2020)
Nesterov, Y.: Inexact high-order proximal-point methods with auxiliary search procedure. Technical report, CORE discussion paper (2020)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Google Scholar
Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)
MathSciNet MATH Google Scholar
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Google Scholar
Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354–373 (1973)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)
MathSciNet MATH Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1996)
Google Scholar
Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)
MathSciNet MATH Google Scholar
Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020)
MathSciNet MATH Google Scholar
Ryu, E.K., Vũ, B.C.: Finding the forward-Douglas–Rachford-forward method. J. Optim. Theory Appl. 184, 858–876 (2019)
MathSciNet MATH Google Scholar
Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)
MathSciNet MATH Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in neural information processing systems (NIPS), pp. 1458–1466 (2011)
Simonetto, A., Jamali-Rad, H.: Primal recovery from consensus-based dual decomposition for distributed convex optimization. J. Optim. Theory Appl. 168(1), 172–197 (2016)
MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)
MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: A hybrid projection-proximal point algorithm. J. Convex Anal. 6(1), 59–70 (1999)
MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: A comparison of rates of convergence of two inexact proximal point algorithms. In: Nonlinear optimization and related topics, pp. 415–427. Springer (2000)
Solodov, M.V., Svaiter, B.F.: Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math. Program. 88(2), 371–389 (2000)
MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman functions. Math. Oper. Res. 25(2), 214–230 (2000)
MathSciNet MATH Google Scholar
Solodov, M.V., Svaiter, B.F.: A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22(7–8), 1013–1035 (2001)
MathSciNet MATH Google Scholar
Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)
Svaiter, B.F.: A weakly convergent fully inexact Douglas–Rachford method with relative error tolerance. Preprint arXiv:1809.02312 (2018)
Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In: Proceedings of the Thirty-Second Conference on Learning Theory (COLT), vol. 99, pp. 2934–2992. PMLR (2019)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)
MathSciNet MATH Google Scholar
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1278–1283 (2017)
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)
MathSciNet MATH Google Scholar
Toker, O., Ozbay, H.: On the np-hardness of solving bilinear matrix inequalities and simultaneous stabilization with static output feedback. In: 1995 Annual American Control Conference (ACC), vol. 4, pp. 2525–2526 (1995)
Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1), 49–54 (2018)
MathSciNet Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
MathSciNet MATH Google Scholar
Zong, C., Tang, Y., Cho, Y.: Convergence analysis of an inexact three-operator splitting algorithm. Symmetry 10(11), 563 (2018)
MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank Ernest Ryu for insightful feedbacks on a preliminary version of this manuscript. The authors also thank the two referees and the associate editor who helped improving this manuscript.

Author information

Authors and Affiliations

D.I. École Normale Supérieure, Paris, France
Mathieu Barré
INRIA, D.I. École Normale Supérieure, Paris, France
Adrien B. Taylor & Francis Bach

Authors

Mathieu Barré
View author publications
You can also search for this author in PubMed Google Scholar
Adrien B. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Francis Bach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Barré.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

MB acknowledges support from an AMX fellowship. The authors acknowledge support from the European Research Council (grant SEQUOIA 724063).This work was funded in part by the french government under management of Agence Nationale de la recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Appendices

A More examples of fixed-step inexact proximal methods

This section extends the list of examples from Sect. 3.1.2.

The hybrid approximate extragradient algorithm (see [90] or [67, Section 4]) can be described as
$$\begin{aligned} x_{k+1}=x_k-\eta _{k+1}g_{k+1},\end{aligned}$$
such that $\exists u_{k+1}, {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(u_{k+1},g_{k+1};\,x_{k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert u_{k+1}-x_{k}\Vert ^2}$ (see Lemma 1 for a link between $\varepsilon $-subgradient formulation and primal-dual gap). One iteration of this form can be artificially cast into three iterations of (2) as
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{3k+1} &{}=&{} w_{3k}-e_{3k}\\ w_{3k+2} &{}=&{} w_{3k+1}-e_{3k+1}\\ w_{3k+3} &{}=&{} w_{3k+2} +e_{3k}+e_{3k+1} - \eta _{k+1}v_{3k+2} \\ \end{array}\right. \end{aligned}$$
with $v_{3k+2} \in \partial h(w_{3k+2})$ . This corresponds to setting $\lambda _{3k+1} = \lambda _{3k+2} =\lambda _{3k+3} = 0$, $\alpha _{3k+3,3k+2}=\eta _{k+1}$, $\beta _{3k+1,3k} = \beta _{3k+2,3k+1}=1$, $\beta _{3k+3,3k+1}=\beta _{3k+3,3k+2} = -1$ and the other parameters to zero. Notice that $w_{3k+3} = w_{3k} - \eta _{k+1}v_{3k+2}$. By requiring ${{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}$ we can identify the primal-dual pair $(u_{k+1},g_{k+1})$ with $(w_{3k+1},v_{3k+2})$ and iterates $x_{k+1}$ with $w_{3k+3}$. In addition, we set
$$\begin{aligned} \text {EQ}_{3k+1}&=0,\\ \text {EQ}_{3k+2}&=0,\\ \text {EQ}_{3k+3}&= {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) - \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}. \end{aligned}$$
Using $v_{3k+2}\in \partial h (w_{3k+2})$, we have $h^*(v_{3k+2}) = {\langle v_{3k+2}; w_{3k+2}\rangle } - h(w_{3k+2})$ and thus
$$\begin{aligned} \text {EQ}_{3k+3} =&\,\tfrac{1}{2}{\Vert w_{3k+1}-w_{3k+3}\Vert ^2}+\eta _{k+1}(h(w_{3k+1})-h(w_{3k+2}) \\&- {\langle v_{3k+2}; w_{3k+1}-w_{3k+2}\rangle })-\tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}, \end{aligned}$$
which complies with (3) and is Gram-representable.
The inexact accelerated proximal point algorithm IAPPA1 in its form from [87, Section 5] can be written as
$$\begin{aligned}\left\{ \begin{array}{rcl} t_{k+1}&{}=&{} \tfrac{1+\sqrt{1+4t_k^2\tfrac{\eta _{k+1}}{\eta _{k+2}}}}{2} \\ x_{k+1} &{}=&{} y_{k} - \eta _{k+1}(g_{k+1}+r_{k+1}) \\ y_{k+1} &{}=&{} x_{k+1} + \tfrac{t_k-1}{t_{k+1}}(x_{k+1}-x_k) \end{array}\right. \end{aligned}$$
with $t_0 = 1$, $\{\eta _k\}_k$ a sequence of step sizes, $y_0=x_0\in \mathbb {R}^d$ along with an inexactness criterion of the form ${{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(x_{k+1},g_{k+1};y_k) \leqslant \varepsilon _{k+1}$ given a nonnegative sequence $\{\varepsilon _k\}_k$. Similarly to Güler’s method we get the recursive formulation
$$\begin{aligned} x_{k+2} = \left( 1+\tfrac{t_k-1}{t_{k+1}}\right) x_{k+1} - \tfrac{t_k-1}{t_{k+1}}x_k - \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$
We consider particular iterations from (2) of the form
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$
with initial iterate $w_0=x_0$. We aim at finding parameters $\alpha _{i,j}$, $\beta _{i,j}$ such that we can identify $\{w_{2k}\}_k$ with $\{x_k\}_k$ (i.e., any sequence $\{x_k\}_k$ can be obtained as a sequence $\{w_{2k}\}_k$). We set $\alpha _{2k+2,2k+1}=\beta _{2k+2,2k+1} = \eta _{k+1}$, $\alpha _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\alpha _{2k,i}$ for $i=1,\ldots ,2k-1$ and $\beta _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\beta _{2k,i}$ for $i\in \{0,\ldots ,2k-1\}\backslash \{2(k-1)\}$ as well as $\beta _{2k+2,2k} = -1$ and $\beta _{2k+2,2(k-1)} = \tfrac{t_{k-1}-1}{t_k}(1+\beta _{2k,2(k-1)})$.

This gives
$$\begin{aligned} w_{2(k+1)} =\,&w_{2k+1} + e_{2k} - \tfrac{t_{k-1}-1}{t_k}(e_{2(k-1)}) -\tfrac{t_{k-1}-1}{t_k}\displaystyle \sum _{i=1}^{2k-1}\alpha _{2k,i}v_{i} \\&-\tfrac{t_{k-1}-1}{t_k}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{k+1}(v_{2k+1}+e_{2k+1})\\ =&\, (1+\tfrac{t_{k-1}-1}{t_k})w_{2k} -\tfrac{t_{k-1}-1}{t_k}w_{2(k-1)} - \eta _{k+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$
which shows that $\{w_{2k}\}_k$ follows the same recursive equation as $\{x_{k}\}_k$. In addition, we have $w_0 = x_0$ and $w_2 = x_0 - \eta _{1}(v_1+e_1)$ similar to $x_1 = x_0 -\eta _1(g_1+r_1)$. Requiring ${{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \varepsilon _{k+1}$ (with the convention $w_{-1}=w_0$) allows to identify the primal-dual pair $(x_{k+1},g_{k+1})$ with $(w_{2k+2},v_{2k+1})$.

Finally, we can set $\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \varepsilon _{k+1}$ which is Gram-representable (similar to hybrid approximate extragradient algorithm).

Note that we can proceed similarly for IAPPA2 from [87, Section 5] with sequence $\{a_k\}_k$ constant equal to 1, by removing the sequence $\{r_k\}_k$ “type 2” errors).
The accelerated hybrid proximal extragradient algorithm (A-HPE) [68, Section 3] can be written as
$$\begin{aligned}\left\{ \begin{array}{rcl} a_{k+1} &{}=&{} \tfrac{\eta _{k+1}+\sqrt{\eta _{k+1}^2+4\eta _{k+1}A_k}}{2}\\ A_{k+1} &{}=&{} A_k + a_{k+1}\\ \tilde{x}_k &{}=&{} y_k + \tfrac{a_{k+1}}{A_{k+1}}(x_k-y_k)\\ y_{k+1} &{}=&{} \tilde{x}_k - \eta _{k+1}(g_{k+1}+r_{k+1})\\ x_{k+1} &{}=&{} x_k - a_{k+1}g_{k+1}, \end{array}\right. \end{aligned}$$
with $A_0=0$, $\{\eta _k\}_k$ a sequence of step sizes, $y_0=x_0\in \mathbb {R}^d$ along with an inexactmess criterion of the form ${{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(y_{k+1},g_{k+1};\tilde{x}_k) \leqslant \tfrac{\sigma }{2}{\Vert y_{k+1}-\tilde{x}_k\Vert ^2}$ given a parameter $\sigma \in [0,1]$. As in the previous examples, we search for a recursive equation followed by the sequence $\{y_k\}_k$. By performing multiple substitutions, we obtain
$$\begin{aligned} y_{k+2}=&\,\tilde{x}_{k+1} - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}x_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( x_{k}- a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\tilde{x}_k - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\left( y_{k+1}+\eta _{k+1}(g_{k+1}+r_{k+1}) \right) \right. \\&\left. - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( \tfrac{A_{k+1}}{A_{k+2}}+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( 1+\tfrac{a_{k+2}A_{k}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$
Similar to IAPPA1, we consider particular iterations from (2) of the form
$$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$
with initial iterate $w_0=x_0$. We aim at finding parameters $\alpha _{i,j}$, $\beta _{i,j}$ such that we can identify $\{w_{2k}\}_k$ with $\{y_k\}_k$ (i.e., any sequence $\{y_k\}_k$ can be obtained as a sequence $\{w_{2k}\}_k$). We set $\alpha _{2(k+1),2k+1}=\beta _{2(k+1),2k+1} = \eta _{k+1}$, $\alpha _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\alpha _{2k,i}$ for $i\in \{1,\ldots ,2(k-1)\}$ and $\beta _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\beta _{2k,i}$ for $i\in \{0,\ldots ,2k-3\}$ as well as $\beta _{2(k+1),2k} = -1$, $\beta _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}a_{k}}(A_{k-1}\beta _{2k,2k-1} -A_k\eta _{k} )$, $\beta _{2(k+1),2(k-1)} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(1+\beta _{2k,2(k-1)})$ and $\alpha _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k-1}}{a_k}\alpha _{2k,2k-1}-\tfrac{A_{k}}{a_{k}}\eta _{k} + a_{k}\right) $.

This gives
$$\begin{aligned} w_{2(k+1)} =&\,w_{2k+1} +e_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}\\ {}&+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=1}^{2k-1}\alpha _{2k,i}v_i \\ {}&- \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,w_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\ {}&+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(w_{2k}-w_{2(k-1)}+e_{2(k-1)}) - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,\left( 1+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\right) w_{2k}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}w_{2(k-1)}+\tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\&+\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \eta _{n+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$
which shows that $\{w_{2k}\}_k$ follows the same recursive equation as $\{y_{k}\}_k$. In addition, we have $w_0 = x_0 = y_0$ and $w_2 = y_0 - \eta _{1}(v_1+e_1)$ similar to $x_1 = x_0 -\eta _1(g_1+r_1)$. Requiring ${{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2(k+1)},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}$ allows to identify the primal-dual pair $(y_{k+1},g_{k+1})$ with $(w_{2(k+1)},v_{2k+1})$.

Finally, we set $\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}$ which is Gram-representable (similar to hybrid approximate extragradient algorithm).

B Interpolation with $\varvec{\varepsilon }$-subdifferentials

In this section, we provide the necessary interpolation result for working with $\varepsilon $-subdifferentials inside performance estimation problems.

Theorem B.1

Let I be a finite set of indices and $S=\{(w_i,v_i,h_i,\varepsilon _i)\}_{i\in I}$ with $w_i,v_i\in \mathbb {R}^d$, $h_i,\varepsilon _i\in \mathbb {R}$ for all $i\in I$. There exists $h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}(\mathbb {R}^d)$ satisfying

$$\begin{aligned} h_i = h(w_i),\text { and } v_i\in \partial _{\varepsilon _i}h(w_i) \text { for all } i\in I \end{aligned}$$

(24)

if and only if

$$\begin{aligned} \begin{aligned} h_i\geqslant h_j +{\langle v_j; w_i-w_j\rangle }-\varepsilon _j \end{aligned} \end{aligned}$$

(25)

holds for all $i,j\in I$.

Proof

$(\Rightarrow )$ Assuming $h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}$ and (24), the inequalities (25) hold by definition.

$(\Leftarrow )$ Assuming (25) hold, one can perform the following construction:

$$\begin{aligned} \begin{aligned} \tilde{h}(x)=\max _i\{h_i+{\langle v_i; x-w_i\rangle }-\varepsilon _i\}, \end{aligned} \end{aligned}$$

and one can easily check that $h=\tilde{h}\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}$ satisfies (24). $\square $

C Equivalence with Güler’s method

In this section, we show that optimized algorithm (ORI-PPA) and Güler’s second method [49, Section 6] are equivalent (i.e., produce the same iterates), in the case of exact proximal computations (i.e., $\sigma =0$).

We consider a constant sequence of step sizes $\{\lambda _k\}_k$ with $\lambda _k = \lambda >0$. In Güler’s second method, the sequence $\{\beta _k\}_k$ is defined as $\beta _1 = 1$ and

$$\begin{aligned} \beta _{k+1} = \tfrac{1+\sqrt{4\beta _k^2+1}}{2}.\end{aligned}$$

The sequence $\{A_k\}_k$ generated by (ORI-PPA) satisfies $A_0=0$ and

$$\begin{aligned}A_{k+1} = A_k+\tfrac{\lambda +\sqrt{4\lambda A_k + \lambda ^2}}{2},\quad k\geqslant 0. \end{aligned}$$

We can link together these two sequences through the following equality

$$\begin{aligned} \beta _{k} = \tfrac{A_{k}-A_{k-1}}{\lambda }, \quad k\geqslant 1. \end{aligned}$$

(26)

Let us prove it recursively. First, observe that $\beta _1 = 1$ and $\tfrac{A_1-A_0}{\lambda } = 1$. Then assuming that the property is true for some $k \geqslant 1$, we have

$$\begin{aligned} \beta _{k+1}&= \tfrac{1+\sqrt{4\beta _k^2 +1}}{2}\\&= \tfrac{1+\sqrt{4\tfrac{(A_{k+1}-A_k)^2}{\lambda ^2} +1}}{2}. \end{aligned}$$

One might notice that

$$\begin{aligned}(A_{k+1}-A_k)^2 = \tfrac{2\lambda ^2+4\lambda A_k+2\lambda \sqrt{4\lambda A_k + \lambda ^2}}{4}= \lambda A_{k+1},\end{aligned}$$

giving

$$\begin{aligned} \beta _{k+1}&= \tfrac{1+\sqrt{4\tfrac{A_{k+1}}{\lambda } +1}}{2}\\&= \tfrac{\lambda + \sqrt{4\lambda A_{k+1} + \lambda ^2}}{2\lambda }\\&= \tfrac{A_{k+2}-A_{k+1}}{\lambda }, \end{aligned}$$

and we finally arrive to (26). In the case $\sigma =0$, the iterations of (ORI-PPA) can be written as

$$\begin{aligned}\left\{ \begin{array}{ccl} y_k &{}=&{} x_k + \tfrac{\lambda }{A_{k+1}-A_k}(z_k-x_k) \\ x_{k+1} &{}=&{} \textrm{prox}_{\lambda h}(y_k)\\ z_{k+1} &{}=&{} z_k +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k}). \end{array}\right. \end{aligned}$$

Therefore, we can write

$$\begin{aligned} y_{k+1}&= x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\bigg (z_k +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k})-x_{k+1}\bigg )\\&=x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\bigg (x_k - \tfrac{A_{k+1}-A_k}{\lambda }(x_k-y_k) \\ {}&\quad +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k})-x_{k+1}\bigg )\\&=x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\left( \left( \tfrac{A_{k+1}-A_k}{\lambda }-1\right) (x_{k+1}-x_k) +\tfrac{A_{k+1}-A_k}{\lambda }(x_{k+1}-y_{k})\right) . \end{aligned}$$

Combining the last equality with (26) leads to

$$\begin{aligned} y_{k+1} = x_{k+1} + \tfrac{\beta _{k+1}-1}{\beta _{k+2}}(x_{k+1}-x_k) + \tfrac{\beta _{k+1}}{\beta _{k+2}}(x_{k+1}-y_k) \end{aligned}$$

which is exactly the update in Güler’s second method [49, Section 6] modulo a translation in the indices of the $\{y_k\}_k$ sequence (indeed in Güler’s method $y_1=x_0$ whereas in (ORI-PPA) $y_0=x_0$).

D Missing details in Theorem 1

The missing elements in the proof of Theorem 1 are presented bellow.

Proof

Let us rewrite the method in terms of a single sequence, by substitution of $y_k$ and $z_k$:

$$\begin{aligned} \begin{aligned} e_k&{:}{=}\tfrac{1}{\lambda _k}\left( y_{k-1} -\lambda _kg_k - x_k\right) \\ x_{k}&=\tfrac{\lambda _k}{A_{k}-A_{k-1}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k-1}(A_{i}-A_{i-1}) g_i\right) \\ {}&\quad +\left( 1-\tfrac{\lambda _k}{A_{k}-A_{k-1}}\right) x_{k-1} -\lambda _{k} (g_{k}+e_{k}),\\ \end{aligned} \end{aligned}$$

(27)

and let us state the following identity on the $A_k$ coefficients

$$\begin{aligned} \lambda _{k+1}A_{k+1}=(A_{k+1}-A_k)^2 \text { (for}\, k\geqslant 0\text {)}. \end{aligned}$$

(28)

We prove the desired convergence result by induction. First, for $N=1$

$$\begin{aligned} \begin{aligned} 0 \geqslant&\, \nu _{\star ,1}\Bigg [h(u_1)-h(x_\star ) + {\langle g_1; x_\star -u_1\rangle }\Bigg ] + \nu _{1,1}\Bigg [h(u_1)-h(x_1)+{\langle g_1; x_1-u_1\rangle }\Bigg ] \\&+ \nu _1\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2} + h(x_1) - h(u_1) -{\langle g_1; x_1-u_1\rangle }\Bigg ] \end{aligned} \end{aligned}$$

with $\nu _{\star ,1}= \tfrac{A_1-A_0}{1+\sigma }=\tfrac{A_1}{1+\sigma }$ as $A_0=0$, $\nu _{1,1} = \tfrac{(1-\sigma )A_1}{\sigma (1+\sigma )}$ and $\nu _1 = \tfrac{A_1}{\sigma (1+\sigma )}$. This gives

$$\begin{aligned} 0 \geqslant&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{A_1}{1+\sigma }{\langle g_1; x_\star -x_1\rangle } + \tfrac{A_1}{\sigma (1+\sigma )}\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\Bigg ]\\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{A_1}{1+\sigma }{\langle g_1; x_\star -x_0 + \lambda _1(g_1+e_1)\rangle }\\ {}&+ \tfrac{A_1}{\sigma (1+\sigma )}\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\Bigg ]\\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{2}{\langle 2\tfrac{A_1}{1+\sigma }g_1; x_\star -x_0\rangle } + {\langle \tfrac{A_1}{1+\sigma }g_1; \lambda _1(g_1+e_1)\rangle }\\ {}&+ \tfrac{A_1}{\sigma (1+\sigma )}\left[ \tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\right] \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} -{\Vert \tfrac{A_1}{1+\sigma }g_1\Vert ^2} \\ {}&+{\langle \tfrac{A_1}{1+\sigma }g_1; \lambda _1(g_1+e_1)\rangle }+ \tfrac{A_1}{\sigma (1+\sigma )}\left[ \tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\right] \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{1+\sigma }{\langle g_1; e_1\rangle } + \tfrac{A_1}{1+\sigma }\left( -\tfrac{A_1}{1+\sigma } + \lambda _1 - \tfrac{\lambda _1\sigma }{2} \right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2} + \tfrac{A_1}{1+\sigma }\left( -\tfrac{A_1}{1+\sigma } + \lambda _1 - \tfrac{\lambda _1\sigma }{2} -\tfrac{\lambda _1(1-\sigma )\sigma }{2(1+\sigma )}\right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2} + \tfrac{A_1}{1+\sigma }\left( \tfrac{\lambda _1-A_1}{1+\sigma }\right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2}\\ {}&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2}, \end{aligned}$$

where we used in the last line that $A_1 = \lambda _1$.

Now, assuming the weighted sum can be reformulated as the desired inequality for $N=k$, that is:

$$\begin{aligned} 0\geqslant&\, \tfrac{A_k}{1+\sigma }(h(x_k)-h_\star ) -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1})g_i\Vert ^2} \\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}, \end{aligned}$$

let us prove it also holds true for $N=k+1$. Noticing that the weighted sum for $k+1$ is exactly the weighted sum for k (which can be reformulated as desired, through our induction hypothesis) with 4 additional inequalities, we get the following valid inequality

$$\begin{aligned} 0\geqslant&\,\tfrac{A_k}{1+\sigma }(h(x_k)-h_\star ) -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1})g_i\Vert ^2} \\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2} \\&+ \tfrac{A_{k+1}-A_k}{1+\sigma }\Bigg [h(u_{k+1})-h_\star + {\langle g_{k+1}; x_\star -u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{(1-\sigma )A_{k+1}}{(1+\sigma )\sigma }\Bigg [h(u_{k+1})-h(x_{k+1}) + {\langle g_{k+1}; x_{k+1}-u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{A_k}{1+\sigma }\Bigg [h(u_{k+1})-h(x_{k}) + {\langle g_{k+1}; x_k-u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{A_{k+1}}{(1+\sigma )\sigma }\left[ \tfrac{\lambda _{k+1}}{2}{\Vert e_{k+1}\Vert ^2} - \tfrac{\lambda _{k+1}\sigma ^2}{2}{\Vert e_{k+1}+g_{k+1}\Vert ^2} + h(x_{k+1}) - h(u_{k+1})\right. \\&\left. -{\langle g_{k+1}; x_{k+1}-u_{k+1}\rangle }\right] . \end{aligned}$$

By grouping all function values we get the following simplification:

$$\begin{aligned}&\left[ \tfrac{A_k}{1+\sigma }-\tfrac{A_k}{1+\sigma }\right] h(x_k)+\tfrac{A_{k+1}}{1+\sigma } \left[ \tfrac{1}{\sigma }-\tfrac{1-\sigma }{\sigma }\right] (h(x_{k+1})-h_\star )\\&\quad +\tfrac{1}{1+\sigma }\left[ A_{k+1}-A_k + \tfrac{1-\sigma }{\sigma }A_{k+1} + A_k - \tfrac{1}{\sigma }A_{k+1}\right] h(u_{k+1}) =\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ), \end{aligned}$$

where $h(x_k)$ and $h(u_{k+1})$ cancelled out. The remaining inequality is therefore

$$\begin{aligned} 0\geqslant{} & {} \,\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\nonumber \\{} & {} + \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \nonumber \\ {}{} & {} + \tfrac{1}{1+\sigma }{\langle g_{k+1}; (A_{k+1}-A_k)(x_\star -u_{k+1}) - A_{k+1}(x_{k+1}-u_{k+1})+A_k(x_k-u_{k+1})\rangle }\nonumber \\ ={} & {} \, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\nonumber \\{} & {} + \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \nonumber \\ {}{} & {} + \tfrac{1}{1+\sigma }{\langle g_{k+1}; (A_{k+1}-A_k)x_\star - A_{k+1}x_{k+1} + A_kx_k\rangle }. \end{aligned}$$

(29)

Then, by using (28), one can observe that

$$\begin{aligned} A_{k+1}x_{k+1} =&\, \tfrac{A_{k+1}\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) +\left( A_{k+1}-\tfrac{A_{k+1}\lambda _{k+1}}{A_{k+1}-A_{k}}\right) x_{k} \\&-A_{k+1}\lambda _{k+1} (g_{k+1}+e_{k+1})\\ =&\, (A_{k+1}-A_k)\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) + A_kx_k\\&\quad -A_{k+1}\lambda _{k+1} (g_{k+1}+e_{k+1}), \end{aligned}$$

and by re-injecting this inside the last line of (29), we get

$$\begin{aligned} 0\geqslant&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \\ {}&+ \tfrac{1}{1+\sigma }{\langle (A_{k+1}-A_k)g_{k+1}; x_\star - x_0 +\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i \rangle }\\&+ \tfrac{A_{k+1}\lambda _{k+1}}{1+\sigma }{\langle g_{k+1}; (g_{k+1}+e_{k+1})\rangle }. \end{aligned}$$

We can then proceed in a similar manner as in the case $k=1$ for factorizing the quadratic terms,

$$\begin{aligned} 0\geqslant&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \\ {}&-\tfrac{(A_{k+1}-A_k)^2}{(1+\sigma )^2}{\Vert g_{k+1}\Vert ^2} + \tfrac{A_{k+1}\lambda _{k+1}}{1+\sigma }{\langle g_{k+1}; (g_{k+1}+e_{k+1})\rangle }\\ =&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}(1-\sigma )}{2\sigma }{\Vert e_{k+1}+\tfrac{\sigma }{1+\sigma }g_{k+1}\Vert ^2} \\ {}&+\left[ \tfrac{A_{k+1}\lambda _{k+1}}{(1+\sigma )}-\tfrac{A_{k+1}\lambda _{k+1}\sigma }{2(1+\sigma )}-\tfrac{(A_{k+1}-A_k)^2}{(1+\sigma )^2} - \tfrac{A_{k+1}\lambda _{k+1}\sigma (1-\sigma )}{2(1+\sigma )^2}\right] {\Vert g_{k+1}\Vert ^2} \\ =&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^{k+1} A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+\tfrac{A_{k+1}\lambda _{k+1}}{(1+\sigma )}\Bigg [1-\tfrac{\sigma }{2}-\tfrac{1}{(1+\sigma )} - \tfrac{\sigma (1-\sigma )}{2(1+\sigma )}\Bigg ]{\Vert g_{k+1}\Vert ^2}\\ =&\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^{k+1} A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}, \end{aligned}$$

since $1-\tfrac{\sigma }{2}-\tfrac{1}{1+\sigma } - \tfrac{\sigma (1-\sigma )}{2(1+\sigma )} = 0$ and this concludes the proof. $\square $

E Tightness of Theorem 1

Proof

The guarantee for (ORI-PPA) provided by Theorem 1 is non-improvable. That is, for all $\{\lambda _k\}_k$ with $\lambda _k>0$, $\sigma \in [0,1]$, $d\in \mathbb {N}$, $x_0\in \mathbb {R}^d$, and $N\in \mathbb {N}$, there exists $f\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)$ such that this bound is achieved with equality. For proving this statement, it is sufficient to exhibit a one-dimensional function for which the bound is attained, which is what we do below. The bound is attained on the one-dimensional (constrained on the nonnegative orthant) linear minimization problem

$$\begin{aligned} \min _x\, \{f(x)\triangleq c\, x + i_{\mathbb {R}_+}(x)\}, \end{aligned}$$

(30)

with an appropriate choice of $c> 0$, where $i_{\mathbb {R}_+}$ denotes the convex indicator function of $\mathbb {R}_+$. Indeed, one can check that the relative error criterion

$$\begin{aligned} \exists u_{k}\in \mathbb {R}_+,\;\tfrac{\lambda _k}{2}{\Vert e_k\Vert ^2} + f(x_k)-f(u_k) -{\langle g_k; x_k-u_k\rangle }\leqslant \tfrac{\lambda _k\sigma ^2}{2}{\Vert e_k+g_{k}\Vert ^2} \end{aligned}$$

is satisfied with equality when picking $g_{k}=c$ ($g_k$ is thus a subgradient at $x_k\geqslant 0$), $u_k=x_k$, and $e_{k}=-\tfrac{c\sigma }{1+\sigma }$; and hence $x_{k}=y_{k-1}-\tfrac{c\lambda _k}{1+\sigma }$. The argument is then as follows: if for some $x_0>0$ and $0\leqslant h\leqslant x_0/c$ we manage to show that $x_N=x_0-c h$, then $f(x_N)-f(x_\star )=c(x_0-c h)$ and hence the value of c producing the worst possible (maximal) value of $f(x_N)$ is $c=\tfrac{x_0}{2h}$. In that case, the resulting value is $f(x_N)-f(x_\star )=\tfrac{x_0^2}{4h}$. Therefore, in order to prove that the guarantee from Theorem 1 cannot be improved, we show that $x_N=x_0-\tfrac{A_N}{1+\sigma }c$ on the linear problem (30). It is easy to show that $x_1=x_0-\tfrac{A_1}{1+\sigma }c$ using $A_1=\lambda _1$. The argument follows by induction: assuming $x_{k}=x_0-\tfrac{A_k}{1+\sigma }c$, one can compute

$$\begin{aligned} x_{k+1}= & {} \, \tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) +\left( 1-\tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\right) x_{k} \\ {}{} & {} -\lambda _{k+1} (g_{k+1}+e_{k+1})\\= & {} \, \tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2c}{1+\sigma }A_{k}\right) +\left( 1-\tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\right) \left( x_0-\tfrac{A_k}{1+\sigma }c\right) -\lambda _{k+1} \tfrac{c}{1+\sigma }\\= & {} \,x_0-\tfrac{c}{1+\sigma }\tfrac{2\lambda _{k+1}A_k + (A_{k+1}-A_k)A_k - \lambda _{k+1}A_k + \lambda _{k+1}(A_{k+1}-A_k)}{A_{k+1}-A_k}\\= & {} \,x_0-\tfrac{c}{1+\sigma }\tfrac{ (A_{k+1}-A_k)A_k + \lambda _{k+1}A_{k+1}}{A_{k+1}-A_k}\\= & {} \,x_0-\tfrac{c}{1+\sigma }A_{k+1}, \end{aligned}$$

where the second equality follows from simple substitutions, and the last equalities follow from basic algebra and $\lambda _{k+1}A_{k+1} = (A_{k+1}-A_k)^2$. The desired statement is proved by picking $c=\tfrac{(1+\sigma )x_0}{2A_N}$, reaching $f(x_N)-f(x_\star ) = \tfrac{(1+\sigma )x_0^2}{A_N}$. $\square $

F Tightness of Theorem 2

Proof

We show that the guarantee provided in Theorem 2 is non-improvable. That is, for all $\mu \geqslant 0$, $\{\lambda _k\}_k$ with $\lambda _k\geqslant 0$, $\sigma \in [0,1]$, $d\in \mathbb {N}$, $w_0\in \mathbb {R}^d$, and $N\in \mathbb {N}$, there exists $h\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)$ such that this bound is achieved with equality. Indeed, the bound is attained on the simple quadratic minimization problem

$$\begin{aligned} \min _x \{h(x) \triangleq \tfrac{\mu }{2}{\Vert x\Vert ^2}\}. \end{aligned}$$

(31)

We can check that the relative error criterion

$$\begin{aligned} \tfrac{\lambda _k}{2}{\Vert e_k\Vert ^2} \leqslant \tfrac{\sigma ^2\lambda _k}{2}{\Vert e_k+v_k\Vert ^2}, \end{aligned}$$

is satisfied with equality when picking $v_k = \nabla h(w_{k+1}) = \mu w_{k+1}$ and $e_k = -\tfrac{\sigma }{1+\sigma }v_k$. Under these choices, one can write

$$\begin{aligned}w_{k+1} = w_k -\tfrac{\lambda _{k+1}\mu }{1+\sigma }w_{k+1}, \end{aligned}$$

which leads to $w_{k+1} = \tfrac{1+\sigma }{1+\sigma + \lambda _{k+1}}w_k$, hence

$$\begin{aligned} w_{N} = \prod _{i=1}^{N}\tfrac{1+\sigma }{1+\sigma +\lambda _{i}\mu }w_0,\end{aligned}$$

and the desired result follows. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Barré, M., Taylor, A.B. & Bach, F. Principled analyses and design of first-order methods with inexact proximal operators. Math. Program. 201, 185–230 (2023). https://doi.org/10.1007/s10107-022-01903-7

Download citation

Received: 27 May 2021
Accepted: 07 October 2022
Published: 17 December 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s10107-022-01903-7

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Principled analyses and design of first-order methods with inexact proximal operators

Abstract

Access this article

Similar content being viewed by others

On a primal-dual Newton proximal method for convex quadratic programs

Relaxed-inertial proximal point type algorithms for quasiconvex minimization

Level constrained first order methods for function constrained optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A More examples of fixed-step inexact proximal methods

B Interpolation with \(\varvec{\varepsilon }\)-subdifferentials

Theorem B.1

Proof

C Equivalence with Güler’s method

D Missing details in Theorem 1

Proof

E Tightness of Theorem 1

Proof

F Tightness of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Mathematics Subject Classification

Navigation

Principled analyses and design of first-order methods with inexact proximal operators

Abstract

Access this article

Similar content being viewed by others

On a primal-dual Newton proximal method for convex quadratic programs

Relaxed-inertial proximal point type algorithms for quasiconvex minimization

Level constrained first order methods for function constrained optimization

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A More examples of fixed-step inexact proximal methods

B Interpolation with \(\varvec{\varepsilon }\)-subdifferentials

Theorem B.1

Proof

C Equivalence with Güler’s method

D Missing details in Theorem 1

Proof

E Tightness of Theorem 1

Proof

F Tightness of Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Mathematics Subject Classification

Search

Navigation