Skip to main content
Log in

Principled analyses and design of first-order methods with inexact proximal operators

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

Proximal operations are among the most common primitives appearing in both practical and theoretical (or high-level) optimization methods. This basic operation typically consists in solving an intermediary (hopefully simpler) optimization problem. In this work, we survey notions of inaccuracies that can be used when solving those intermediary optimization problems. Then, we show that worst-case guarantees for algorithms relying on such inexact proximal operations can be systematically obtained through a generic procedure based on semidefinite programming. This methodology is primarily based on the approach introduced by Drori and Teboulle (Math Program 145(1–2):451–482, 2014) and on convex interpolation results, and allows producing non-improvable worst-case analyses. In other words, for a given algorithm, the methodology generates both worst-case certificates (i.e., proofs) and problem instances on which they are achieved. Relying on this methodology, we study numerical worst-case performances of a few basic methods relying on inexact proximal operations including accelerated variants, and design a variant with optimized worst-case behavior. We further illustrate how to extend the approach to support strongly convex objectives by studying a simple relatively inexact proximal minimization method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ajalloeian, A., Simonetto, A., Dall’Anese, E.: Inexact online proximal-gradient method for time-varying convex optimization. In: 2020 American Control Conference (ACC), pp. 2850–2857. IEEE (2020)

  2. Alves, M.M., Eckstein, J., Geremia, M., Melo, J.: Relative-error inertial-relaxed inexact versions of Douglas–Rachford and ADMM splitting algorithms. Preprint arXiv:1904.10502 (2019)

  3. Alves, M.M., Marcavillaca, R.T.: On inexact relative-error hybrid proximal extragradient, forward-backward and Tseng’s modified forward-backward methods with inertial effects. Set-Valued Var. Anal. 28, 301–325 (2020)

    MathSciNet  MATH  Google Scholar 

  4. Auslender, A.: Numerical methods for nondifferentiable convex optimization. In: Nonlinear Analysis and Optimization, pp. 102–126. Springer (1987)

  5. Barré, M., Taylor, A., d’Aspremont, A.: Complexity guarantees for Polyak steps with momentum. In: Conference on Learning Theory, pp. 452–478. PMLR (2020)

  6. Bastianello, N., Ajalloeian, A., Dall’Anese, E.: Distributed and inexact proximal gradient method for online convex optimization. arXiv preprint arXiv:2001.00870 (2020)

  7. Bauschke, H.H., Combettes, P.L.: Convex analysis and monotone operator theory in Hilbert spaces, vol. 408. Springer, Berlin (2011)

    MATH  Google Scholar 

  8. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)

    MathSciNet  MATH  Google Scholar 

  9. Bello-Cruz, Y., Gonçalves, M.L.N., Krislock, N.: On inexact accelerated proximal gradient methods with relative error rules. Preprint arXiv:2005.03766 (2020)

  10. Boţ, R.I., Csetnek, E.R.: A hybrid proximal-extragradient algorithm with inertial effects. Numer. Funct. Anal. Optim. 36(8), 951–963 (2015)

    MathSciNet  MATH  Google Scholar 

  11. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)

    MATH  Google Scholar 

  12. Brøndsted, A., Rockafellar, R.T.: On the subdifferentiability of convex functions. Proc. Am. Math. Soc. 16(4), 605–611 (1965)

    MathSciNet  MATH  Google Scholar 

  13. Bruck, R.E., Jr.: An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bull. Am. Math. Soc. 81(5), 890–892 (1975)

    MathSciNet  MATH  Google Scholar 

  14. Burachik, R.S., Iusem, A.N., Svaiter, B.F.: Enlargement of monotone operators with applications to variational inequalities. Set-Valued Anal. 5(2), 159–180 (1997)

    MathSciNet  MATH  Google Scholar 

  15. Burachik, R.S., Martínez-Legaz, J.E., Rezaie, M., Théra, M.: An additive subfamily of enlargements of a maximally monotone operator. Set-Valued Var. Anal. 23(4), 643–665 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: \(\varepsilon \)-enlargements of maximal monotone operators: Theory and applications. In: Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp. 25–43. Springer (1998)

  17. Burachik, R.S., Sagastizábal, C.A., Svaiter, B.F.: Bundle methods for maximal monotone operators. In: Ill-Posed Variational Problems and Regularization Techniques, pp. 49–64. Springer (1999)

  18. Burke, J., Qian, M.: A variable metric proximal point algorithm for monotone operators. SIAM J. Control. Optim. 37(2), 353–375 (1999)

    MathSciNet  MATH  Google Scholar 

  19. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)

    MathSciNet  MATH  Google Scholar 

  20. Chierchia, G., Chouzenoux, E., Combettes, P.L., Pesquet, J.C.: The proximity operator repository. User’s guide (2020). http://proximity-operator.net/download/guide.pdf

  21. Combettes, P.L., Pesquet, J.C.: Proximal splitting methods in signal processing. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 185–212. Springer (2011)

  22. Cominetti, R.: Coupling the proximal point algorithm with approximation methods. J. Optim. Theory Appl. 95(3), 581–600 (1997)

    MathSciNet  MATH  Google Scholar 

  23. Correa, R., Lemaréchal, C.: Convergence of some algorithms for convex minimization. Math. Program. 62(1–3), 261–275 (1993)

    MathSciNet  MATH  Google Scholar 

  24. Cyrus, S., Hu, B., Van Scoy, B., Lessard, L.: A robust accelerated optimization algorithm for strongly convex functions. In: 2018 Annual American Control Conference (ACC), pp. 1376–1381 (2018)

  25. de Klerk, E., Glineur, F., Taylor, A.B.: On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11(7), 1185–1199 (2017)

    MathSciNet  MATH  Google Scholar 

  26. de Klerk, E., Glineur, F., Taylor, A.B.: Worst-case convergence analysis of inexact gradient and newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3), 2053–2082 (2020)

    MathSciNet  MATH  Google Scholar 

  27. Devolder, O.: First-order methods with inexact oracle: the strongly convex case. CORE Discussion Papers (2013)

  28. Devolder, O., Glineur, F., Nesterov, Y.: First-order methods of smooth convex optimization with inexact oracle. Math. Program. 146(1–2), 37–75 (2014)

    MathSciNet  MATH  Google Scholar 

  29. Dixit, R., Bedi, A.S., Tripathi, R., Rajawat, K.: Online learning with inexact proximal online gradient descent algorithms. IEEE Trans. Signal Process. 67(5), 1338–1352 (2019)

    MathSciNet  MATH  Google Scholar 

  30. Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82, 421–439 (1956)

    MathSciNet  MATH  Google Scholar 

  31. Dragomir, R.A., Taylor, A.B., d’Aspremont, A., Bolte, J.: Optimal complexity and certification of Bregman first-order methods. Math. Program. 194, 41–83 (2022)

    MathSciNet  MATH  Google Scholar 

  32. Drori, Y.: Contributions to the complexity analysis of optimization algorithms. Ph.D. thesis, Tel-Aviv University (2014)

  33. Drori, Y., Taylor, A.B.: Efficient first-order methods for convex minimization: a constructive approach. Math. Program. 184(1), 183–220 (2020)

    MathSciNet  MATH  Google Scholar 

  34. Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)

    MathSciNet  MATH  Google Scholar 

  35. Drori, Y., Teboulle, M.: An optimal variant of Kelley’s cutting-plane method. Math. Program. 160(1–2), 321–351 (2016)

    MathSciNet  MATH  Google Scholar 

  36. Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, Massachusetts Institute of Technology (1989)

  37. Eckstein, J.: Approximate iterations in Bregman-function-based proximal algorithms. Math. Program. 83(1–3), 113–123 (1998)

    MathSciNet  MATH  Google Scholar 

  38. Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55(1–3), 293–318 (1992)

    MathSciNet  MATH  Google Scholar 

  39. Eckstein, J., Silva, P.J.: A practical relative error criterion for augmented Lagrangians. Math. Program. 141(1–2), 319–348 (2013)

    MathSciNet  MATH  Google Scholar 

  40. Eckstein, J., Yao, W.: Augmented Lagrangian and alternating direction methods for convex optimization: a tutorial and some illustrative computational results. RUTCOR Res. Rep. 32(3), 44 (2012)

    Google Scholar 

  41. Eckstein, J., Yao, W.: Approximate ADMM algorithms derived from Lagrangian splitting. Comput. Optim. Appl. 68(2), 363–405 (2017)

    MathSciNet  MATH  Google Scholar 

  42. Eckstein, J., Yao, W.: Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Math. Program. 170(2), 417–444 (2018)

    MathSciNet  MATH  Google Scholar 

  43. Fortin, M., Glowinski, R.: On decomposition-coordination methods using an Augmented Lagrangian. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)

    MATH  Google Scholar 

  44. Fuentes, M., Malick, J., Lemaréchal, C.: Descentwise inexact proximal algorithms for smooth optimization. Comput. Optim. Appl. 53(3), 755–769 (2012)

    MathSciNet  MATH  Google Scholar 

  45. Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)

    MATH  Google Scholar 

  46. Golub, G.H., Van Loan, C.F.: Matrix Computations. JHU Press, Baltimore (1996)

  47. Gu, G., Yang, J.: On the optimal ergodic sublinear convergence rate of the relaxed proximal point algorithm for variational inequalities. Preprint arXiv:1905.06030 (2019)

  48. Gu, G., Yang, J.: Optimal nonergodic sublinear convergence rate of proximal point algorithm for maximal monotone inclusion problems. Preprint arXiv:1904.05495 (2019)

  49. Güler, O.: New proximal point algorithms for convex minimization. SIAM J. Optim. 2(4), 649–664 (1992)

    MathSciNet  MATH  Google Scholar 

  50. Hu, B., Lessard, L.: Dissipativity theory for Nesterov’s accelerated method. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1549–1557. JMLR (2017)

  51. Iusem, A.N.: Augmented Lagrangian methods and proximal point methods for convex optimization. Investig. Oper. 8(11–49), 7 (1999)

    Google Scholar 

  52. Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190, 57–87 (2021)

    MathSciNet  MATH  Google Scholar 

  53. Kim, D., Fessler, J.A.: Optimized first-order methods for smooth convex minimization. Math. Program. 159(1–2), 81–107 (2016)

    MathSciNet  MATH  Google Scholar 

  54. Kim, D., Fessler, J.A.: Another look at the fast iterative shrinkage/thresholding algorithm (FISTA). SIAM J. Optim. 28(1), 223–250 (2018)

    MathSciNet  MATH  Google Scholar 

  55. Kim, D., Fessler, J.A.: Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1), 192–219 (2021)

    MathSciNet  MATH  Google Scholar 

  56. Lemaire, B.: About the convergence of the proximal method. In: Advances in Optimization, pp. 39–51. Springer (1992)

  57. Lessard, L., Recht, B., Packard, A.: Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1), 57–95 (2016)

    MathSciNet  MATH  Google Scholar 

  58. Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett. 15, 405–418 (2021)

    MathSciNet  MATH  Google Scholar 

  59. Lin, H., Mairal, J., Harchaoui, Z.: A universal catalyst for first-order optimization. In: Advances in Neural Information Processing Systems, pp. 3384–3392 (2015)

  60. Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)

    MathSciNet  MATH  Google Scholar 

  61. Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    MathSciNet  MATH  Google Scholar 

  62. Löfberg, J.: YALMIP: a toolbox for modeling and optimization in MATLAB. In: Proceedings of the CACSD Conference (2004)

  63. Martinet, B.: Régularisation d’inéquations variationnelles par approximations successives. Revue Française d’Informatique et de Recherche Opérationnelle 4, 154–158 (1970)

    MATH  Google Scholar 

  64. Martinet, B.: Détermination approchée d’un point fixe d’une application pseudo-contractante. cas de l’application prox. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 274, 163–165 (1972)

    MathSciNet  MATH  Google Scholar 

  65. Megretski, A., Rantzer, A.: System analysis via integral quadratic constraints. IEEE Trans. Autom. Control 42(6), 819–830 (1997)

    MathSciNet  MATH  Google Scholar 

  66. Millán, R.D., Machado, M.P.: Inexact proximal \(epsilon\)-subgradient methods for composite convex optimization problems. J. Global Optim. 75(4), 1029–1060 (2019)

    MathSciNet  MATH  Google Scholar 

  67. Monteiro, R.D., Svaiter, B.F.: On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean. SIAM J. Optim. 20(6), 2755–2787 (2010)

    MathSciNet  MATH  Google Scholar 

  68. Monteiro, R.D., Svaiter, B.F.: An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM J. Optim. 23(2), 1092–1125 (2013)

    MathSciNet  MATH  Google Scholar 

  69. Moreau, J.J.: Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes rendus hebdomadaires des séances de l’Académie des sciences de Paris 255, 2897–2899 (1962)

    MathSciNet  MATH  Google Scholar 

  70. Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93, 273–299 (1965)

    MathSciNet  MATH  Google Scholar 

  71. Mosek, A.: The MOSEK optimization software. http://www.mosek.com54 (2010)

  72. Nemirovski, A.: Prox-method with rate of convergence \(o(1/t)\) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1), 229–251 (2004)

    MathSciNet  MATH  Google Scholar 

  73. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(\(1/k^2\)). Soviet Math. Doklady 27, 372–376 (1983)

    MATH  Google Scholar 

  74. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    MathSciNet  MATH  Google Scholar 

  75. Nesterov, Y.: Inexact accelerated high-order proximal-point methods. Technical report, CORE discussion paper (2020)

  76. Nesterov, Y.: Inexact high-order proximal-point methods with auxiliary search procedure. Technical report, CORE discussion paper (2020)

  77. Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Google Scholar 

  78. Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    MathSciNet  MATH  Google Scholar 

  79. Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)

    Google Scholar 

  80. Rockafellar, R.T.: A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 5(1), 354–373 (1973)

    MathSciNet  MATH  Google Scholar 

  81. Rockafellar, R.T.: Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2), 97–116 (1976)

    MathSciNet  MATH  Google Scholar 

  82. Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control. Optim. 14(5), 877–898 (1976)

    MathSciNet  MATH  Google Scholar 

  83. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1996)

    Google Scholar 

  84. Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)

    MathSciNet  MATH  Google Scholar 

  85. Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020)

    MathSciNet  MATH  Google Scholar 

  86. Ryu, E.K., Vũ, B.C.: Finding the forward-Douglas–Rachford-forward method. J. Optim. Theory Appl. 184, 858–876 (2019)

    MathSciNet  MATH  Google Scholar 

  87. Salzo, S., Villa, S.: Inexact and accelerated proximal point algorithms. J. Convex Anal. 19(4), 1167–1192 (2012)

    MathSciNet  MATH  Google Scholar 

  88. Schmidt, M., Le Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in neural information processing systems (NIPS), pp. 1458–1466 (2011)

  89. Simonetto, A., Jamali-Rad, H.: Primal recovery from consensus-based dual decomposition for distributed convex optimization. J. Optim. Theory Appl. 168(1), 172–197 (2016)

    MathSciNet  MATH  Google Scholar 

  90. Solodov, M.V., Svaiter, B.F.: A hybrid approximate extragradient-proximal point algorithm using the enlargement of a maximal monotone operator. Set-Valued Anal. 7(4), 323–345 (1999)

    MathSciNet  MATH  Google Scholar 

  91. Solodov, M.V., Svaiter, B.F.: A hybrid projection-proximal point algorithm. J. Convex Anal. 6(1), 59–70 (1999)

    MathSciNet  MATH  Google Scholar 

  92. Solodov, M.V., Svaiter, B.F.: A comparison of rates of convergence of two inexact proximal point algorithms. In: Nonlinear optimization and related topics, pp. 415–427. Springer (2000)

  93. Solodov, M.V., Svaiter, B.F.: Error bounds for proximal point subproblems and associated inexact proximal point algorithms. Math. Program. 88(2), 371–389 (2000)

    MathSciNet  MATH  Google Scholar 

  94. Solodov, M.V., Svaiter, B.F.: An inexact hybrid generalized proximal point algorithm and some new results on the theory of Bregman functions. Math. Oper. Res. 25(2), 214–230 (2000)

    MathSciNet  MATH  Google Scholar 

  95. Solodov, M.V., Svaiter, B.F.: A unified framework for some inexact proximal point algorithms. Numer. Funct. Anal. Optim. 22(7–8), 1013–1035 (2001)

    MathSciNet  MATH  Google Scholar 

  96. Sturm, J.F.: Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones. Optim. Methods Softw. 11–12, 625–653 (1999)

  97. Svaiter, B.F.: A weakly convergent fully inexact Douglas–Rachford method with relative error tolerance. Preprint arXiv:1809.02312 (2018)

  98. Taylor, A., Bach, F.: Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In: Proceedings of the Thirty-Second Conference on Learning Theory (COLT), vol. 99, pp. 2934–2992. PMLR (2019)

  99. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017)

    MathSciNet  MATH  Google Scholar 

  100. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In: IEEE 56th Annual Conference on Decision and Control (CDC), pp. 1278–1283 (2017)

  101. Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)

    MathSciNet  MATH  Google Scholar 

  102. Toker, O., Ozbay, H.: On the np-hardness of solving bilinear matrix inequalities and simultaneous stabilization with static output feedback. In: 1995 Annual American Control Conference (ACC), vol. 4, pp. 2525–2526 (1995)

  103. Van Scoy, B., Freeman, R.A., Lynch, K.M.: The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1), 49–54 (2018)

    MathSciNet  Google Scholar 

  104. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    MathSciNet  MATH  Google Scholar 

  105. Zong, C., Tang, Y., Cho, Y.: Convergence analysis of an inexact three-operator splitting algorithm. Symmetry 10(11), 563 (2018)

    MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Ernest Ryu for insightful feedbacks on a preliminary version of this manuscript. The authors also thank the two referees and the associate editor who helped improving this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mathieu Barré.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

MB acknowledges support from an AMX fellowship. The authors acknowledge support from the European Research Council (grant SEQUOIA 724063).This work was funded in part by the french government under management of Agence Nationale de la recherche as part of the “Investissements d’avenir” program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Appendices

A More examples of fixed-step inexact proximal methods

This section extends the list of examples from Sect. 3.1.2.

  • The hybrid approximate extragradient algorithm (see [90] or [67, Section 4]) can be described as

    $$\begin{aligned} x_{k+1}=x_k-\eta _{k+1}g_{k+1},\end{aligned}$$

    such that \(\exists u_{k+1}, {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(u_{k+1},g_{k+1};\,x_{k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert u_{k+1}-x_{k}\Vert ^2}\) (see Lemma 1 for a link between \(\varepsilon \)-subgradient formulation and primal-dual gap). One iteration of this form can be artificially cast into three iterations of (2) as

    $$\begin{aligned}\left\{ \begin{array}{rcl} w_{3k+1} &{}=&{} w_{3k}-e_{3k}\\ w_{3k+2} &{}=&{} w_{3k+1}-e_{3k+1}\\ w_{3k+3} &{}=&{} w_{3k+2} +e_{3k}+e_{3k+1} - \eta _{k+1}v_{3k+2} \\ \end{array}\right. \end{aligned}$$

    with \(v_{3k+2} \in \partial h(w_{3k+2})\) . This corresponds to setting \(\lambda _{3k+1} = \lambda _{3k+2} =\lambda _{3k+3} = 0\), \(\alpha _{3k+3,3k+2}=\eta _{k+1}\), \(\beta _{3k+1,3k} = \beta _{3k+2,3k+1}=1\), \(\beta _{3k+3,3k+1}=\beta _{3k+3,3k+2} = -1\) and the other parameters to zero. Notice that \(w_{3k+3} = w_{3k} - \eta _{k+1}v_{3k+2}\). By requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}\) we can identify the primal-dual pair \((u_{k+1},g_{k+1})\) with \((w_{3k+1},v_{3k+2})\) and iterates \(x_{k+1}\) with \(w_{3k+3}\). In addition, we set

    $$\begin{aligned} \text {EQ}_{3k+1}&=0,\\ \text {EQ}_{3k+2}&=0,\\ \text {EQ}_{3k+3}&= {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{3k+1},v_{3k+2};\,w_{3k}) - \tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}. \end{aligned}$$

    Using \(v_{3k+2}\in \partial h (w_{3k+2})\), we have \(h^*(v_{3k+2}) = {\langle v_{3k+2}; w_{3k+2}\rangle } - h(w_{3k+2})\) and thus

    $$\begin{aligned} \text {EQ}_{3k+3} =&\,\tfrac{1}{2}{\Vert w_{3k+1}-w_{3k+3}\Vert ^2}+\eta _{k+1}(h(w_{3k+1})-h(w_{3k+2}) \\&- {\langle v_{3k+2}; w_{3k+1}-w_{3k+2}\rangle })-\tfrac{\sigma ^2}{2}{\Vert w_{3k+1}-w_{3k}\Vert ^2}, \end{aligned}$$

    which complies with (3) and is Gram-representable.

  • The inexact accelerated proximal point algorithm IAPPA1 in its form from [87, Section 5] can be written as

    $$\begin{aligned}\left\{ \begin{array}{rcl} t_{k+1}&{}=&{} \tfrac{1+\sqrt{1+4t_k^2\tfrac{\eta _{k+1}}{\eta _{k+2}}}}{2} \\ x_{k+1} &{}=&{} y_{k} - \eta _{k+1}(g_{k+1}+r_{k+1}) \\ y_{k+1} &{}=&{} x_{k+1} + \tfrac{t_k-1}{t_{k+1}}(x_{k+1}-x_k) \end{array}\right. \end{aligned}$$

    with \(t_0 = 1\), \(\{\eta _k\}_k\) a sequence of step sizes, \(y_0=x_0\in \mathbb {R}^d\) along with an inexactness criterion of the form \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(x_{k+1},g_{k+1};y_k) \leqslant \varepsilon _{k+1}\) given a nonnegative sequence \(\{\varepsilon _k\}_k\). Similarly to Güler’s method we get the recursive formulation

    $$\begin{aligned} x_{k+2} = \left( 1+\tfrac{t_k-1}{t_{k+1}}\right) x_{k+1} - \tfrac{t_k-1}{t_{k+1}}x_k - \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$

    We consider particular iterations from (2) of the form

    $$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$

    with initial iterate \(w_0=x_0\). We aim at finding parameters \(\alpha _{i,j}\), \(\beta _{i,j}\) such that we can identify \(\{w_{2k}\}_k\) with \(\{x_k\}_k\) (i.e., any sequence \(\{x_k\}_k\) can be obtained as a sequence \(\{w_{2k}\}_k\)). We set \(\alpha _{2k+2,2k+1}=\beta _{2k+2,2k+1} = \eta _{k+1}\), \(\alpha _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\alpha _{2k,i}\) for \(i=1,\ldots ,2k-1\) and \(\beta _{2k+2,i} = \tfrac{t_{k-1}-1}{t_k}\beta _{2k,i}\) for \(i\in \{0,\ldots ,2k-1\}\backslash \{2(k-1)\}\) as well as \(\beta _{2k+2,2k} = -1\) and \(\beta _{2k+2,2(k-1)} = \tfrac{t_{k-1}-1}{t_k}(1+\beta _{2k,2(k-1)})\).

    This gives

    $$\begin{aligned} w_{2(k+1)} =\,&w_{2k+1} + e_{2k} - \tfrac{t_{k-1}-1}{t_k}(e_{2(k-1)}) -\tfrac{t_{k-1}-1}{t_k}\displaystyle \sum _{i=1}^{2k-1}\alpha _{2k,i}v_{i} \\&-\tfrac{t_{k-1}-1}{t_k}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{k+1}(v_{2k+1}+e_{2k+1})\\ =&\, (1+\tfrac{t_{k-1}-1}{t_k})w_{2k} -\tfrac{t_{k-1}-1}{t_k}w_{2(k-1)} - \eta _{k+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$

    which shows that \(\{w_{2k}\}_k\) follows the same recursive equation as \(\{x_{k}\}_k\). In addition, we have \(w_0 = x_0\) and \(w_2 = x_0 - \eta _{1}(v_1+e_1)\) similar to \(x_1 = x_0 -\eta _1(g_1+r_1)\). Requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \varepsilon _{k+1}\) (with the convention \(w_{-1}=w_0\)) allows to identify the primal-dual pair \((x_{k+1},g_{k+1})\) with \((w_{2k+2},v_{2k+1})\).

    Finally, we can set \(\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \varepsilon _{k+1}\) which is Gram-representable (similar to hybrid approximate extragradient algorithm).

    Note that we can proceed similarly for IAPPA2 from [87, Section 5] with sequence \(\{a_k\}_k\) constant equal to 1, by removing the sequence \(\{r_k\}_k\) “type 2” errors).

  • The accelerated hybrid proximal extragradient algorithm (A-HPE) [68, Section 3] can be written as

    $$\begin{aligned}\left\{ \begin{array}{rcl} a_{k+1} &{}=&{} \tfrac{\eta _{k+1}+\sqrt{\eta _{k+1}^2+4\eta _{k+1}A_k}}{2}\\ A_{k+1} &{}=&{} A_k + a_{k+1}\\ \tilde{x}_k &{}=&{} y_k + \tfrac{a_{k+1}}{A_{k+1}}(x_k-y_k)\\ y_{k+1} &{}=&{} \tilde{x}_k - \eta _{k+1}(g_{k+1}+r_{k+1})\\ x_{k+1} &{}=&{} x_k - a_{k+1}g_{k+1}, \end{array}\right. \end{aligned}$$

    with \(A_0=0\), \(\{\eta _k\}_k\) a sequence of step sizes, \(y_0=x_0\in \mathbb {R}^d\) along with an inexactmess criterion of the form \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(y_{k+1},g_{k+1};\tilde{x}_k) \leqslant \tfrac{\sigma }{2}{\Vert y_{k+1}-\tilde{x}_k\Vert ^2}\) given a parameter \(\sigma \in [0,1]\). As in the previous examples, we search for a recursive equation followed by the sequence \(\{y_k\}_k\). By performing multiple substitutions, we obtain

    $$\begin{aligned} y_{k+2}=&\,\tilde{x}_{k+1} - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}x_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( x_{k}- a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\tilde{x}_k - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\tfrac{A_{k+1}}{A_{k+2}}y_{k+1} + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\left( y_{k+1}+\eta _{k+1}(g_{k+1}+r_{k+1}) \right) \right. \\&\left. - \tfrac{A_k}{a_{k+1}}y_{k} - a_{k+1}g_{k+1}\right) - \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( \tfrac{A_{k+1}}{A_{k+2}}+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2})\\ =&\,\left( 1+\tfrac{a_{k+2}A_{k}}{A_{k+2}a_{k+1}}\right) y_{k+1}-\tfrac{a_{k+2}A_k}{A_{k+2}a_{k+1}}y_k + \tfrac{a_{k+2}}{A_{k+2}}\left( \tfrac{A_{k+1}}{a_{k+1}}\eta _{k+1} - a_{k+1}\right) g_{k+1}\\&+\tfrac{a_{k+2}A_{k+1}}{A_{k+2}a_{k+1}}\eta _{k+1}r_{k+1}- \eta _{k+2}(g_{k+2}+r_{k+2}). \end{aligned}$$

    Similar to IAPPA1, we consider particular iterations from (2) of the form

    $$\begin{aligned}\left\{ \begin{array}{rcl} w_{2k+1} &{}=&{} w_{2k} - e_{2k} \\ w_{2k+2} &{}=&{} w_{2k+1} -\displaystyle \sum _{i=1}^{2k+1}\alpha _{2k+2,i}v_{i} -\sum _{i=0}^{2k+1}\beta _{2k+2,i}e_i, \end{array}\right. \end{aligned}$$

    with initial iterate \(w_0=x_0\). We aim at finding parameters \(\alpha _{i,j}\), \(\beta _{i,j}\) such that we can identify \(\{w_{2k}\}_k\) with \(\{y_k\}_k\) (i.e., any sequence \(\{y_k\}_k\) can be obtained as a sequence \(\{w_{2k}\}_k\)). We set \(\alpha _{2(k+1),2k+1}=\beta _{2(k+1),2k+1} = \eta _{k+1}\), \(\alpha _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\alpha _{2k,i}\) for \(i\in \{1,\ldots ,2(k-1)\}\) and \(\beta _{2(k+1),i} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\beta _{2k,i}\) for \(i\in \{0,\ldots ,2k-3\}\) as well as \(\beta _{2(k+1),2k} = -1\), \(\beta _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}a_{k}}(A_{k-1}\beta _{2k,2k-1} -A_k\eta _{k} )\), \(\beta _{2(k+1),2(k-1)} = \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(1+\beta _{2k,2(k-1)})\) and \(\alpha _{2(k+1),2k-1} = \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k-1}}{a_k}\alpha _{2k,2k-1}-\tfrac{A_{k}}{a_{k}}\eta _{k} + a_{k}\right) \).

    This gives

    $$\begin{aligned} w_{2(k+1)} =&\,w_{2k+1} +e_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}\\ {}&+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=1}^{2k-1}\alpha _{2k,i}v_i \\ {}&- \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\sum _{i=0}^{2k-1}\beta _{2k,i}e_i - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,w_{2k} +\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_k}e_{2(k-1)}+ \tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\ {}&+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}(w_{2k}-w_{2(k-1)}+e_{2(k-1)}) - \eta _{n+1}(v_{2k+1}+e_{2k+1})\\ =&\,\left( 1+\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}\right) w_{2k}-\tfrac{a_{k+1}A_{k-1}}{A_{k+1}a_{k}}w_{2(k-1)}+\tfrac{a_{k+1}}{A_{k+1}}\left( \tfrac{A_{k}}{a_{k}}\eta _{k} - a_{k}\right) v_{2k-1}\\&+\tfrac{a_{k+1}A_k}{A_{k+1}a_{k}}\eta _{k}e_{2k-1} - \eta _{n+1}(v_{2k+1}+e_{2k+1}), \end{aligned}$$

    which shows that \(\{w_{2k}\}_k\) follows the same recursive equation as \(\{y_{k}\}_k\). In addition, we have \(w_0 = x_0 = y_0\) and \(w_2 = y_0 - \eta _{1}(v_1+e_1)\) similar to \(x_1 = x_0 -\eta _1(g_1+r_1)\). Requiring \({{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2(k+1)},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) \leqslant \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}\) allows to identify the primal-dual pair \((y_{k+1},g_{k+1})\) with \((w_{2(k+1)},v_{2k+1})\).

    Finally, we set \(\text {EQ}_{2k+2} = {{\,\mathrm{\textrm{PD}}\,}}_{\eta _{k+1} h}(w_{2k+2},v_{2k+1};\,w_{2k+2}+\eta _{k+1}(v_{2k+1}+e_{2k+1})) - \tfrac{\sigma ^2}{2}{\Vert w_{2(k+1)}-w_{2k}\Vert ^2}\) which is Gram-representable (similar to hybrid approximate extragradient algorithm).

B Interpolation with \(\varvec{\varepsilon }\)-subdifferentials

In this section, we provide the necessary interpolation result for working with \(\varepsilon \)-subdifferentials inside performance estimation problems.

Theorem B.1

Let I be a finite set of indices and \(S=\{(w_i,v_i,h_i,\varepsilon _i)\}_{i\in I}\) with \(w_i,v_i\in \mathbb {R}^d\), \(h_i,\varepsilon _i\in \mathbb {R}\) for all \(i\in I\). There exists \(h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}(\mathbb {R}^d)\) satisfying

$$\begin{aligned} h_i = h(w_i),\text { and } v_i\in \partial _{\varepsilon _i}h(w_i) \text { for all } i\in I \end{aligned}$$
(24)

if and only if

$$\begin{aligned} \begin{aligned} h_i\geqslant h_j +{\langle v_j; w_i-w_j\rangle }-\varepsilon _j \end{aligned} \end{aligned}$$
(25)

holds for all \(i,j\in I\).

Proof

\((\Rightarrow )\) Assuming \(h\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}\) and (24), the inequalities (25) hold by definition.

\((\Leftarrow )\) Assuming (25) hold, one can perform the following construction:

$$\begin{aligned} \begin{aligned} \tilde{h}(x)=\max _i\{h_i+{\langle v_i; x-w_i\rangle }-\varepsilon _i\}, \end{aligned} \end{aligned}$$

and one can easily check that \(h=\tilde{h}\in {{\,\mathrm{\mathcal {F}_{0,\infty }}\,}}\) satisfies (24). \(\square \)

C Equivalence with Güler’s method

In this section, we show that optimized algorithm (ORI-PPA) and Güler’s second method [49, Section 6] are equivalent (i.e., produce the same iterates), in the case of exact proximal computations (i.e., \(\sigma =0\)).

We consider a constant sequence of step sizes \(\{\lambda _k\}_k\) with \(\lambda _k = \lambda >0\). In Güler’s second method, the sequence \(\{\beta _k\}_k\) is defined as \(\beta _1 = 1\) and

$$\begin{aligned} \beta _{k+1} = \tfrac{1+\sqrt{4\beta _k^2+1}}{2}.\end{aligned}$$

The sequence \(\{A_k\}_k\) generated by (ORI-PPA) satisfies \(A_0=0\) and

$$\begin{aligned}A_{k+1} = A_k+\tfrac{\lambda +\sqrt{4\lambda A_k + \lambda ^2}}{2},\quad k\geqslant 0. \end{aligned}$$

We can link together these two sequences through the following equality

$$\begin{aligned} \beta _{k} = \tfrac{A_{k}-A_{k-1}}{\lambda }, \quad k\geqslant 1. \end{aligned}$$
(26)

Let us prove it recursively. First, observe that \(\beta _1 = 1\) and \(\tfrac{A_1-A_0}{\lambda } = 1\). Then assuming that the property is true for some \(k \geqslant 1\), we have

$$\begin{aligned} \beta _{k+1}&= \tfrac{1+\sqrt{4\beta _k^2 +1}}{2}\\&= \tfrac{1+\sqrt{4\tfrac{(A_{k+1}-A_k)^2}{\lambda ^2} +1}}{2}. \end{aligned}$$

One might notice that

$$\begin{aligned}(A_{k+1}-A_k)^2 = \tfrac{2\lambda ^2+4\lambda A_k+2\lambda \sqrt{4\lambda A_k + \lambda ^2}}{4}= \lambda A_{k+1},\end{aligned}$$

giving

$$\begin{aligned} \beta _{k+1}&= \tfrac{1+\sqrt{4\tfrac{A_{k+1}}{\lambda } +1}}{2}\\&= \tfrac{\lambda + \sqrt{4\lambda A_{k+1} + \lambda ^2}}{2\lambda }\\&= \tfrac{A_{k+2}-A_{k+1}}{\lambda }, \end{aligned}$$

and we finally arrive to (26). In the case \(\sigma =0\), the iterations of (ORI-PPA) can be written as

$$\begin{aligned}\left\{ \begin{array}{ccl} y_k &{}=&{} x_k + \tfrac{\lambda }{A_{k+1}-A_k}(z_k-x_k) \\ x_{k+1} &{}=&{} \textrm{prox}_{\lambda h}(y_k)\\ z_{k+1} &{}=&{} z_k +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k}). \end{array}\right. \end{aligned}$$

Therefore, we can write

$$\begin{aligned} y_{k+1}&= x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\bigg (z_k +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k})-x_{k+1}\bigg )\\&=x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\bigg (x_k - \tfrac{A_{k+1}-A_k}{\lambda }(x_k-y_k) \\ {}&\quad +\tfrac{2(A_{k+1}-A_k)}{\lambda }(x_{k+1}-y_{k})-x_{k+1}\bigg )\\&=x_{k+1} + \tfrac{\lambda }{A_{k+2}-A_{k+1}}\left( \left( \tfrac{A_{k+1}-A_k}{\lambda }-1\right) (x_{k+1}-x_k) +\tfrac{A_{k+1}-A_k}{\lambda }(x_{k+1}-y_{k})\right) . \end{aligned}$$

Combining the last equality with (26) leads to

$$\begin{aligned} y_{k+1} = x_{k+1} + \tfrac{\beta _{k+1}-1}{\beta _{k+2}}(x_{k+1}-x_k) + \tfrac{\beta _{k+1}}{\beta _{k+2}}(x_{k+1}-y_k) \end{aligned}$$

which is exactly the update in Güler’s second method [49, Section 6] modulo a translation in the indices of the \(\{y_k\}_k\) sequence (indeed in Güler’s method \(y_1=x_0\) whereas in (ORI-PPA) \(y_0=x_0\)).

D Missing details in Theorem 1

The missing elements in the proof of Theorem 1 are presented bellow.

Proof

Let us rewrite the method in terms of a single sequence, by substitution of \(y_k\) and \(z_k\):

$$\begin{aligned} \begin{aligned} e_k&{:}{=}\tfrac{1}{\lambda _k}\left( y_{k-1} -\lambda _kg_k - x_k\right) \\ x_{k}&=\tfrac{\lambda _k}{A_{k}-A_{k-1}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k-1}(A_{i}-A_{i-1}) g_i\right) \\ {}&\quad +\left( 1-\tfrac{\lambda _k}{A_{k}-A_{k-1}}\right) x_{k-1} -\lambda _{k} (g_{k}+e_{k}),\\ \end{aligned} \end{aligned}$$
(27)

and let us state the following identity on the \(A_k\) coefficients

$$\begin{aligned} \lambda _{k+1}A_{k+1}=(A_{k+1}-A_k)^2 \text { (for}\, k\geqslant 0\text {)}. \end{aligned}$$
(28)

We prove the desired convergence result by induction. First, for \(N=1\)

$$\begin{aligned} \begin{aligned} 0 \geqslant&\, \nu _{\star ,1}\Bigg [h(u_1)-h(x_\star ) + {\langle g_1; x_\star -u_1\rangle }\Bigg ] + \nu _{1,1}\Bigg [h(u_1)-h(x_1)+{\langle g_1; x_1-u_1\rangle }\Bigg ] \\&+ \nu _1\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2} + h(x_1) - h(u_1) -{\langle g_1; x_1-u_1\rangle }\Bigg ] \end{aligned} \end{aligned}$$

with \(\nu _{\star ,1}= \tfrac{A_1-A_0}{1+\sigma }=\tfrac{A_1}{1+\sigma }\) as \(A_0=0\), \(\nu _{1,1} = \tfrac{(1-\sigma )A_1}{\sigma (1+\sigma )}\) and \(\nu _1 = \tfrac{A_1}{\sigma (1+\sigma )}\). This gives

$$\begin{aligned} 0 \geqslant&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{A_1}{1+\sigma }{\langle g_1; x_\star -x_1\rangle } + \tfrac{A_1}{\sigma (1+\sigma )}\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\Bigg ]\\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{A_1}{1+\sigma }{\langle g_1; x_\star -x_0 + \lambda _1(g_1+e_1)\rangle }\\ {}&+ \tfrac{A_1}{\sigma (1+\sigma )}\Bigg [\tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\Bigg ]\\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{2}{\langle 2\tfrac{A_1}{1+\sigma }g_1; x_\star -x_0\rangle } + {\langle \tfrac{A_1}{1+\sigma }g_1; \lambda _1(g_1+e_1)\rangle }\\ {}&+ \tfrac{A_1}{\sigma (1+\sigma )}\left[ \tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\right] \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} -{\Vert \tfrac{A_1}{1+\sigma }g_1\Vert ^2} \\ {}&+{\langle \tfrac{A_1}{1+\sigma }g_1; \lambda _1(g_1+e_1)\rangle }+ \tfrac{A_1}{\sigma (1+\sigma )}\left[ \tfrac{\lambda _1}{2}{\Vert e_1\Vert ^2}-\tfrac{\lambda _1\sigma ^2}{2}{\Vert e_1+g_1\Vert ^2}\right] \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{1+\sigma }{\langle g_1; e_1\rangle } + \tfrac{A_1}{1+\sigma }\left( -\tfrac{A_1}{1+\sigma } + \lambda _1 - \tfrac{\lambda _1\sigma }{2} \right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2} + \tfrac{A_1}{1+\sigma }\left( -\tfrac{A_1}{1+\sigma } + \lambda _1 - \tfrac{\lambda _1\sigma }{2} -\tfrac{\lambda _1(1-\sigma )\sigma }{2(1+\sigma )}\right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} \\&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2} + \tfrac{A_1}{1+\sigma }\left( \tfrac{\lambda _1-A_1}{1+\sigma }\right) {\Vert g_1\Vert ^2} \\ =&\,\tfrac{A_1}{1+\sigma }(h(x_1)-h_\star ) + \tfrac{1}{4}{\Vert x_\star -x_0+2\tfrac{A_1}{1+\sigma }g_1\Vert ^2} -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2}\\ {}&+ \tfrac{A_1\lambda _1(1-\sigma )}{2\sigma }{\Vert e_1+\tfrac{\sigma }{1+\sigma }g_1\Vert ^2}, \end{aligned}$$

where we used in the last line that \(A_1 = \lambda _1\).

Now, assuming the weighted sum can be reformulated as the desired inequality for \(N=k\), that is:

$$\begin{aligned} 0\geqslant&\, \tfrac{A_k}{1+\sigma }(h(x_k)-h_\star ) -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1})g_i\Vert ^2} \\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}, \end{aligned}$$

let us prove it also holds true for \(N=k+1\). Noticing that the weighted sum for \(k+1\) is exactly the weighted sum for k (which can be reformulated as desired, through our induction hypothesis) with 4 additional inequalities, we get the following valid inequality

$$\begin{aligned} 0\geqslant&\,\tfrac{A_k}{1+\sigma }(h(x_k)-h_\star ) -\tfrac{1}{4}{\Vert x_\star -x_0\Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1})g_i\Vert ^2} \\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2} \\&+ \tfrac{A_{k+1}-A_k}{1+\sigma }\Bigg [h(u_{k+1})-h_\star + {\langle g_{k+1}; x_\star -u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{(1-\sigma )A_{k+1}}{(1+\sigma )\sigma }\Bigg [h(u_{k+1})-h(x_{k+1}) + {\langle g_{k+1}; x_{k+1}-u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{A_k}{1+\sigma }\Bigg [h(u_{k+1})-h(x_{k}) + {\langle g_{k+1}; x_k-u_{k+1}\rangle }\Bigg ]\\&+ \tfrac{A_{k+1}}{(1+\sigma )\sigma }\left[ \tfrac{\lambda _{k+1}}{2}{\Vert e_{k+1}\Vert ^2} - \tfrac{\lambda _{k+1}\sigma ^2}{2}{\Vert e_{k+1}+g_{k+1}\Vert ^2} + h(x_{k+1}) - h(u_{k+1})\right. \\&\left. -{\langle g_{k+1}; x_{k+1}-u_{k+1}\rangle }\right] . \end{aligned}$$

By grouping all function values we get the following simplification:

$$\begin{aligned}&\left[ \tfrac{A_k}{1+\sigma }-\tfrac{A_k}{1+\sigma }\right] h(x_k)+\tfrac{A_{k+1}}{1+\sigma } \left[ \tfrac{1}{\sigma }-\tfrac{1-\sigma }{\sigma }\right] (h(x_{k+1})-h_\star )\\&\quad +\tfrac{1}{1+\sigma }\left[ A_{k+1}-A_k + \tfrac{1-\sigma }{\sigma }A_{k+1} + A_k - \tfrac{1}{\sigma }A_{k+1}\right] h(u_{k+1}) =\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ), \end{aligned}$$

where \(h(x_k)\) and \(h(u_{k+1})\) cancelled out. The remaining inequality is therefore

$$\begin{aligned} 0\geqslant{} & {} \,\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\nonumber \\{} & {} + \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \nonumber \\ {}{} & {} + \tfrac{1}{1+\sigma }{\langle g_{k+1}; (A_{k+1}-A_k)(x_\star -u_{k+1}) - A_{k+1}(x_{k+1}-u_{k+1})+A_k(x_k-u_{k+1})\rangle }\nonumber \\ ={} & {} \, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\nonumber \\{} & {} + \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \nonumber \\ {}{} & {} + \tfrac{1}{1+\sigma }{\langle g_{k+1}; (A_{k+1}-A_k)x_\star - A_{k+1}x_{k+1} + A_kx_k\rangle }. \end{aligned}$$
(29)

Then, by using (28), one can observe that

$$\begin{aligned} A_{k+1}x_{k+1} =&\, \tfrac{A_{k+1}\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) +\left( A_{k+1}-\tfrac{A_{k+1}\lambda _{k+1}}{A_{k+1}-A_{k}}\right) x_{k} \\&-A_{k+1}\lambda _{k+1} (g_{k+1}+e_{k+1})\\ =&\, (A_{k+1}-A_k)\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) + A_kx_k\\&\quad -A_{k+1}\lambda _{k+1} (g_{k+1}+e_{k+1}), \end{aligned}$$

and by re-injecting this inside the last line of (29), we get

$$\begin{aligned} 0\geqslant&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^k (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \\ {}&+ \tfrac{1}{1+\sigma }{\langle (A_{k+1}-A_k)g_{k+1}; x_\star - x_0 +\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i \rangle }\\&+ \tfrac{A_{k+1}\lambda _{k+1}}{1+\sigma }{\langle g_{k+1}; (g_{k+1}+e_{k+1})\rangle }. \end{aligned}$$

We can then proceed in a similar manner as in the case \(k=1\) for factorizing the quadratic terms,

$$\begin{aligned} 0\geqslant&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}}{2(1+\sigma )\sigma }\Bigg [{\Vert e_{k+1}\Vert ^2} - \sigma ^2{\Vert e_{k+1}+g_{k+1}\Vert ^2}\Bigg ] \\ {}&-\tfrac{(A_{k+1}-A_k)^2}{(1+\sigma )^2}{\Vert g_{k+1}\Vert ^2} + \tfrac{A_{k+1}\lambda _{k+1}}{1+\sigma }{\langle g_{k+1}; (g_{k+1}+e_{k+1})\rangle }\\ =&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^k A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+ \tfrac{A_{k+1}\lambda _{k+1}(1-\sigma )}{2\sigma }{\Vert e_{k+1}+\tfrac{\sigma }{1+\sigma }g_{k+1}\Vert ^2} \\ {}&+\left[ \tfrac{A_{k+1}\lambda _{k+1}}{(1+\sigma )}-\tfrac{A_{k+1}\lambda _{k+1}\sigma }{2(1+\sigma )}-\tfrac{(A_{k+1}-A_k)^2}{(1+\sigma )^2} - \tfrac{A_{k+1}\lambda _{k+1}\sigma (1-\sigma )}{2(1+\sigma )^2}\right] {\Vert g_{k+1}\Vert ^2} \\ =&\, \tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^{k+1} A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}+\tfrac{A_{k+1}\lambda _{k+1}}{(1+\sigma )}\Bigg [1-\tfrac{\sigma }{2}-\tfrac{1}{(1+\sigma )} - \tfrac{\sigma (1-\sigma )}{2(1+\sigma )}\Bigg ]{\Vert g_{k+1}\Vert ^2}\\ =&\tfrac{A_{k+1}}{1+\sigma }(h(x_{k+1})-h_\star ) -\tfrac{1}{4}{\Vert x_0-x_\star \Vert ^2} + \tfrac{1}{4}{\Vert x_\star -x_0+\tfrac{2}{1+\sigma }\sum _{i=1}^{k+1} (A_i-A_{i-1}) g_i\Vert ^2}\\&+ \tfrac{(1-\sigma )}{2\sigma }\sum _{i=1}^{k+1} A_i\lambda _i{\Vert e_i+\tfrac{\sigma }{1+\sigma }g_i\Vert ^2}, \end{aligned}$$

since \(1-\tfrac{\sigma }{2}-\tfrac{1}{1+\sigma } - \tfrac{\sigma (1-\sigma )}{2(1+\sigma )} = 0\) and this concludes the proof. \(\square \)

E Tightness of Theorem 1

Proof

The guarantee for (ORI-PPA) provided by Theorem 1 is non-improvable. That is, for all \(\{\lambda _k\}_k\) with \(\lambda _k>0\), \(\sigma \in [0,1]\), \(d\in \mathbb {N}\), \(x_0\in \mathbb {R}^d\), and \(N\in \mathbb {N}\), there exists \(f\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)\) such that this bound is achieved with equality. For proving this statement, it is sufficient to exhibit a one-dimensional function for which the bound is attained, which is what we do below. The bound is attained on the one-dimensional (constrained on the nonnegative orthant) linear minimization problem

$$\begin{aligned} \min _x\, \{f(x)\triangleq c\, x + i_{\mathbb {R}_+}(x)\}, \end{aligned}$$
(30)

with an appropriate choice of \(c> 0\), where \(i_{\mathbb {R}_+}\) denotes the convex indicator function of \(\mathbb {R}_+\). Indeed, one can check that the relative error criterion

$$\begin{aligned} \exists u_{k}\in \mathbb {R}_+,\;\tfrac{\lambda _k}{2}{\Vert e_k\Vert ^2} + f(x_k)-f(u_k) -{\langle g_k; x_k-u_k\rangle }\leqslant \tfrac{\lambda _k\sigma ^2}{2}{\Vert e_k+g_{k}\Vert ^2} \end{aligned}$$

is satisfied with equality when picking \(g_{k}=c\) (\(g_k\) is thus a subgradient at \(x_k\geqslant 0\)), \(u_k=x_k\), and \(e_{k}=-\tfrac{c\sigma }{1+\sigma }\); and hence \(x_{k}=y_{k-1}-\tfrac{c\lambda _k}{1+\sigma }\). The argument is then as follows: if for some \(x_0>0\) and \(0\leqslant h\leqslant x_0/c\) we manage to show that \(x_N=x_0-c h\), then \(f(x_N)-f(x_\star )=c(x_0-c h)\) and hence the value of c producing the worst possible (maximal) value of \(f(x_N)\) is \(c=\tfrac{x_0}{2h}\). In that case, the resulting value is \(f(x_N)-f(x_\star )=\tfrac{x_0^2}{4h}\). Therefore, in order to prove that the guarantee from Theorem 1 cannot be improved, we show that \(x_N=x_0-\tfrac{A_N}{1+\sigma }c\) on the linear problem (30). It is easy to show that \(x_1=x_0-\tfrac{A_1}{1+\sigma }c\) using \(A_1=\lambda _1\). The argument follows by induction: assuming \(x_{k}=x_0-\tfrac{A_k}{1+\sigma }c\), one can compute

$$\begin{aligned} x_{k+1}= & {} \, \tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2}{1+\sigma }\sum _{i=1}^{k}(A_{i}-A_{i-1}) g_i\right) +\left( 1-\tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\right) x_{k} \\ {}{} & {} -\lambda _{k+1} (g_{k+1}+e_{k+1})\\= & {} \, \tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\left( x_0-\tfrac{2c}{1+\sigma }A_{k}\right) +\left( 1-\tfrac{\lambda _{k+1}}{A_{k+1}-A_{k}}\right) \left( x_0-\tfrac{A_k}{1+\sigma }c\right) -\lambda _{k+1} \tfrac{c}{1+\sigma }\\= & {} \,x_0-\tfrac{c}{1+\sigma }\tfrac{2\lambda _{k+1}A_k + (A_{k+1}-A_k)A_k - \lambda _{k+1}A_k + \lambda _{k+1}(A_{k+1}-A_k)}{A_{k+1}-A_k}\\= & {} \,x_0-\tfrac{c}{1+\sigma }\tfrac{ (A_{k+1}-A_k)A_k + \lambda _{k+1}A_{k+1}}{A_{k+1}-A_k}\\= & {} \,x_0-\tfrac{c}{1+\sigma }A_{k+1}, \end{aligned}$$

where the second equality follows from simple substitutions, and the last equalities follow from basic algebra and \(\lambda _{k+1}A_{k+1} = (A_{k+1}-A_k)^2\). The desired statement is proved by picking \(c=\tfrac{(1+\sigma )x_0}{2A_N}\), reaching \(f(x_N)-f(x_\star ) = \tfrac{(1+\sigma )x_0^2}{A_N}\). \(\square \)

F Tightness of Theorem 2

Proof

We show that the guarantee provided in Theorem 2 is non-improvable. That is, for all \(\mu \geqslant 0\), \(\{\lambda _k\}_k\) with \(\lambda _k\geqslant 0\), \(\sigma \in [0,1]\), \(d\in \mathbb {N}\), \(w_0\in \mathbb {R}^d\), and \(N\in \mathbb {N}\), there exists \(h\in \mathcal {F}_{\mu , \infty }(\mathbb {R}^d)\) such that this bound is achieved with equality. Indeed, the bound is attained on the simple quadratic minimization problem

$$\begin{aligned} \min _x \{h(x) \triangleq \tfrac{\mu }{2}{\Vert x\Vert ^2}\}. \end{aligned}$$
(31)

We can check that the relative error criterion

$$\begin{aligned} \tfrac{\lambda _k}{2}{\Vert e_k\Vert ^2} \leqslant \tfrac{\sigma ^2\lambda _k}{2}{\Vert e_k+v_k\Vert ^2}, \end{aligned}$$

is satisfied with equality when picking \(v_k = \nabla h(w_{k+1}) = \mu w_{k+1}\) and \(e_k = -\tfrac{\sigma }{1+\sigma }v_k\). Under these choices, one can write

$$\begin{aligned}w_{k+1} = w_k -\tfrac{\lambda _{k+1}\mu }{1+\sigma }w_{k+1}, \end{aligned}$$

which leads to \(w_{k+1} = \tfrac{1+\sigma }{1+\sigma + \lambda _{k+1}}w_k\), hence

$$\begin{aligned} w_{N} = \prod _{i=1}^{N}\tfrac{1+\sigma }{1+\sigma +\lambda _{i}\mu }w_0,\end{aligned}$$

and the desired result follows. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barré, M., Taylor, A.B. & Bach, F. Principled analyses and design of first-order methods with inexact proximal operators. Math. Program. 201, 185–230 (2023). https://doi.org/10.1007/s10107-022-01903-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-022-01903-7

Mathematics Subject Classification

Navigation