Skip to main content
Log in

Incremental proximal methods for large scale convex optimization

  • Full Length Paper
  • Series B
  • Published:
Mathematical Programming Submit manuscript

Abstract

We consider the minimization of a sum \({\sum_{i=1}^mf_i(x)}\) consisting of a large number of convex component functions f i . For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in a few contexts, including signal processing and inference/machine learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Blatt D., Hero A.O., Gauchman H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18, 29–51 (2008)

    MathSciNet  Google Scholar 

  2. Bauschke H.H., Combettes P.L., Luke D.R.: Hybrid projection-reflection method for phase retrieval. J. Opt. Soc. Am. 20, 1025–1034 (2003)

    Article  Google Scholar 

  3. Bauschke H.H., Combettes P.L., Kruk S.G.: Extrapolation algorithm for affine-convex feasibility problems. Numer. Algorithms 41, 239–274 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ben-Tal A., Margalit T., Nemirovski A.: The ordered subsets mirror descent optimization method and its use for the positron emission tomography reconstruction. In: Butnariu, D., Censor, Y., Reich, S. (eds) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)

    Google Scholar 

  5. Bertsekas D.P., Nedić C.A., Ozdaglar A.E.: Convex Analysis and Optimization. Athena Scientific, Belmont, MA (2003)

    MATH  Google Scholar 

  6. Bauschke, H.H. : Projection algorithms: results and open problems. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)

    Google Scholar 

  7. Bertsekas D.P., Tsitsiklis J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont, MA (1996)

    MATH  Google Scholar 

  8. Bertsekas D.P., Tsitsiklis J.N.: Gradient convergence in gradient methods. SIAM J. Optim. 10, 627–642 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  9. Beck A., Teboulle M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal-recovery problems. In: Eldar, Y., Palomar, D. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2010)

  11. Bertsekas, D.P., Yu, H.: A unifying polyhedral approximation framework for convex optimization. In: Laboratory for Information and Decision Systems Report LIDS-P-2820. MIT (2009); SIAM J. Optim. (to appear)

  12. Bertsekas D.P.: Incremental least squares methods and the extended Kalman filter. SIAM J. Optim 6, 807–822 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bertsekas D.P.: Hybrid incremental gradient method for least squares. SIAM J. Optim. 7, 913–926 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  14. Bertsekas, D.P.: Nonlinear Programming. 2nd edn. Athena Scientific, Belmont, MA (1999)

    MATH  Google Scholar 

  15. Bertsekas D.P.: Convex Optimization Theory. Athena Scientific, Belmont, MA (2009)

    Google Scholar 

  16. Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Labaratory for Information and Decision Systems Report LIDS-P-2848. MIT (2010)

  17. Bioucas-Dias J., Figueiredo M.A.T.: A new TwIST: two-step iterative shrinkage thresholding algorithms for image restoration. IEEE Trans. Image Process. 16, 2992–3004 (2007)

    Article  MathSciNet  Google Scholar 

  18. Chambolle A., DeVore R.A., Lee N.Y., Lucier B.J.: Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7, 319–335 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cegielski A., Suchocka A.: Relaxed alternating projection methods. SIAM J. Optim. 19, 1093–1106 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Combettes P.L., Wajs V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Daubechies I., Defrise M., Mol C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)

    Article  MATH  Google Scholar 

  22. Elad M., Matalon B., Zibulevsky M.: Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. J. Appl. Comput. Harmon. Anal. 23, 346–367 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  23. Figueiredo M.A.T., Nowak R.D.: An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12, 906–916 (2003)

    Article  MathSciNet  Google Scholar 

  24. Figueiredo M.A.T., Nowak R.D., Wright S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–597 (2007)

    Article  Google Scholar 

  25. Gubin L.G., Polyak B.T., Raik E.V.: The method of projection for finding the common point in convex sets U.S.S.R. Comput. Math. Phys. 7, 1–24 (1967) (English Translation)

    Article  Google Scholar 

  26. Grippo L.: A class of unconstrained minimization methods for neural network training. Optim. Methods Softw. 4, 135–150 (1994)

    Article  Google Scholar 

  27. Grippo L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11, 1284–1299 (2000)

    Article  Google Scholar 

  28. Helou E.S., De Pierro A.R.: Incremental subgradients for constrained convex optimization: a unified framework and new methods. SIAM J. Optim. 20, 1547–1572 (2009)

    MathSciNet  MATH  Google Scholar 

  29. Johansson B., Rabi M., Johansson M.: A randomized incremental subgradient method for distributed optimization in networked systems. SIAM J. Optim. 20, 1157–1170 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  30. Kibardin V.M.: Decomposition into functions in the minimization problem. Autom. Remote Control 40, 1311–1323 (1980)

    Google Scholar 

  31. Kiwiel K.C.: Convergence of approximate and incremental subgradient methods for convex optimization. SIAM J. Optim. 14, 807–840 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lions P.L., Mercier B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  33. Litvakov B.M.: On an iteration method in the problem of approximating a function from a finite number of observations. Avtom. Telemech. 4, 104–113 (1966)

    MathSciNet  Google Scholar 

  34. Luo Z.Q., Tseng P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4, 85–101 (1994)

    Article  Google Scholar 

  35. Luo Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3, 226–245 (1991)

    Article  Google Scholar 

  36. Mangasarian O.L., Solodov M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)

    Article  Google Scholar 

  37. Martinet B.: Regularisation d’in équations variationelles par approximations successives. Revue Fran. d’Automatique et Infomatique Rech. Op’ erationelle 4, 154–159 (1970)

    MathSciNet  Google Scholar 

  38. Nedić A., Bertsekas D.P., Borkar V.: Distributed asynchronous incremental subgradient methods. In: Butnariu, D., Censor, Y., Reich, S. (eds) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)

    Google Scholar 

  39. Nedić A., Bertsekas D.P.: Convergence rate of the incremental subgradient algorithm. In: Uryasev, S., Pardalos, P.M. (eds) Stochastic Optimization: Algorithms and Applications, Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  40. Nedić A., Bertsekas D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12, 109–138 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  41. Nedić A., Bertsekas D.P.: The effect of deterministic noise in subgradient methods. Math. Program. Ser. A 125, 75–99 (2010)

    Article  MATH  Google Scholar 

  42. Nedić, A.: Random projection algorithms for convex minimization problems. University of Illinois Report (2010); Math. Program. J. (to appear)

  43. Neveu J.: Discrete Parameter Martingales. North-Holland, Amsterdam, The Netherlands (1975)

    MATH  Google Scholar 

  44. Predd J.B., Kulkarni S.R., Poor H.V.: A collaborative training algorithm for distributed learning. IEEE Trans. Inf. Theory 55, 1856–1871 (2009)

    Article  MathSciNet  Google Scholar 

  45. Passty G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  46. Ram S.S., Nedić A., Veeravalli V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20, 691–717 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  47. Ram S.S., Nedić A., Veeravalli V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  48. Rabbat, M.G., Nowak, R.D.: Distributed optimization in sensor networks. In: Proceedings of Information Processing Sensor Networks, pp. 20–27. Berkeley, CA (2004)

  49. Rabbat M.G., Nowak R.D.: Quantized incremental algorithms for distributed optimization. IEEE J Sel. Areas Commun 23, 798–808 (2005)

    Article  Google Scholar 

  50. Rockafellar R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  51. Rockafellar R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  52. Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter A.: Pegasos: primal estimated subgradient solver for SVM. In: ICML 07 pp. 807–814. New York, N.Y. (2007)

  53. Solodov M.V., Zavriev S.K.: Error stability properties of generalized gradient-type algorithms. J. Opt. Theory Appl. 98, 663–680 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  54. Solodov M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11, 28–35 (1998)

    Article  MathSciNet  Google Scholar 

  55. Tseng P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8, 506–531 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  56. Vonesch, C., Unser, M.: Fast iterative thresholding algorithm for wavelet-regularized deconvolution. In: Proceedings of the SPIE Optics and Photonics 2007 Conference on Mathematical Methods: Wavelet XII, vol. 6701, pp. 1–5. San Diego, CA (2007)

  57. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 3373–3376 (2008)

  58. Widrow, B., Hoff, M.E.: Adaptive switching circuits. Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, Part 4, 96–104 (1960)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitri P. Bertsekas.

Additional information

Laboratory for Information and Decision Systems Report LIDS-P-2847, August 2010 (revised March 2011); to appear in Math. Programming Journal, 2011. Research supported by AFOSR Grant FA9550-10-1-0412. Many thanks are due to Huizhen (Janey) Yu for extensive helpful discussions and suggestions.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertsekas, D.P. Incremental proximal methods for large scale convex optimization. Math. Program. 129, 163–195 (2011). https://doi.org/10.1007/s10107-011-0472-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-011-0472-0

Keywords

Mathematics Subject Classification (2000)

Navigation