Abstract
We consider the minimization of a sum \({\sum_{i=1}^mf_i(x)}\) consisting of a large number of convex component functions f i . For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in a few contexts, including signal processing and inference/machine learning.
Similar content being viewed by others
References
Blatt D., Hero A.O., Gauchman H.: A convergent incremental gradient method with a constant step size. SIAM J. Optim. 18, 29–51 (2008)
Bauschke H.H., Combettes P.L., Luke D.R.: Hybrid projection-reflection method for phase retrieval. J. Opt. Soc. Am. 20, 1025–1034 (2003)
Bauschke H.H., Combettes P.L., Kruk S.G.: Extrapolation algorithm for affine-convex feasibility problems. Numer. Algorithms 41, 239–274 (2006)
Ben-Tal A., Margalit T., Nemirovski A.: The ordered subsets mirror descent optimization method and its use for the positron emission tomography reconstruction. In: Butnariu, D., Censor, Y., Reich, S. (eds) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)
Bertsekas D.P., Nedić C.A., Ozdaglar A.E.: Convex Analysis and Optimization. Athena Scientific, Belmont, MA (2003)
Bauschke, H.H. : Projection algorithms: results and open problems. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)
Bertsekas D.P., Tsitsiklis J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont, MA (1996)
Bertsekas D.P., Tsitsiklis J.N.: Gradient convergence in gradient methods. SIAM J. Optim. 10, 627–642 (2000)
Beck A., Teboulle M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Beck, A., Teboulle, M.: Gradient-based algorithms with applications to signal-recovery problems. In: Eldar, Y., Palomar, D. (eds.) Convex Optimization in Signal Processing and Communications, pp. 42–88. Cambridge University Press, Cambridge (2010)
Bertsekas, D.P., Yu, H.: A unifying polyhedral approximation framework for convex optimization. In: Laboratory for Information and Decision Systems Report LIDS-P-2820. MIT (2009); SIAM J. Optim. (to appear)
Bertsekas D.P.: Incremental least squares methods and the extended Kalman filter. SIAM J. Optim 6, 807–822 (1996)
Bertsekas D.P.: Hybrid incremental gradient method for least squares. SIAM J. Optim. 7, 913–926 (1997)
Bertsekas, D.P.: Nonlinear Programming. 2nd edn. Athena Scientific, Belmont, MA (1999)
Bertsekas D.P.: Convex Optimization Theory. Athena Scientific, Belmont, MA (2009)
Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. In: Labaratory for Information and Decision Systems Report LIDS-P-2848. MIT (2010)
Bioucas-Dias J., Figueiredo M.A.T.: A new TwIST: two-step iterative shrinkage thresholding algorithms for image restoration. IEEE Trans. Image Process. 16, 2992–3004 (2007)
Chambolle A., DeVore R.A., Lee N.Y., Lucier B.J.: Nonlinear wavelet image processing: variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Process. 7, 319–335 (1998)
Cegielski A., Suchocka A.: Relaxed alternating projection methods. SIAM J. Optim. 19, 1093–1106 (2008)
Combettes P.L., Wajs V.R.: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 4, 1168–1200 (2005)
Daubechies I., Defrise M., Mol C.D.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57, 1413–1457 (2004)
Elad M., Matalon B., Zibulevsky M.: Coordinate and subspace optimization methods for linear least squares with non-quadratic regularization. J. Appl. Comput. Harmon. Anal. 23, 346–367 (2007)
Figueiredo M.A.T., Nowak R.D.: An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process. 12, 906–916 (2003)
Figueiredo M.A.T., Nowak R.D., Wright S.J.: Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process. 1, 586–597 (2007)
Gubin L.G., Polyak B.T., Raik E.V.: The method of projection for finding the common point in convex sets U.S.S.R. Comput. Math. Phys. 7, 1–24 (1967) (English Translation)
Grippo L.: A class of unconstrained minimization methods for neural network training. Optim. Methods Softw. 4, 135–150 (1994)
Grippo L.: Convergent on-line algorithms for supervised learning in neural networks. IEEE Trans. Neural Netw. 11, 1284–1299 (2000)
Helou E.S., De Pierro A.R.: Incremental subgradients for constrained convex optimization: a unified framework and new methods. SIAM J. Optim. 20, 1547–1572 (2009)
Johansson B., Rabi M., Johansson M.: A randomized incremental subgradient method for distributed optimization in networked systems. SIAM J. Optim. 20, 1157–1170 (2009)
Kibardin V.M.: Decomposition into functions in the minimization problem. Autom. Remote Control 40, 1311–1323 (1980)
Kiwiel K.C.: Convergence of approximate and incremental subgradient methods for convex optimization. SIAM J. Optim. 14, 807–840 (2004)
Lions P.L., Mercier B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16, 964–979 (1979)
Litvakov B.M.: On an iteration method in the problem of approximating a function from a finite number of observations. Avtom. Telemech. 4, 104–113 (1966)
Luo Z.Q., Tseng P.: Analysis of an approximate gradient projection method with applications to the backpropagation algorithm. Optim. Methods Softw. 4, 85–101 (1994)
Luo Z.Q.: On the convergence of the LMS algorithm with adaptive learning rate for linear feedforward networks. Neural Comput. 3, 226–245 (1991)
Mangasarian O.L., Solodov M.V.: Serial and parallel backpropagation convergence via nonmonotone perturbed minimization. Optim. Methods Softw. 4, 103–116 (1994)
Martinet B.: Regularisation d’in équations variationelles par approximations successives. Revue Fran. d’Automatique et Infomatique Rech. Op’ erationelle 4, 154–159 (1970)
Nedić A., Bertsekas D.P., Borkar V.: Distributed asynchronous incremental subgradient methods. In: Butnariu, D., Censor, Y., Reich, S. (eds) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications, Elsevier, Amsterdam, Netherlands (2001)
Nedić A., Bertsekas D.P.: Convergence rate of the incremental subgradient algorithm. In: Uryasev, S., Pardalos, P.M. (eds) Stochastic Optimization: Algorithms and Applications, Kluwer Academic Publishers, Dordrecht (2000)
Nedić A., Bertsekas D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12, 109–138 (2001)
Nedić A., Bertsekas D.P.: The effect of deterministic noise in subgradient methods. Math. Program. Ser. A 125, 75–99 (2010)
Nedić, A.: Random projection algorithms for convex minimization problems. University of Illinois Report (2010); Math. Program. J. (to appear)
Neveu J.: Discrete Parameter Martingales. North-Holland, Amsterdam, The Netherlands (1975)
Predd J.B., Kulkarni S.R., Poor H.V.: A collaborative training algorithm for distributed learning. IEEE Trans. Inf. Theory 55, 1856–1871 (2009)
Passty G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72, 383–390 (1979)
Ram S.S., Nedić A., Veeravalli V.V.: Incremental stochastic subgradient algorithms for convex optimization. SIAM J. Optim. 20, 691–717 (2009)
Ram S.S., Nedić A., Veeravalli V.V.: Distributed stochastic subgradient projection algorithms for convex optimization. J. Optim. Theory Appl. 147, 516–545 (2010)
Rabbat, M.G., Nowak, R.D.: Distributed optimization in sensor networks. In: Proceedings of Information Processing Sensor Networks, pp. 20–27. Berkeley, CA (2004)
Rabbat M.G., Nowak R.D.: Quantized incremental algorithms for distributed optimization. IEEE J Sel. Areas Commun 23, 798–808 (2005)
Rockafellar R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14, 877–898 (1976)
Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter A.: Pegasos: primal estimated subgradient solver for SVM. In: ICML 07 pp. 807–814. New York, N.Y. (2007)
Solodov M.V., Zavriev S.K.: Error stability properties of generalized gradient-type algorithms. J. Opt. Theory Appl. 98, 663–680 (1998)
Solodov M.V.: Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11, 28–35 (1998)
Tseng P.: An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8, 506–531 (1998)
Vonesch, C., Unser, M.: Fast iterative thresholding algorithm for wavelet-regularized deconvolution. In: Proceedings of the SPIE Optics and Photonics 2007 Conference on Mathematical Methods: Wavelet XII, vol. 6701, pp. 1–5. San Diego, CA (2007)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2008), pp. 3373–3376 (2008)
Widrow, B., Hoff, M.E.: Adaptive switching circuits. Institute of Radio Engineers, Western Electronic Show and Convention, Convention Record, Part 4, 96–104 (1960)
Author information
Authors and Affiliations
Corresponding author
Additional information
Laboratory for Information and Decision Systems Report LIDS-P-2847, August 2010 (revised March 2011); to appear in Math. Programming Journal, 2011. Research supported by AFOSR Grant FA9550-10-1-0412. Many thanks are due to Huizhen (Janey) Yu for extensive helpful discussions and suggestions.
Rights and permissions
About this article
Cite this article
Bertsekas, D.P. Incremental proximal methods for large scale convex optimization. Math. Program. 129, 163–195 (2011). https://doi.org/10.1007/s10107-011-0472-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-011-0472-0