Practical inexact proximal quasi-Newton method with global complexity analysis

Scheinberg, Katya; Tang, Xiaocheng

doi:10.1007/s10107-016-0997-3

Practical inexact proximal quasi-Newton method with global complexity analysis

Full Length Paper
Series A
Published: 31 March 2016

Volume 160, pages 495–529, (2016)
Cite this article

Mathematical Programming Submit manuscript

Katya Scheinberg¹ &
Xiaocheng Tang¹

1431 Accesses
31 Citations
Explore all metrics

Abstract

Recently several methods were proposed for sparse optimization which make careful use of second-order information (Hsieh et al. in Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS, 2011; Yuan et al. in An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City, 2011; Olsen et al. in Newton-like methods for sparse inverse covariance estimation. In: NIPS, 2012; Byrd et al. in A family of second-order methods for convex l1-regularized optimization. Technical report, 2012) to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Newton-Type Methods with the Proximal Gradient Step for Sparse Estimation

Article 20 March 2024

Scaled Proximal Gradient Methods for Sparse Optimization Problems

Article 22 November 2023

Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates

Article 20 November 2017

References

Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Article MathSciNet MATH Google Scholar
Becker, S., Fadili, J.: A Quasi-Newton Proximal Splitting Method. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2618–2626. Curran Associates, Inc., Red Hook (2012)
Google Scholar
Byrd, R., Chin, G., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex l1-regularized optimization. Technical report (2012)
Byrd, R., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex l-1 regularized optimization. Technical report (2013)
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)
Article MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I.M., Toint, P.L.: Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization. Optim. Methods Softw. 27, 197–219 (2012)
Article MathSciNet MATH Google Scholar
Donoho, D.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41, 613–627 (1995)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostat. Oxf. Engl. 9, 432–441 (2008)
Article MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Hsieh, C.-J., Sustik, M., Dhilon, I., Ravikumar, P.: Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS (2011)
Jiang, K.F., Sun, D.F., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP. SIAM J. Optim. 3, 1042–1064 (2012)
Article MathSciNet MATH Google Scholar
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal newton-type methods for convex optimization. In: NIPS (2012)
Lewis, A.S., Wright, S.J.: Identifying activity. SIAM J. Optim. 21, 597–614 (2011)
Article MathSciNet MATH Google Scholar
Li, L., Toh, K.-C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. 2, 291–315 (2010)
Article MathSciNet MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE report (2007)
Nesterov, Y.E.: Introductory lectures on convex optimization: a basic course 87, xviii+236 (2004)
MathSciNet Google Scholar
Nesterov, Y.E., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.J.: Newton-like methods for sparse inverse covariance estimation. In: NIPS (2012)
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5, 143–169 (2013)
Article MathSciNet MATH Google Scholar
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program. 144(1–2), 1–38 (2014)
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: NIPS (2010)
Scheinberg, K., Rish, I.: SINCO: a greedy coordinate ascent method for sparse inverse covariance selection problem, tech. rep. (2009)
Schmidt, M., Kim, D., Sra, S.: Projected newton-type methods in machine learning. Optim. Mach. Learn., 305 (2012)
Schmidt, M., Roux, N. L., Bach, F.: Supplementary material for the paper convergence rates of inexact proximal-gradient methods for convex optimization. In: Proceedings of the 25th annual conference on neural information processing systems (NIPS) (2011)
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1 regularized loss minimization. In: ICML, pp. 929–936 (2009)
Tang, X.: Optimization in machine learning, Ph.D. thesis, Lehigh University (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Article MathSciNet MATH Google Scholar
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Article MathSciNet Google Scholar
Wytock, M., Kolter, Z.: Sparse gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th international conference on machine learning (ICML-13), vol. 28, JMLR Workshop and Conference Proceedings, pp. 1265–1273 (May 2013)
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. JMLR 11, 3183–3234 (2010)
MathSciNet MATH Google Scholar
Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Lehigh University, Harold S. Mohler Laboratory, 200 West Packer Avenue, Bethlehem, PA, 18015-1582, USA
Katya Scheinberg & Xiaocheng Tang

Authors

Katya Scheinberg
View author publications
You can also search for this author in PubMed Google Scholar
Xiaocheng Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katya Scheinberg.

Additional information

The work of Katya Scheinberg is partially supported by NSF Grants DMS 10-16571, DMS 13-19356, AFOSR Grant FA9550-11-1-0239, and DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR. The work of Xiaocheng Tang is partially supported by DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR.

Appendix

Proof of Lemma 2.

Proof

Let $p_{\phi }(v)$ denote $p_{H,\phi }(v)$ for brevity. From (3.6) and (2.1), we have

$$\begin{aligned} F(u) - F(p_{\phi }(v))\ge & {} F(u) - Q(H, p_{\phi }(v),v) -\epsilon \nonumber \\= & {} F(u) - (f(v)+ g(p_{\phi }(v)) + \langle \nabla f(v),p_{\phi }(v)-v\rangle \nonumber \\&+ \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2)-\epsilon . \end{aligned}$$

(9.1)

Also

$$\begin{aligned} g(u) \ge g(p_{\phi }(v)) + \langle u-p_{\phi }(v), \gamma _g(p_{\phi }(v)) \rangle - \phi \end{aligned}$$

(9.2)

by the definition of $\phi $-subgradient, and

$$\begin{aligned} f(u) \ge f(v) + \langle u- v, \nabla f(v) \rangle , \end{aligned}$$

(9.3)

due to the convexity of f. Here $\gamma _g(\cdot )$ is any subgradient of $g(\cdot )$ and $\gamma _g(p_{\phi }(v))$ is an $\phi $-subgradient, which satisfies the first-order optimality conditions for $\phi $-approximate minimizer from Lemma 1 with $z=v-H^{-1}\nabla f(v)$, i.e.,

$$\begin{aligned} \gamma _g(p_{\phi }(v)) = H(v-p_{\phi }(v)) - \nabla f(v) - \eta , \text{ with } \frac{1}{2} \Vert \eta \Vert ^2_{H^{-1}} \le \phi . \end{aligned}$$

(9.4)

Summing (9.2) and (9.3) yields

$$\begin{aligned} F(u) \ge g(p_{\phi }(v)) + \langle u-p_{\phi }(v), \gamma _g(p_{\phi }(v)) \rangle - \phi + f(v) + \langle u- v, \nabla f(v)\rangle .\qquad \end{aligned}$$

(9.5)

Therefore, from (9.1), (9.4) and (9.5) it follows that

$$\begin{aligned} F(u) - F(p_{\phi }(v))&\ge \langle \nabla f(v)+\gamma _g(p_{\phi }(v)), u-p_{\phi }(v) \rangle - \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2 - \epsilon - \phi \\&= \langle -H(p_{\phi }(v)-v) - \eta , u-p_{\phi }(v)\rangle - \frac{1}{2}\Vert p_{\phi }(v)-v\Vert _H^2-\epsilon - \phi \\&= \frac{1}{2}\Vert p_{\phi }( v)-u\Vert _H^2 - \frac{1}{2} \Vert v-u\Vert _H^2-\epsilon -\phi - \langle \eta , u - p_{\phi }(v) \rangle . \end{aligned}$$

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scheinberg, K., Tang, X. Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160, 495–529 (2016). https://doi.org/10.1007/s10107-016-0997-3

Download citation

Received: 19 March 2014
Accepted: 19 February 2016
Published: 31 March 2016
Issue Date: November 2016
DOI: https://doi.org/10.1007/s10107-016-0997-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical inexact proximal quasi-Newton method with global complexity analysis

Abstract

Access this article

Similar content being viewed by others

Newton-Type Methods with the Proximal Gradient Step for Sparse Estimation

Scaled Proximal Gradient Methods for Sparse Optimization Problems

Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Practical inexact proximal quasi-Newton method with global complexity analysis

Abstract

Access this article

Similar content being viewed by others

Newton-Type Methods with the Proximal Gradient Step for Sparse Estimation

Scaled Proximal Gradient Methods for Sparse Optimization Problems

Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation