Abstract
Recently several methods were proposed for sparse optimization which make careful use of second-order information (Hsieh et al. in Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS, 2011; Yuan et al. in An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City, 2011; Olsen et al. in Newton-like methods for sparse inverse covariance estimation. In: NIPS, 2012; Byrd et al. in A family of second-order methods for convex l1-regularized optimization. Technical report, 2012) to improve local convergence rates. These methods construct a composite quadratic approximation using Hessian information, optimize this approximation using a first-order method, such as coordinate descent and employ a line search to ensure sufficient descent. Here we propose a general framework, which includes slightly modified versions of existing algorithms and also a new algorithm, which uses limited memory BFGS Hessian approximations, and provide a novel global convergence rate analysis, which covers methods that solve subproblems via coordinate descent.
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2, 183–202 (2009)
Becker, S., Fadili, J.: A Quasi-Newton Proximal Splitting Method. In: Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 2618–2626. Curran Associates, Inc., Red Hook (2012)
Byrd, R., Chin, G., Nocedal, J., Oztoprak, F.: A family of second-order methods for convex l1-regularized optimization. Technical report (2012)
Byrd, R., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for convex l-1 regularized optimization. Technical report (2013)
Byrd, R.H., Nocedal, J., Schnabel, R.B.: Representations of quasi-newton matrices and their use in limited memory methods. Math. Program. 63, 129–156 (1994)
Cartis, C., Gould, N.I.M., Toint, P.L.: Evaluation complexity of adaptive cubic regularization methods for convex unconstrained optimization. Optim. Methods Softw. 27, 197–219 (2012)
Donoho, D.: De-noising by soft-thresholding. IEEE Trans. Inf. Theory 41, 613–627 (1995)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostat. Oxf. Engl. 9, 432–441 (2008)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Hsieh, C.-J., Sustik, M., Dhilon, I., Ravikumar, P.: Sparse inverse covariance matrix estimation using quadratic approximation. In: NIPS (2011)
Jiang, K.F., Sun, D.F., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex SDP. SIAM J. Optim. 3, 1042–1064 (2012)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal newton-type methods for convex optimization. In: NIPS (2012)
Lewis, A.S., Wright, S.J.: Identifying activity. SIAM J. Optim. 21, 597–614 (2011)
Li, L., Toh, K.-C.: An inexact interior point method for L1-regularized sparse covariance selection. Math. Program. 2, 291–315 (2010)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Nesterov, Y.: Gradient methods for minimizing composite objective function, CORE report (2007)
Nesterov, Y.E.: Introductory lectures on convex optimization: a basic course 87, xviii+236 (2004)
Nesterov, Y.E., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177–205 (2006)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd edn. Springer, New York (2006)
Olsen, P.A., Oztoprak, F., Nocedal, J., Rennie, S.J.: Newton-like methods for sparse inverse covariance estimation. In: NIPS (2012)
Qin, Z., Scheinberg, K., Goldfarb, D.: Efficient block-coordinate descent algorithms for the group lasso. Math. Program. Comput. 5, 143–169 (2013)
Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program. 144(1–2), 1–38 (2014)
Scheinberg, K., Ma, S., Goldfarb, D.: Sparse inverse covariance selection via alternating linearization methods. In: NIPS (2010)
Scheinberg, K., Rish, I.: SINCO: a greedy coordinate ascent method for sparse inverse covariance selection problem, tech. rep. (2009)
Schmidt, M., Kim, D., Sra, S.: Projected newton-type methods in machine learning. Optim. Mach. Learn., 305 (2012)
Schmidt, M., Roux, N. L., Bach, F.: Supplementary material for the paper convergence rates of inexact proximal-gradient methods for convex optimization. In: Proceedings of the 25th annual conference on neural information processing systems (NIPS) (2011)
Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l1 regularized loss minimization. In: ICML, pp. 929–936 (2009)
Tang, X.: Optimization in machine learning, Ph.D. thesis, Lehigh University (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117, 387–423 (2009)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Wytock, M., Kolter, Z.: Sparse gaussian conditional random fields: algorithms, theory, and application to energy forecasting. In: Dasgupta, S., McAllester, D. (eds.) Proceedings of the 30th international conference on machine learning (ICML-13), vol. 28, JMLR Workshop and Conference Proceedings, pp. 1265–1273 (May 2013)
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J., Lin, C.-J.: A comparison of optimization methods and software for large-scale l1-regularized linear classification. JMLR 11, 3183–3234 (2010)
Yuan, G.-X., Ho, C.-H., Lin, C.-J.: An improved GLMNET for l1-regularized logistic regression and support vector machines. National Taiwan University, Taipei City (2011)
Author information
Authors and Affiliations
Corresponding author
Additional information
The work of Katya Scheinberg is partially supported by NSF Grants DMS 10-16571, DMS 13-19356, AFOSR Grant FA9550-11-1-0239, and DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR. The work of Xiaocheng Tang is partially supported by DARPA Grant FA 9550-12-1-0406 negotiated by AFOSR.
Appendix
Appendix
Proof of Lemma 2.
Proof
Let \(p_{\phi }(v)\) denote \(p_{H,\phi }(v)\) for brevity. From (3.6) and (2.1), we have
Also
by the definition of \(\phi \)-subgradient, and
due to the convexity of f. Here \(\gamma _g(\cdot )\) is any subgradient of \(g(\cdot )\) and \(\gamma _g(p_{\phi }(v))\) is an \(\phi \)-subgradient, which satisfies the first-order optimality conditions for \(\phi \)-approximate minimizer from Lemma 1 with \(z=v-H^{-1}\nabla f(v)\), i.e.,
Summing (9.2) and (9.3) yields
Therefore, from (9.1), (9.4) and (9.5) it follows that
\(\square \)
Rights and permissions
About this article
Cite this article
Scheinberg, K., Tang, X. Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160, 495–529 (2016). https://doi.org/10.1007/s10107-016-0997-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-016-0997-3