Skip to main content

Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning

  • Conference paper
  • First Online:
  • 3503 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9077))

Abstract

Online and stochastic gradient methods have emerged as potent tools in large scale optimization with both smooth convex and nonsmooth convex problems from the classes \(C^{1,1}(\mathbb {R}^p)\) and \(C^{1,0}(\mathbb {R}^p)\) respectively. However, to our best knowledge, there is few paper using incremental gradient methods to optimization the intermediate classes of convex problems with Hölder continuous functions \(C^{1,v}(\mathbb {R}^p)\). In order to fill the difference and the gap between the methods for smooth and nonsmooth problems, in this work, we propose several online and stochastic universal gradient methods, which we do not need to know the actual degree of the smoothness of the objective function in advance. We expanded the scope of the problems involved in machine learning to Hölder continuous functions and to propose a general family of first-order methods. Regret and convergent analysis shows that our methods enjoy strong theoretical guarantees. For the first time, we establish algorithms that enjoys a linear convergence rate for convex functions that have Hölder continuous gradients.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beck, A., Teboulle, M.: Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters 31(3), 167–175 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  2. Duchi, J., Shalev-Shwartz, S., Singer, Y., Tewari, A.: Composite objective mirror descent (2010)

    Google Scholar 

  3. Duchi, J.C., Agarwal, A., Wainwright, M.J.: Dual averaging for distributed optimization. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1564–1565. IEEE (2012)

    Google Scholar 

  4. Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications 2(1), 17–40 (1976)

    Article  MATH  Google Scholar 

  5. Mairal, J.: Optimization with first-order surrogate functions. arXiv preprint arXiv:1305.3120 (2013)

  6. Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization 22(2), 341–362 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  7. Nesterov, Y.: Primal-dual subgradient methods for convex problems. Mathematical Programming 120(1), 221–259 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  8. Nesterov, Y.: Universal gradient methods for convex optimization problems. CORE (2013)

    Google Scholar 

  9. Schmidt, M., Roux, N.L., Bach, F.: Minimizing finite sums with the stochastic average gradient. arXiv preprint arXiv:1309.2388 (2013)

  10. Shalev-Shwartz, S., Zhang, T.: Proximal stochastic dual coordinate ascent. arXiv preprint arXiv:1211.2717 (2012)

  11. Shi, Z., Han, J., Zheng, T., Deng, S.: Audio segment classification using online learning based tensor representation feature discrimination. IEEE transactions on audio, speech, and language processing 21(1–2), 186–196 (2013)

    Google Scholar 

  12. Suzuki, T.: Dual averaging and proximal gradient descent for online alternating direction multiplier method. In: Proceedings of ICML 2013, pp. 392–400 (2013)

    Google Scholar 

  13. Wang, H., Banerjee, A.: Online alternating direction method. arXiv preprint arXiv:1206.6448 (2012)

  14. Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research 11, 2543–2596 (2010)

    MATH  Google Scholar 

  15. Xiao, L., Zhang, T.: A proximal stochastic gradient method with progressive variance reduction. arXiv preprint arXiv:1403.4699 (2014)

  16. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of ICML 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziqiang Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Shi, Z., Liu, R. (2015). Online and Stochastic Universal Gradient Methods for Minimizing Regularized Hölder Continuous Finite Sums in Machine Learning. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9077. Springer, Cham. https://doi.org/10.1007/978-3-319-18038-0_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18038-0_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18037-3

  • Online ISBN: 978-3-319-18038-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics