skip to main content
10.1145/2623330.2623710acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Large margin distribution machine

Published:24 August 2014Publication History

ABSTRACT

Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. In this paper, we propose the Large margin Distribution Machine (LDM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The LDM is a general learning approach which can be used in any place where SVM can be applied, and its superiority is verified both theoretically and empirically in this paper.

References

  1. F. Aiolli, G. San Martino, and A. Sperduti. A kernel method for the optimization of the margin distribution. In Proceedings of the 18th International Conference on Artificial Neural Networks, pages 305--314, Prague, Czech, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737--1754, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics, pages 177--186, Paris, France, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Breiman. Prediction games and arcing classifiers. Neural Computation, 11(7):1493--1517, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Cotter, S. Shalev-shwartz, and N. Srebro. Learning optimally sparse support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 266--274, Atlanta, GA, 2013.Google ScholarGoogle Scholar
  7. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Do and K. Alexandre. Convex formulations of radius-margin based support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 169--177, Atlanta, GA, 2013.Google ScholarGoogle Scholar
  9. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory, pages 23--37, Barcelona, Spain, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Gao and Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 199--200:22--44, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Garg and D. Roth. Margin distribution and learning algorithms. In Proceedings of the 20th International Conference on Machine Learning, pages 210--217, Washington, DC, 2003.Google ScholarGoogle Scholar
  13. C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pages 408--415, Helsinki, Finland, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Jose, P. Goyal, P. Aggrwal, and M. Varma. Local deep kernel learning for efficient non-linear SVM prediction. In Proceedings of the 30th International Conference on Machine Learning, pages 486--494, Atlanta, GA, 2013.Google ScholarGoogle Scholar
  15. H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York, 2nd edition, 2003.Google ScholarGoogle Scholar
  16. S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher. Block-coordinate Frank-Wolfe optimization for structural SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 53--61, Atlanta, GA, 2013.Google ScholarGoogle Scholar
  17. A. Luntz and V. Brailovsky. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3, 1969. (in Russian).Google ScholarGoogle Scholar
  18. K. Pelckmans, J. Suykens, and B. D. Moor. A risk minimization principle for a class of parzen estimators. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1137--1144. MIT Press, Cambridge, MA, 2008.Google ScholarGoogle Scholar
  19. B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838--855, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Reyzin and R. E. Schapire. How boosting the margin can also boost classifier complexity. In Proceedings of 23rd International Conference on Machine Learning, pages 753--760, Pittsburgh, PA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectives of voting methods. Annuals of Statistics, 26(5):1651--1686, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  22. B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, pages 807--814, Helsinki, Finland, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Proceedings of the 30th International Conference on Machine Learning, pages 71--79, Atlanta, GA, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Takac, A. Bijral, P. Richtarik, and N. Srebro. Mini-batch primal and dual methods for SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 1022--1030, Atlanta, GA, 2013.Google ScholarGoogle Scholar
  26. V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. Wang, M. Sugiyama, C. Yang, Z.-H. Zhou, and J. Feng. On the margin explanation of boosting algorithm. In Proceedings of the 21st Annual Conference on Learning Theory, pages 479--490, Helsinki, Finland, 2008.Google ScholarGoogle Scholar
  28. W. Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. CoRR, abs/1107.2490, 2011.Google ScholarGoogle Scholar
  29. G. X. Yuan, C. H. Ho, and C. J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine learning, pages 116--123, Banff, Canada, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, FL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Large margin distribution machine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2014
      2028 pages
      ISBN:9781450329569
      DOI:10.1145/2623330

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 August 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader