research-article

Large margin distribution machine

Authors:
Teng Zhang

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

,
Zhi-Hua Zhou

Nanjing University, Nanjing, China

Nanjing University, Nanjing, China
View Profile

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningAugust 2014Pages 313–322https://doi.org/10.1145/2623330.2623710

Published:24 August 2014Publication History

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 313–322

ABSTRACT

Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. In this paper, we propose the Large margin Distribution Machine (LDM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The LDM is a general learning approach which can be used in any place where SVM can be applied, and its superiority is verified both theoretically and empirically in this paper.

References

F. Aiolli, G. San Martino, and A. Sperduti. A kernel method for the optimization of the margin distribution. In Proceedings of the 18th International Conference on Artificial Neural Networks, pages 305--314, Prague, Czech, 2008. Google ScholarDigital Library
A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737--1754, 2009. Google ScholarDigital Library
L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics, pages 177--186, Paris, France, 2010.Google ScholarCross Ref
L. Breiman. Prediction games and arcing classifiers. Neural Computation, 11(7):1493--1517, 1999. Google ScholarDigital Library
C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
A. Cotter, S. Shalev-shwartz, and N. Srebro. Learning optimally sparse support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 266--274, Atlanta, GA, 2013.Google Scholar
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK, 2000. Google ScholarDigital Library
H. Do and K. Alexandre. Convex formulations of radius-margin based support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 169--177, Atlanta, GA, 2013.Google Scholar
R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory, pages 23--37, Barcelona, Spain, 1995. Google ScholarDigital Library
W. Gao and Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 199--200:22--44, 2013. Google ScholarDigital Library
A. Garg and D. Roth. Margin distribution and learning algorithms. In Proceedings of the 20th International Conference on Machine Learning, pages 210--217, Washington, DC, 2003.Google Scholar
C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pages 408--415, Helsinki, Finland, 2008. Google ScholarDigital Library
C. Jose, P. Goyal, P. Aggrwal, and M. Varma. Local deep kernel learning for efficient non-linear SVM prediction. In Proceedings of the 30th International Conference on Machine Learning, pages 486--494, Atlanta, GA, 2013.Google Scholar
H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York, 2nd edition, 2003.Google Scholar
S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher. Block-coordinate Frank-Wolfe optimization for structural SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 53--61, Atlanta, GA, 2013.Google Scholar
A. Luntz and V. Brailovsky. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3, 1969. (in Russian).Google Scholar
K. Pelckmans, J. Suykens, and B. D. Moor. A risk minimization principle for a class of parzen estimators. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1137--1144. MIT Press, Cambridge, MA, 2008.Google Scholar
B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838--855, 1992. Google ScholarDigital Library
L. Reyzin and R. E. Schapire. How boosting the margin can also boost classifier complexity. In Proceedings of 23rd International Conference on Machine Learning, pages 753--760, Pittsburgh, PA, 2006. Google ScholarDigital Library
R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectives of voting methods. Annuals of Statistics, 26(5):1651--1686, 1998.Google ScholarCross Ref
B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001. Google ScholarDigital Library
S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, pages 807--814, Helsinki, Finland, 2007. Google ScholarDigital Library
O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Proceedings of the 30th International Conference on Machine Learning, pages 71--79, Atlanta, GA, 2013.Google ScholarDigital Library
M. Takac, A. Bijral, P. Richtarik, and N. Srebro. Mini-batch primal and dual methods for SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 1022--1030, Atlanta, GA, 2013.Google Scholar
V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. Google ScholarDigital Library
L. Wang, M. Sugiyama, C. Yang, Z.-H. Zhou, and J. Feng. On the margin explanation of boosting algorithm. In Proceedings of the 21st Annual Conference on Learning Theory, pages 479--490, Helsinki, Finland, 2008.Google Scholar
W. Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. CoRR, abs/1107.2490, 2011.Google Scholar
G. X. Yuan, C. H. Ho, and C. J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012.Google ScholarCross Ref
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine learning, pages 116--123, Banff, Canada, 2004. Google ScholarDigital Library
Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, FL, 2012. Google ScholarDigital Library

Index Terms

Large margin distribution machine
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning

Recommendations

Double distribution support vector machine

Sample mean is generally the simplest feasible description of samples.Margin distribution can characterize the generalization performance.Double Distribution Support Vector Machine (DDSVM) maximizes the margin distribution of two classes sample ...
Read More
Improved twin bounded large margin distribution machines for binary classification
Abstract
Recently, a robust and effective classifier termed twin bounded large margin distribution machine (TBLDM) was suggested. TBLDM is based on the twin bounded support vector machine (TBSVM) and large margin distribution machine (LDM). TBLDM searches ...
Read More
Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data

Large margin Distribution Machine (LDM) is not satisfactory on imbalanced training data.Cost-sensitive margin distribution is introduced to design a balanced classifier.Cost-sensitive LDM (CS-LDM) has a very strong generalization performance.CS-LDM can ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2014
2028 pages
ISBN:9781450329569
DOI:10.1145/2623330
General Chairs:
Sofus Macskassy
Facebook
,
Claudia Perlich
Dstillery
,
Program Chairs:
Jure Leskovec
Stanford University
,
Wei Wang
UCLA
,
Rayid Ghani
University of Chicago
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
margin distribution
minimum margin
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '14 Paper Acceptance Rate151of1,036submissions,15%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 61
  Total Citations
  View Citations
- 810
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Large margin distribution machine

KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Double distribution support vector machine

Improved twin bounded large margin distribution machines for binary classification

Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data