ABSTRACT
Support vector machine (SVM) has been one of the most popular learning algorithms, with the central idea of maximizing the minimum margin, i.e., the smallest distance from the instances to the classification boundary. Recent theoretical results, however, disclosed that maximizing the minimum margin does not necessarily lead to better generalization performances, and instead, the margin distribution has been proven to be more crucial. In this paper, we propose the Large margin Distribution Machine (LDM), which tries to achieve a better generalization performance by optimizing the margin distribution. We characterize the margin distribution by the first- and second-order statistics, i.e., the margin mean and variance. The LDM is a general learning approach which can be used in any place where SVM can be applied, and its superiority is verified both theoretically and empirically in this paper.
- F. Aiolli, G. San Martino, and A. Sperduti. A kernel method for the optimization of the margin distribution. In Proceedings of the 18th International Conference on Artificial Neural Networks, pages 305--314, Prague, Czech, 2008. Google ScholarDigital Library
- A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737--1754, 2009. Google ScholarDigital Library
- L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the 19th International Conference on Computational Statistics, pages 177--186, Paris, France, 2010.Google ScholarCross Ref
- L. Breiman. Prediction games and arcing classifiers. Neural Computation, 11(7):1493--1517, 1999. Google ScholarDigital Library
- C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273--297, 1995. Google ScholarDigital Library
- A. Cotter, S. Shalev-shwartz, and N. Srebro. Learning optimally sparse support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 266--274, Atlanta, GA, 2013.Google Scholar
- N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge, UK, 2000. Google ScholarDigital Library
- H. Do and K. Alexandre. Convex formulations of radius-margin based support vector machines. In Proceedings of the 30th International Conference on Machine Learning, pages 169--177, Atlanta, GA, 2013.Google Scholar
- R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
- Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the 2nd European Conference on Computational Learning Theory, pages 23--37, Barcelona, Spain, 1995. Google ScholarDigital Library
- W. Gao and Z.-H. Zhou. On the doubt about margin explanation of boosting. Artificial Intelligence, 199--200:22--44, 2013. Google ScholarDigital Library
- A. Garg and D. Roth. Margin distribution and learning algorithms. In Proceedings of the 20th International Conference on Machine Learning, pages 210--217, Washington, DC, 2003.Google Scholar
- C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pages 408--415, Helsinki, Finland, 2008. Google ScholarDigital Library
- C. Jose, P. Goyal, P. Aggrwal, and M. Varma. Local deep kernel learning for efficient non-linear SVM prediction. In Proceedings of the 30th International Conference on Machine Learning, pages 486--494, Atlanta, GA, 2013.Google Scholar
- H. J. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York, 2nd edition, 2003.Google Scholar
- S. Lacoste-julien, M. Jaggi, M. Schmidt, and P. Pletscher. Block-coordinate Frank-Wolfe optimization for structural SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 53--61, Atlanta, GA, 2013.Google Scholar
- A. Luntz and V. Brailovsky. On estimation of characters obtained in statistical procedure of recognition. Technicheskaya Kibernetica, 3, 1969. (in Russian).Google Scholar
- K. Pelckmans, J. Suykens, and B. D. Moor. A risk minimization principle for a class of parzen estimators. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1137--1144. MIT Press, Cambridge, MA, 2008.Google Scholar
- B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838--855, 1992. Google ScholarDigital Library
- L. Reyzin and R. E. Schapire. How boosting the margin can also boost classifier complexity. In Proceedings of 23rd International Conference on Machine Learning, pages 753--760, Pittsburgh, PA, 2006. Google ScholarDigital Library
- R. E. Schapire, Y. Freund, P. L. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectives of voting methods. Annuals of Statistics, 26(5):1651--1686, 1998.Google ScholarCross Ref
- B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001. Google ScholarDigital Library
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of the 24th International Conference on Machine Learning, pages 807--814, Helsinki, Finland, 2007. Google ScholarDigital Library
- O. Shamir and T. Zhang. Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In Proceedings of the 30th International Conference on Machine Learning, pages 71--79, Atlanta, GA, 2013.Google ScholarDigital Library
- M. Takac, A. Bijral, P. Richtarik, and N. Srebro. Mini-batch primal and dual methods for SVMs. In Proceedings of the 30th International Conference on Machine Learning, pages 1022--1030, Atlanta, GA, 2013.Google Scholar
- V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. Google ScholarDigital Library
- L. Wang, M. Sugiyama, C. Yang, Z.-H. Zhou, and J. Feng. On the margin explanation of boosting algorithm. In Proceedings of the 21st Annual Conference on Learning Theory, pages 479--490, Helsinki, Finland, 2008.Google Scholar
- W. Xu. Towards optimal one pass large scale learning with averaged stochastic gradient descent. CoRR, abs/1107.2490, 2011.Google Scholar
- G. X. Yuan, C. H. Ho, and C. J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012.Google ScholarCross Ref
- T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine learning, pages 116--123, Banff, Canada, 2004. Google ScholarDigital Library
- Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton, FL, 2012. Google ScholarDigital Library
Index Terms
- Large margin distribution machine
Recommendations
Double distribution support vector machine
Sample mean is generally the simplest feasible description of samples.Margin distribution can characterize the generalization performance.Double Distribution Support Vector Machine (DDSVM) maximizes the margin distribution of two classes sample ...
Improved twin bounded large margin distribution machines for binary classification
AbstractRecently, a robust and effective classifier termed twin bounded large margin distribution machine (TBLDM) was suggested. TBLDM is based on the twin bounded support vector machine (TBSVM) and large margin distribution machine (LDM). TBLDM searches ...
Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data
Large margin Distribution Machine (LDM) is not satisfactory on imbalanced training data.Cost-sensitive margin distribution is introduced to design a balanced classifier.Cost-sensitive LDM (CS-LDM) has a very strong generalization performance.CS-LDM can ...
Comments