DCA based algorithms for feature selection in multi-class support vector machine

Le Thi, Hoai An; Nguyen, Manh Cuong

doi:10.1007/s10479-016-2333-y

DCA based algorithms for feature selection in multi-class support vector machine

Pardalos60
Published: 04 October 2016

Volume 249, pages 273–300, (2017)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Hoai An Le Thi^1,2 &
Manh Cuong Nguyen²

525 Accesses
17 Citations
Explore all metrics

Abstract

This paper addresses the problem of feature selection for Multi-class Support Vector Machines. Two models involving the \(\ell _{0}\) (the zero norm) and the \(\ell _{2}\)–\(\ell _{0}\) regularizations are considered for which two continuous approaches based on DC (Difference of Convex functions) programming and DCA (DC Algorithms) are investigated. The first is DC approximation via several sparse inducing functions and the second is an exact reformulation approach using penalty techniques. Twelve versions of DCA based algorithms are developed on which empirical computational experiments are fully performed. Numerical results on real-world datasets show the efficiency and the superiority of our methods versus one of the best standard algorithms on both feature selection and classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Algorithms for Feature Selection in Multi-class Support Vector Machine

DCA Based Algorithms for Feature Selection in Semi-supervised Support Vector Machines

Feature selection in SVM via polyhedral k-norm

Article 18 September 2019

Notes

The conjugate \(G^*\) of a convex function G is defined by \(G^*(Y):= \sup \limits _{X} \left\{ \langle X,Y \rangle - G(X) \right\} \).

References

Bradley, P. S., & Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In J. Shavlik (Ed.), Machine learning proceedings of the fifteenth international conferences (ICML’98) (pp. 82–90). San Francisco: Morgan Kaufmann.
Google Scholar
Cai, X., Nie, F., Huang, H., & Ding, C. (2011). Multi-class \(\ell _{2,1}\)-norm support vector machine. In Data mining (ICDM), 2011 IEEE 11th International Conference (pp. 91–100).
Candès, E. J., Wakin, M. B., & Boyd, S. P. (2008). Enhancing sparsity by reweighted \(\ell _{1}\) minimization. Journal of Fourier Analysis and Applications, 14, 877–905.
Article Google Scholar
Chapelle, O. (2008). Multi-class feature selection with support vector machines. Technical report YR-2008-002.
Chen, Y. W., & Lin, C. J. (2006). Combining SVMs with various feature selection strategies. In I. Guyon, M. Nikravesh, S. Gunn, & L. A. Zadeh (Eds.), Feature extraction. Studies in Fuzziness and Soft Computing (Vol. 207, pp. 315–324). Berlin: Springer.
Chen, Y., Li, Y., Cheng, X-Q., & Guo, L. (2006). Survey and taxonomy of feature selection algorithms in intrusion detection system. In Proceedings of Inscrypt 2006, LNCS 4318 (pp. 153–167).
Chen, X., Zeng, X., & Alphen, D. V. (2006). Multi-class feature selection for texture classification. Pattern Recognition Letters, 27, 1685–1691.
Article Google Scholar
Collobert, R., Sinz, F., Weston, J., & Bottou, L. (2006). Large scale transductive SVMs. Journal of Machine Learning Research, 7, 1687–1712.
Google Scholar
Deng, S., Xu, Y., Li, L., Li, X., & He, Y. (2013). A feature-selection algorithm based on Support Vector Machine-multiclass for hyperspectral visible spectral analysis. Journal of Food Engineering, 119(1), 159–166.
Article Google Scholar
Duan, K. B., Rajapakse, J. C., Wang, H., & Azuaje, F. (2005). Multiple SVM-RFE for genne selection in cancer classification with expression data. IEEE Transactions on Nanobioscience, 4, 228–234.
Article Google Scholar
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Article Google Scholar
Gribonval, R., & Nielsen, M. (2003). Sparse representation in union of bases. IEEE Transactions on Information Theory, 49, 3320–73325.
Article Google Scholar
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Google Scholar
Guyon, I., Gunn, S., Nikravesh, M., & Zadeh, L. A. (2006). Feature extraction, foundations and applications. Berlin: Springer.
Book Google Scholar
Hermes, L., & Buhmann, J. M. (2000). Feature selection for support vector machines. Proceedings of the 15th International Conference on Pattern Recognition, vol. 2 (pp. 712–715).
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.
Article Google Scholar
Huang, J., Ma, S., & Zhang, C. H. (2008). Adaptive Lasso for sparse high-dimentional regression models. Statistica Sinica, 18, 1603–1618.
Google Scholar
Huang, L., Zhang, H. H., Zeng, Z. B., & Bushel, P. R. (2013). Improved sparse multi-class SVM and its application for gene selection in cancer classification. Cancer Inform, 12, 143–153.
Article Google Scholar
Hui, Z. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Article Google Scholar
Krause, N., & Singer, Y. (2004). Leveraging the margin more carefully. In Proceeding of ICML ’04 (pp. 63–71). NY, USA.
Le Thi, H. A. (2005). DC programming and DCA. Available on http://lita.sciences.univ-metz.fr/~lethi/DCA.html.
Le Thi, H. A. (2012). A new approximation for the \(\ell _{0}\) -norm. Research Report LITA EA 3097, University of Lorraine.
Le Thi, H. A., & Phan, D. N. (2016). DC programming and DCA for sparse fisher linear discriminant analysis. Neural Computing and Applications, doi:10.1007/s00521-016-2216-9.
Le Thi, H. A., Belghiti, T., & Pham Dinh, T. (2006). A new efficient algorithm based on DC programming and DCA for Clustering. Journal of Global Optimization, 37, 593–608.
Google Scholar
Le Thi, H. A., Le Hoai, M., & Dinh, T. Pham. (2015). Feature Selection in machine learning: An exact penalty approach using a Difference of Convex function algorithm. Machine Learning, 101(1–3), 163–186.
Article Google Scholar
Le Thi, H. A., Le Hoai, M., Nguyen, V. V., & Pham Dinh, T. (2008). A DC programming approach for feature selection in Support Vector Machines learning. Journal of Advances in Data Analysis and Classification, 2(3), 259–278.
Article Google Scholar
Le Thi, H. A., Le Hoai, M., & Pham Dinh, T. (2007). Optimization based DC programming and DCA for hierarchical clustering. European Journal of Operational Research, 183, 1067–1085.
Article Google Scholar
Le Thi, H. A., Huynh, V. N., & Pham Dinh, T. (2012). Exact penalty and error bounds in DC programming. Journal of Global Optimization, 52(3), 509–535.
Article Google Scholar
Le Thi, H. A., Nguyen, V. V., & Ouchani, S. (2008). Gene selection for cancer classification using DCA. In C. Tang, C. X. Ling, X. Zhou, N. J. Cercone, & X. Li (Eds.), ADMA 2008. LNCS (LNAI) (Vol. 5139, pp. 62–72). Heidelberg: Springer.
Google Scholar
Le Thi, H. A., & Pham Dinh, T. (2005). The DC (Difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research, 133, 23–46.
Article Google Scholar
Le Thi, H. A., Pham Dinh, T., Le Hoai, M., & Vo, X. T. (2015). DC approximation approaches for sparse optimization. European Journal of Operational Research, 244(1), 26–46.
Article Google Scholar
Le Thi, H. A., & Phan, D. N. (2016). DC programming and DCA for sparse optimal scoring problem. Neurocomputing, 186, 170–181.
Article Google Scholar
Lee, Y., Kim, Y., Lee, S., & Koo, J. (2006). Structured multicategory support vector machines with analysis of variance decomposition. Biometrika, 93(3), 555–71.
Article Google Scholar
Lee, Y., Lin, Y., & Wahba, G. (2004). Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465), 67–81.
Article Google Scholar
Li, G. Z., Yang, J., Liu, G. P., & Xue, L. (2004). Feature selection for multi-class problems using support vector machines. In PRICAI 2004: Trends in artificial intelligence, lecture notes in computer science 3157 (pp. 292–300). Berlin: Springer.
Liu, D., Qian, H., Dai, G., & Zhang, Z. (2013). An iterative SVM approach to feature selection and classification in high-dimensional datasets. Pattern Recognition, 46(9), 2531–2537.
Article Google Scholar
Liu, Y., & Shen, X. (2006). Multicategory \(\Psi \)-learning. Journal of the American Statistical Association, 101(474), 500–509.
Article Google Scholar
Liu, Y., Shen, X., & Doss, H. (2005). Multicategory \(\psi \)-learning and Support Vector Machine: Computational tools. Journal of Computational and Graphical Statistics, 14, 219–236.
Article Google Scholar
Liu, Y., Zhang, H. H., Park, C., & Ahn, J. (2007). Support vector machines with adaptive \(\ell _q\) penalty. Computational Statistics & Data Analysis, 51, 6380–6394.
Article Google Scholar
Maldonado, S., Weber, R., & Basak, J. (2011). Simultaneous feature selection and classification using kernel-penalized support vector machines. Information Sciences, 181(1), 115–128.
Article Google Scholar
Neumann, J., Schnörr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Machine Learning, 61(1–3), 129–150.
Article Google Scholar
Ong, C. S., & Le Thi, H. A. (2013). Learning sparse classifiers with Difference of Convex functions algorithms. Optimization Methods and Software, 28, 4.
Article Google Scholar
Peleg, D., & Meir, R. (2008). A bilinear formulation for vector sparsity optimization. Signal Processing, 8(2), 375–389.
Article Google Scholar
Pham Dinh, T., & Le Thi, H. A. (2014). Recent advances on DC programming and DCA. In Transactions on computational intelligence XIII, Lecture Notes in Computer Science Vol. 8342 (pp. 1–37).
Pham Dinh, T., & Le Thi, H. A. (1997). Convex analysis approach to D.C. programming: Theory, algorithm and applications. Acta Mathematica Vietnamica, 22, 289–355.
Google Scholar
Pham Dinh, T., & Le Thi, H. A. (1998). Optimization algorithms for solving the trust region subproblem. SIAMJ. Optimization, 2, 476–505.
Google Scholar
Rakotomamonjy, A. (2003). Variable selection using SVM-based criteria. Journal of Machine Learning Research, 3, 1357–1370.
Google Scholar
Ramona, M., Richard, G., & David, B. (2012). Multiclass feature selection with kernel gram-matrix-based criteria. IEEE Transactions on Neural Networks and Learning Systems, 23(10), 1611–1623.
Article Google Scholar
Ronan, C., Fabian, S., Jason, W., & Lé, B. (2006). Trading convexity for scalability. In Proceedings of the 23rd international conference on machine learning ICML 2006 (pp. 201–208). Pittsburgh, Pennsylvania.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 46, 431–439.
Google Scholar
Wang, H., Li, G., & Jiang, G. (2007). Robust regression shrinkage and consistent variable selection via the LAD-LASSO. Journal of Business & Economics Statistics, 25(3), 347–355.
Article Google Scholar
Wang, L., & Shen, X. (2003). On \(\ell _1\)-norm multi-class support vector machine: Methodology and theory. Journal of the American Statistical Association, 102, 583–594.
Article Google Scholar
Weston, J., & Watkins, C. (1999). Support vector machines for multi-class pattern recognition. In Proceedings-European symposium on artificial neural networks, ESANN 1999 (pp. 219–224). D-Facto public.
Weston, J., Elisseeff, A., & Schölkopf, B. (2003). Use of zero-norm with linear models and kernel methods. Journal of Machine Learning Research, 3, 1439–1461.
Google Scholar
Wu, K., Lu, B., Uchiyama, M. & Isahara, H. (2007). A probabilistic approach to feature selection for multi-class text categorization. In D. Liu et al. (Eds.), ISNN 2007, Part I, LNCS 4491 (pp. 1310–1317).
Yeh, Y., Chung, Y., Lin, T., & Wang, Y. (2011). Group lasso regularized multiple kernel learning for heterogeneous feature selection. In The 2011 international joint conference on neural networks (IJCNN) (pp. 2570–2577).
Zhang, H. H., Liu, Y., Wu, Y., & Zhu, J. (2008). Variable selection for the multicategory SVM via adaptive sup-norm regularization. Journal of Statistics, 2, 149–167.
Google Scholar
Zhou, Y., Jin, R. & Hoi, S. C. (2010). Exclusive lasso for multi-task feature selection. In AISTATS 9.
Zhou, X., & Tuck, D. P. (2007). MSVM-RFE: Extensions of SVM-RFE for multiclass gene selection on DNA microarray data. Bioinformatics, 23(9), 1106–1114.
Article Google Scholar
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–71429.
Article Google Scholar

Download references

Acknowledgments

This research is funded by Foundation for Science and Technology Development of Ton Duc Thang University (FOSTECT), website: http://fostect.tdt.edu.vn, under Grant FOSTECT.2015.BR.15. The authors would like to thank the referees and the guest editor for their valuable comments which helped to improve the manuscript.

Author information

Authors and Affiliations

Department for Management of Science and Technology Development and Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Hoai An Le Thi
Laboratory of Theoretical and Applied Computer Science LITA EA 3097, University of Lorraine, Ile du Saulcy, 57045, Metz, France
Hoai An Le Thi & Manh Cuong Nguyen

Authors

Hoai An Le Thi
View author publications
You can also search for this author in PubMed Google Scholar
Manh Cuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoai An Le Thi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Le Thi, H.A., Nguyen, M.C. DCA based algorithms for feature selection in multi-class support vector machine. Ann Oper Res 249, 273–300 (2017). https://doi.org/10.1007/s10479-016-2333-y

Download citation

Published: 04 October 2016
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10479-016-2333-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DCA based algorithms for feature selection in multi-class support vector machine

Abstract

Access this article

Similar content being viewed by others

Efficient Algorithms for Feature Selection in Multi-class Support Vector Machine

DCA Based Algorithms for Feature Selection in Semi-supervised Support Vector Machines

Feature selection in SVM via polyhedral k-norm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DCA based algorithms for feature selection in multi-class support vector machine

Abstract

Access this article

Similar content being viewed by others

Efficient Algorithms for Feature Selection in Multi-class Support Vector Machine

DCA Based Algorithms for Feature Selection in Semi-supervised Support Vector Machines

Feature selection in SVM via polyhedral k-norm

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation