Abstract
We treat the feature selection problem in the support vector machine (SVM) framework by adopting an optimization model based on use of the \(\ell _0\) pseudo-norm. The objective is to control the number of non-zero components of the normal vector to the separating hyperplane, while maintaining satisfactory classification accuracy. In our model the polyhedral norm \(\Vert .\Vert _{[k]}\), intermediate between \(\Vert .\Vert _1\) and \(\Vert .\Vert _{\infty }\), plays a significant role, allowing us to come out with a DC (difference of convex) optimization problem that is tackled by means of DCA algorithm. The results of several numerical experiments on benchmark classification datasets are reported.
Similar content being viewed by others
Notes
Symbol “*” in Dataset column of Table 5 indicates that parameter C has been set to 2.
References
Amaldi, E., Kann, V.: On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. 209(1–2), 237–260 (1998)
Bertolazzi, P., Felici, G., Festa, P., Fiscon, G., Weitschek, E.: Integer programming models for feature selection: new extensions and a randomized solution algorithm. Eur. J. Oper. Res. 250(2), 389–399 (2016)
Bradley, P.S., Mangasarian, O.L., Street, W.N.: Feature selection via mathematical programming. INFORMS J. Comput. 10(2), 209–217 (1998)
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J., (ed.) Machine Learning Proceedings of the Fifteenth International Conference (ICML ’98). Morgan Kaufmann, San Francisco, California, pp. 82–90 (1998)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Di Pillo, G., Grippo, L.: Exact penalty functions in constrained optimization. SIAM J. Control Optim. 27(6), 1333–1360 (1989)
Dy, J.G., Brodley, C.E., Wrobel, S.: Feature selection for unsupervised learning. J. Mach. Learn. Res. 5, 845–889 (2004)
Gasso, G., Rakotomamonjy, A., Canu, S.: Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process. 57(12), 4686–4698 (2009)
Gaudioso, M., Gorgone, E., Labbé, M., Rodríguez-Chía, A.M.: Lagrangian relaxation for SVM feature selection. Comput. Oper. Res. 87, 137–145 (2017)
Gaudioso, M., Giallombardo, G., Miglionico, G.: Minimizing piecewise-concave functions over polytopes. Math. Oper. Res. 43(2), 580–597 (2018)
Gaudioso, M., Giallombardo, G., Miglionico, G., Bagirov, A.M.: Minimizing nonsmooth DC functions via successive DC piecewise-affine approximations. J. Glob. Optim. 71(1), 37–55 (2018)
Gotoh, J., Takeda, A., Tono, K.: DC formulations and algorithms for sparse optimization problems. Math. Program. Ser. B 169(1), 141–176 (2018)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hastie, T., Tibshirani, R., Wainwright, M.: Statistical Learning with Sparsity: The Lasso and Generalizations. CRC Press, Boca Raton (2015)
Hempel, A.B., Goulart, P.J.: A novel method for modelling cardinality and rank constraints. In: 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA December 15–17, pp. 4322–4327 (2014)
Hiriart-Urruty, J.-B.: Generalized Differentiability/Duality and Optimization for Problems Deling with Differences of Convex Functions, Lecture Notes in Economic and Mathematical Systems, vol. 256, pp. 37–70. Springer, Berlin (1986)
Hiriart-Urruty, J.-B., Ye, D.: Sensitivity analysis of all eigevalues of a symmetric matrix. Numer. Math. 70(1), 45–72 (1995)
Joki, K., Bagirov, A.M., Karmitsa, N., Mäkelä, M.M.: A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes. J. Glob. Optim. 68, 501–535 (2017)
Le Thi, H.A., Dinh, T.P.: The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. J. Glob. Optim. 133, 23–46 (2005)
Le Thi, H.A., Le, H.M., Nguyen, V.V., Dinh, T.P.: A DC programming approach for feature selection in support vector machines learning. Adv. Data Anal. Classif. 2, 259–278 (2008)
Maldonado, S., Pérez, J., Weber, R., Labbé, M.: Feature selection for support vector machines via mixed integer linear programming. Inf. Sci. 279, 163–175 (2014)
Mangasarian, O.L.: Nonlinear Programming. McGraw-Hill, New York (1969)
Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(1–3), 321–357 (1993)
Pilanci, M., Wainwright, M.J., El Ghaoui, L.: Sparse learning via Boolean relaxations. Math. Program. Ser. B 151, 63–87 (2015)
Rinaldi, F., Schoen, F., Sciandrone, M.: Concave programming for minimizing the zero-norm over polyhedral sets. Comput. Optim. Appl. 46, 467–486 (2010)
Strekalovsky, A.S.: Global optimality conditions for nonconvex optimization. J. Glob. Optim. 12, 415–434 (1998)
Soubies, E., Blanc-Féraud, L., Aubert, G.: A unified view of exact continuous penalties for \(\ell _2\)-\(\ell _0\) minimization. SIAM J. Optim. 27(3), 2034–2060 (2017)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Vapnik, V.: The Nature of the Statistical Learning Theory. Springer, Berlin (1995)
Wang, L., Zhu, J., Zou, H.: The doubly regularized support vector machine. Stat. Sin. 16, 589–615 (2006)
Watson, G.A.: Linear best approximation using a class of polyhedral norms. Numer. Algorithms 2, 321–336 (1992)
Weston, J., Elisseeff, A., Schölkopf, B., Tipping, M.: Use of the zero-norm with linear models and kernel methods. J. Mach. Learn. Res. 3, 1439–1461 (2003)
Wright, S.J.: Accelerated block-cordinate relaxation for regularized optimization. SIAM J. Optim. 22(1), 159–186 (2012)
Wu, B., Ding, C., Sun, D., Toh, K.-C.: On the Moreau–Yosida regularization of the vector \(k\)-norm related functions. SIAM J. Optim. 24(2), 766–794 (2014)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)
Acknowledgements
We are grateful to Francesco Rinaldi for having provided us with the datasets Colon Cancer and Nova that we have used to compare our results with those in [25].
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gaudioso, M., Gorgone, E. & Hiriart-Urruty, JB. Feature selection in SVM via polyhedral k-norm. Optim Lett 14, 19–36 (2020). https://doi.org/10.1007/s11590-019-01482-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-019-01482-1