Skip to main content
Log in

Multi-label feature selection via feature manifold learning and sparsity regularization

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Multi-label learning deals with data associated with different labels simultaneously. Like traditional single-label learning, multi-label learning suffers from the curse of dimensionality as well. Feature selection is an efficient technique to improve learning efficiency with high-dimensional data. With the least square regression model, we incorporate feature manifold learning and sparse regularization into a joint framework for multi-label feature selection problems. The graph regularization is used to explore the feature geometric structure for gaining a better regression coefficient matrix which reflects the importance of varying features. Besides, the \(\ell _{2,1}\)-norm is imposed on the sparsity term to guarantee the sparsity of the regression coefficients. Furthermore, we design an iterative updating algorithm with proved convergence to tackle the aforementioned formulated problem. The proposed method is validated in six publicly available data sets from real-world applications. Finally, extensively experimental results demonstrate its superiority over the compared state-of-the-art multi-label feature selection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://mulan.sourceforge.net/datasets-mlc.html.

References

  1. Belkin M, Niyogi P (2001) Laplacian eigenmaps and spectral techniques for embedding and clustering. NIPS 14:585–591

    Google Scholar 

  2. Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2016) Feature selection for high-dimensional data. Prog Artif Intell 5(2):65–75

    Article  Google Scholar 

  3. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771

    Article  Google Scholar 

  4. Cai D, Zhang C, He X (2010) Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 333–342

  5. Cai X, Nie F, Huang H (2013) Exact top-k feature selection via l2, 0-norm constraint. In: IJCAI

  6. Chang X, Nie F, Yang Y, Huang H (2014) A convex formulation for semi-supervised multi-label feature selection. In: AAAI, pp 1171–1177

  7. Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), IEEE, pp 451–456

  8. Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Innovations in bio-inspired computing and applications, Springer, pp 229–239

  9. Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: European Conference on Principles of Data Mining and Knowledge Discovery, Springer, pp 42–53

  10. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  11. Doquire G, Verleysen M (2011) Feature selection for multi-label classification problems. In: International work-conference on artificial neural networks, Springer, pp 9–16.

  12. Dougherty J, Kohavi R, Sahami M et al (1995) Supervised and unsupervised discretization of continuous features. In: Machine learning: proceedings of the 12th international conference, vol. 12, pp 194–202

  13. Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the 7th international conference on information and knowledge management, ACM, pp 148–155

  14. Efron B, Hastie T, Johnstone I, Tibshirani R et al (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  MATH  Google Scholar 

  15. Elisseeff A, Weston J (2001) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14:681–687

    Google Scholar 

  16. Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: Proceedings of the 14th ACM international conference on information and knowledge management, ACM, pp 195–200

  17. Gharroudi O, Elghazel H, Aussem A (2014) A comparison of multi-label feature selection methods using the random forest paradigm. In: Canadian conference on artificial intelligence, pp 95–106

  18. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia conference on knowledge discovery and data mining, pp 22–30

  19. Gu Q, Li Z, Han J (2011) Correlated multi-label feature selection. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 1087–1096

  20. Gu Q, Zhou J (2009) Co-clustering on manifolds. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 359–368

  21. Guo S, Guo D, Chen L, Jiang Q (2016) A centroid-based gene selection method for microarray data classification. J Theor Biol 400:32–41

    Article  MathSciNet  MATH  Google Scholar 

  22. He X, Cai D, Niyogi P (2005) Laplacian score for feature selection. Adv Neural Inf Process Syst 186:507–514

    Google Scholar 

  23. He X, Cai D, Yan S, Zhang HJ (2005) Neighborhood preserving embedding. In: 10th IEEE international conference on computer vision (ICCV’05), vol. 1, vol. 2, IEEE, pp 1208–1213

  24. Ji S, Tang L, Yu S, Ye J (2010) A shared-subspace learning framework for multi-label classification. ACM Trans Knowl Discov Data (TKDD) 4(2):1–29

    Article  Google Scholar 

  25. Jolliffe I (2002) Principal component analysis. Wiley Online Library

  26. Jungjit S, Michaelis M, Freitas AA, Cinatl J (2013) Two extensions to multi-label correlation-based feature selection: a case study in bioinformatics. In: 2013 IEEE international conference on systems, man, and cybernetics, IEEE, pp 1519–1524

  27. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    Article  MATH  Google Scholar 

  28. Kong D, Ding C, Huang H, Zhao H (2012) Multi-label relieff and f-statistic feature selections for image annotation. In: Computer vision and pattern recognition (CVPR), 2012 IEEE conference on, IEEE, pp 2352–2359

  29. Kong X, Philip SY (2012) gmlc: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31(2):281–305

    Article  Google Scholar 

  30. Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recognit Lett 34(3):349–357

    Article  Google Scholar 

  31. Lee J, Kim DW (2015) Fast multi-label feature selection based on information-theoretic feature ranking. Pattern Recognit 48(9):2761–2771

    Article  Google Scholar 

  32. Lee J, Lim H, Kim D (2012) Approximating mutual information for multi-label feature selection. Electron Lett 48(15):929–930

    Article  Google Scholar 

  33. Lin Y, Hu Q, Liu J, Duan J (2015) Multi-label feature selection based on max-dependency and min-redundancy. Neurocomputing 168:92–103

    Article  Google Scholar 

  34. McCallum A (1999) Multi-label text classification with a mixture model trained by em. In: AAAI99 Workshop on Text Learning, pp 1–7

  35. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In: Advances in neural information processing systems, pp 1813–1821

  36. Nie F, Wang X, Jordan MI, Huang H (2016) The constrained laplacian rank algorithm for graph-based clustering. In: Thirtieth AAAI Conference on Artificial Intelligence. Citeseer

  37. Nie F, Xiang S, Jia Y, Zhang C, Yan S (2008) Trace ratio criterion for feature selection. AAAI 2:671–676

    Google Scholar 

  38. Niyogi X (2004) Locality preserving projections. In: Neural information processing systems, vol. 16, MIT, pp 153–160

  39. Read J (2008) A pruned problem transformation method for multi-label classification. In: Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), pp 143–150

  40. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  41. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3):135–168

    Article  MATH  Google Scholar 

  42. Sharma A, Dehzangi A, Lyons J, Imoto S, Miyano S, Nakai K, Patil A (2014) Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function. Plos One 9:2, e89, 890

  43. Sharma A, Imoto S, Miyano S, Sharma V (2011) Null space based feature selection method for gene expression data. Int J Mach Learn Cybern 3(4):269–276

    Article  Google Scholar 

  44. Sharma A, Koh CH, Imoto S, Miyano S (2011) Strategy of finding optimal number of features on gene expression data. Electron Lett 47(8):480–482

    Article  Google Scholar 

  45. Sharma A, Paliwal KK, Imoto S, Miyano S (2014) A feature selection method using improved regularized linear discriminant analysis. Mach Vis Appl 25(25):775–786

    Article  Google Scholar 

  46. Slavkov I, Karcheska J, Kocev D, Kalajdziski S, Dzeroski S (2013) Extending relieff for hierarchical multi-label classification. Mach Learn 4:1–13

    Google Scholar 

  47. Song L, Smola A, Gretton A, Bedo J, Borgwardt K (2012) Feature selection via dependence maximization. J Mach Learn Res 13(1):1393–1434

    MathSciNet  MATH  Google Scholar 

  48. Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–151

    Article  Google Scholar 

  49. Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) Relieff for multi-label feature selection. In: Intelligent Systems (BRACIS), 2013 Brazilian Conference on, IEEE, pp 6–11

  50. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  51. Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2008) Multi-label classification of music into emotions. ISMIR 8:325–330

    Google Scholar 

  52. Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng 23(7):1079–1089

    Article  Google Scholar 

  53. Wang D, Nie F, Huang H (2015) Feature selection via global redundancy minimization. IEEE Trans Knowl Data Eng 27(10):2743–2755

    Article  Google Scholar 

  54. Wang FY (2016) Control 5.0: newton to merton in popper’s cyber-social-physical spaces. IEEE/CAA J Autom Sin 3(3):233–234

    Article  MathSciNet  Google Scholar 

  55. Wang FY, Wang X, Li L, Li L (2016) Steps toward parallel intelligence. IEEE/CAA J Autom Sin 3(4):345–348

    Article  MathSciNet  Google Scholar 

  56. Wang FY, Zhang JJ, Zheng X, Wang X, Yuan Y, Dai X, Zhang J, Yang L (2016) Where does alphago go: from church-turing thesis to alphago thesis and beyond. IEEE/CAA J Autom Sin 3(2):113–120

    Article  Google Scholar 

  57. Wang S, Pedrycz W, Zhu Q, Zhu W (2015) Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recognit 48(1):10–19

    Article  MATH  Google Scholar 

  58. Wang S, Wang J, Wang Z, Ji Q (2014) Enhancing multi-label classification by modeling dependencies among labels. Pattern Recognit 47(10):3405–3413

    Article  Google Scholar 

  59. Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative least squares regression for multiclass classification and feature selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754

    Article  Google Scholar 

  60. Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, pp 258–265

  61. Yu Y, Pedrycz W, Miao D (2014) Multi-label classification by exploiting label correlations. Expert Syst Appl 41(6):2989–3004

    Article  Google Scholar 

  62. Zhang M, Ding CH, Zhang Y Nie F (2014) Feature selection at the discrete limit. In: AAAI, pp 1355–1361

  63. Zhang ML, Peña JM, Robles V (2009) Feature selection for multi-label naive bayes classification. Inf Sci 179(19):3218–3229

    Article  MATH  Google Scholar 

  64. Zhang ML, Wu L (2015) Lift: Multi-label learning with label-specific features. Pattern Anal Mach Intell IEEE Trans 37(1):107–120

    Article  MathSciNet  Google Scholar 

  65. Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351

    Article  Google Scholar 

  66. Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7):2038–2048

    Article  MATH  Google Scholar 

  67. Zhang ML, Zhou ZH (2014) A review on multi-label learning algorithms. Knowl Data Eng IEEE Trans 26(8):1819–1837

    Article  Google Scholar 

  68. Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Trans Knowl Discov Data (TKDD) 4(3):1–21

    Article  Google Scholar 

  69. Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recognit 48(2):438–446

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 61379049 and 61379089.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, Z., Zhu, W. Multi-label feature selection via feature manifold learning and sparsity regularization. Int. J. Mach. Learn. & Cyber. 9, 1321–1334 (2018). https://doi.org/10.1007/s13042-017-0647-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0647-y

Keywords

Navigation