Skip to main content

A Novel Feature Selection Technique for Text Classification

  • Conference paper
  • First Online:
Emerging Technologies in Data Mining and Information Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 813))

Abstract

In this paper, a new feature selection technique called Term-Class Weight-Inverse-Class Frequency is proposed for the purpose of text classification. The technique is based on selecting the most discriminating features with respect to each class. Nevertheless, the number of selected features by our technique is equal to the multiples of the number of classes present in the collection. The vectors of the document have been built based on varying number of selected features. The effectiveness of the technique has been demonstrated by conducting a series of experiments on two benchmarking text corpora, viz., Reuters-21578 and TDT2 using KNN classifier. In addition, a comparative analysis of the results of the proposed technique with that of the state-of-the-art techniques on the datasets indicates that the proposed technique outperforms several techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, D., Zhang, H., Wu, W., Lin, M.: Inverse-category-frequency based supervised term weighting schemes for text categorization. J. Inf. Sci. Eng. 29, 209–225 (2013)

    Google Scholar 

  2. Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)

    Article  Google Scholar 

  3. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), pp. 412–420 (1997)

    Google Scholar 

  4. Guru, D.S., Suhil, M.: A novel Term_Class relevance measure for text categorization. Procedia Comput. Sci. 45, 13–22 (2015)

    Article  Google Scholar 

  5. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On using partial supervision for text categorization. IEEE Trans. Knowl. Data Eng. 16(2), 245–255 (2004)

    Article  Google Scholar 

  6. Singh, S.R., Murthy, H.A., Gonsalves, T.A.: Feature selection for text classification based on Gini coefficient of inequality. Fsdm 10, 76–85 (2010)

    Google Scholar 

  7. Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)

    Article  Google Scholar 

  8. Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximization. Expert Syst. Appl. Int. J. 42(22), 8520–8532 (2015)

    Article  Google Scholar 

  9. Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)

    Article  Google Scholar 

  10. Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)

    Article  Google Scholar 

  11. Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)

    Article  Google Scholar 

  12. Corrêa, G.N., Marcacini, R.M., Hruschka, E.R., Rezende, S.O.: Interactive textual feature selection for consensus clustering. Pattern Recogn. Lett. 52, 25–31 (2015)

    Article  Google Scholar 

  13. Zong, W., Wu, F., Chu, L.K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int. J. Prod. Econ. 165, 215–222 (2015)

    Article  Google Scholar 

  14. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(1), 1289–1305 (2003)

    MATH  Google Scholar 

  15. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3), 103–134 (2000)

    Article  Google Scholar 

  16. Park, H., Kwon, S., Kwon, H.-C.: Complete gini-index text (git) feature-selection algorithm for text classification. In: 2010 2nd International Conference on Software Engineering and Data Mining (SEDM), pp. 366–371 (2010)

    Google Scholar 

  17. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Google Scholar 

  18. Lam, W., Han, Y.: Automatic textual document categorization based on generalized instance sets and a metamodel. Proc. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 628–633 (2003)

    Article  Google Scholar 

Download references

Acknowledgements

The second author of this paper acknowledges the financial support rendered by the Indian Council for Cultural Relations (ICCR) and the Egyptian Cultural Affairs and Mission Sector.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guru, D.S., Ali, M., Suhil, M. (2019). A Novel Feature Selection Technique for Text Classification. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_63

Download citation

Publish with us

Policies and ethics