Abstract
In this paper, a new feature selection technique called Term-Class Weight-Inverse-Class Frequency is proposed for the purpose of text classification. The technique is based on selecting the most discriminating features with respect to each class. Nevertheless, the number of selected features by our technique is equal to the multiples of the number of classes present in the collection. The vectors of the document have been built based on varying number of selected features. The effectiveness of the technique has been demonstrated by conducting a series of experiments on two benchmarking text corpora, viz., Reuters-21578 and TDT2 using KNN classifier. In addition, a comparative analysis of the results of the proposed technique with that of the state-of-the-art techniques on the datasets indicates that the proposed technique outperforms several techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang, D., Zhang, H., Wu, W., Lin, M.: Inverse-category-frequency based supervised term weighting schemes for text categorization. J. Inf. Sci. Eng. 29, 209–225 (2013)
Azam, N., Yao, J.: Comparison of term frequency and document frequency based feature selection metrics in text categorization. Expert Syst. Appl. 39, 4760–4768 (2012)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97), pp. 412–420 (1997)
Guru, D.S., Suhil, M.: A novel Term_Class relevance measure for text categorization. Procedia Comput. Sci. 45, 13–22 (2015)
Aggarwal, C.C., Gates, S.C., Yu, P.S.: On using partial supervision for text categorization. IEEE Trans. Knowl. Data Eng. 16(2), 245–255 (2004)
Singh, S.R., Murthy, H.A., Gonsalves, T.A.: Feature selection for text classification based on Gini coefficient of inequality. Fsdm 10, 76–85 (2010)
Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)
Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual information maximization. Expert Syst. Appl. Int. J. 42(22), 8520–8532 (2015)
Tasci, S., Gungor, T.: Comparison of text feature selection policies and using an adaptive framework. Expert Syst. Appl. 40, 4871–4886 (2013)
Pinheiro, R.H.W., Cavalcanti, G.D.C., Ren, T.I.: Data-driven global-ranking local feature selection methods for text categorization. Expert Syst. Appl. 42, 1941–1949 (2015)
Wang, D., Zhang, H., Li, R., Lv, W., Wang, D.: t-Test feature selection approach based on term frequency for text categorization. Pattern Recogn. Lett. 45, 1–10 (2014)
Corrêa, G.N., Marcacini, R.M., Hruschka, E.R., Rezende, S.O.: Interactive textual feature selection for consensus clustering. Pattern Recogn. Lett. 52, 25–31 (2015)
Zong, W., Wu, F., Chu, L.K., Sculli, D.: A discriminative and semantic feature selection method for text categorization. Int. J. Prod. Econ. 165, 215–222 (2015)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3(1), 1289–1305 (2003)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3), 103–134 (2000)
Park, H., Kwon, S., Kwon, H.-C.: Complete gini-index text (git) feature-selection algorithm for text classification. In: 2010 2nd International Conference on Software Engineering and Data Mining (SEDM), pp. 366–371 (2010)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Lam, W., Han, Y.: Automatic textual document categorization based on generalized instance sets and a metamodel. Proc. IEEE Trans. Pattern Anal. Mach. Intell. 25(5), 628–633 (2003)
Acknowledgements
The second author of this paper acknowledges the financial support rendered by the Indian Council for Cultural Relations (ICCR) and the Egyptian Cultural Affairs and Mission Sector.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guru, D.S., Ali, M., Suhil, M. (2019). A Novel Feature Selection Technique for Text Classification. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds) Emerging Technologies in Data Mining and Information Security. Advances in Intelligent Systems and Computing, vol 813. Springer, Singapore. https://doi.org/10.1007/978-981-13-1498-8_63
Download citation
DOI: https://doi.org/10.1007/978-981-13-1498-8_63
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1497-1
Online ISBN: 978-981-13-1498-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)