Skip to main content

A Comparative Study of Recent Feature Selection Techniques Used in Text Classification

  • Conference paper
  • First Online:
IOT with Smart Systems

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 251))

  • 1279 Accesses

Abstract

As we all know, handling large amounts of data is a problem these days. Despite having so many resources to store, train and process the data, still it is required to reduce these datasets in order to reduce computational complexity, save time, cost and retrieve valuable information from large text documents. The presentation of a machine learning algorithm relies upon the dataset utilized. When the dataset is large, the learning algorithm tries to accommodate all the features which increases the dimensionality of the data. This high-dimensional data is not useful as it might contain irrelevant and redundant features. It becomes important to remove these features. Thus, pre-processing of data is required to compress and analyse the dataset for the purpose of text classification (TC). This can be achieved by using feature selection (FS) techniques. The fundamental goal of FS techniques is to acknowledge pertinent features and to get rid of repetitive attributes w.r.t. high-dimensional data (Shroff and Maheta in 2015 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2015 [1]). Nowadays, major FS methods use optimization algorithms (Brownlee in https://machinelearningmastery.com/. 23 Dec 20 [Online]. Available: https://machinelearningmastery.com/tour-of-optimization-algorithms/. Accessed 3 Feb 2021 [2]) to get an ideal component subset from high-dimensional information from feature space which decreases computational expense and builds classifier precision. Some of the recent feature selection techniques have been discussed in this paper which can prove to be useful for text classification (TC).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shroff, K.P., Maheta, H.H.: A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy. In: 2015 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India (2015)

    Google Scholar 

  2. Brownlee, J.: https://machinelearningmastery.com/, 23 Dec 2020. [Online]. Available: https://machinelearningmastery.com/tour-of-optimization-algorithms/. Accessed 3 Feb 2021.

  3. Liu, Y., Ju, S., Wang, J., Su, C.: A new feature selection method for text classification based on independent feature space search. Mathematical Problems in Engineering (2020)

    Google Scholar 

  4. Wang, H., Hong, M.: Supervised Hebb rule-based feature selection for text classification. Inf. Process. Manage. 56(1), 167–191 (2019)

    Article  Google Scholar 

  5. Thirumoorthy, K., Muneeswaran, K.: Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sadhana 45(1), 1–13 (2020)

    Article  Google Scholar 

  6. Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: A comprehensive review. arXiv preprint arXiv:2004.03705 (2020)

  7. Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Networks 10(5), 1048–1054 (1999)

    Article  Google Scholar 

  8. Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)

    Article  Google Scholar 

  9. Günal, S., Ergin, S., Gülmezoğlu, M.B., Gerek, Ö.N.: On feature extraction for spam e-mail detection. In: International Workshop on Multimedia Content Representation, Classification and Security, Springer, Berlin, Heidelberg, pp. 635–642, September 2006

    Google Scholar 

  10. Yu, B., Zhu, D.H.: Combining neural networks and semantic feature space for email classification. Knowl.-Based Syst. 22(5), 376–381 (2009)

    Article  Google Scholar 

  11. Anagnostopoulos, I., Anagnostopoulos, C., Loumos, V., Kayafas, E.: Classifying Web pages employing a probabilistic neural network. IEEE Proc.-Softw. 151(3), 139–150 (2004)

    Article  Google Scholar 

  12. Chen, R.C., Hsieh, C.H.: Web page classification based on a support vector machine using a weighted vote schema. Expert Syst. Appl. 31(2), 427–435 (2006)

    Article  Google Scholar 

  13. Markov, I.: Automatic Native Language Identification (2018). https://doi.org/10.13140/RG.2.2.15566.51520

  14. Belazzoug, M., Touahria, M., Nouioua, F., Brahimi, M.: An improved sine cosine algorithm to select features for text categorization. J. King Saud Univ.-Comput. Inf. Sci. 32(4), 454–464 (2020)

    Google Scholar 

  15. More, V.D.: https://moredvikas.wordpress.com/, 9 Oct 2018. [Online]. Available: https://moredvikas.wordpress.com/2018/10/09/machine-learning-introduction-to-feature-selection-variable-selection-or-attribute-selection-or-dimensionality-reduction/. Accessed 3 Feb 2021

  16. Raschka, S.: https://sebastianraschka.com, 21 Dec 2020. [Online]. Available: https://sebastianraschka.com/faq/docs/feature_sele_categories.html. Accessed 3 Feb 2021

  17. Lim, H., Kim, D.W.: Generalized term similarity for feature selection in text classification using quadratic programming. Entropy 22(4), 395 (2020)

    Article  MathSciNet  Google Scholar 

  18. Goudjil, M., Koudil, M., Bedda, M., Ghoggali, N.: A novel active learning method using SVM for text classification. Int. J. Autom. Comput. 15(3), 290–298 (2018)

    Article  Google Scholar 

  19. Ranjan, N.M., Prasad, R.S.: LFNN: Lion fuzzy neural network-based evolutionary model for text classification using context and sense-based features. Appl. Soft Comput. 71, 994–1008 (2018)

    Article  Google Scholar 

  20. Borhani, M.: Multi-label Log-Loss function using L-BFGS for document categorization. Eng. Appl. Artif. Intell. 91, 103623 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, G., Priya, R. (2022). A Comparative Study of Recent Feature Selection Techniques Used in Text Classification. In: Senjyu, T., Mahalle, P., Perumal, T., Joshi, A. (eds) IOT with Smart Systems. Smart Innovation, Systems and Technologies, vol 251. Springer, Singapore. https://doi.org/10.1007/978-981-16-3945-6_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-3945-6_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-3944-9

  • Online ISBN: 978-981-16-3945-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics