skip to main content
10.1145/3654446.3654519acmotherconferencesArticle/Chapter ViewAbstractPublication PagesspcncConference Proceedingsconference-collections
research-article

Feature selection SVM through Universum and its applications on text classification Feature selection SVM through Universum

Published:03 May 2024Publication History

ABSTRACT

The continuous emergence of digital texts makes text classification one of the key tasks. Support Vector Machine (SVM) has become a widely used classification tool due to its strong generalization ability and dependence on a few parameters. However, SVM was not originally designed to determine relevant features. This research focuses on applying SVM and Universum learning to text classification, exploring their effects on processing unlabeled data, enhancing generalization capabilities, and solving the problem of feature selection. By introducing the Universum set, we construct an extended data set and incorporate the concept of Universum embedding. We propose a Feature Selection Universum Support Vector Machine (FSUSVM). This model introduces constraints on the prediction boundaries of the Universum set on this extended dataset to ensure its robust performance in terms of feature selection. Specifically, by incorporating constraints on the prediction boundaries of the Universum set into the existing SVM model, we aim to optimize the model's accuracy and feature selection performance in text classification tasks. F Finally, we substantiated the effectiveness of FSUSVM through numerical experiments conducted on text data. Additionally, we evaluated FSUSVM on image data, yielding positive results.

References

  1. Deepak Agnihotri, Kesari Verma, Priyanka Tripathi, and Bikesh Kumar Singh. 2019. Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl. Intell. (April 2019), 1597-1619. https://doi.org/10.1007/s10489-018-1349-1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tian Xia and Xuemin Chen. 2021. A weighted feature enhanced Hidden Markov Model for spam SMS filtering. Neurocomputing (July 2021), 48-58. https://doi.org/10.1016/j.neucom.2021.02.075.Google ScholarGoogle ScholarCross RefCross Ref
  3. Aytuğ Onan. 2018. An ensemble scheme based on language function analysis and feature engineering for text genre classification. J. Inf. Sci. (December 2018), 28-47. https://doi.org/10.1177/0165551516677911.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bashar Ahmed. 2020. Wrapper feature selection approach based on binary firefly algorithm for spam E-mail filtering. Journal of Soft Computing and Data Mining (2020), 44-52.Google ScholarGoogle Scholar
  5. Avinash Madasu and Sivasankar Elango. 2020. Efficient feature selection techniques for sentiment analysis. Multimed. Tools Appl. (2020), 6313-6335. https://doi.org/10.1007/s11042-019-08409-z.Google ScholarGoogle ScholarCross RefCross Ref
  6. Bekir Parlak and Alper Kürşat Uysal. 2020. On classification of abstracts obtained from medical journals. J. Inf. Sci. (2020), 648-663. https://doi.org/10.1177/0165551519860982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. (2003), 1157-1182.Google ScholarGoogle Scholar
  8. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. (1995), 273-297. https://doi.org/10.1007/BF00994018.Google ScholarGoogle ScholarCross RefCross Ref
  9. Edoardo Amaldi and Viggo Kann. 1998. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. (December 1998), 237-260. https://doi.org/10.1016/S0304-3975(97)00115-1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hui Zou An improved 1-norm svm for simultaneous classification and variable selection. PMLR, 2007.Google ScholarGoogle Scholar
  11. Jason Weston, Ronan Collobert, Fabian Sinz, Léon Bottou, and Vladimir Vapnik. Inference with the universum., 2006. https://doi.org/10.1145/1143844.1143971.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bharat Richhariya, Muhammad Tanveer, Ashraf Haroon Rashid, and Alzheimer S. Disease Neuroimaging Initiative. 2020. Diagnosis of Alzheimer's disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomed. Signal Process. Control (2020). https://doi.org/10.1016/j.bspc.2020.101903.Google ScholarGoogle ScholarCross RefCross Ref
  13. V. Murugesan and P. Balamurugan. 2023. Breast Cancer Classification by Gene Expression Analysis using Hybrid Feature Selection and Hyper-heuristic Adaptive Universum Support Vector Machine. Int. J. Electr. Comput. Eng. Syst. (2023), 241-249.Google ScholarGoogle Scholar
  14. Julia Neumann, Christoph Schnörr, and Gabriele Steidl. 2005. Combined SVM-based feature selection and classification. Mach. Learn. (2005), 129-150. https://doi.org/10.1007/s10994-005-1505-9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor Hastie. 2003. 1-norm support vector machines. Advances in neural information processing systems (2003).Google ScholarGoogle Scholar
  16. C. Van Rijsbergen Information retrieval: theory and practice., 1979.Google ScholarGoogle Scholar
  17. Kent A. Spackman. Signal detection theory: Valuable tools for evaluating inductive learning. Elsevier, 1989. https://doi.org/10.1016/B978-1-55860-036-2.50047-3.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Feature selection SVM through Universum and its applications on text classification Feature selection SVM through Universum

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SPCNC '23: Proceedings of the 2nd International Conference on Signal Processing, Computer Networks and Communications
      December 2023
      435 pages
      ISBN:9798400716430
      DOI:10.1145/3654446

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 May 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)4

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format