ABSTRACT
The continuous emergence of digital texts makes text classification one of the key tasks. Support Vector Machine (SVM) has become a widely used classification tool due to its strong generalization ability and dependence on a few parameters. However, SVM was not originally designed to determine relevant features. This research focuses on applying SVM and Universum learning to text classification, exploring their effects on processing unlabeled data, enhancing generalization capabilities, and solving the problem of feature selection. By introducing the Universum set, we construct an extended data set and incorporate the concept of Universum embedding. We propose a Feature Selection Universum Support Vector Machine (FSUSVM). This model introduces constraints on the prediction boundaries of the Universum set on this extended dataset to ensure its robust performance in terms of feature selection. Specifically, by incorporating constraints on the prediction boundaries of the Universum set into the existing SVM model, we aim to optimize the model's accuracy and feature selection performance in text classification tasks. F Finally, we substantiated the effectiveness of FSUSVM through numerical experiments conducted on text data. Additionally, we evaluated FSUSVM on image data, yielding positive results.
- Deepak Agnihotri, Kesari Verma, Priyanka Tripathi, and Bikesh Kumar Singh. 2019. Soft voting technique to improve the performance of global filter based feature selection in text corpus. Appl. Intell. (April 2019), 1597-1619. https://doi.org/10.1007/s10489-018-1349-1.Google ScholarDigital Library
- Tian Xia and Xuemin Chen. 2021. A weighted feature enhanced Hidden Markov Model for spam SMS filtering. Neurocomputing (July 2021), 48-58. https://doi.org/10.1016/j.neucom.2021.02.075.Google ScholarCross Ref
- Aytuğ Onan. 2018. An ensemble scheme based on language function analysis and feature engineering for text genre classification. J. Inf. Sci. (December 2018), 28-47. https://doi.org/10.1177/0165551516677911.Google ScholarDigital Library
- Bashar Ahmed. 2020. Wrapper feature selection approach based on binary firefly algorithm for spam E-mail filtering. Journal of Soft Computing and Data Mining (2020), 44-52.Google Scholar
- Avinash Madasu and Sivasankar Elango. 2020. Efficient feature selection techniques for sentiment analysis. Multimed. Tools Appl. (2020), 6313-6335. https://doi.org/10.1007/s11042-019-08409-z.Google ScholarCross Ref
- Bekir Parlak and Alper Kürşat Uysal. 2020. On classification of abstracts obtained from medical journals. J. Inf. Sci. (2020), 648-663. https://doi.org/10.1177/0165551519860982.Google ScholarDigital Library
- Isabelle Guyon and André Elisseeff. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. (2003), 1157-1182.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. (1995), 273-297. https://doi.org/10.1007/BF00994018.Google ScholarCross Ref
- Edoardo Amaldi and Viggo Kann. 1998. On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theor. Comput. Sci. (December 1998), 237-260. https://doi.org/10.1016/S0304-3975(97)00115-1.Google ScholarDigital Library
- Hui Zou An improved 1-norm svm for simultaneous classification and variable selection. PMLR, 2007.Google Scholar
- Jason Weston, Ronan Collobert, Fabian Sinz, Léon Bottou, and Vladimir Vapnik. Inference with the universum., 2006. https://doi.org/10.1145/1143844.1143971.Google ScholarDigital Library
- Bharat Richhariya, Muhammad Tanveer, Ashraf Haroon Rashid, and Alzheimer S. Disease Neuroimaging Initiative. 2020. Diagnosis of Alzheimer's disease using universum support vector machine based recursive feature elimination (USVM-RFE). Biomed. Signal Process. Control (2020). https://doi.org/10.1016/j.bspc.2020.101903.Google ScholarCross Ref
- V. Murugesan and P. Balamurugan. 2023. Breast Cancer Classification by Gene Expression Analysis using Hybrid Feature Selection and Hyper-heuristic Adaptive Universum Support Vector Machine. Int. J. Electr. Comput. Eng. Syst. (2023), 241-249.Google Scholar
- Julia Neumann, Christoph Schnörr, and Gabriele Steidl. 2005. Combined SVM-based feature selection and classification. Mach. Learn. (2005), 129-150. https://doi.org/10.1007/s10994-005-1505-9.Google ScholarDigital Library
- Ji Zhu, Saharon Rosset, Robert Tibshirani, and Trevor Hastie. 2003. 1-norm support vector machines. Advances in neural information processing systems (2003).Google Scholar
- C. Van Rijsbergen Information retrieval: theory and practice., 1979.Google Scholar
- Kent A. Spackman. Signal detection theory: Valuable tools for evaluating inductive learning. Elsevier, 1989. https://doi.org/10.1016/B978-1-55860-036-2.50047-3.Google ScholarCross Ref
Index Terms
- Feature selection SVM through Universum and its applications on text classification Feature selection SVM through Universum
Recommendations
Self-Universum support vector machine
In this paper, for an improved twin support vector machine (TWSVM), we give it a theoretical explanation based on the concept of Universum and then name it Self-Universum support vector machine (SUSVM). For the binary classification problem, SUSVM takes ...
Feature selection for the SVM: An application to hypertension diagnosis
A support vector machine (SVM) is a novel classifier based on the statistical learning theory. To increase the performance of classification, the approach of SVM with kernel is usually used in classification tasks. In this study, we first attempted to ...
Cost-Sensitive Universum-SVM
ICMLA '12: Proceedings of the 2012 11th International Conference on Machine Learning and Applications - Volume 01Many applications of machine learning involve analysis of sparse high-dimensional data, where the number of input features is larger than the number of data samples. Standard classification methods may not be sufficient for such data, and this provides ...
Comments