A method of dimensionality reduction by selection of components in principal component analysis for text classification

Zhang,  Yangwu; Li,  Guohe; Zong,  Heng

National library of Serbia

All issues

Filomat 2018 Volume 32, Issue 5, Pages: 1499-1506
https://doi.org/10.2298/FIL1805499Z
Full text ( 255 KB)
Cited by

A method of dimensionality reduction by selection of components in principal component analysis for text classification

Zhang Yangwu (College of Geophysics and Information Engineering, China University of Petroleum-Beijing, China + Department of Science and Technology Teaching, China University of Political Science and Law, China)
Li Guohe (College of Geophysics and Information Engineering, China University of Petroleum-Beijing, China + Beijing Key Lab of Data Mining for Petroleum Data, China University of Petroleum-Beijing, China)
Zong Heng (Department of Science and Technology Teaching, China University of Political Science and Law, China)

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.

Keywords: Principal components analysis, Dimensionality reduction, Text classification

doiSerbia