Filomat 2018 Volume 32, Issue 5, Pages: 1499-1506
https://doi.org/10.2298/FIL1805499Z
Full text ( 255 KB)
Cited by
A method of dimensionality reduction by selection of components in principal component analysis for text classification
Zhang Yangwu (College of Geophysics and Information Engineering, China University of Petroleum-Beijing, China + Department of Science and Technology Teaching, China University of Political Science and Law, China)
Li Guohe (College of Geophysics and Information Engineering, China University of Petroleum-Beijing, China + Beijing Key Lab of Data Mining for Petroleum Data, China University of Petroleum-Beijing, China)
Zong Heng (Department of Science and Technology Teaching, China University of Political Science and Law, China)
Dimensionality reduction, including feature extraction and selection, is one
of the key points for text classification. In this paper, we propose a mixed
method of dimensionality reduction constructed by principal components
analysis and the selection of components. Principal components analysis is a
method of feature extraction. Not all of the components in principal
component analysis contribute to classification, because PCA objective is
not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this
context, we present a function of components selection, which returns the
useful components for classification by the indicators of the performances
on the different subsets of the components. Compared to traditional methods
of feature selection, SVM classifiers trained on selected components show
improved classification performance and a reduction in computational
overhead.
Keywords: Principal components analysis, Dimensionality reduction, Text classification