Unsupervised feature selection for visual classification via feature-representation property
Introduction
High-dimensional data could lead to expensive computation cost as well as result in the issue of ‘curse of dimensionality’ so that affecting the performance of learning from the data [1], [2], [3]. In the past decades, dimensionality reduction (including feature selection and subspace learning) via reducing the dimensions has been becoming an efficient solution to high-dimensional data [4], [5].
Feature selection directly removes a subset of features to output interpretable results, so that making it practical in real applications [6]. Previous feature selection methods can be classified into three categories, e.g., supervised feature selection, semi-supervised feature selection and unsupervised feature selection [1]. Supervised feature selection methods usually select features according to the labels of training data. For example, Gu et al. proposed to seek a subset of features by maximizing the lower bound of traditional Fisher score [4], while Zhang et al. proposed to use spectral-spatial feature combination for hyper spectral image analysis [7]. Since supervised feature selection methods enclose labels to conduct feature selection, they are able to select discriminative features.
Semi-supervised feature selection mainly utilizes a small number of labeled samples and a large number of unlabeled samples for the training stage [8]. For example, Lv et al. employed a manifold regularization term to conduct the discriminative semi-supervised feature selection [9]. Wang et al. proposed to first learn the class labels of unlabeled samples, and then to use the learned class labels to define the margins for feature weight learning [10].
However, due to all kinds of reasons such as unknown labels and time-consuming to obtain labels, it is difficult to obtain enough labels for learning from data, unsupervised feature selection thus is practical in alleviating irrelevant features [11], [7]. Compared to either supervised feature selection method or semi-supervised feature selection method, unsupervised feature selection lacks the label information, so it is very challenging to conduct unsupervised feature selection [12]. Recently, unsupervised feature selection methods mainly utilized evaluation indicators to remove redundant features. For example, Liu et al. combined the Laplacian score with the distance-based entropy measure to conduct unsupervised feature selection [13], while Nie et al. proposed to use a corresponding score to conduct feature selection [14].
In this paper, we propose a new unsupervised feature selection method with the utilization of the property of feature self-representation, in which features can represent themselves to find representative feature ingredients. Motivated by the successful application of the self-similarity in subspace clustering [15], [7], [16], this paper first proposes a feature-level self-representation for unsupervised learning, and then adds an -norm regularizer in the objective function to yield sparse feature selection. In our method, the proposed loss function is proposed to represent each feature by other features with the rationale of that the important features are usually used to represent other features and the unimportant features will be disused for all features. The group sparsity (i.e., the -norm regularization term) penalizes all coefficients in the same row of the regression matrix together for joint selection or un-selection in predicting the response variables. Besides, this paper also devises an novel and efficient optimization method to solve the resulting objective function as well as proves its convergence. It should be noted that the property of self-representation is not a new concept, which has been popularly used in machine learning and computer vision such as in the application of sparse coding [17] and low-rank [18]. However, previous literature [19], [20] focused on the sample-level self-similarity where each sample is represented by all samples. In this paper, we propose to represent each feature by its relevant features. That is, we conduct feature selection via devising a feature-level self-representation loss function. The contribution of our method is described as follows:
- •
Unlike previous unsupervised feature selection methods mainly utilize a number of evaluation indicators to remove the redundant features, we propose a novel feature-level self-representation to remove the irrelevant features. The proposed feature-level self-representation is different from the sample-level self-similarity, which represent each sample by all samples.
- •
We propose a novel iterative optimization algorithm to solve the resulting objective function, which is also testified to efficiently converge to the optimum solution.
The left parts of this paper are organized as follows: Section 2 introduces related work on feature selection methods and Section 3 gives the details of our proposed feature selection model. In 4 Experiments, 5 Conclusion, respectively, we show our experimental results and conclude our paper.
Section snippets
Related work
Dimensionality reduction methods are usually divided into two groups: feature selection methods [21] and subspace learning methods [22], [6]. Feature selection methods are widely used for reducing the dimensions of high-dimensional data to output interpretable results [23], [24]. That is, feature selection methods select a subset of features in accordance with criteria, such as distinguishing features with good characteristics and correlating to the predefined goal. The state-of-the-art feature
Approach
In this section, we first define the notations used in this paper, and then describe the details of the proposed method, followed by the proposed optimization method to the resulting objective function.
Experiments
In this section, we compared our proposed Self-Representation Feature Selection (SR_FS for short) method with the comparison methods in terms of classification performance. Specifically, we first used each dimensionality reduction method to map original high-dimensional data into low-dimensional space, and then used the resulting reduced data to conduct classification with Support Vector Machine (SVM) via the LIBSVM toolbox.1 Then the
Conclusion
In this paper, we have proposed a new feature selection method based on the property of feature-representation. Experimental results showed the advantages of the proposed method over the comparison methods on both binary classification and multi-class classification.
In real applications, there usually exist missing data in high-dimensional data [5], [17]. In our future work, we will extend the proposed method to conduct feature selection on the high-dimensional data with incomplete data.
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China (Grant No: 61263035, 61573270, 61450001 and 61363009), the China 973 Program (Grant No: 2013CB329404), the Guangxi Natural Science Foundation (Grant No: 2015GXNSFCB139011), the Guangxi Higher Institutions’ Program of Introducing 100 High-Level Over-seas Talents, the Guangxi Collaborative Innovation Center of Multi-Source Information Integration and Intelligent Processing, Innovation Project of Guangxi Graduate
Wei He is with the Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, M. S. China. Email: [email protected].
References (47)
- et al.
Dimensionality reduction by mixed kernel canonical correlation analysis
Pattern Recognit.
(2012) - et al.
Towards information-theoretic k-means clustering for image indexing
Signal Process.
(2013) - et al.
Self-taught dimensionality reduction on the high-dimensional small-sized data
Pattern Recognit.
(2013) - et al.
Scaling up cosine interesting pattern discovery: a depth-first method
Inf. Sci.
(2014) - et al.
Feature subset selection in large dimensionality domains
Pattern Recognit.
(2010) - et al.
An unsupervised feature selection algorithm based on ant colony optimization
Eng. Appl. Artif. Intell.
(2014) - et al.
Mr2pso: a maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification
Inf. Sci.
(2011) - et al.
Evolutionary elm wrapper feature selection for alzheimer's disease cad on anatomical brain mri
Neurocomputing
(2014) - et al.
Content-based image quality metric using similarity measure of moment vectors
Pattern Recognit.
(2012) - et al.
Scale invariant texture representation based on frequency decomposition and gradient orientation
Pattern Recognit. Lett.
(2015)
Robust pca via principal component pursuit: a review for a comparative evaluation in video surveillance
Comput. Vis. Image Underst.
A novel local preserving projection scheme for use with face recognition
Expert Syst. Appl.
Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion
Neuroimage
Continuous rotation invariant local descriptors for texton dictionary-based texture classification
Comput. Vis. Image Underst.
Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system
World Wide Web-Internet Web Inf. Syst.
Semi-parametric optimization for missing data imputation
Appl. Intell.
Automatic spatial-spectral feature selection for hyperspectral image via discriminative sparse multimodal learning
IEEE Trans. Geosci. Remote Sens.
Local energy pattern for texture classification using self-adaptive quantization thresholds
IEEE Trans. Image Process. A Publ. IEEE Signal Process. Soc.
Cited by (14)
Latent energy preserving embedding for unsupervised feature selection
2022, Digital Signal Processing: A Review JournalCitation Excerpt :In [44], Deng et al. present a sparse sample self-representation approach for subspace clustering. In [45], He et al. utilize the regression and self-representation for feature selection. Although the above methods can achieve satisfactory feature selection performance, they still have some shortcomings.
Unsupervised feature selection with robust data reconstruction (UFS-RDR) and outlier detection
2022, Expert Systems with ApplicationsSRAGL-AWCL: A two-step multi-view clustering via sparse representation and adaptive weighted cooperative learning
2021, Pattern RecognitionCitation Excerpt :This paper is the first one to combine NMF with manifold learning to preserve the local structural features of the input data. GSR_SFS: This is a method based on graph self-representation for sparse feature selection [34]. The difference between GSR_SFS and DSRMR is that a traditional fixed similarity matrix is used to solve the model.
An evaluation of feature selection methods for environmental data
2021, Ecological InformaticsMCFS: Min-cut-based feature-selection
2020, Knowledge-Based SystemsCitation Excerpt :Although many studies emphasize the elimination of redundant features, others warn about the possible damage that this deletion may cause due to the exclusion of potentially relevant features [4,5]. The applications of feature-selection are numerous, and include, among many others [6,7], those related with genomic analysis [8], text mining [9], spam detection [10], image retrieval [11], image classification [12,13], and clustering [14]. Two highly regarded feature-selection techniques [15] are CFS (Correlation-based feature-selection) [16], due to the quality of the subset of selected features, and FCBF (Fast Correlation-Based Filter Solution) [17] thanks to its ability to work with datasets with many features and the small size of the subset of selected features.
Robust unsupervised feature selection via dual self-representation and manifold regularization
2018, Knowledge-Based SystemsCitation Excerpt :Hou et al. proposed a general framework for unsupervised feature selection by joint embedding learning and sparse regression [25]. In recent years, many self-representation based methods have been explored and shown promising results [26–28,30,33,41]. The assumption behind these methods is that each feature can be well approximated by the linear combination of its relevant features and the representation coefficient matrix with sparsity constrain can be used as the feature weights.
Wei He is with the Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, M. S. China. Email: [email protected].
Xiaofeng Zhu is with the Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, M. S. China. Email: [email protected]. His research topics include feature selection and analysis, pattern recognition and data mining.
Debo Cheng is with the Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, M. S. China. Email:[email protected].
Rongyao Hu is with the Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, M. S. China. Email: [email protected].
Shichao Zhang is a Distinguished Professor and the director of Institute of School of Computer Science and Information Technology at the Guangxi Normal University, Guilin, China. He holds a Ph.D. degree in Computer Science from Deakin University, Australia. His research interests include data analysis and smart pattern discovery. He has published over 50 international journal papers and over 60 international conference papers. He has won over 10 nation-class grants, such as the China NSF, China 863 Program, China 973 Program, and Australia Large ARC. He is an Editor-in-Chief for International Journal of Information Quality and Computing, and is served as an associate editor for IEEE Transactions on Knowledge and Data Engineering, Knowledge and Information Systems, and IEEE Intelligent Informatics Bulletin. Email: [email protected].
- 1
Wei He and Debo Cheng have equally contributed to this work.