A novel feature selection method considering feature interaction
Introduction
Feature selection is a preprocessing step in pattern recognition and machine learning. It has drawn attention of many researchers from various fields. The main objective of this task is to choose a subset of features which retain the optimum salient characteristics necessary from the original ones. Feature selection can bring lots of advantages, such as avoiding over-fitting, facilitating data visualization, reducing storage requirements, and reducing training time [1].
From the perspective of subset evaluation function, feature selection algorithms can be categorized into two main categories [2]: filter [3], [4], [5] and wrapper [6], [7], [8] models. In the filter model, the algorithms are independent to any classifier, i.e., the algorithms do not perform classification of the data in the process of feature subset evaluation. The best feature subset is selected by evaluating some predefined criteria without involving any learning algorithms. Wrapper models utilize the performance of a specific classifier to evaluate feature subsets by different search strategies. Though the wrapper methods may guarantee good results, it has the disadvantage of a considerable computational expense, and may produce subsets that are overly specific to the classifier used. Compared with the wrapper methods, the filter methods are computationally simple and fast. Thus, they can be easily applied to very high-dimensional datasets due to their high computation efficiency and generality. We believe that a feature selection method needs to be simple, robust and efficient. Therefore, we tend to adopt filter-based feature selection methods.
Traditionally, feature selection research has focused on removing irrelevant and redundant features as many as possible [9]. Irrelevant features provide no useful information in any context and redundant features are those which provide no more information than the currently selected features. Apart from the identification of irrelevant and redundant features, an important but usually being ignored issue is feature interaction [10]. Interacting features are those that appear to be irrelevant with the class individually, but when it combined with other features, it may highly correlate to the class. The XOR problem is a typical example. There are two features and a class which is zero if both features have the same value and one otherwise. Obviously, each feature does not carry any information about the class individually, however, when combined; the two features completely determine the class.
Although some recent work has pointed out the existence and effect of feature interaction, there is little work on explicit treatment of feature interaction. Some wrapper methods are able to deal with feature interaction to some extent, but these methods require a model testing each feature subset and the process is usually time-consuming, especially for some computational expensive models. Furthermore, wrapper methods strongly correlate to the used classification algorithm, and the performance of the model does not necessarily reflect the actual predictive ability of the selected feature subset. Therefore, it is a challenge to filter out the irrelevant and redundant features, and reserve only a small number of interactive features.
In this paper, we propose an Interaction Weight based Feature Selection algorithm (IWFS). We firstly define the concept of feature relevance, feature redundancy and feature interaction, and then propose an interaction weight factor to measure the redundancy and interaction of candidate features. Since redundant features produce negative influence and interaction features produce positive influence in predicting, the weight factors of redundant features should less than that of interactive features. Through the manipulation of interaction weight factor, we can redress the traditional relevance measure between a feature and the class and rank the candidate features with the adjusted relevance measure. To verify its performance, the proposed method is compared with five state-of-the-art feature selection methods (CFS, INTERACT, FCBF, MRMR and Relief-F) on six synthetic datasets and eight real world datasets. Experiment results show that our proposed method cannot only remove redundant features, but also detect interactive features.
The rest of this paper is organized as follows. In Section 2, some basic information-theoretic notions are reviewed. In Section 3, we describe the related work. In Section 4, we provide formal definitions of relevance, redundancy and interaction in the framework of information theory. In Section 5, we put forward the new feature subset selection algorithm. Experimental results and analysis are presented in Section 6. Finally, we make a brief conclusion and give the future research direction in Section 7.
Section snippets
Some basic information-theoretic notions
In this section, some basic information-theoretic notions for feature selection are reviewed.
Shannon’s information theory, first introduced in 1948 [11], provides a way to measure the information of random variables. The entropy is a measure of uncertainty of random variables [12]. Let be a discrete random variable and is the probability of , the entropy of is defined by
Here the base of log is 2 and the unit of entropy is the bit. Obviously,
Related work
Feature subset selection can be regarded as a search problem. It searches for one or more informative subsets of features under some predefined criteria. This process is defined in the following way: Let is the full set of input features, is a selected feature subset from the full feature set which is the subset of , where . We would like to select the most informative subset that represents the original data under some criteria . In
Definitions of relevance, redundancy and interaction
Feature selection algorithms often associate with information theory concepts like relevance, redundancy and interaction of features. In this section, the definitions of relevance, redundancy and interaction of features will be pointed out.
Most of previous work focuses on the definitions of relevant features and redundant features. Genari et al. [32] believe that a feature is useful if it is correlated with or predictive of the class; otherwise, it is irrelevant. The mutual information
Proposed feature selection algorithm
In this section, we first define the interaction weight factor for measuring redundancy and interaction between features. Then, we present our proposed feature subset selection algorithm.
Experimental results and analysis
In this section, we empirically evaluate the performance of our proposed algorithm, and present the experimental results compared with the other five different types of feature subset selection algorithms upon both synthetic and real world datasets respectively.
Conclusions and future work
The main goal of feature selection is to find a feature subset as small as possible, while the feature subset has highly prediction accuracy. Feature interaction exists in many applications. It is a challenging task to find interactive feature. In this paper, we present a novel feature subset selection algorithm considering interaction, which is very effective not only in removing irrelevant and redundant features but also reserving interactive features. First, the new definitions of redundancy
Acknowledgment
The authors would like to thank the anonymous reviewers for their constructive comments. This work was supported by the National Natural Science Foundation of China (70971137).
Zilin Zeng received the B.S. degree in applied mathematics from Jiangxi Normal University, Jiangxi, China, in 2008. She is currently a Ph.D. student of PLA University of Science & Technology, Nanjing, China. Her research focuses on feature subset selection and meta-learning.
References (43)
- et al.
Feature selection for classification
Intell. Data Anal.
(1997) - et al.
Filter-based optimization techniques for selection of feature subsets in ensemble systems
Expert Syst. Appl.
(2014) - et al.
Non-parametric classifier-independent feature selection
Pattern Recognit.
(2006) - et al.
A wrapper method for feature selection using support vector machines
Inf. Sci.
(2009) - et al.
Markov blanket-embedded genetic algorithm for gene selection
Pattern Recognit.
(2007) - et al.
Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data
Int. J. Approx. Reasoning
(2011) - et al.
Consistency-based search in feature selection
Artif. Intell.
(2003) - et al.
Feature selection with dynamic mutual information
Pattern Recognit.
(2009) - et al.
An efficient gene selection algorithm based on mutual information
Neurocomputing
(2009) - et al.
Learning to classify by ongoing feature selection
Image Vis. Comput.
(2010)
Selecting feature subset for high dimension data via the propositional FOIL rules
Pattern Recognit.
Models of incremental concept formation
Artif. Intell.
An introduction to variable and feature selection
J. Mach. Learn. Res.
Feature subset selection and ranking for data dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
Gene selection for cancer classification using support vector machines
Mach. Learn.
Analyzing attribute dependencies
in: Proceedings of Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases
A mathematical theory of communication
ACM SIGMOBILE Mobile Comput. Commun. Rev.
Elements of Information Theory
Attribute interactions in machine learning (Master thesis)
Cited by (152)
An efficient classification framework for Type 2 Diabetes incorporating feature interactions
2024, Expert Systems with ApplicationsClass-specific feature selection via maximal dynamic correlation change and minimal redundancy
2023, Expert Systems with ApplicationsA dynamic support ratio of selected feature-based information for feature selection
2023, Engineering Applications of Artificial IntelligenceAn improved differential evolution algorithm for quantifying fraudulent transactions
2023, Pattern RecognitionRobust microarray data feature selection using a correntropy based distance metric learning approach
2023, Computers in Biology and Medicine
Zilin Zeng received the B.S. degree in applied mathematics from Jiangxi Normal University, Jiangxi, China, in 2008. She is currently a Ph.D. student of PLA University of Science & Technology, Nanjing, China. Her research focuses on feature subset selection and meta-learning.