Elsevier

Pattern Recognition

Volume 48, Issue 8, August 2015, Pages 2656-2666
Pattern Recognition

A novel feature selection method considering feature interaction

https://doi.org/10.1016/j.patcog.2015.02.025Get rights and content

Highlights

  • A novel feature selection method based on interaction weight factor is proposed.

  • We redefined relevance, redundancy and interaction of features in the framework of information theory.

  • The algorithm can deal with irrelevant, redundant and interactive features.

  • Our method obtains the best average accuracies compared with the other five algorithms.

Abstract

Interacting features are those that appear to be irrelevant or weakly relevant with the class individually, but when it combined with other features, it may highly correlate to the class. Discovering feature interaction is a challenging task in feature selection. In this paper, a novel feature selection algorithm considering feature interaction is proposed. Firstly, feature relevance, feature redundancy and feature interaction have been redefined in the framework of information theory. Then the interaction weight factor which can reflect the information of whether a feature is redundant or interactive is proposed. Afterwards, we bring forward an Interaction Weight based Feature Selection algorithm (IWFS). To evaluate the performance of the proposed algorithm, we compare IWFS with other five representative feature selection algorithms, including CFS, INTERACT, FCBF, MRMR and Relief-F, in terms of the classification accuracies and the number of selected features with three different types of classifiers including C4.5, IB1 and PART. The results on the six synthetic datasets show that IWFS can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the eight real world datasets indicate that IWFS not only efficiently reduces the dimensionality of feature space, but also offers the highest average accuracy for all the three classification algorithms.

Introduction

Feature selection is a preprocessing step in pattern recognition and machine learning. It has drawn attention of many researchers from various fields. The main objective of this task is to choose a subset of features which retain the optimum salient characteristics necessary from the original ones. Feature selection can bring lots of advantages, such as avoiding over-fitting, facilitating data visualization, reducing storage requirements, and reducing training time [1].

From the perspective of subset evaluation function, feature selection algorithms can be categorized into two main categories [2]: filter [3], [4], [5] and wrapper [6], [7], [8] models. In the filter model, the algorithms are independent to any classifier, i.e., the algorithms do not perform classification of the data in the process of feature subset evaluation. The best feature subset is selected by evaluating some predefined criteria without involving any learning algorithms. Wrapper models utilize the performance of a specific classifier to evaluate feature subsets by different search strategies. Though the wrapper methods may guarantee good results, it has the disadvantage of a considerable computational expense, and may produce subsets that are overly specific to the classifier used. Compared with the wrapper methods, the filter methods are computationally simple and fast. Thus, they can be easily applied to very high-dimensional datasets due to their high computation efficiency and generality. We believe that a feature selection method needs to be simple, robust and efficient. Therefore, we tend to adopt filter-based feature selection methods.

Traditionally, feature selection research has focused on removing irrelevant and redundant features as many as possible [9]. Irrelevant features provide no useful information in any context and redundant features are those which provide no more information than the currently selected features. Apart from the identification of irrelevant and redundant features, an important but usually being ignored issue is feature interaction [10]. Interacting features are those that appear to be irrelevant with the class individually, but when it combined with other features, it may highly correlate to the class. The XOR problem is a typical example. There are two features and a class which is zero if both features have the same value and one otherwise. Obviously, each feature does not carry any information about the class individually, however, when combined; the two features completely determine the class.

Although some recent work has pointed out the existence and effect of feature interaction, there is little work on explicit treatment of feature interaction. Some wrapper methods are able to deal with feature interaction to some extent, but these methods require a model testing each feature subset and the process is usually time-consuming, especially for some computational expensive models. Furthermore, wrapper methods strongly correlate to the used classification algorithm, and the performance of the model does not necessarily reflect the actual predictive ability of the selected feature subset. Therefore, it is a challenge to filter out the irrelevant and redundant features, and reserve only a small number of interactive features.

In this paper, we propose an Interaction Weight based Feature Selection algorithm (IWFS). We firstly define the concept of feature relevance, feature redundancy and feature interaction, and then propose an interaction weight factor to measure the redundancy and interaction of candidate features. Since redundant features produce negative influence and interaction features produce positive influence in predicting, the weight factors of redundant features should less than that of interactive features. Through the manipulation of interaction weight factor, we can redress the traditional relevance measure between a feature and the class and rank the candidate features with the adjusted relevance measure. To verify its performance, the proposed method is compared with five state-of-the-art feature selection methods (CFS, INTERACT, FCBF, MRMR and Relief-F) on six synthetic datasets and eight real world datasets. Experiment results show that our proposed method cannot only remove redundant features, but also detect interactive features.

The rest of this paper is organized as follows. In Section 2, some basic information-theoretic notions are reviewed. In Section 3, we describe the related work. In Section 4, we provide formal definitions of relevance, redundancy and interaction in the framework of information theory. In Section 5, we put forward the new feature subset selection algorithm. Experimental results and analysis are presented in Section 6. Finally, we make a brief conclusion and give the future research direction in Section 7.

Section snippets

Some basic information-theoretic notions

In this section, some basic information-theoretic notions for feature selection are reviewed.

Shannon’s information theory, first introduced in 1948 [11], provides a way to measure the information of random variables. The entropy is a measure of uncertainty of random variables [12]. Let X={x1,x2,...,xn} be a discrete random variable and p(xi) is the probability of xi, the entropy of X is defined byH(X)=i=1np(xi)logp(xi).

Here the base of log is 2 and the unit of entropy is the bit. Obviously, H(

Related work

Feature subset selection can be regarded as a search problem. It searches for one or more informative subsets of features under some predefined criteria. This process is defined in the following way: Let F={F1,F2,...,Fn} is the full set of input features, S={Fτ(1),Fτ(2),,Fτ(m)}(SF) is a selected feature subset from the full feature set which is the subset of F, where m<n. We would like to select the most informative subset SoptF that represents the original data under some criteria J. In

Definitions of relevance, redundancy and interaction

Feature selection algorithms often associate with information theory concepts like relevance, redundancy and interaction of features. In this section, the definitions of relevance, redundancy and interaction of features will be pointed out.

Most of previous work focuses on the definitions of relevant features and redundant features. Genari et al. [32] believe that a feature is useful if it is correlated with or predictive of the class; otherwise, it is irrelevant. The mutual information I(Fi;C)

Proposed feature selection algorithm

In this section, we first define the interaction weight factor for measuring redundancy and interaction between features. Then, we present our proposed feature subset selection algorithm.

Experimental results and analysis

In this section, we empirically evaluate the performance of our proposed algorithm, and present the experimental results compared with the other five different types of feature subset selection algorithms upon both synthetic and real world datasets respectively.

Conclusions and future work

The main goal of feature selection is to find a feature subset as small as possible, while the feature subset has highly prediction accuracy. Feature interaction exists in many applications. It is a challenging task to find interactive feature. In this paper, we present a novel feature subset selection algorithm considering interaction, which is very effective not only in removing irrelevant and redundant features but also reserving interactive features. First, the new definitions of redundancy

Acknowledgment

The authors would like to thank the anonymous reviewers for their constructive comments. This work was supported by the National Natural Science Foundation of China (70971137).

Zilin Zeng received the B.S. degree in applied mathematics from Jiangxi Normal University, Jiangxi, China, in 2008. She is currently a Ph.D. student of PLA University of Science & Technology, Nanjing, China. Her research focuses on feature subset selection and meta-learning.

References (43)

  • G. Wang et al.

    Selecting feature subset for high dimension data via the propositional FOIL rules

    Pattern Recognit.

    (2013)
  • J.H. Gennari et al.

    Models of incremental concept formation

    Artif. Intell.

    (1989)
  • I. Guyon et al.

    An introduction to variable and feature selection

    J. Mach. Learn. Res.

    (2003)
  • H.L. Wei et al.

    Feature subset selection and ranking for data dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • I. Guyon et al.

    Gene selection for cancer classification using support vector machines

    Mach. Learn.

    (2002)
  • A. Jakulin et al.

    Analyzing attribute dependencies

    in: Proceedings of Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases

    (2003)
  • C.E. Shannon

    A mathematical theory of communication

    ACM SIGMOBILE Mobile Comput. Commun. Rev.

    (2001)
  • T.M. Cover et al.

    Elements of Information Theory

    (1991)
  • A. Jakulin, I. Bratko, Testing the significance of attribute interactions, in: Proceedings of the Twenty-first...
  • A. Jakulin

    Attribute interactions in machine learning (Master thesis)

    (2003)
  • K. Kira, L.A. Rendell, The feature selection problem: traditional methods and a new algorithm, in: Proceedings of Ninth...
  • Cited by (152)

    • A dynamic support ratio of selected feature-based information for feature selection

      2023, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus

    Zilin Zeng received the B.S. degree in applied mathematics from Jiangxi Normal University, Jiangxi, China, in 2008. She is currently a Ph.D. student of PLA University of Science & Technology, Nanjing, China. Her research focuses on feature subset selection and meta-learning.

    View full text