Elsevier

Pattern Recognition Letters

Volume 19, Issue 11, September 1998, Pages 997-1006
Pattern Recognition Letters

Unsupervised feature selection using a neuro-fuzzy approach

https://doi.org/10.1016/S0167-8655(98)00083-XGet rights and content

Abstract

A neuro-fuzzy methodology is described which involves connectionist minimization of a fuzzy feature evaluation index with unsupervised training. The concept of a flexible membership function incorporating weighed distance is introduced in the evaluation index to make the modeling of clusters more appropriate. A set of optimal weighing coefficients in terms of networks parameters representing individual feature importance is obtained through connectionist minimization. Besides, the investigation includes the development of another algorithm for ranking of different feature subsets using the aforesaid fuzzy evaluation index without neural networks. Results demonstrating the effectiveness of the algorithms for various real life data are provided.

Introduction

Feature selection or extraction is a process of selecting a map of the form x=f(x) by which a sample x(x1,x2,…,xn) in an n-dimensional measurement space (Rn) is transformed into a point x(x1,x2,…,xn) in an n-dimensional (n<n) feature space (Rn). The problem of feature selection deals with choosing some of the xis from the measurement space to constitute the feature space. On the other hand, the problem of feature extraction deals with generating new xjs (constituting the feature space) based on some xis in the measurement space. The main objective of these processes is to retain the optimum salient characteristics necessary for the recognition process and to reduce the dimensionality of the measurement space so that effective and easily computable algorithms can be devised for efficient categorization.

Fuzzy set theory enables one to deal with uncertainties in different tasks of a pattern recognition system, arising from deficiency (e.g., vagueness, incompleteness, etc.) in information, in an efficient manner. Artificial Neural Networks (ANNs), having the capability of fault tolerance, adaptivity and generalization, and scope for massive parallelism, are widely used in dealing with learning and optimization tasks. Fuzzy set theoretic approaches for feature selection are mainly based on measures of entropy and index of fuzziness (Pal and Chakraborty, 1986; Pal, 1992), fuzzy c-means and fuzzy ISODATA algorithms (Bezdek and Castelaz, 1977). Some of the recent attempts made for feature selection in the framework of ANN are mainly based on multilayer feedforward networks and self-organizing networks (Priddy et al., 1993; Steppe and Bauer, Jr., 1996; De et al., 1997; Pregenzer et al., 1996). Note that, depending on whether the class information of the samples is known or not, these methods are classified under supervised or unsupervised mode. For example, the algorithms described in (Pal and Chakraborty, 1986, Pal, 1992, Bezdek and Castelaz, 1977; Priddy et al., 1993; Steppe and Bauer, Jr., 1996; De et al., 1997) fall under the supervised category, whereas those in (Bezdek and Castelaz, 1977; Pregenzer et al., 1996) are in unsupervised mode.

Recently, attempts have been made to integrate the merits of fuzzy set theory and ANN under the heading `neuro-fuzzy computing' for making the systems artificially more intelligent. In the area of pattern recognition, neuro-fuzzy approaches have been attempted mostly for designing classification/clustering methodologies, not much for feature selection or extraction.

The present article is an attempt in this regard and provides a neuro-fuzzy approach for feature selection under unsupervised mode of training. First of all, a fuzzy feature evaluation index for a set of features is defined in terms of membership values denoting the degree of similarity between two patterns. The similarity between two patterns is measured by a weighed distance between them. The weight coefficients are used to denote the degree of importance of the individual features in characterizing/discriminating different clusters and to provide flexibility in modeling various clusters. The evaluation index is such that, for a set of features, the lower its value, the higher is the importance of that set in characterizing/discriminating various clusters. A layered network is then formulated for performing the task of minimization of the evaluation index by an unsupervised learning process, thereby determining the optimum weight coefficients providing an ordering of the individual features.

In another part of the investigation, the aforesaid fuzzy evaluation index is used alone to find the best subset of features. This is done by computing the evaluation index (with weight coefficients equal to 1) on different subsets of features and then ordering them accordingly. The effectiveness of these algorithms is demonstrated on four different data sets, namely, vowel (Pal and Dutta Majumder, 1986, Pal and Chakraborty, 1986), Iris (Fisher, 1936), medical (Hayashi, 1991) and mango-leaf (Pal, 1992) .

Section snippets

Feature evaluation index

In this section we first of all provide a definition of the fuzzy feature evaluation index. The membership function for its realization is then defined in terms of a distance measure and weight coefficients.

Feature selection

In this section we describe two unsupervised algorithms for feature selection. The first one considers the fuzzy feature evaluation index alone for ranking of different feature subsets. The second one is based on a neuro-fuzzy approach, where the fuzzy feature evaluation index is minimized with a layered neural network for ranking of individual features.

Results

Here we demonstrate the effectiveness of the algorithms presented above on four data sets, namely, vowel data (Pal and Dutta Majumder, 1986; Pal and Chakraborty, 1986), Iris data (Fisher, 1936), medical data (Hayashi, 1991) and mango-leaf data (Pal, 1992). The vowel data consists of a set of 437 Indian Telugu vowel sounds collected by trained personnel. These were uttered in a consonant-vowel-consonant context by three male speakers in the age group of 30 to 35 years. The data set has three

Conclusions

In this article we have demonstrated how the concept of neuro-fuzzy computing can be exploited for developing a methodology for feature selection in unsupervised mode. The methodology developed involves connectionist optimization of a fuzzy feature evaluation index, thereby determining the ranking of various features. The algorithm considers interdependence of the original features. Unlike the method based on the fuzzy c-means algorithm (Bezdek and Castelaz, 1977), the algorithm provides a

Acknowledgements

Mr. Rajat K. De is grateful to the Department of Atomic Energy, Government of India for providing him a Dr. K.S. Krishnan Senior Research Fellowship. The work is partly supported by Grant No. 25(0093)/97/EMR-II of CSIR, New Delhi. The work was partly done when Jayanta Basak was in RIKEN Brain Science Institute, Wakoshi, Saitama, Japan.

References (10)

There are more references available in the full text version of this article.

Cited by (84)

  • Heterogeneous feature subset selection using mutual information-based feature transformation

    2015, Neurocomputing
    Citation Excerpt :

    As a result, it is difficult to evaluate heterogeneous features concurrently. However, most conventional FS algorithms focus on datasets with homogeneous features, which can be roughly categorized into two types: numerical FS [18–23] and non-numerical FS [24–27]. Several methods were also proposed to solve the problem of heterogeneous feature selection.

View all citing articles on Scopus
View full text