Unsupervised feature selection using a neuro-fuzzy approach

doi:10.1016/S0167-8655(98)00083-X

Pattern Recognition Letters

Volume 19, Issue 11, September 1998, Pages 997-1006

https://doi.org/10.1016/S0167-8655(98)00083-X Get rights and content

Abstract

A neuro-fuzzy methodology is described which involves connectionist minimization of a fuzzy feature evaluation index with unsupervised training. The concept of a flexible membership function incorporating weighed distance is introduced in the evaluation index to make the modeling of clusters more appropriate. A set of optimal weighing coefficients in terms of networks parameters representing individual feature importance is obtained through connectionist minimization. Besides, the investigation includes the development of another algorithm for ranking of different feature subsets using the aforesaid fuzzy evaluation index without neural networks. Results demonstrating the effectiveness of the algorithms for various real life data are provided.

Introduction

Feature selection or extraction is a process of selecting a map of the form $x^{′} =f(x)$ by which a sample $x (x_{1},x_{2},…,x_{n})$ in an n-dimensional measurement space $(R^{n})$ is transformed into a point $x^{′} (x_{1}^{′},x_{2}^{′},…,x_{n^{′}}^{′})$ in an n^′-dimensional (n^′<n) feature space $(R^{n^{′}})$ . The problem of feature selection deals with choosing some of the x_is from the measurement space to constitute the feature space. On the other hand, the problem of feature extraction deals with generating new x_j^′s (constituting the feature space) based on some x_is in the measurement space. The main objective of these processes is to retain the optimum salient characteristics necessary for the recognition process and to reduce the dimensionality of the measurement space so that effective and easily computable algorithms can be devised for efficient categorization.

Fuzzy set theory enables one to deal with uncertainties in different tasks of a pattern recognition system, arising from deficiency (e.g., vagueness, incompleteness, etc.) in information, in an efficient manner. Artificial Neural Networks (ANNs), having the capability of fault tolerance, adaptivity and generalization, and scope for massive parallelism, are widely used in dealing with learning and optimization tasks. Fuzzy set theoretic approaches for feature selection are mainly based on measures of entropy and index of fuzziness (Pal and Chakraborty, 1986; Pal, 1992), fuzzy c-means and fuzzy ISODATA algorithms (Bezdek and Castelaz, 1977). Some of the recent attempts made for feature selection in the framework of ANN are mainly based on multilayer feedforward networks and self-organizing networks (Priddy et al., 1993; Steppe and Bauer, Jr., 1996; De et al., 1997; Pregenzer et al., 1996). Note that, depending on whether the class information of the samples is known or not, these methods are classified under supervised or unsupervised mode. For example, the algorithms described in (Pal and Chakraborty, 1986, Pal, 1992, Bezdek and Castelaz, 1977; Priddy et al., 1993; Steppe and Bauer, Jr., 1996; De et al., 1997) fall under the supervised category, whereas those in (Bezdek and Castelaz, 1977; Pregenzer et al., 1996) are in unsupervised mode.

Recently, attempts have been made to integrate the merits of fuzzy set theory and ANN under the heading `neuro-fuzzy computing' for making the systems artificially more intelligent. In the area of pattern recognition, neuro-fuzzy approaches have been attempted mostly for designing classification/clustering methodologies, not much for feature selection or extraction.

The present article is an attempt in this regard and provides a neuro-fuzzy approach for feature selection under unsupervised mode of training. First of all, a fuzzy feature evaluation index for a set of features is defined in terms of membership values denoting the degree of similarity between two patterns. The similarity between two patterns is measured by a weighed distance between them. The weight coefficients are used to denote the degree of importance of the individual features in characterizing/discriminating different clusters and to provide flexibility in modeling various clusters. The evaluation index is such that, for a set of features, the lower its value, the higher is the importance of that set in characterizing/discriminating various clusters. A layered network is then formulated for performing the task of minimization of the evaluation index by an unsupervised learning process, thereby determining the optimum weight coefficients providing an ordering of the individual features.

In another part of the investigation, the aforesaid fuzzy evaluation index is used alone to find the best subset of features. This is done by computing the evaluation index (with weight coefficients equal to 1) on different subsets of features and then ordering them accordingly. The effectiveness of these algorithms is demonstrated on four different data sets, namely, vowel (Pal and Dutta Majumder, 1986, Pal and Chakraborty, 1986), Iris (Fisher, 1936), medical (Hayashi, 1991) and mango-leaf (Pal, 1992) .

Section snippets

Feature evaluation index

In this section we first of all provide a definition of the fuzzy feature evaluation index. The membership function for its realization is then defined in terms of a distance measure and weight coefficients.

Feature selection

In this section we describe two unsupervised algorithms for feature selection. The first one considers the fuzzy feature evaluation index alone for ranking of different feature subsets. The second one is based on a neuro-fuzzy approach, where the fuzzy feature evaluation index is minimized with a layered neural network for ranking of individual features.

Results

Here we demonstrate the effectiveness of the algorithms presented above on four data sets, namely, vowel data (Pal and Dutta Majumder, 1986; Pal and Chakraborty, 1986), Iris data (Fisher, 1936), medical data (Hayashi, 1991) and mango-leaf data (Pal, 1992). The vowel data consists of a set of 437 Indian Telugu vowel sounds collected by trained personnel. These were uttered in a consonant-vowel-consonant context by three male speakers in the age group of 30 to 35 years. The data set has three

Conclusions

In this article we have demonstrated how the concept of neuro-fuzzy computing can be exploited for developing a methodology for feature selection in unsupervised mode. The methodology developed involves connectionist optimization of a fuzzy feature evaluation index, thereby determining the ranking of various features. The algorithm considers interdependence of the original features. Unlike the method based on the fuzzy c-means algorithm (Bezdek and Castelaz, 1977), the algorithm provides a

Acknowledgements

Mr. Rajat K. De is grateful to the Department of Atomic Energy, Government of India for providing him a Dr. K.S. Krishnan Senior Research Fellowship. The work is partly supported by Grant No. 25(0093)/97/EMR-II of CSIR, New Delhi. The work was partly done when Jayanta Basak was in RIKEN Brain Science Institute, Wakoshi, Saitama, Japan.

References (10)

S.K. Pal
Fuzzy set theoretic measures for automatic feature evaluation: II
Information Sciences
(1992)
K.L. Priddy et al.
Bayesian selection of important features for feedforward neural networks
Neurocomputing
(1993)
R.K. De et al.
Feature analysis: neural network and fuzzy set theoretic approaches
Pattern Recognition
(1997)
M. Pregenzer et al.
Automated feature selection with a distinctive sensitive learning vector quantizer
Neurocomputing
(1996)
J.C. Bezdek et al.
Prototype classification and feature selection with fuzzy sets
IEEE Trans. on Systems, Man and Cybernetics
(1977)

There are more references available in the full text version of this article.

Cited by (84)

Cluster Analysis for mixed data: An application to credit risk evaluation
2021, Socio-Economic Planning Sciences
Credit risk is one of the main risks faced by a bank to provide financial products and services to clients. To evaluate the financial performance of clients, several scoring methodologies have been proposed, which are based mostly on quantitative indicators. This paper highlights the relevance of both quantitative and qualitative features of applicants and proposes a new methodology based on mixed data clustering techniques. Indeed, cluster analysis may prove particularly useful in the estimation of credit risk. Traditionally, clustering concentrates only on quantitative or qualitative data at a time; however, since credit applicants are characterized by mixed personal features, a cluster analysis specific for mixed data can lead to discover particularly informative patterns, estimating the risk associated with credit granting.
Fuzzy Centroid and Genetic Algorithms: Solutions for Numeric and Categorical Mixed Data Clustering
2021, Procedia Computer Science
Statistical data analysis in machine learning and data mining usually uses the clustering technique. However, data with both attributes or mixed data exists universally in real life. K-prototype is a well-known algorithm for clustering mixed data because of its effectiveness in handling large data. However, practically, k-prototype has two main weaknesses, the use of mode as a cluster center for categorical attributes cannot accurately represent the objects, and the algorithm may stop at the local optimum solution because affected by random initial cluster prototypes. To overcome the first weakness, we can use fuzzy centroid, and for second weakness is to implement the genetic algorithm to search the global optimum solution. Our research combines the genetic algorithm and Fuzzy K-Prototype to accommodate these two weaknesses. We set up two multivariate data with high correlation and low correlation to see the robustness of the proposed algorithm. According to four value indexes of clustering result evaluation, Coefficient Varians Index, Partition Coefficient, Partition Entropy, and Purity, show that our proposed algorithm has a better result than K prototype. Based on the evaluation result, we conclude that our proposed algorithm can solve two weaknesses of the k-prototype algorithm.
An off-center technique: Learning a feature transformation to improve the performance of clustering and classification
2019, Information Sciences
This paper proposes a feature transformation method to improve the performance of clustering and classification, which is named as weight-matrix learning (WML). A feed-forward neural network is particularly designed for WML, which aims to learn the optimal weights by minimizing an objective function similar to cross-entropy, and the training process is finished based on the technique of batch gradient descent or stochastic gradient descent. The proposed feature transformation is linear, which is a non-trivial extension of a previous technique named feature-weight learning (FWL). Essentially, WML can be considered as a learning technique of departing 0.5-similarity, since it can make the samples with similarity larger than 0.5 closer and the samples with similarity lower than 0.5 farther away. From this perspective, WML is identified as an off-center technique with the center of 0.5-similarity. Theoretically and experimentally, it is validated that WML can significantly improve the performance of some clustering algorithms like k-means, and enhance the performance of some classification algorithms like random weight neural network.
A neuro-fuzzy classification technique using dynamic clustering and GSS rule generation
2017, Journal of Computational and Applied Mathematics
An efficient feature subset selection for predictive and accurate classification is highly desirable in many application domains like medical diagnosis, target marketing etc. Many neuro-fuzzy models were proposed for feature selection and efficient classification. One of such existing neuro-fuzzy models is Enhance Neuro-Fuzzy (ENF) system for classification using dynamic clustering. The major problem of ENF is, huge number of linguistic variables generated for each feature, which results in poor interpretation of the rules generated for classification. Therefore, this paper proposes a neuro-fuzzy model which is an extension of ENF. The novelty of the proposed model lies in determining less number of linguistic variables for each feature and also in generating significant linguistic variables in the rules for classification with better interpretation and accuracy. Six datasets are used to test the performance of the proposed model. 10-fold cross validation is used to compare the performance of the proposed model with others. It is observed from the experimental results that the performance of the proposed model is superior to others.
Heterogeneous feature subset selection using mutual information-based feature transformation
2015, Neurocomputing
Citation Excerpt :
As a result, it is difficult to evaluate heterogeneous features concurrently. However, most conventional FS algorithms focus on datasets with homogeneous features, which can be roughly categorized into two types: numerical FS [18–23] and non-numerical FS [24–27]. Several methods were also proposed to solve the problem of heterogeneous feature selection.
Conventional mutual information (MI) based feature selection (FS) methods are unable to handle heterogeneous feature subset selection properly because of data format differences or estimation methods of MI between feature subset and class label. A way to solve this problem is feature transformation (FT). In this study, a novel unsupervised feature transformation (UFT) which can transform non-numerical features into numerical features is developed and tested. The UFT process is MI-based and independent of class label. MI-based FS algorithms, such as Parzen window feature selector (PWFS), minimum redundancy maximum relevance feature selection (mRMR), and normalized MI feature selection (NMIFS), can all adopt UFT for pre-processing of non-numerical features. Unlike traditional FT methods, the proposed UFT is unbiased while PWFS is utilized to its full advantage. Simulations and analyses of large-scale datasets showed that feature subset selected by the integrated method, UFT–PWFS, outperformed other FT–FS integrated methods in classification accuracy.
A novel pretreatment method of three-dimensional fluorescence data for quantitative measurement of component contents in mixture
2015, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Three-dimensional fluorescence technique is commonly used for the determination of component contents in the mixture. Fluorescence intensity data are used directly in the fluorescent spectrum data processing method. The relationship between fluorescence intensity values and concentrations is linear. Random noise is inevitable in the process of measuring due to fluorescence spectrometer. The measurement accuracy is reduced due to the existence of noise. To reduce random noise and improve the measurement sensitivity, a novel pretreatment method of three-dimensional fluorescence data is proposed. The method is based on Quasi-Monte-Carlo integral. Due to the increased slope of fluorescence intensity data during the integral, the measurement sensitivity is improved. At the same time, the sum of different exponentials of fluorescence intensity at the points reduces the random noise, so the measurement sensitivity is improved more. The recovery rates of the mixture mixed by gasoline, kerosene and diesel oil are calculated to validate the effectiveness of the method.

View all citing articles on Scopus

View full text

Unsupervised feature selection using a neuro-fuzzy approach

Abstract

Introduction

Section snippets

Feature evaluation index

Feature selection

Results

Conclusions

Acknowledgements

Information Sciences

Neurocomputing

Pattern Recognition

Neurocomputing

Prototype classification and feature selection with fuzzy sets

IEEE Trans. on Systems, Man and Cybernetics