Feature selection using localized generalization error for supervised classification problems using RBFNN
Introduction
With the availability of fast computers, broadband Internet, and cheap, high capacity storage, datasets have become ever larger. Usually, domain knowledge and personal bias influence the choice of features. Although these parameters may not fully describe the problem, some parameters may be included just for fear of losing something useful. When the number of parameters (input features) of the dataset becomes large, the pattern classification systems trained for differentiating the sample points into different classes also get more complex. On the other hand, if it is not necessary to collect so many input features, the cost of data collection and storage will be reduced.
A major problem in pattern classification is how to build a simple classifier that has good performance. By “good performance” we mean a system that can be quickly trained, is highly accurate and responds quickly to future unseen samples, and is easily understood by people. Perhaps the most straightforward way to reduce the complexity of a classifier is to reduce the number of input features.
Given the training dataset consisting of training samples with denoting the unknown input–output mapping of the classification problem that one would like to approximate using a classifier (e.g. a neural network), the training error and generalization error for the entire input space of the classifier are defined aswhere denotes the true unknown probability density function of , and denotes the set of parameters in the classifier . The ultimate goal of training a classifier is to minimize the generalization error for unseen samples (i.e. minimizing the differences between the real unknown input–output mapping function and the mapping approximated by . Moreover the ultimate goal of feature selection is to maintain the classifier's generalization capability using a reduced set of features. Classifiers (e.g. neural networks) are usually not expected to recognize unseen samples that are too different from the training samples. Therefore, assessing the generalization capability of a classifier for those unseen samples may be counter-productive to classifier learning. So, Ng et al. [1], [2] proposed a localized generalization error model for bounding the generalization error for a classifier for unseen samples similar to the training samples. In our proposed feature selection method , we remove the feature subset that yields the smallest contribution to the . In terms of probability, the classifier trained using the reduced feature subset will not lose its generalization capability if remains unchanged. In this paper, the widely adopted radial basis function neural networks (RBFNNs) with Gaussian basis function [3], [4] will be used to demonstrate the method.
A brief literature review is presented in Section 2. In Section 3 we describe the localized generalization error model. The novel feature selection method is presented in Section 4, while experimental results are shown in Section 5. Section 6 concludes the paper.
Section snippets
Existing feature selection methods
Broadly speaking, the number of input features is reduced using three feature selection approaches: filters, wrappers, and embedded [5], [6], [7]. Under certain circumstances in the learning process of a decision tree, some features are ignored in the final decision tree if they have a minor influence on the classification [8]. This is a special case of feature selection and we will not discuss it in this work. In the following two sub-sections, we will introduce the filter and wrapper
Localized generalization error model
In this work, we concentrate our discussion on the use of RBFNN as a classifier , which is trained by the minimization of mean-square error (MSE) that indicates how good the RBFNN is when approximating the true unknown input–output mapping function . The localized generalization error bound is an upper bound of the MSE of those unseen samples that have features similar to the training samples (i.e., having a distance smaller than a constant in the input space) [1], [2]. We are
Feature selection using the localized generalization error
We applied the to RBFNN architecture selection problems [1] and image classification problems [2]. In this paper, we focus on the use of the to select the feature subset for pattern classification problems (i.e. ), for RBFNN. We define the irrelevant features to be those features yielding the smallest contribution to the . By removing these irrelevant features, one could build a classifier with a smaller loss or even no loss in generalization performance with reduced
Experimental results
We perform a thorough comparison among the proposed method and five other feature selection methods using the UCI Wine dataset [21] in Section 5.1 On top of the average testing accuracies of the RBFNNs built using the selected feature subsets by those methods, we also compare the features selected and show the advantages of the method. In Section 5.2, we provide more experimental results on a variety of datasets in terms of numbers of features samples and classes The methods being
Conclusion and discussion
In this paper, we propose removing features that do not contribute to the localized generalization error bound of the classifier. The model bounds from above the generalization error of unseen samples located within a neighborhood of the training samples. In the experiments for two of the datasets, the RBFNNs built using feature subsets with 90% of features removed by the yield average testing accuracies higher than those trained using a full set of features. Moreover, the
Acknowledgment
This work is supported by the Hong Kong Polytechnic University Research Grant G-YD87.
About the Author—WING W.Y. NG received the B.Sc. degree in Information Technology and the Ph.D. degree from Hong Kong Polytechnic University in 2001 and 2006, respectively. In 2008, he joined the School of Computer Science and Engineering, South China University of Technology, China, where he is currently an Associate Professor.
References (32)
- et al.
Image classification with the use of radial basis function neural networks and the minimization of localized generalization error
Pattern Recognition
(2007) - et al.
Wrappers for feature subset selection
Artif. Intell.
(1997) - et al.
Enron: what happened and what we can learn from it
J. Accounting Public Policy
(2002) - et al.
Localized generalization error and its application to architecture selection for radial basis function neural network
IEEE Trans. Neural Networks
(2007) Neural Networks
(1998)- et al.
Fast learning in networks of locally-tuned processing units
Neural Comput.
(1989) - et al.
Selection of relevant features and examples in machine learning
Artif. Intell.
(1997) - et al.
Feature Selection for Knowledge Discovery and Data Mining
(1998) - et al.
An introduction to variable and feature selection
J. Mach. Learn. Res.
(2003) Decision trees and decision-making
IEEE Trans. SMC
(1990)
Principal Component Analysis
PCA-based feature selection scheme for machine defect classification
IEEE Trans. Instrum. Meas.
Using mutual information for selecting features in supervised neural net learning
IEEE Trans. Neural Networks
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
IEEE Trans. PAMI
Input feature selection by mutual information based on parzen window
IEEE Trans. PAMI
Unsupervised feature selection using feature similarity
IEEE Trans. PAMI
Cited by (0)
About the Author—WING W.Y. NG received the B.Sc. degree in Information Technology and the Ph.D. degree from Hong Kong Polytechnic University in 2001 and 2006, respectively. In 2008, he joined the School of Computer Science and Engineering, South China University of Technology, China, where he is currently an Associate Professor.
About the Author—DANIEL S. YEUNG received the Ph.D. degree in Applied Mathematics from Case Western Reserve University in 1974. He was the Chairman of the Department of Computing, The Hong Kong Polytechnic University, Hong Kong. He has been elected as President in 2007 for the IEEE SMC Society.
About the Author—MICHAEL FIRTH earned the Ph.D. from the University of Bradford and he has professional experience and qualifications in accounting and finance. He is currently the Chair Professor of Finance at Lingnan University.
About the Author—ERIC C.C. TSANG received the B.Sc. degree in Computer Studies from the City University of Hong Kong in 1990 and Ph.D. degree in computing at the Hong Kong Polytechnic University in 1996. He is an Assistant Professor of the Department of Computing of the Hong Kong Polytechnic University.
About the Author—XI-ZHAO WANG received the Ph.D. degree in Computer Science from Harbin Institute of Technology, Harbin, China, in 1998. Since 2001, he has been the Dean and Professor of the Faculty of Mathematics and Computer Science, Hebei University, China.