Neural Network and Adaptive Feature Extraction Technique for Pattern Recognition

In this paper, we propose adaptive K-means algorithm upon the principal component analysis PCA feature extraction to pattern recognition by using a neural network model. Adaptive k-means to discriminate among objects belonging to different groups based upon the principal component analysis PCA implemented for statistical feature extraction. The features extracted by PCA consistently reduction dimensional algorithm, thus demonstrating that the suite of structure detectors effectively performs generalized feature extraction. The classification accuracies achieved using feature learning process of back propagation neural network . A comparison of the proposed adaptive and previous non-adaptive ensemble is the primary goal of the experiments. We evaluated the performance of the clustering ensemble algorithms by matching the detected and the known partitions of the iris dataset. The best possible matching of clusters provides a measure of performance expressed as the misassignment rate.

the clustering techniques have been widely applied in a variety of scientific areas such as pattern recognition 6 , information retrieval, microbiology analysis, and so forth.In the literature, the k-means 2 is a typical clustering algorithm, which aims to partition N inputs (also called data points interchangeably).
The most commonly used family of neural networks for pattern classification tasks 4 is the feedforward network, which includes multilayer perceptron and Radial-Basis Function (RBF) networks.These networks are organized into layers and have unidirectional connections between the layers.Another popular network is the Self-Organizing Map (SOM), or Kohonen-Network 5 , which is mainly used for data clustering and feature mapping.The learning process involves updating network architecture and connection weights so that a network can efficiently perform a specific classification/clustering task.The increasing popularity of neural network models to solve pattern recognition problems has been primarily due to their seemingly low dependence on domain-specific knowledge.

Adaptive K-means cluster algorithm
The adaptive k-means algorithm is built on Kullback scheme [7] , its details steps as follows: Step 1 We implement this step by using Frequency Sensitive Competitive Learning [1] because they can achieve the goal as long as the number of seed points is not less than the exact number K of clusters.Here, we suppose the number of clusters is k ≥ k*, and randomly initialize the k seed points m1 , m2 , . . ., mk in the input data set.
Step 1. 1 Randomly pick up a data point xt from the input data set, and for j = 1, 2, . . ., k, let ... (1) where and nr is the cumulative number of the occurrences of ur = 1.
Step 1.2 Update the winning seed point mw only by ... (2) Steps 1.1 and 1.2 are repeatedly implemented until the k series of υj for j = 1, 2, . . ., k remain unchanged for all xt s.Then go to Step 2.
In the above, we have not included the input covariance information in Eqs. ( 2) and (3) because this step merely aims to allocate the seed points into some desired regions as stated before, rather than making a precise value estimate of them.Hence, we can simply ignore the covariance information to save the considerable computing cost in the estimate of a covariance matrix.
Step 2 Initialize αj = 1/k for j = 1, 2, . . ., k, and let Σj be the covariance matrix of those data points with uj = 1.In the following, we adaptively learn α j s , mj s and Σj s Step 2.1 Given a data point xt, calculate I( j ƒì xt ) s by Eq. ( 1).

Step 2.2
Update the winning seed point mw only by ...( 3)  .In the latter, we actually update mw along the direction of that forms an acute angle to the gradient-descent direction.Further, we have to update the parameters αj s and Σw .through a constrained optimization algorithm in view of the constraints on α j s in Eq. ( 5).Alternatively, we here let ... (5) where the constraints of αj s are automatically satisfied, but the new variables β j s are totally free.Consequently, instead of β j s, we can learn β new w only by ... (6) with the other β j s unchanged.It turns out that aw is exclusively increased while the other α j s are penalized, i.e., their values are decreased.Here, please note that, although β j s are gradually convergent, Eq. ( 6) always makes the updating of β increase without an upper bound upon the fact the aw is always smaller than 1 in general.To avoid this undesirable situation, one feasible way is to subtract a positive constant cb from all β j s when the largest one of β j s reaches a prespecified positive threshold value.
As for Σw , we update it with a small step size ... (7) where and ηs is a small positive learning rate, e.g.ηs = 0.1η.In general, the learning of a covariance matrix is more sensitive to the learning step size than the other parameters.
The simple K-means partitional clustering algorithm described above is computationally efficient and gives surprisingly good results if the clusters are compact, hyper spherical in shape and well-separated in the feature space.If the Mahalanobis distance is used in defining the squared error in (3), then the algorithm is even able to detect hyper ellipsoidal shaped clusters.

PCA Feature Extraction by Hebbian Algorithm
The principal component analysis (PCA) is a Multivariate Statistical Process Control (MSPC) methods.Feature extraction methods determine an appropriate subspace of dimensionality m (either in a linear or a nonlinear way) in the original feature space of dimensionality n , such that m is less than n.Linear transforms, such as principal component analysis, factor analysis, linear discriminant analysis, and projection pursuit have been widely used in pattern recognition for feature extraction and dimensionality reduction.Principal components can be extracted using single-layer feed-forward neural networks 8 .

Hebbian Learning process
There is a close correspondence between the behavior of self-organizing neural networks and the statistical method of principal component analysis.In fact, a self-organizing, fully interconnected 2-layer network with i inputs and j outputs (with i > j) can be used to extract the first j principal components from the input vector, thus reducing the size of the input vector by j-i elements as shown in fig.(1).With j=1, such a network acts as a maximum eigenf ilter by extracting the first principal component from the input vector.The algorithm used to train the network is based on Hebb's postulate of learning.
... (8) Hebbian Learning process by eq.( 9) ... (9) Here n denotes the number of training samples, and the learning rate.The attentive reader will notice that the unconstrained use of this learning algorithm would drive to infinity because the weight would always grow but never be decreased.In order to overcome this problem, some sort of normalization or saturation factor needs to be introduced.A proportional decrease of w i by a normalization term introduces competition among the synapses, which, as a principle of selforganization, is essential for the stabilization of the learning process.
The adjusted weight is calculated as ... (10) Reconstruction pattern from Hebb network by eq (11) ... (11) In general, the algorithm converges very quickly.There is, however, one serious problem: errors in the input vector may cause some weights to break through the constraint mechanism and grow to infinity, which as a result destroys the PCA filter.The implementation should throw an exception when a particular weight grows above an appropriate upper limit.

Application of PCA on Handwritten
We applied the PCA by Hebbian on a handwritten digits numbers from 0 to 9, which represented by image of (28×28) pixels as shown in figure ().We have been applied a Kohonen 's map network to recognition theses pattern.The result of recognition is shown in table(1)

Application of adaptive K-means and PCA algorithm on Iris dataset
A data set with 150 random samples with nine size dimensional space of flowers from the iris species setosa, versicolor, and virginica collected by Anderson (1935).From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in cm.
We applied adaptive K-means algorithm to classify 150 samples dataset into three clusters and applied PCA algorithm to reduce the dimensional into two size.The classification accuracies achieved using the half of dataset features to learning process of back propagation neural network , the result of classification is shown as in Fig.