Using GA for Optimization of the Fuzzy C-Means Clustering Algorithm

,


INTRODUCTION
Pattern recognition is a field concerned with machine recognition of meaningful regularities in noisy or complex environments.In simpler words, pattern recognition is the search for structures in data.In pattern recognition, group of data is called a cluster, Li-Xin (1997).In practice, the data are usually not well distributed; therefore the "regularities" or "structures" may not be precisely defined.That is, pattern recognition, by its very nature, an inexact science.To deal with the ambiguity, it is helpful to introduce some "fuzziness" into the formulation of the problem.For example, the boundary between clusters could be fuzzy rather than crisp; that is, a data point could belong to two or more clusters with different degrees of membership.In this way, the formulation is closer to the real-world problem and therefore better performance may be expected.This is the first reason for using fuzzy models for pattern recognition: the problem by its very nature requires fuzzy modeling (in fact, fuzzy modeling means more flexible modeling-by extending the zero-one membership to the membership in the interval [0, 1], more flexibility is introduced).
The second reason for using fuzzy models is that the formulated problem may be easier to solve computationally.This is due to the fact that a non-fuzzy model often results in an exhaustive search in a huge space (because some key variables can only take values 0 and 1), whereas in a fuzzy model all the variables are continuous, so that derivatives can be computed to find the right direction for the search.A key problem is to find clusters from a set data points.
Fuzzy C-Means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters.This method was developed by Dunn (1973) and improved by Bezdek (1981) and is frequently used in pattern recognition.The Fuzzy C-Means algorithm makes soft partitions where a datum can belong to different clusters with a different membership degree to each cluster.This clustering method is an iterative algorithm which uses the necessary condition to achieve the minimization of the objective function as presented in Zhang et al. (2008).Various segmentation techniques have been developed for image segmentation.Dipak and Amiya (2010) and Amiya and Nilavra (2011) presents in fact a GA based segmentation, which was based on the GA based clustering of gray level images.Zhang et al. (2009) presented a new modification of fuzzy c-means clustering algorithm.
Thus what we want from the optimization is to improve the performance toward some optimal point or points, Beightler et al. (1979).Luus and Jaakola (1973) identify three main types of search methods: calculusbased, enumerative and random.The work of Hesam and Ajith (2011) shows a hybrid fuzzy clustering method based on FCM and Fuzzy PSO (FPSO) is proposed which make use of the merits of both algorithms.Hall et al. (1999) describe a genetically guided approach for optimizing the hard (J 1 ) and fuzzy (J m ) cmeans functional used in cluster analysis.Our experiments show that a genetic algorithm ameliorates the difficulty of choosing an initialization for the cmeans clustering algorithms.Experiments use six data sets, including the Iris data, magnetic resonance and color images.The genetic algorithm approach is generally able to find the lowest known J m value or a J m associated with a partition very similar to that associated with the lowest J m value.On data sets with several local extrema, the GA approach always avoids the less desirable solutions.Deteriorate partitions are always avoided by the GA approach, which provides an effective method for optimizing clustering models whose objective function can be represented in terms of cluster centers.The time cost of genetic guided clustering is shown to make series of random initializations of fuzzy/hard c-means, where the partition associated with the lowest J m value is chosen and an effective competitor for many clustering domains.
The main differences between this work and the one by Hall et al. (1999) are: • This study used the least square error as an objective function for the genetics algorithm but Hall et al. (1999) used J m as an objective function.• This study optimized the weighting exponent m without changing the distance function but Hall et al. (1999) keeps the weighting exponent m = 2.00 and uses two different distance functions to find an optimal value.

METHODOLOGY
The subtractive clustering: The subtractive clustering method assumes each data point is a potential cluster center and calculates a measure of the likelihood that each data point would define the cluster center, based on the density of surrounding data points.The algorithm: • Selects the data point with the highest potential to be the first cluster center.• Removes all data points in the vicinity of the first cluster center (as determined by radii), in order to determine the next data cluster and its center location.• Iterates on this process until all of the data is within radii of a cluster center.
The subtractive clustering method is an extension of the mountain clustering method proposed by Yager and Filev (1994).
The subtractive clustering is used to determine the number of clusters of the data being proposed and then generates a fuzzy model.However, the iterative search is used to optimize the least square error from the model being generated and the test model.After that, the number of clusters is taken to the Fuzzy C-Means Algorithm.
The fuzzy c-means clustering algorithm: Fuzzy C-Means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters.This method is frequently used in pattern recognition.It is based on minimization of the following objective function: (1) where, m : Any real number greater than 1, it was set to 2.00 by Bezdek (1981) u ij : The degree of membership of x i in the cluster j x i : The i th of d-dimensional measured data c j : The d-dimension center of the cluster ||*|| : Any norm expressing the similarity between any measured data and the center Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership u ij and the c j cluster centers by: (2) (3) This iteration will stop when: (4) where, ε : A termination criterion between 0 and 1 k : The iteration steps This procedure converges to a local minimum or a saddle point of J m .The algorithm is composed of the following steps: with U(k) (5) If ||U(k+1) -U(k)||<ε then STOP; otherwise return to step 2.

The genetics algorithm:
The GA is a stochastic global search method that mimics the metaphor of natural biological evolution.GAs operates on a population of potential solutions applying the principle of survival of the fittest to produce (hopefully) better and better approximations to a solution given by Ginat (1988), Wang (1997) and Sinha et al. (2010).At each generation, a new set of approximations is created by the process of selecting individuals according to their level of fitness in the problem domain and breeding them together using operators borrowed from natural genetics.This process leads to the evolution of populations of individuals that are better suited to their environment than the individuals that they were created from, just as in natural adaptation.

RESULTS AND DISCUSSION
A complete program using MATLAB programming language was developed to find the optimal value of the weighting exponent.It starts by performing subtractive clustering for input-output data, build the fuzzy model using subtractive clustering and optimize the parameters by optimizing the least square error between the output of the fuzzy model and the

Least squares error
The weighting exponent of the optimal least squares Change the weighting exponent by GA if the number of generations is not reached Fig. 2: Random data points of Eq. ( 7); blue circles for the Table 1: The results of Eq. ( 7) Iteration m = 2.00 Iteration for the subtractive and GA for FCM Subtractive clustering Error 0.0115 m is the weighting exponent and time in seconds output from the original function by entering a data.The optimizing is carried out by iteration.After that, the genetic algorithms optimized the weighting exponent of FCM.The same way, build the fuzzy model using FCM then optimize the weighting exponent m by optimizing the least square error between the output of the fuzzy model and the output from the original function by entering the same tested data.Figure 1 shows the flow chart of the program.
The best way to introduce results is through presenting four examples of modeling of four highly nonlinear functions.Each example is discussed, plotted Then compared with the best error of original FCM with weighting exponent (m = 2.00).

Example 1-modeling a two input nonlinear function:
In this example, a nonlinear function was proposed: The range X ∈ [-10.5, 10.5] and Y 10.5] is the input space of the above equation, 200 data pairs were obtained randomly (Fig. 2).
First, the best least square error was obtained for the FCM of weighting exponent (m = 2.00) which is ) sin( * ) sin( = Random data points of Eq. ( 7); blue circles for the data to be clustered and the red stares for the testing data Error - ---------------------------------------------------------------- from the original function by entering a tested data.The optimizing is carried out by iteration.After algorithms optimized the weighting exponent of FCM.The same way, build the fuzzy model using FCM then optimize the weighting by optimizing the least square error between the output of the fuzzy model and the output y entering the same tested data.Figure 1 shows the flow chart of the program.
The best way to introduce results is through presenting four examples of modeling of four highly nonlinear functions.Each example is discussed, plotted.
best error of original FCM modeling a two input nonlinear function: In this example, a nonlinear function was proposed: (7) 10.5, 10.5] and Y ∈ [-10.5. 10.5] is the input space of the above equation, 200 data First, the best least square error was obtained for = 2.00) which is (0.0126 with 53 clusters).Next, the optimized least square error of the subtractive clustering is obtained by iteration that is (0.0115 with 52 clusters).We could see here that the error improves by (10%).Then, the clusters number is taken to the FCM algorithm, the error is optimized to (0.004 with 52 clusters) that means the error improves by (310%) and the weighting exponent (m) is (1.4149).Results were better shown in Table 1.

Example 2-modeling a one input nonlinear function:
In this example, a nonlinear function was proposed but with one variable x: The range X ∈ [-20.5, 20.5] is the input space of the above equation, 200 data pairs were obtained randomly and shown in Fig. 3. First, the best least square error is obtained for FCM of weighting exponent (m (5.1898*e-7 with 178 clusters).Next, the least square error of the subtractive clustering is obtained by iteration which was (1*e -10 with 24) clusters since this ext, the optimized least square error of the subtractive clustering is obtained by iteration that is (0.0115 with 52 clusters).We could see here that the error improves by (10%).Then, the clusters number is taken to the FCM algorithm, the zed to (0.004 with 52 clusters) that means the error improves by (310%) and the weighting exponent (m) is (1.4149).Results were better shown in modeling a one input nonlinear function: In this example, a nonlinear function was proposed also (8) 20.5, 20.5] is the input space of the above equation, 200 data pairs were obtained First, the best least square error is obtained for the = 2.00) which is 7 with 178 clusters).Next, the least square error of the subtractive clustering is obtained by 10 with 24) clusters since this Fig.3: Random data points of Eq. ( 8) Blue circles for the data to be clustered and the red stares for the testing data    ---------------------------------------------------------------------  Example 3-modeling a one input nonlinear function: In this example, a nonlinear function was proposed: (9) The range X ∈ [1, 50] is the input space of the above equation, 200 data pairs were obtained randomly and are shown in Fig. 4. First, the best least square error is obtained for the FCM of weighting exponent (m = 2.00) which is (3.3583*e-17 with 188 clusters).Next, the least square error of the subtractive clustering is obtained by iteration which is (1.6988*e-17 with 103 clusters) since the least error can be taken from the iteration.Then, the clusters number is taken to the FCM algorithm, the error was (2.2819*e-18 with 103 clusters) and the weighting exponent (m) is 100.8656.Here we could see that the number of clusters is reduced from 188 to 103 clusters that mean the number of rules is reduced and the error is improved by 14 times.Results were better shown in Table 3.The whole results were better shown in Table 4.

CONCLUSION
In this study, the subtractive clustering parameters, which are the radius, squash factor, accept ratio and the reject ratio are optimized using the GA.
The original FCM proposed by Bezdek (1981) is optimized using GA and another values of the weighting exponent rather than (m = 2) are giving less approximation error.Therefore, the least square error is enhanced in most of the cases handled in this work.Also, the number of clusters is reduced.
The time needed to reach an optimum through GA is less than the time needed by the iterative approach.Also GA provides higher resolution capability compared to the iterative search due to the fact that the precision depends on the step value in the "for loop function" which is max equal to 0.001 for the radius parameter in the subtractive clustering algorithm, but for GA, it depends on the length of the individual and the range of the parameter which is 0.00003 for the radius parameter also.So GA gives better performance and has less approximation error with less time.
Also it can be concluded that the time needed for the GA to optimize an objective function depends on the number and the length of the individual in the population and the number of parameter to be optimized.

Fig
Fig. 1: The flowchart of the software clustered and the red stares for the testing data

Fig. 4 :
Fig.4: Random data points of Eq. (9)Blue circles for the data to be clustered and the red stares for the testing data error pre-defined if the error is less than (1*e the clusters number is taken to the FCM error is ( 1.27 75*e -12 ) with 24 clusters

Table 4 :
The final least square errors and clusters number for the original FCM and for the FCM which their numbers of clusters were got from the iteratively or genetically optimized subtractive clustering