Improvement Method of Fuzzy Geographically Weighted Clustering Using Gravitational Search Algorithm

Geo-demographic analysis (GDA) is a useful method to analyze information based on location, utilizing several spatial analysis explicitly. One of the most efficient and commonly used method is Fuzzy Geographically Weighted Clustering (FGWC). However, it has a limitation in obtaining local optimal solution in the centroid initialization. A novel approach integrating Gravitational Search Algorithm (GSA) with FGWC is proposed to obtain global optimal solution leading to better cluster quality. Several cluster validity indexes are used to compare the proposed methods with the FGWC using other optimization approaches. The study shows that the hybrid method FGWC-GSA provides better cluster quality. Furthermore, the method has been implemented in R package spatialClust.


Introduction
Nowadays geographical data are available and easy to be accessed and getting more attention to be included in the analysis and commonly used to observe people behavior based on their location. Geo-demographic analysis (GDA) is the analysis of spatially geo-demographic and lifestyle data [1]. Geo-demographic analysis explores information based on location, utilizing several spatial analyses explicitly.
Geo-demographic analysis often uses clustering techniques to classify the geo-demographic data into groups, making the data more manageable for analysis purposes [2]. GDA relay on two assumptions: (1) two individuals who live in the same area are more likely to have similar characteristics than individuals selected at random, and (2) two areas can be characterized in terms of their population, using demographics and other measures. Based on these two principles, clustering can be applied to group geo-demographic data and lead to meaningful results [1].
In GDA, fuzzy clustering is commonly used with different approaches such as Bezdek's Fuzzy C-means Clustering (FCM), Gustafson-Kessel, Neighborhood Effect, and Fuzzy Geographically Weighted Clustering (FGWC). FCM is the most popular clustering method because it is easy to use and efficient [2]. FGWC was proposed to improve FCM on handling spatial data with incorporated geographic and neighborhood data [3]. FGWC was inspired by a hypothesis statement, if we incorporated neighborhood effect to fuzzy clustering, the result will be geographically aware [3].
Similar to the FCM method, FGWC also has limitation in initial phase. The random cluster centroid initialization makes FGWC easily trapped in local optimal solution that effect the cluster quality. Several attempts have been done by using different optimization approach, for example, Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO), and Simulated Annealing (SA) [1].
Gravitational Search Algorithm (GSA) is new optimization algorithm which focus in obtaining a global solution. To improve the cluster quality of FGWC, this research aims to integrate the Gravitational Search Algorithm to avoid FGWC falling in local optimal solution providing better clustering result.

Fuzzy C-Means (FCM)
Fuzzy C-Means is one of the most popular clustering algorithms aimed to minimize the following objective function based on membership value of the object and distance of the object to the cluster centroids [4]: where "y"_"k" is the k-th observation, c is the initial cluster number, m is degree of the fuzziness, "U" is membership matrix that contains membership degree ("μ"_"ik"), "μ"_"ik" membership degree between the i-th and cluster c, "V" is centroid matrix that contains value of cluster centroid, and "v"_"i" is cluster centroid of cluster c.
Membership degree of each object in FCM is changing during the iteration by using the following function: where "d" is the Euclidean distance between data and cluster centroid. The cluster centroid is defined as follows: Fuzzy Geographically Weighted Clustering (FGWC) FGWC proposed by Mason and Jacobson is an extension of version of FCM that more geographically aware [3]. This algorithm takes into account basic spatial interaction effect into the model. The adaptation of spatial effect is performed in each iteration of the following membership matrix calculation: where〖μ'〗_i is membership value of area i-th, μ_i is old membership value before incorporating spatial effect, and A is scale value to ensure that sum of membership matrix equal 1. The parameters α and β control the membership proportion after and before weighting a+β=1.
where p_i and p_j are number of population of area i and j, respectively, and z_ij is the distance between the two areas. The other two parameters, a and β tune the effect of the distance and population on the weight and are defined by the users.

Gravitational Search Algorithm (GSA)
GSA is one of population based algorithm [5] developed by [6]. The aim of this algorithm is improving exploration and exploitation of the population based algorithm to reach optimal solution. GSA is naturally inspired by law of motion and Newtonian gravity. Every object in GSA is called agent and the capability of each agent measured by his mass. Each agent in GSA will interact based on law of gravity. Agent with small capability will move to agent with large capability. Figure 1 shows that agent M1 is affected by agents M2, M3 and M4. According to the law of gravity, M1 has a resultant force which will make it move towards agent M3. Agent M2, M3, M4 also has resultant fore to each other.
The first step in GSA is randomly generate initial N solutions with m dimension. The agent position is represented as follows: In each iteration, the following total force in each agent (F) is evaluated: where〖x_i〗^d represents the agent position, ( ) is gravitation constant at t, M_i (t) is mass of agent i, and R_ij (t) is the euclidean distance between agent.
( ) is updated in each iteration using the following function where G_0is gravity constant. The agent mass M_i (t) is defined as follows: 〖fit〗_i (t) is current fitness value from the solution. The best and worst determined by fitness value. There are two minimization functions to get best and worst: Whereas the maximization functions: The acceleration (a) and velocity (v) each agent are defined: The last step is updating the position of each agent x.
Repeat step until maximum iteration or reach stopping criterion.

Imam Habib Pamungkas and Setia Paramana, Improvement Method of Fuzzy 13
Cluster Validity Index The main problem of cluster validity is finding objective criterion to determine the partition value from the clustering algorithm [7]. In this research we use several cluster validity index. I.e. Partition Coefficient (PC), Classification Entropy (CE), Separation Index (S), Xie Beni Index, and IFV index. Partition Coefficient measures the average number of relatively degree sharing of each object in membership matrix. The greater value of PC indicate better clustering quality.
where "μ" _ij is membership degree of item j in cluster i. The Classification Entropy (CE) index is used to define the fuzziness of partition in each cluster: The Separation Index and Xie Beni Index measure the compactness and the separation of each cluster. The minimum value of Separation Index and Xie Beni Index indicate the better clustering validity.
where v_i is the centroid of the cluster i. To validate the cluster fuzziness in spatial data, IFV is implemented as it is stable and robust [8]. Higher IFV index shows better result. (24)

Methods
The Improved FGWC using GSA As mentioned before cluster center initialization in FGWC could fall in local optimal solution easily affecting the clustering results. We propose to minimize the objective function by using GSA to initialize the initial centroid. This is the objective function that will be minimized.
Here is step by step of the proposed methods: Step 1: Determine the basic parameter, number of cluster c, degree of fuzziness m, threshold of error, maximum number of iteration, and some parameter for weighted function.
Step 2: Initialize the GSA parameter such as gravity constant G.
Step 3: Initialize the geographic weighting.
Step 6: Calculate fitness function and start to optimize FGWC using GSA.
Step 7: Update membership matrix Step 8: Perform geographic weighting to membership matrix Step 9: Repeat step 5-8 until reach stopping criteria. For more detail, the step can be seen at Figure 2 We compare the proposed method with the standard FGWC and different optimization approaches such as Particle Swarm Optimization, Artificial Bee Colony and Simulated Annealing, using a case study of Educational Profile of Jawa Tengah Province 2015 published by BPS' Statistics of Jawa Tengah Province, Indonesia. The variables were selected based on research conducted by Bustomi in 2012 [9] which conclude that inequality education in Central Java Province caused by 4 dimensions.
According to the dimension, we choose 11 variables that represent civil participation, education quality and facilities. The details of each variable is presented in Table 1. Hence the dataset used contains 11 variables of 35 regencies in Central Java Province, Indonesia. The proposed method is implemented in R and now available in CRAN (spatialClust package). The Graphical Interface is also available in FAST [10] which can be accessed through www.stis.ac.id/fast. Figure 3 shows the results of different cluster validity indexes from the case study for standard FGWC, FGWC-GSA and other approaches. The x-axis is different number of clusters, and the yaxis is the corresponding validity index.

Results and Analysis
It can be seen that FGWC-GSA give higher Partition Coefficient Index, IFV index and as Classification Entropy (CE) compared to the other approaches. Furthermore, the FGWC-GSA provide lower Separation Index and Xie Beni Index compared to the others.
In general we observed that FGWC-GSA outperforms FGWC and the other optimization approaches in all validity indexes and all number of clusters.
Based on the results of the analysis, areas in Central Java can be grouped into three clusters based on educational indicators. Cluster 3 contains regencies (such as Semarang and Salatiga) with the high educational quality. While the cluster 1 (e.g., Cilacap, Purbalingga) is the cluster with poor education quality. Cluster 2 consisting medium education quality such as Sragen and Banyumas.

Conclusion
In this research, we proposed a new method to avoid local optimal solution that may occur in initial phase of centroid in FGWC, using Gravitational Search Algorithm (GSA) approach. The results show that the proposed method out performs the standard FGWC and its other modification in terms of cluster validity.