Performance of Clustering on ANFIS for Weather Forecasting

This paper proposes the comparison of using K-Means and Fuzzy C-Means (FCM) to optimize the premise parameters on Adaptive Neuro-Fuzzy Inference System (ANFIS) for weather forecasting. The ANFIS architecture groups each of the feature inputs in the first layer into three clusters, and uses three rules for the second layer. The comparison is performed based on the RMSE value and the number of iteration. The testing is done on the percentage of 40%, 50%, and 60% of the total data. In addition, the testing is done by grouping the data based on season called rainy and dry seasons. The testing results show that both K-Means and FCM havealmost the same RMSE, except for rainy season where K-Means has better RMSE. However, K-Means requires relatively more iterations to achieve convergence. The use of FCM, in general, gives better results than K-Means. It is also shown that ANFIS provides the best performance for data onto the dry season.

algorithm [7], and traffic volume forecasting [8]. These studies demonstrate the superiority over the neural network in the prediction or forecasting. However, the data obtained is sometimes incomplete or contains fuzzyness as well as observational data onto geometeorology stations. Besides, forecasting sometimes also involves certain rules to produce a decision. Thus, learning with the neural network cannot handle this problem.
Adaptive Neuro-Fuzzy Inference System (ANFIS) is a combination of the concept of backpropagation neural networks and fuzzy logic. Backpropagation neural network has the ability in recognizing a data based on a set of features that act as system input. The fuzzy system can be expressed as knowledge in the form "ifthen" which gives the advantage such as not to require mathematical analysis to model. Therefore, ANFIS can be used for time series data forecasting [9], electricity forecasting [10], wind speed profile [11], atmospheric turbidity [12], micro-generation performance [13], and others.
ANFIS architecture consists of five layers which each layer has a different treatment. One of the layers that constitute adaptive layers is the first layer, wherein each node in this layer is an adaptive node [14]. The value of this layer can be obtained by fuzzification the input using bell function. Furthermore, the initial steps that can be used to get the premise parameter of bell function of this layer are clustering process.
Currently, there are two used clustering methods, namely K-Means and Fuzzy C-Means (FCM). K-Mean is a simple technique and can be applied to several applications [15][16][17][18][19][20]. Moreover, K-Means can be used for data image [15,16] or multivariate geophysical data [17]. K-Means also produces high accuracy when it is implemented in micro hydropower availability and even on gradual data. On the other hand, FCM can be applied for data that have fuzziness characteristic [20] and implemented on a various application such as image segmentation [21], a medical image with (Communication & Information Technology) Journal 12(1), 43-49, 2018.
Some researches regarding the performance of K-Means and FCM have been carried out for several problems. The first research compares the ability of K-Means and FCM in the process of color image segmentation [24,25]. The other researches in image processing also compared the K-Means and FCM for the quantification of color and graphic images [25]. The research conducted by Velmurugan and Santhanam [26] compares these two methods when it is applied to cluster huge data with normal and uniform distribution. The results of these three researches demonstrate that K-Means has a similar performance to the FCM, but FCM has a longer computation time compared to K-Means. Another research compares the performance of the K-Means and FCM in the process of edge preservation and image despeckling of Synthetic Aperture Radar (SAR). The results indicate that the FCM has higher efficiency than K-Means [27].
According to the previous explanation, the effect of using K-Mean and FCM on ANFIS is important to be further observed for weather forecasting. This paper analyzes the effect of using those clustering techniques on ANFIS. Henceforth, the testing is performed to analyze the performance of ANFIS for forecasting the time series weather data. In addition, this paper performs testing for ANFIS parameters such as learning rate value and the number of iterations to obtain convergent condition and the accuracy of the system.

II. RESEARCH METHOD
The ANFIS for weather forecasting process accepts four input parameters. There are temperature, humidity, wind speed, and air pressure. The output consists of a single value that represents weather condition. This value is classified into sunny weather, cloudy, and rainy.
ANFIS architecture used in this research consists of five layers. The layer symbolized by the box is an adaptive layer, while the circle is fixed layer. Each output of each layer is symbolized by O l,i with i is the order for nodes and 1 is the order of the layers. The ANFIS network architecture used in this research is shown in Fig 1 . Every i node in the first layer is an adaptive node and its value represents the degree of membership such shown on Equations (1) and (2).
Where x and y are the input for the ith node, and A i is the language labels (linguistic labels) such as cold, cool, and others. In this paper, O 1,i is the degree of membership of a fuzzy set temperature, humidity, wind speed, and air pressure. A membership function parameters can be approximated by the bell function as shown on Equation (3).
where a and c are the parameters of membership functions or as the premise parameters and µ A1 (x) is the degree of membership. The value of the premise parameters can be calculated by using clustering techniques like K-Means and FCM. Each input on this layer consists of three linguistics (low, medium and high). Therefore, three clusters are formed into each input.
Each node in second layers is non-adaptive (fixed parameter). It can be obtained by multiplying all incoming signals from the first layer. Every node output represents the degree of activation (firing strength) of the fuzzy rule.
The third layer is also non-adaptive. Each node is the degree of normalized activation and can be calculated using Equation (4). The function can be extended by dividing wi with the total w for all rules if more than two rules are established.
Each node in the fourth layer is also an adaptive node with the function shown on Equations (5) and (6). or wherew i is the normalized degree of activation of the third layer. Then, p i , q i is the parameter set of this node, y i is the output, and {c i1 , c i2 , . . . , c in , c i0 } are the parameters at the node (neurons). The parameters in this layer are called consequent parameters.

A. K-Means Clustering
K-means is a clustering algorithm starting with the selection the number of clusters to be formed (K). Initialize K cluster centers of this research are done using a random method. The K value will be selected as the center of the cluster or commonly called the centroid. After that, it calculates the distance from all available data in each centroid using the Euclidian formula to find the closest distance from each data onto the available centroid. The closest distance between the data in a particular cluster will determine the cluster where the data are included. The Euclidean distance is calculated using Equation (7). where is data variable, and Y = (y 1 , y 2 , . . . , y j ) is a variable at cluster center.
Next is recalculating the cluster centers using the new value of cluster membership. Then, it categorizes each data using the new cluster centers. If the center of the cluster is no longer changed, the clustering process is completed. If it is not, it is back to step of finding the closest distance from each data until the cluster centers do not change anymore. Upon completion, the obtained data have been clustered. This data is used to calculate mean and standard deviation. The value of mean and standard deviation has been calculated is used as the initial value of a and c in Equation (3).

B. FCM Clustering
FCM is a data clustering technique where each data in a cluster is determined by the degree of membership. The basic concept of FCM is the determination of the cluster centers. It will mark the location of each cluster. This process will be done repeatedly so that the center of the cluster will move towards the right location. This iteration is based on the minimization of an objective function. It describes the distance from the given data point of a cluster center weighted by the membership degrees of the data points. The output of the FCM is a sequence of cluster centers and membership degree for each data point. The steps of FCM are described as follows [28].
1) Determine the matrix X kj . It is the data that are clustered. K is the number of data clustered and j is the number of attributes (criteria). 2) Determine the parameters of FCM such as the number of clusters formed (c ≥ 2), weighting (w > 1), maximum iteration (n max ), stopping criteria/threshold (ǫ is small positive value), and initial objective function (P 0 ). 3) Determine the initial partition matrix U ki (degree of membership in a cluster) using the random method. K is the number of data clustered and i is number of clusters. 4) Calculate the center of the cluster (V ) for each cluster, using the Equation (8).
where V ij is the center of the i th cluster and j th attributes. Then, µ ik is a partition data (at matrix U ) on the i th cluster and k th data. X kj is the data (at matrix U ) in the j th attribute and k th data. Then, w is the weighting factor. 5) Calculate the objective values (P n ) using Equation (9).
where P n is objective value on n th iteration and d ik is the function of the Euclidean distance of the i th cluster centre and k th data. 6) Fix the membership degree of data in each cluster (correction of partition matrix) using Equation (10).
with the value of d ik can be calculated using Equation (11).
where d ik is the Euclidean distance from the cluster center of the i th and k th data. X kj is the data (at matrix U ) on the j th attribute and k th data. 7) Terminate iteration if cluster center V does not change. Another alternative stopping criterion is if the change in the error value is less than threshold |P n − P n−1 | < ǫ and the iteration exceeds the  Clusters are selected based on the value of the largest partition matrix. Upon completion, the data that have been clustered will be used to calculate the value of mean and standard deviation. These data are used to calculate mean and standard deviation. The values of mean and standard deviation have been calculated can be used as the initial values of a and c in Equation (3).

A. Data and Testing Scenario
Data used at this research are obtained from Indonesian Agency for Meteorology, Climatology, and Geophysics 'Badan Meteorologi, Klimatologi, dan Geofisika' (BMKG) of Karangploso District, Malang, East Java, Indonesia. The example of data is shown at Table I. Data are obtained from January 2011 to May 2012. The data consist of daily data of temperature, humidity, wind speed, air pressure, and weather category. Based on the discussions with experts from BMKG and observations of ANFIS calculations, weather categories are converted to 1 for sunny, 2 for cloudy, and 3 for rainy. This weather value is used as the target value of the calculation of Root Mean Square Error (RMSE).
The test is done on time series data. Testing is carried out after the training for systems of various combinations of training data and testing data. The training process is done to obtain the best performance of the system. It means that the training process is performed until the convergence condition is achieved.
Testing is performed for some scenarios. In the first scenario, the test is performed in the combination of training data as much as 40%, 50%, and 60% of the total available data and the testing data is 30%, 40%, and 50%. In the second scenario, the testing is done by splitting the data based on seasons namely the rainy season and dry season. In addition, testing is done for a short period such as one to three months. The data in the dry season are from June to September and the data in the rainy season are from November to April. Meanwhile, the data on May and October are considered as data transition from rainy season to dry season and vice versa.
To analyze the result, testing for ANFIS using K-Means and FCM is done separately for each scenario. Each test is performed five times. Then, the average of RMSE and the number of iteration are recorded.
The test results of the first scenario, the second and the third, respectively are shown in Tables II-IV.  Table II shows the result of learning on several training data such as 40%, 50%, and 60%. Based on Table II, it can be seen that implementation of K-Means results in the smaller RMSE for more training data. However, the number of iterations does not show the same pattern, where the more data does not guarantee that the iteration number will increase. The test shows the smallest RMSE value is found in the combination of the 60% of training data and 40% of testing data. It also shows that the accuracy and RMSE value are inversely proportional to the training data of 40% and 50%. It implies that the smaller RMSE will produce better accuracy. This is not found in the 60% of training data, where the condition is reversed. The result of the test by using the comparison ratio of the training data and the test data shows that the best accuracy is found in the training data at least 50%.
While on FCM, Table II shows that the smallest RMSE and the highest accurate value is found in the combination of 60% of training data and 40% of test data. It also shows that the increasing training data will result in decreasing RMSE. Surprisingly, the number of iteration decreases in line with the increasing of training data, except for 60% of training data and 30% of test data. Normally, the more training data will cause the longer learning process. It also shows that smaller RMSE does not guarantee better accuracy. Table III shows the result of learning on the rainy season. The learning shows that the more training data result in smaller RMSE for K-Means. However, the number of iteration does not have the same manner. It has a random pattern. It also shows that RMSE for rainy season data is worse than that one in Table II. Besides that, the value of RMSE in Table II is also stabler than the RMSE from rainy training data. It also shows for rainy data, RMSE does not provide a good result with the value over 1. The highest accuracy is obtained in one-month training data and one-month test data. For other test data, the accuracy value is below 60%, and the lowest value is 34.46%. The training  process in November also needs a long time. This condition is caused by the high variation in some data on November. Therefore, the network also needs more time to achieve convergence. However, adding the number of training data will decrease the RMSE and the number of iteration.
Furthermore, implementation of FCM on rainy data does not provide a good RMSE that the value is higher than 1. The training process in November has the highest accuracy, but it needs quite a long time. This condition is caused by the high variation in some data on November. Therefore, the network also needs more time to achieve convergence. However, by adding the number of training data, it will decrease the RMSE and the number of iteration.
The result of learning the dry season is shown in Table IV. By using K-Means, it shows that the RMSE has the smallest value compared to the random training data in Table I and the rainy data in Table III. This is caused by the data variation. The data of dry season have insignificant variation than the data of rainy season. However, the more training data in the dry season does not guarantee that will give smaller RMSE. It can be seen from the training data between June and (Communication & Information Technology) Journal 12(1), 43-49, 2018.
August. The values are higher than the previous data. The test results show a good accuracy that the average is above 80%. Moreover, the stable results above 90% are shown in the June to August training data with test data of three months.
On the other hand, the use of FCM shows that the increase of training data will result in the decreasing of RMSE values and the number of iteration. The average accuracy is above 80% and more training data will increase the accuracy.
The result from Table I indicates that the RMSE on ANFIS by using K-Means is slightly higher than FCM. However, the result from rainy and dry season shows the opposite. In these two seasons, K-Means have lower average RMSE than FCM. The result of FCM on the rainy data has quite high RMSE values. Both Tables III and IV show that FCM result is more stable RMSE than K-Means. However, FCM gives unstable RMSE while it is implemented in rainy data. In general, FCM gives stable results from the mix of data from dry and rainy seasons and data during the dry season. K-Means give better results if it is used to forecast the rainy data.
The comparison of iteration for ANFIS to achieve convergence shows that FCM needs fewer iterations to achieve convergence, except for some rainy data. In addition, the number of iterations of the FCM is also more stable than K-Means. The use of FCM also produces the higher accuracy in some combination of training-testing data and in the rainy season data. In the test by using a percentage consisting of combined data on rainy and dry seasons, FCM also produces a lower RMSE. Thus, the use of FCM can shorten the training process with lower RMSE and higher accuracy.

IV. CONCLUSION
This paper compares the performance of K-Means and FCM on ANFIS for weather forecasting activities. The ANFIS architecture for the testing is three neurons for layers 1-4, and one neuron output. The results show that K-Means obtain the initial membership value of input on ANFIS and give lower RMSE than FCM for the data in the rainy and dry season. However, K-Means require more iteration to achieve convergence and slightly unstable RMSE result. In addition, test results show that both K-Means and FCM are best to predict the data in the dry season because the fluctuation of data for this season is low. The use of FCM, in general, gives better results than K-Means. However, For the better result, it needs to optimize the number of neuron in each layer, especially on layer 1, 2, and 3.