Improved Firefly Algorithm with Variable Neighborhood Search for Data Clustering

: Among the metaheuristic algorithms, population-based algorithms are an explorative search algorithm superior to the local search algorithm in terms of exploring the search space to find globally optimal solutions. However, the primary downside of such algorithms is their low exploitative capability, which prevents the expansion of the search space neighborhood for more optimal solutions. The firefly algorithm (FA) is a population-based algorithm that has been widely used in clustering problems. However, FA is limited in terms of its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and exploring the global regions in the search space. On these bases, this work aims to improve FA using variable neighborhood search (VNS) as a local search method, providing VNS the benefit of the trade-off between the exploration and exploitation abilities. The proposed FA-VNS allows fireflies to improve the clustering solutions with the ability to enhance the clustering solutions and maintain the diversity of the clustering solutions during the search process using the perturbation operators of VNS. To evaluate the performance of the algorithm, eight benchmark datasets are utilized with four well-known clustering algorithms. The comparison according to the internal and external evaluation metrics indicates that the proposed FA-VNS can produce more compact clustering solutions than the well-known clustering algorithms.


Introduction:
Data clustering has a root in a number of fields including statistics, bioinformatics, machine learning exploratory data analysis, image segmentation, security, medical image analysis, web handling, and mathematical programming. Its role is to transform data into clusters with high similarity in a common cluster and high dissimilarity in different clusters [1][2][3][4] . Clustering can be classified as partitional and hierarchical clustering 5 . The former can be represented as center-, density-, grid-, and model-based clustering, whereas the latter produces a hierarchical tree different from that former which produces spherical groups. In hierarchical clustering, the clustering performs by using ether agglomerative or divisive hierarchic approach. The latter constructs a tree by dividing the dataset into sub-clusters recursively until each data item represents a single cluster. By contrast, in the former, clustering is conducted by computing the similarity among clusters and then combining the two most similar clusters until a single cluster is observed. The advantage of hierarchical clustering is that this approach is suitable for text clustering because it reviews the data as a hierarchy of nested quality clusters. However, this approach is not suitable for big data compared with the partitional clustering that requires lower computational complexity and observed speed.
Center-based clustering is one of the main clustering approaches and uses the central concept to represent the center of clusters. Each cluster has a unique center that represents the minimum intraclustering distance between the centroid and all members of the cluster 6 . In clustering, the center concept can be represented as an object of the data, which is known as medoid-based clustering, or the mean of the objects located in the search space of the data, which is known as centroid-based clustering 7 . Centroid-based clustering can be represented as the mean of the objects in a cluster, where each object in the cluster has a minimum distance to the centroid compared with the other cluster centroids in the search space. The minimum intra-clustering within each cluster shows a good clustering quality, which becomes more difficult to obtain when the number of clusters is increased 8 . In addition, the minimum intra-clustering is considered a NP-hard problem when more than three centroids are involved 9,10 . Several distance-based algorithms have been adopted to assign objects to appropriate clusters, such as identifying disease using K-means and artificial neural network (ANN) 11,12 , fuzzy Cmeans, and multi K-means 13 . Nonetheless, finding the best initial clustering centroid and avoiding becoming stuck at the local optima are the challenges of the traditional algorithms 14 . An unsupervised approach using clustering can identify several diseases, which is promising in diagnosing strange diseases or incomprehensible behavior when no enough information is available 11,12 . For instance, in finding the abnormality of a brain tumor, K-means can be used to improve the image and mark the districts in view of their texture feature and ANN to choose the correct object in view of the training of ANN.
Several studies have classified the use of algorithms in solving the clustering problem as distance-based algorithms, including K-means, fuzzy C-means, multi K-means, and other local search algorithms, such as tabu search 15 , and population-based algorithms, such as artificial bee colony 16 , gray wolves algorithm 17 , and firefly algorithm (FA) 18 .
Recently, different optimization algorithms have been published to minimize the intraclustering distance within each cluster, such as iterative simulated annealing 19 , randomized local search algorithm 20 , and the adaptive acceptance criterion algorithm for optimization problems 21 . Nonetheless, the above algorithms are local search algorithms that intensify only the search process in the neighborhood of the clustering solutions. Thus, the algorithms are limited, and the exploration of the search space for more promising clustering solutions is weak 22 . On the contrary, populationbased algorithms have a high exploration capability and are only limited to exploit the neighborhood search space to improve clustering solutions 23 . FA has been incorporating other algorithms as a hybrid algorithm for different optimization algorithms, such as integration into two different clustering techniques, one with K-means and one with K-harmonic 24,25 , and other research, such as a hybrid model of FA and fuzzy c-means (FCM). However, the integration is still limited because the initial centroid of the FA algorithm mainly depends on another algorithm to quickly converge to a local optimum.
On these bases, this study enhances the performance of the firefly clustering algorithm by incorporating variable neighborhood search (VNS) 26 as a local search method to overcome their limitations in providing solutions to clustering problems, thereby exhibiting a promising performance in different application domains 27,28 . The FA mainly depends on the initial selection, which causes premature convergence when no neighborhood search strategies are employed to improve the quality of the clustering solutions in the neighborhood region 27 and explore the global regions in the search space 29 . VNS can enhance exploitation capability by improving clustering solutions during the algorithm process. VNS can also enhance the exploration capability using the perturbation operators, thereby avoiding the known premature convergence of the FA 30 and getting stuck at the local optima in the advance stages of the search process 31 . The contribution of VNS is by performing as a local search method with four different operations, which plays an important role in the trade-off between the exploration and exploitation abilities. Indeed, four different operations mean different neighborhoods, and thus, different landscapes can be generated. The concept of VNS with local search generates different local optima, which is local optima for a given neighborhood. Using VNS will enhance the learning process of the FA, which begins with the exploration of the search space. By contrast, research on an optimal clustering solution in the search space of the best clustering solution found during the research process is intensified.
The rest of this paper is organized as follows. Section 2 discusses the proposed FA-VNS clustering algorithm. Section 3 shows the benchmark and the evaluation performance, whereas Section 4 presents the results. Finally, Section 5 concludes this research and presents the future research direction.

Proposed FA-VNS:
FA is a population-based algorithm inspired by the nature of fireflies, simulating their flash pattern and characteristics. Owing to its simplicity and the good results obtained in the optimization problem compared with other swarm intelligence algorithms, researchers have applied FA to different optimization problems for several topics in data 411 mining, including speech recognition, image segmentation, and feature selection. The flash of a firefly is a bioluminescence operator produce by each firefly as a light to attract other fireflies and used for prey. The main purpose of a light flash is threefold, namely, to attract each other, be attractive to less bright ones, and represent the fitness function on the search space. Meanwhile, the two important issues in the main idea of the algorithm include light intensity and attractiveness. The light density reflects the objective function of a particular location, whereas attractiveness is a variable value that changes according to the distance between two fireflies. Figure 1 depicts the flowchart of the standard FA. As showing in Fig. 1, the algorithm starts by initializing all parameters as step 1. In the next step, FA evaluates all fireflies based on the objective function and then ranks them as shown in step 3. In step 4, the algorithm finds the best firefly and compares it with others in the colony, where the firefly with fewer attractiveness moves from its location to another better location as shown in step 5. The algorithm will conduct the above process until all iterations are complete, and then, the algorithm will print the best result produced by the best firefly. The algorithm utilized for unsupervised clustering includes partitional and hierarchical clustering. The FA is used to produce the optimal number of clusters and corresponding optimal centers. The optimal centers minimize the intraclustering distance between each cluster center and each item in the same cluster. However, the standard FA is limited by its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and explore the global regions in the search space. This study improves the FA by incorporating VNS in each iteration of the algorithm. Therefore, the clustering solution is also improved by using the perturbation operators of VNS to change some parts of the solution with the addition of the ability to find more centroids during the algorithm process. The modification performed based on predefined parameter 0 was initialized to 0.98 statically, which represents the probability of selecting the current iteration best solution ( ) for enhancement. In each iteration, a random value is generated [0,1] and compared with the value of 0 ; when the values are equal or the random value is greater than 0 , the local search statues become active, and one of the VNS operations is performed in the greedy concept according to a random number that represents one of the VNS operations, such as pair-swap, inversion, insertion, and displacement. The operations are discussed in detail as shown in Fig. 2.   As shown in Fig. 3, the best iteration solution is investigated to improve in a greedy manner to ensure the improvement of the solution using one of the VNS operations. The new clustering solution contains a simple modification of the position (more exploitation), such as pair-swap, or high exploration, such as the use of the displacement operation, which can exploit the benefit in the trade-off between the exploration and exploitation abilities.
The algorithm generates initial centroids randomly for each agent, where the number of centroids is statically predefined 32 , such as the example in Table 1 of three centroids carried by a single agent. Each agent attempts to optimize the fitness function by finding the minimum intra-clustering within each cluster, which represents the optimal centroids. The algorithm begins the clustering task by sorting the fireflies according to their fitness function. The two main factors of the FA are the 413 light intensity ( ) and the attractiveness ( ), which respectively represents the fitness function and the solution improvement during the algorithm process according to the brightness between the agents and the distance among agents. The improvement relative to the movement of the agents is calculated using Equation (1), where the agent has less bright moves toward the agent with a high density of the brightness, which is calculated using Equation (1).
where 0 represents the initial attractiveness, is the light absorption, which is usually initialized to 1, and 2 represents the Euclidean distance between two agent positions ( , ) on the graph, which represents the distance between the two centroids calculated using Equations (2) and (3).
where x ( ) is the new position of the agent, x ( ) is the position of current agent , and x ( ) is the position of current agent on the search space. In each iteration, the agent with less brightness is treated to improve its centroids by moving it to another position on the search space. The integration of the VNS procedure improves the solution quality as a local search and avoids premature convergence. Combining the FA and VNS improves the solution quality by maintaining the balance between exploration and exploitation. The tiny permutation used in VNS easily improves the neighborhood structure during the search process, moving the centroids of the agent to another position in the neighborhood. Similarly, the large number of permutations enhances the exploration ability, moving the agent centroids to another promising region in the search space. The iteration's best clustering solution is improved using one of the VNS operations according to the value as shown in Fig. 2 and calculated using Equation (3). If > 0 , then (VNS activated) and one of the operations are selected according to the value of , which is randomly generated [1,4] as shown in Equations (4) The purpose of > 0 is to improve the solution according to predefined value 0 set by the user. Those values guide the search toward the objective function following the quality of the clustering solution or toward the neighborhood search, which improves the clustering solution in a stochastic manner. If the value generated randomly is greater than 0 , then the algorithm selects the iteration's best clustering solution to be continued without any improvement in its local region; otherwise, the iteration's best clustering solution will be a subjected of modification using one of the four operations. The choice of the operation is based on the value of , which is a random value generated in ranges [1,4] representing one of the available operations, such as pair-swap set by one value, the inversion set by two, and so on.
In each success prepetition, the centroids of the agent are updated to new centroids if accepted, where the process is applied in a greedy manner, that is, only the best improvement is accepted in either exploration or exploitation. The greedy process improves the clustering solution in the advanced stages by accepting only the best solution and exploring the best clustering solution found in another region of the search space. Figure 4 shows the FA-VNS for clustering problems. Generate random initial centroids for each firefly , = {1,2,3, … , }, where N is the number of fireflies 3: Calculate the fitness ( ) for each firefly and update light intensity of each firefly . where ( ) is the intra-clustering distance 4: Find the best firefly using ( ) 5: while ( < _ ) do 6: for i = 1 to N do % all fireflies 7: for j = 1 to N do % all fireflies 8: if ( > ) then 9: ( Move towards ) % Attractiveness 10: end-if 11: end-for 12: end-for 13: Perform VNS according to value and value % Select one of the operations 14: Update centroids in each firefly according to their latest positions 15: Update the best firefly by ranking the fireflies and find the current best 16: end-while 17: Print the best result end-algorithm

Benchmark and Evaluation Performance:
To verify the performance of the proposed FA-VNS, internal and external evaluation matrices are used. The internal and external matrices are part of the clustering evaluation task and represent the compactness and separation of each cluster. The implementation of the code was in Java with the Weka library to perform the evaluation part, including the internal and external criteria. The analysis of the results will be compared with other clustering algorithms to show how the proposed algorithm can produce better clustering accuracy. The internal evaluation matrices are unsupervised methods that use the distance between the clustering and within the clusters to indicate the quality of clustering, such as the intra-clustering distance (intra) and Calinski-Harabasz (CH). The external evaluation matrices are supervised methods, such as entropy and F-measure, that measure the quality of clustering according to known information, such as the class of each instance, to indicate how many correct classes are placed in the same cluster.
The minimum intra-clustering distance (intra) and the CH metric 33 are the internal evaluation matrices, whereas the entropy and F-measure metrics are the external evaluation matrices used in the research. The intra-clustering distance shown in Equation (6) is the summation of the distance between the cluster centroid and the objects of the cluster. A minimum intra-clustering distance indicates that the clusters have good compactness and are well-separated from each other.
is the number of clusters, is the number of objects in cluster, is the centroid of a cluster, and is an object belonging to .
The CH metric represents the ratio of the sum of the secured error between each cluster to the within-clustering sum of secured error , and is the number of objects in the dataset. The maximum value of CH reflects a high quality of clustering solution where the ratio of to is high. The CH metric can be calculated using Equation (7).
Equation (8) measures the entropy for single clustering . The entropy of each cluster is first measured. Then, the total entropy of all clusters is calculated using Equation (9).
∈ ( ) represents the probability of object in cluster , whereas H(w) is the entropy of a single cluster. Thus, the sum of all cluster entropies is calculated according to Equation (9), which reflects well-distributed objects in their right clusters if the value of the entropy is small 34 . The F-measure metric requires two other supervised external metrics to be calculated, namely, Precision and Recall, which are used to determine the cluster assignment 35 . The two metrics can be calculated using Equations (10) and (11) respectively, where is the true positive, is the false positive, and is the false negative. The metrics are calculated before calculating the Fmeasure, and if the value of the F-measure is high, then most of the objects are assigned to the same cluster, as shown in Equation (12).
The benchmark used in the research was extracted from UCI and contains eight datasets, which are popularly used for classification and clustering tasks. Table 2 shows the datasets, which cover different application domains, such as life and physical, with different instance sizes, such as small, medium, large, and very large. A comparison was performed against well-known algorithms, including centroid FA (C-FA) 32 , genetic algorithm (GA) 36,37 , simulated annealing algorithm (SA) 38 , and K-means (KM) 39 . The KM is a static algorithm, and its iterations and maximum runs are respectively set to 1,000 and 50 because the algorithm is easily trapped at local optima. Table 3 shows the parameter setting of the above-mentioned algorithm. These parameters are set according to the literature of the clustering and the best known for all algorithms 40 .  Results: Experiential Results:  show the comparisons between the algorithms using internal and external evaluation metrics. As shown in Table 4, the comparison based on the minimum intra-clustering distance (overall performance) indicates that FA-VNS produced the best results in seven datasets, which is approximately 88% better than the other algorithms. The comparison (algorithm vs. algorithm) indicates that the FA-VNS is better than the SA, GA, and KM in all datasets (100%). The comparison between the FA and FA-VNS shows that FA-VNS produced the best result in seven datasets (approximately 88%), including obesity, segment, hepatitis C virus, vehicle, Ecoli, and mammographic. However, FA produced the best result in only one dataset, namely, the contraceptive method choice.  The comparison (overall performance) using the internal CH metric shown in Table 5 indicates that the FA-VNS performed better than the other algorithms. The proposed algorithm generated the best results in five datasets, which is approximately 63% better than the SA, GA, C-FA, and KM. The KM algorithm ranks second, obtaining the best results only in three datasets (approximate 27%). FA-VNS is better than the SA, GA, and FA in all datasets (100%). The comparison between KM and FA-VNS shows that FA-VNS produced the best results in five datasets (approximately 63%), including obesity, segment, hepatitis C virus, contraceptive method choice, and mammographic. However, KM produced the best result only in three datasets, including vehicle, Ecoli, and glass (approximately 27%).  Table 6 shows the comparison between the algorithms using the F-measure metric, which indicates that FA-VNS performed better than the other algorithms in six datasets (75%). The C-FA ranks second, obtaining the best results in two datasets (25%). The comparison (algorithm vs. algorithm) indicates that FA-VNS is better than the SA, GA, and KM in all datasets (100%). The comparison between the C-FA and FA-VNS shows that FA-VNS produced the best results in six datasets (75%), including obesity, segment, hepatitis C virus, glass, and mammographic. However, C-FA produced the best results in two datasets only, namely, Ecoli and contraceptive method choice (approximately 25%). The last comparison is based on the entropy metric, which shows how the objects are assigned to their clusters. Table 7 shows that FA-VNS is better than the other algorithms on five datasets by approximately 63%. The SA, C-FA, and KM produced the best results in only one dataset, and GA did not produce any good results (overall performance). The comparison (algorithm vs. algorithm) indicates that FA-VNS is better than the SA in all datasets (100%) and better than the GA in seven datasets (approximately 88%), including obesity, segment, hepatitis C virus, vehicle, Ecoli, contraceptive method choice, and mammographic. However, the GA produced the best result only in the glass dataset. The comparison between FA-VNS and the C-FA indicates that FA-VNS is better than the C-FA in six datasets (75%), namely, obesity, segment, hepatitis C virus, Ecoli, glass, and mammographic, whereas the C-FA obtained the best results in only two datasets (25%), including vehicle and contraceptive method choice. The comparison between KM and FA-VNS shows that FA-VNS produced the best results in six datasets (75%), including obesity, segment, hepatitis C virus, vehicle, Ecoli, and mammographic, whereas KM produced the best results only in two datasets, namely, glass and contraceptive method choice (approximately 25%). The experiments shown in Fig. 5 indicate that FA-VNS produced minimum intra-clustering in (88%) of the datasets, which is better than that produced by other algorithms. This finding indicates that the clustering results are more compact to the cluster center and well-separated according to the CH metric. The CH metric shows a high ratio between the clusters (approximately 63%), which is better than that in other algorithms. The results also show that the results of FA-VNS are more accreted according to the F-measure results, which are (approximately 75%) better than those of the other algorithms. The entropy shows the distribution of the objects to the right clusters, capturing the count of similar objects assigned in different clusters. The entropy is approximately 63% better than those of the other algorithms. The results indicate that VNS enhances the algorithm to find better clustering assignments during the algorithm process by finding promising regions in the search space and simultaneously improving the clustering solutions during the search process.    Figure 6 shows that the behavior of the two algorithms is C-FA and FA-VNS during the iteration process from 1 to 100. The FA-VNS algorithm starts with high exploration and then moves to a different region using the four operations in VNS. The FA-VNS algorithm modifies the neighbored structure of the best iteration solution to find a better quality of solutions, such as showing in iteration 61, where the algorithm moves the search process to other regions, such as the new region at iteration 67. The C-FA algorithm showing other behavior produced the same results during the algorithm run time. This result means that FA-VNS can effectively explore more regains on the search space, which in the end increases the probability to find deeper regains to have high-quality clustering solutions. The statistical analysis of paired samples Ttest is performed to test the difference in the average mean of the internal and external metrics. The p-value indicates a sufficient difference between two means of the algorithms when the pvalue is less than 0.05. Table 8 shows the statistical analysis of the paired samples T-test between FA-VNS and C-FA, which shows evidence to reject the null hypothesis. Most of the p-values are less than 0.05, except for the entropy metric in C-FA, which indicates that no substantial difference exists in the means of both algorithms.

Discussion:
One of the interesting facts is meta-heuristic performance is controlled by trade-off between the exploration and exploitation abilities. This process controls the improvement in this aspect has a high quality of solutions, as the search process is treated in sequence stages from time to time. This sequence is adapted to produce high-quality solutions through the exploration process initially, then deeply exploitation through the advanced search process in the neighbourhood region. FA as an optimization algorithm for clustering followed the same procedure of the meta-heuristic algorithms. However, FA is limited in terms of its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and exploring the global regions in the search space. To improve the FA performance, neighborhood search is required, which has a benefit of the trade-off between the exploration and exploitation abilities through the perturbation operators. This research proposed to use VNS as a local search method to maintain the diversity of the clustering solutions using the perturbation operators and improve the quality of clustering solutions in the neighborhood region using the benefit of local search method. Two aspects can be indicated for this improvement. The first aspect is to generate several landscapes with different quality of solutions. Meanwhile, the second aspect generates more diversity of the clustering solutions during the search process. Both aspect aspects control the trade-off between the  1  5  9  13  17  21  25  29  33  37  41  45  49  53  57  61  65  69  73  77  81  85  89  93  97 Intra-Cluster

Conclusion:
This study addresses the problem of improving the clustering solution through FA by using VNS as the local search method (i.e., FA-VNS). The improvement is achieved by intensifying the search process during the algorithm process and moving the search process using the VNS premutation to find more promising regions in the search space. VNS with several operations, such as pair-swap, inversion, insertion, and displacement, provide different neighborhoods. Different neighborhoods generate several landscapes and support the algorithm to find more clustering solutions and avoid being stuck at local optima. Therefore, the result of the performance of FA-VNS has been compared with well-known clustering algorithms on UCI Machine Learning Repository datasets. The proposed algorithm produces a better clustering solution than the other clustering algorithms using internal and external evaluation metrics. The reason is that the learning process of the proposed FA-VNS algorithm that can find a promising region on the search space increased as the algorithm begins with the high exploration looking for global regions. By contrast, the search toward an optimal clustering solution in the search space of the best clustering solution found during the search process is intensified. The premature convergence of FA pushed the author of this research to contribute to the utilization of neighborhood search strategies of VNS to improve the quality of clustering solutions by finding global regions in the search space and avoiding a local optima problem. Furthermore, the advantage of using VNS with the FA is twofold. The first aspect is to generate several landscapes with different quality clustering solutions while making more improvements to the local search to find deeper local regions. Meanwhile, the second aspect is to maintain more diversity of the clustering solutions during the search process according to the perturbation in VNS, which allows a high probability to improve the quality of clustering solutions in the neighborhood region and explore the global regions in the search space.
The proposed FA-VNS algorithm has produced better clustering results compared with other algorithms in terms of internal and external evaluation criteria. However, the FA-VNS algorithm still has a limitation in some parts. For instance, the choice of operations is based on a randomly generated number and does not provide efficient information on the best operation for a particular dataset. Another limitation is the time complexity where the operations in the neighborhoods require more time to find more solutions. Furthermore, the algorithm cannot find the right number of clusters that is required by users as a predefined parameter.
Future research should focus on evaluating the proposed algorithm on other datasets using other evaluation criteria. An online parameter adaption is used to optimize the parameter in VNS to select the best operation for a particular dataset, including self-adaptive strategy, adaptive strategy, and searchbased strategy. Other suggestions for future research exploring other search methods include guided and iterated local search with additional comparisons to find its effect on the performance of the algorithm in terms of accuracy and time complexity.