Improving K-means Clustering with Enhanced Firefly Algorithms

. In this research, we propose two variants of the Firefly Algorithm (FA), namely inward intensified exploration FA (IIEFA) and compound intensified exploration FA (CIEFA), for undertaking the obstinate problems of initialization sensitivity and local optima traps of the K-means clustering model. To enhance the capability of both exploitation and exploration, matrix-based search parameters and dispersing mechanisms are incorporated into the two proposed FA models. We first replace the attractiveness coefficient with a randomized control matrix in the IIEFA model to release FA from the constraints of biological law, as the exploitation capability in the neighbourhood is elevated from a one-dimensional to multi-dimensional search mechanism with enhanced diversity in search scopes, scales, and directions. Besides that, we employ a dispersing mechanism in the second CIEFA model to dispatch fireflies with high similarities to new positions out of the close neighbourhood to perform global exploration. This dispersing mechanism ensures sufficient variance between fireflies in comparison to increase search efficiency. The ALL-IDB2 database, a skin lesion data set, and a total of 12 UCI data sets are employed to evaluate efficiency of the proposed FA models on clustering tasks. The minimum Redundancy Maximum Relevance (mRMR)-based feature selection method is also adopted to reduce feature dimensionality. The empirical results indicate that the proposed FA models demonstrate statistically significant superiority in both distance and performance measures for clustering tasks in comparison with conventional K-means clustering, five classical search methods, and five advanced FA variants.


Introduction
Clustering analysis is one of the fundamental methods of discovering and understanding underlying patterns embodied in data by partitioning data objects into several clusters according to measured or perceived intrinsic characteristics or similarity [1].As a result of the clustering process, data samples with high similarity are grouped in the same cluster, while those with distinctions are categorized into different clusters.Clustering analysis has been widely adopted by many disciplines, such as image segmentation [2][3][4][5][6][7][8], text mining [9][10][11], bioinformatics [12,13], wireless sensor networks [14,15], and financial analysis [16].In general, conventional clustering algorithms can be broadly categorized into two groups: partitioning and hierarchical methods.The partitioning methods divide data samples into several clusters simultaneously, where each instance can only exclusively belong to one specific cluster.On the other hand, the hierarchical methods build a hierarchy of clusters, either in an agglomerative or divisive mode.K-means (KM) clustering is one of the popular partitioning methods, and is widely used owing to its simplicity, efficiency, and ease of implementation [1].
Despite the abovementioned merits, KM clustering suffers from a number of limitations, such as initialization sensitivity [1,17], susceptibility to noise [18,19], and vulnerability to undesirable sample distributions [19].Specifically, real-life clustering tasks pose diverse challenges to KM clustering, owing to complexity embedded in data samples, such as immense dimensionality, disturbance of noise and outliers, irregular, sparse, and imbalanced sample distributions, and clusters with overlap or narrow class margins [1].These complexities overtly violate restrictive assumptions embedded in KM, i.e. spherical sample distributions and evenly sized clusters, therefore leading to limitations in interpretability for such complex data distributions [18,19].Moreover, KM suffers from initialization sensitivity and local optima traps owing to its operating mechanism of local search around the configuration of initial centroids [1,17].As characterised by their powerful search capability in terms of exploration and exploitation, metaheuristic search algorithms have been widely employed to assist KM to escape from local optima traps by exploring and obtaining more optimized configurations of initial centroids.The negative impacts imposed by challenging real-life data can, therefore, be mitigated owing to more accurate cluster identification resulted from the optimized centroids.The effectiveness of such hybrid clustering models has been extensively validated by empirical studies, e.g.Tabu Search (TS) [20,21], Simulated Annealing (SA) [22], Genetic Algorithm (GA) [23], Artificial Bee Colony (ABC) [24,25], Ant Colony Optimization (ACO) [26,27], Particle Swarm Optimization (PSO) [27][28][29], Cuckoo Search (CS) [29,30], Firefly Algorithm (FA) [31,32], Gravitational Search Algorithm (GSA) [33,34], Black Hole Algorithm (BH) [35], and Big Bang-Big Crunch algorithm (BB-BC) [36].
As one of newly proposed metaheuristic search algorithms, FA possesses unique capability of automatic subdivision in comparison with other metaheuristic search algorithms.This unique property endows FA with advantages in tackling multimodal optimisation problems, such as clustering analysis, which entail sub-optimal distraction and high nonlinearity [37][38][39][40][41][42].However, the original FA model has limitations in search diversity and efficiency.As an example, with respect to search diversity, the search behaviours in FA are always constrained to a diagonal-based search in principle for any pair of fireflies in comparison.Owing to such a diagonal-based search action, instead of a region-based one, the search process tends to reduce the probability for fireflies to identify more promising search direction, leading to stagnation.On the other hand, with respect to search efficiency, the current search mechanism forces one firefly to approach the brighter ones in the neighbourhood without considering the fitness distinctiveness between them.As a result, many movements become futile and ineffective in navigating the search process to a more promising region, since there is no difference for movement towards neighbouring fireflies with large or small fitness differences to that of the current individual.Therefore, search efficiency is compromised with constrained search diversity.The limitations of KM clustering, these identified deficiencies of FA, and diverse challenges of real-life clustering tasks constitute the major motivations of this research.This research aims to address the above drawbacks of the original FA model and resolve the initialization sensitivity and local optima traps of conventional KM clustering.Two modified FA models, namely inward intensified exploration FA (IIEFA) and compound intensified exploration FA (CIEFA), are proposed.As one of the main contributions of this research, two novel strategies are formulated to increase search diversification and efficiency.Firstly, a randomized control matrix is proposed in IIEFA to replace the attractiveness coefficient in the original FA model, in order to intensify exploitation diversity.It enables the diagonal-based search action in the original FA model to be elevated to a multi-dimensional region-based search mechanism with greater scales and directions in the search space.Secondly, besides the above strategy, the diversity of global exploration is enhanced in CIEFA by dispersing fireflies with high similarities in the early stage of the search process and relocating them in various directions and scales outside the scope between fireflies in comparison.This enables the distribution of the firefly swarm to expand to a more substantial space, therefore less likely to be trapped in local optima.The search efficiency is also improved by the guarantee of sufficient variance between fireflies in comparison, especially in the early convergence stage.The proposed FA models are incorporated into the KM clustering algorithm to enhance its clustering performance.The minimum Redundancy Maximum Relevance [43] (mRMR)-based feature selection method is adopted to reduce feature dimensionality.A total of 12 UCI data sets, a skin lesion data set, and the ALL-IDB2 database are used to evaluate the proposed models.Five clustering performance indicators, i.e. intra-cluster distances, accuracy, sensitivity, specificity, and Fscore M , are used to indicate the model efficiency.The empirical results indicate that the proposed IIEFA and CIEFA models demonstrate a superior capability of dealing with both high-dimensional as well as low-dimensional clustering tasks, and outperform the KM clustering algorithm, five classical search methods, and five other FA variants statistically.
The rest of the paper is organized as follows.Section 2 introduces conventional KM clustering and FA models, modified FA variants, and the incorporation of metaheuristic algorithms with clustering models for clustering analysis.In Section 3, the proposed FA models, namely IIEFA and CIEFA, are presented comprehensively.Section 4 presents the evaluation of the proposed models and comparison with other methods.Section 5 further explains the distinctiveness of the two proposed IIEFA and CIEFA models.Conclusions are drawn and future research directions are presented in Section 6.

Related Research
In this section, we firstly introduce the conventional KM clustering and FA models.Then, we review FA variants and clustering models, which incorporate metaheuristic algorithms, in the literature.

K-means Clustering
The KM clustering algorithm partitions data samples into different clusters based on distance measures.It finds a partition such that the squared error between the empirical mean of a cluster and the points in the cluster is minimized [1].Let  = { 1 ,  2 , … ,  } be a set of  data samples to be clustered into a set of  clusters,  = {  ,  = 1, … ,}.The goal of KM clustering is to minimize the sum of the squared error over all  clusters, which is defined as follows: () = ∑ ∑ (  −   ) 2     ∈   =1 (1) where   ,   ,   , and  represent the  ℎ cluster, the centroid for  ℎ cluster, data samples belonging to the  ℎ cluster, and the total number of clusters, respectively.
In KM clustering, cluster centroids are initialized randomly.Data samples are assigned to the closest cluster, which is determined by the distances between the corresponding centroid and data samples.The centroid of each cluster is updated by calculating the mean value of all data samples within the respective cluster.Then, the process of partitioning data samples into the corresponding clusters is repeated according to the updated cluster centroids until the specified termination criteria are met.The KM clustering algorithm shows impressive performances for a wide range of applications, including computer vision [44], pattern recognition [45] and information retrieval [46].It often serves as a pre-processing method for other complex models to provide an initial configuration.
Despite the advantages and popularity, KM clustering suffers from a number of limitations owing to its restrictive assumptions and operating mechanisms.One of the key drawbacks of KM is initialization sensitivity [1,17].Specifically, the process of minimizing the sum of intra-cluster distances in KM is, in essence, a local search surrounding the initial centroids.As a result, the performance of KM heavily depends on the initial configuration of cluster centroids.In addition, owing to its operating mechanisms and the randomness during centroid initialization, KM is more likely to suffer from local optima traps.This drawback of KM clustering serves as one of the main motivations of this research.

Firefly Algorithm
The FA model performs the search operation according to the foraging behaviours of fireflies [47].In FA, a swarm of fireflies is initiated randomly, and each firefly denotes one initial solution.A fitness score is calculated based on the objective function of each firefly, which is then assigned as the light intensity.According to [47], fireflies with lower light intensities are attracted to those with strong illuminations in the neighbourhood, as defined in Eq. (2).
where  and  denote fireflies with lower and higher light intensities, respectively, while    and    denote the current positions of fireflies  and  at the  ℎ iteration, respectively.Parameter  0 is the initial attractiveness while  is the light absorption coefficient, and   denotes the distance between fireflies  and .In addition,   is a randomization coefficient, while   is a vector of random numbers drawn from a Gaussian distribution or a uniform distribution.
The major advantage of FA lies in its attraction mechanism.The attractiveness-based movements enable the firefly swarm to automatically subdivide into subgroups, where each group swarms around one mode or a local optimum solution [40,47].When the population size is sufficiently higher than the number of local optima, the subdivision ability in FA is able to find all optima simultaneously in principle, and, therefore, attain the global optima.This automatic subdivision ability enables the FA model to tackle optimisation problems characterised as highly nonlinear and multimodal, which exactly match the characteristics of clustering problems evaluated in this research, namely data sets with many local optima traps and nonlinearity.

FA Variants
While the original FA model demonstrates some unique properties in its search mechanism, it suffers from slow convergence and high computational complexity, owing to its behaviour of following all brighter fireflies in the neighbourhood [48].Additionally, fireflies can fall into stagnation during the search process, as the distance between fireflies increases and the attractiveness component ( 0  −  2 ) approaches zero.Many FA variants have been proposed to overcome these problems by increasing the exploration ability and search diversification of the original FA model.The strategies employed to improve the original FA model can be generally categorized into three groups, i.e. adaptive processes of parameter tuning, population diversification, and integration of hybrid search patterns [49].Ozsoydan and Baykasoglu [50] proposed a quantum firefly swarm model to tackle multimodal dynamic optimization problems.Four strategies were incorporated into their model: (1) multiswarms based search; (2) two types of movements undertaken by neutral and quantum fireflies respectively in each sub-swarm; (3) simplification of firefly position updating; and (4) employment of two sub-swarm prioritizing techniques, i.e. sequential selection and roulette wheel selection.The quantum firefly swarm model was evaluated with the Moving Peaks Benchmark problem to locate and track the moving optima.The obtained results indicated that the quantum firefly swarm model was competitive and promising in comparison with 13 well-known algorithms in dynamic optimization problems, including mCPSO-with anticonvergence, mCPSOwithout anticonvergence, mQSO-with anticonvergence, mQSO-without anticonvergence, SPSO, rSPSO, BSPSO, RWS, and SPSO-PD.Banerjee et al. [51]  Baykasoglu and Ozsoydan [52] proposed a variant of FA, i.e., FA2, with two strategies: (1) replacing the exponential function with an inverse function of distance as the attractiveness coefficient, and (2) constructing a threshold probability for a firefly's position to be updated or otherwise.The FA2 model was tested by both static and dynamic multidimensional knapsack problems.The obtained results indicated that FA2 was more effective than GE, DA, and FA.Sadhu et al. [53] proposed a Q-learning induced FA (QFA) model.Q-learning was used to generate light absorption coefficient  and randomization coefficient   with a fitness rank based rewarding and penalizing mechanism.The generated pair, < ,   >, was capable of producing high-performing fireflies in each step.The QFA model was tested with fifteen benchmark functions in CEC 2015, and with a real-world path planning problem of a robotic manipulator with various obstacles.The empirical results confirmed the superiority of the QFA model in terms of solution quality and run-time complexity in comparison with other algorithms, e.g.AFA (adaptive FA), DEsPA (Differential Evolution with success-based parameter adaption), SRPSO (Self-regulating PSO), SDMS-PSO2 (Self adaptive dynamic multi-swarm PSO), SLPSO (social learning PSO), and LFABC (Levy flight Artificial Bee Colony).Zhang et al. [54] proposed a modified FA model for feature selection by incorporating three strategies, i.e. the improved attractiveness operations guided by SA-enhanced neighbouring and global optimal signals, chaotic diversified search mechanisms, and diversion of weak solutions.The modified FA model was tested with feature selection problems using 29 classification and 11 regression benchmark data sets.The experimental results indicated that the proposed FA variant outperformed 11 classical search methods in undertaking diverse feature selection tasks, i.e.PSO, GA, FA, SA, CS, TS, Differential Evolution (DE), Bat Swarm Optimization (BSO), Dragonfly Algorithm (DA), Ant-Lion Optimization (ALO), Memetic Algorithm with Local Search Chain (MA-LS), and 10 popular FA variants, i.e.FA with neighbourhood attraction (NaFA) [48], SA incorporated with FA (SFA) [55], SA incorporated with both Levy flights and FA (LSFA) [55], Opposition and Dimensional FA (ODFA) [56], FA with logistic map as the attractiveness coefficient (CFA1) [57], FA with a Gauss map as the attractiveness coefficient (CFA2) [58], FA with a variable step wise (VSSFA) [59], FA with a random attraction (RaFA) [60], a modified FA incorporating chaotic Tent map and global best based search operation (MCFA) [61], and a hybrid multiobjective FA (HMOFA) [62].
FA and its variants have also been widely used for solving multimodal optimisation problems.Gandomi et al. [41] applied FA to a set of seven mixed variable structural optimization problems with nonlinearity and multiple local optima.The empirical results indicated that FA was more efficient than other metaheuristic algorithms, such as PSO, GA, and Harmony Search (HS), on these optimization tasks.Nekouie and Yaghoobi [38] proposed a hybrid method on the basis of FA for solving multimodal optimisation problems.In their study, KM was used to cluster the FA population into several subpopulations.FA with a roaming technique was employed to identify multiple local optima, while SA was used to further improve the local promising solutions.A set of 15 multimodal test functions was used to evaluate the effectiveness of the hybrid model.The empirical results demonstrated its great advantages over other methods such as Niche GSA (NGSA), r2PSO (a l-best PSO with a ring topology and each member interacting with its immediate member on its right), r3PSO (a l-best PSO with a ring topology and each member interacting with its immediate members on both its left and right), r2PSO-lhc (r2PSO with no overlapping neighbourhoods), FER-PSO (Fitness Euclidean-distance Ratio based PSO), and SPSO (Speciation-based PSO).Zhang et al. [39] proposed a modified FA model for ensemble model construction for classification and regression problems.Their FA variant embedded attractiveness strategies guided by both neighbouring and global promising solutions, as well as evading mechanisms with the consideration of local and global worst experiences.Their FA variant was evaluated with standard, shifted, and composite test functions, as well as the Black-Box Optimization Benchmarking test suite and several highdimensional UCI data sets.The experimental results indicated that their FA model outperformed several stateof-the-art FA variants and classical search methods in solving diverse complex unimodal and multimodal optimization and ensemble reduction problems.Yang [42] proposed a multi-objective FA model (MOFA) for solving optimization problems with multiple objectives and complex nonlinear constraints.Evaluated with five mathematical artificial landscapes with convex, nonconvex, discontinuous Pareto fronts, and complex Pareto sets, the empirical results indicated that MOFA outperformed seven established multi-objective algorithms, i.e. vector evaluated GA (VEGA), Non-dominated Sorting GA II (NSGA-II), multi-objective DE (MODE), DE for multi-objective optimization (DEMO), multi-objective Bees algorithms (Bees), and Strength Pareto Evolutionary Algorithm (SPEA).A comprehensive review on evolutionary algorithms for multimodal optimization is also provided in [63].
Despite the abovementioned studies, there are certain limitations in search diversification imposed by the strict obedience of biological laws in the original FA model.These limitations are rarely addressed in the existing literature.Specifically, the position updating strategy in FA in Eq. ( 2) is constructed according to the firefly foraging behaviours, which is employed to guide one firefly to approach another with a higher light intensity by multiplying the position difference of these two fireflies (   −    ) with their relative attractiveness component ).While the inheritance of biological laws enables one firefly to approach another with a more favourable position, the dimensionality and diversity through the approaching process are severely constrained, since the movement can only happen on the diagonal direction composed by two fireflies, in accordance with the formula.As illustrated in Fig. 1, in a two-dimensional scenario, the green and red dots symbolize fireflies  and .If we view both fireflies as vectors, the position difference of these two fireflies (   −    ) can be represented by the red line denoted as ∆ in Fig. 1.The calculation of attractiveness practically imposes one constant isotropic factor (  0  −

2
) on all dimensions of the position difference between fireflies  and , therefore, the lack of variance among different dimensions.As a result, instead of exploring flexibly in the entire solution space, the fireflies can merely move along the specific diagonal trajectory between two fireflies in comparison, and the search area is shrunk drastically from a two-dimensional green rectangular into a onedimensional red line, as shown in Fig. 1.Therefore, the chances of finding the global optima are reduced, since search diversification is constrained severely owing to the limitations of the biological laws in the original FA model.In order to mitigate the limitations, matrix-based search parameters and dispersing mechanisms are proposed in this research, which are incorporated into the proposed FA models to enhance exploitation and exploration.[32], two FA variants, namely the probabilistic firefly KM (PFK) and the greedy probabilistic firefly KM (GPFK), were proposed for data clustering.The PFK model employed a cluster channel array to store the probability of each data object belonging to each cluster in the encoding system.Instead of moving towards all brighter fireflies as in PFK, the GPFK algorithm adopted a greedy search strategy, in which each firefly only moved towards the brightest firefly in the swarm.The PFK and GPFK models outperformed KM clustering and FA based on the evaluation of four UCI data sets.Hassanzadeh and Meybodi [64] proposed a modified FA model (MFA) for clustering analysis.The MFA model not only employed neighbouring brighter fireflies but also the global best solution to provide guidance for the search process.The MFA model was evaluated with five UCI data sets.It outperformed three other clustering methods, including KM, PSO, and KPSO.Han et al. [34] proposed a modified GSA model for clustering analysis, namely BFGSA.The mean position of the seven nearest neighbours of the global best solution was used to enable the leader to escape from the local optima traps.Based on 13 UCI data sets, BFGSA outperformed nine classical search methods, including GSA, PSO, ABC, FA, KM, NM-PSO (fusion of Nelder-Mead simplex and PSO), K-PSO (fusion of KM and PSO), K-NM-PSO (fusion of KM, Nelder-Mead simplex and PSO), and CPSO (Chaotic PSO) [34].A comprehensive survey on metaheuristic algorithms for partitioning clustering can be found in Nanda and Panda [65].

Methodology
We construct the hybrid clustering models on the basis of FA owing to its unique property of automatic subdivision and its advantages in tackling multimodal optimisation problems [40,47].However, the identified limitations pertaining to search diversity and search efficiency in the original FA model may impose certain constraints on identification of optimal centroids in clustering analysis.Therefore, in this research, we propose two modified FA models, namely IIEFA and CIEFA, to overcome limitations of the original FA model and mitigate the problems of initialization sensitivity and local optima traps of KM clustering.The proposed models intensify the diversification of exploration both in the neighbourhood and global search space, and lift the constraints of the biological laws in the original FA model.We introduce the proposed models in detail in the following sub-sections.

The Proposed Inward Intensified Exploration FA (IIEFA) Model
The aim of IIEFA is to expand the one-dimensional search in the original FA model to a multi-dimensional scale by replacing the attractiveness term  0  −  2 with a random matrix , as illustrated in Eq. (3).
where  denotes a control matrix where each element is drawn from [0, 1] randomly, while   denotes an adaptive randomization step based on a geometric annealing schedule, with  as an adaptive coefficient.
According to [47],  is recommended to have a value in the range of 0.95 to 0.99.We set  to 0.97 in this study, in accordance with the recommendation in [47] and several trial-and-error results in our experiments.This adaptive randomization step enables the search process to start with a larger random step to increase global exploration and fine-tune the solution vectors in subsequent iterations with a smaller search parameter.
By multiplying the control matrix, , each dimension of the position difference (   −    ) between two fireflies is assigned with a unique random number in [0, 1], therefore being shrunk disproportionately with various magnitudes.Subsequently, the resulting solutions after this operation can be any vectors originated from the current firefly solution, randomly distributed in the green approaching area in comparison with residing in the red diagonal line as in original FA model, as illustrated in Fig. 1.The random control matrix operation possesses two-fold advantages.Firstly, the search directions in the neighbourhood are not constrained to the diagonal line, but become more diversified.Secondly, the movement scales become more diverse owing to the impact of various magnitudes on each dimension.Fig. 1 provides an example of possible directions and scales in the neighbourhood search, indicated by vectors with green arrows.Therefore, IIEFA possesses a better search capability by extending exploration of fireflies from a one-dimensional diagonal direction to a multidimensional space in the neighbourhood.In other words, exploration of the swarm increases along with the firefly congregation process.This first proposed FA variant is hereby characterized as an inward intensified exploration FA model.The pseudo-code of IIEFA is presented in Algorithm 1.
Algorithm 1 -The pseudo-code of the proposed IIEFA model As illustrated in Eq. ( 5), we employ   to distinguish fireflies with weak or strong light intensity differences to that of the current firefly, whereby the neighbouring solutions, with   < 0.5, are labelled as 'ineffective individuals', whereas those with distinctive variance in light intensities, i.e.   > 0.5, are labelled as 'effective individuals', through the position updating process.Eqs. ( 6)-( 7) define the outward search operation for the 'ineffective individuals', with   < 0.5 .This new outward search operation enables firefly  to not only perform local exploitation of firefly , but also force firefly  to jump out of the space between  and  so as to explore an outer space.It expands search exploration of the weaker firefly  to accelerate convergence.On the contrary, when   > 0.5, the inward intensified exploration formula in IIEFA is used to dispatch firefly  using 'effective individuals'.
In Eq. ( 6),  denotes a step control matrix for this new outward operation, while  represents a direction control matrix with each element being drawn randomly from -1 and 1.The step control matrix, , for the outward search operation is further defined in Eq. ( 7), where  represents the current iteration number while   is the maximum number of iterations.Parameter  denotes the control matrix that consists of random numbers in [0, 1], as defined earlier in IIEFA, with the same feature dimension as that of the firefly swarm.
The step control matrix, , is employed to regulate the extent of outward exploration in each dimension and the balance between exploration and exploitation through the whole search process.Owing to the randomness introduced by the control matrix, , in IIEFA, as defined in Eq. ( 3), the elements in  possess different values from each other, but all follow the same trend of variation as the iteration number builds up.As an example, the change of one element from  against the iteration number is illustrated in Fig. 2.This example element in  decreases from 2 to 0, governing the exploration scale on each dimension as the count of iterations builds up.The exploration operation is conducted outwardly when the element in  is greater than 1, otherwise the exploration operation is performed inwardly.In the second stage, both inward and outward explorations reside in the 50 th -90 th iterations, in order to balance between exploitation and exploration.In the third stage, the inward exploration operation replaces the outward exploration movement, and takes control once the number of iterations exceeds 90, as the whole swarm gradually congregates and converges altogether.It should be noted that the iteration numbers used for the division of three search modes fluctuate slightly around the thresholds given in the illustrated example in Fig. 2, since the randomness of  affects the magnitude of elements in  delicately.Nevertheless, the general adaptive patterns coherently apply to the whole search process with respect to all dimensions in fireflies.Moreover, each element (either -1 or 1) in  controls the direction of the movement along each corresponding dimension, which enables fireflies to fully explore and exploit the search space.
The whole search process of 'ineffective individuals' with low dissimilarity levels (  < 0.5) is depicted in Fig. 3.With the assistance of three different position updating operations (indicated in three colours) in Fig. 3, not only the search diversity in direction and scope among fireflies with high similarities is improved significantly and local stagnation is mitigated effectively.The search efficiency is also enhanced because of the guarantee of heterogeneity between fireflies in movement.On the other hand, the movement of 'effective individuals' with distinctive position variance follows the same strategy in IIEFA, as illustrated in Eq. (3).In short, CIEFA enhances diversity of exploration one step further, and inherits all merits by combining both inward and outward intensified exploration mechanisms.
Moreover, according to the empirical results, the proportion of calling the dispersing search mechanism in CIEFA for 'ineffective individuals' among the total number of position updating varies slightly, and is dependent on the parameter settings (e.g. the maximum number of iterations and the size of the firefly population) as well as the problems at hand (e.g. the employed data sets).Taking the Sonar data set as an example, the proportion of running the dispersing mechanism varies between 40% and 52% for each trial with a population of 50 fireflies and a maximum number of 200 iterations.The average proportion of calling the dispersing mechanism in CIEFA over a series of 30 trials is 47.18% under the same setting.The pseudo-code of CIEFA is provided in Algorithm 2.

End
In order to improve search efficiency and increase convergence, a seed solution for cluster centroids is generated firstly by the original KM clustering algorithm, and is used to replace the first firefly in the swarm.The similarities among data samples are measured by the Euclidean distance during the partitioning process.Quality of the centroid solution represented by each firefly is evaluated based on the sum of intra-cluster distance measures.The search process and movement patterns of the swarm are governed and regulated by the proposed IIEFA and CIEFA models.Benefited from the enhanced diversity of the search scopes, scales, and directions in IIEFA and CIEFA, a cluster centroid solution with a better quality is identified through the intensified neighbouring and global search processes, and the possibility of being trapped in local optima is significantly reduced.Owing to the high dimensionality of some of the data sets evaluated in this study, e.g.80 for ALL, 72 for Ozone, and 60 for Sonar, and the implementation of feature selection on the these data sets as validated in previous studies [54,66], we employ mRMR [43] to conduct feature dimensionality reduction and improve clustering performance by eliminating redundant and irrelevant features.A comprehensive evaluation of the proposed clustering method is presented in the next section.

Evaluation and Discussion
To investigate the clustering performance in an objective and comprehensive manner, the proposed FA models are evaluated and compared with not only FA related methods, but also several other classical metaheuristic search methods.In view of their novelties and contributions to the development of a variety of metaheuristic algorithms, GA and ACO are two most successful metaheuristic search methods [67].As such, we evaluate and compare the proposed IIEFA and CIEFA models against GA [68], ACO [69], and four other classical methods i.e.KM clustering, FA [47], Dragonfly (DA) [70], and Sine Cosine Algorithm (SCA) [71], as well as five FA variants i.e.CFA1 [58], CFA2 [57], NaFA [48], VSSFA [59], and MFA [64].Each optimization model is integrated with KM clustering for performance comparison.A total of ten data sets characterised with a wide range of dimensionalities are evaluated with five performance indicators, namely sum of intra-cluster distances (i.e.fitness scores), average accuracy [72], average sensitivity, average specificity and macro-average F-score ( Fscore  ) [72].To ensure a fair comparison, we employ the same number of function evaluations (i.e.population size × the maximum number of iterations) as the stopping criterion for all the search methods.The population size and the maximum number of iterations are set to 50 and 200, respectively, in our experiments.We also employ 30 independent runs in each experiment, in order to mitigate the influence of fluctuation of the results.Moreover, the following initial parameters are applied to both the original FA model and FA variants, in accordance with the empirical study in [37], i.e. initial attractiveness=1.0,absorption coefficient=1.0, and randomization parameter=0.2,while the proposed IIEFA and CIEFA models employ randomized search parameters as indicated in Section 3.

Data sets
Clustering performance is significantly influenced by characteristics of data samples, such as data distribution, noise, and dimensionality.Therefore, the following data sets with various characteristics from different domains are used to investigate efficiency of the proposed models.Specifically, we employ the ALL-IDB2 database [73], denoted as ALL (Acute Lymphoblastic Leukaemia), and nine data sets from the UCI machine learning repository [74], namely Sonar, Ozone, Wisconsin breast cancer diagnostic data set (Wbc1), Wisconsin breast cancer original data set (Wbc2), Wine, Iris, Balance, Thyroid, and E.coli, for evaluation.Among the selected data sets, Sonar, Ozone and ALL possess significantly high feature dimensionality, i.e. 60, 72, and 80, respectively.They are characterised as high-dimensional data sets.The remaining data sets have comparatively smaller feature dimensions (i.e. 9 for Wbc2, 4 for Iris and 5 for Thyroid).They are characterised as lowdimensional data sets.Additionally, owing to the fact that data samples are extremely imbalanced between classes in certain data sets, e.g.E.coli, we only select those classes with relatively sufficient number of samples for clustering performance comparison.The main characteristics of the employed data sets are illustrated in Table 1.
The employed data sets impose various challenges on clustering analysis.As an example, the ALL data set used in [66,75] is obtained from the analysis of the ALL-IDB2 microscopic blood cancer images.The essential features, such as colour, shape, and texture details, were extracted from this ALL-IDB2 data set, and a feature vector of 80 dimensions was obtained for each white blood cell image [63].This image data set poses diverse challenges to classification/clustering models, owing to the complex irregular morphology of nucleus, variations in terms of the nucleus to cytoplasm ratio, as well as the subtle differences between the blast and normal blood cells, which bring in noise and sub-optimal distraction in the follow-on clustering process for lymphoblastic and lymphocyte identification.Other UCI data sets also contain similar challenging factors.Therefore, a comprehensive evaluation of the proposed clustering models can be established owing to diversity of the employed challenging data sets in terms of sample distribution and dimensionality.

Performance Comparison Metrics
Five performance indicators are employed to evaluate the clustering performance, namely the sum of intracluster distances (i.e.fitness scores), average accuracy, average sensitivity, average specificity, and macroaverage F-score (Fscore  ) [72].The first distance-based metric is used to indicate the convergence speed of the proposed models, while the last four metrics are used as the main criteria for clustering performance comparison.We introduce each performance metric in detail, as follows.
1. Sum of intra-cluster distances: This measurement is obtained by the summation of distances between the data samples and their corresponding centroids, as defined in Eq. ( 8).The smaller the sum of intra-cluster distances, the more compact the partitioned clusters.Similar to KM clustering, the proposed models employ the sum of intra-cluster distances as the objective function, which is minimized during the search process.
(, ) = ∑ ∑ √(  −   ) 2   ∈   =1 (8) where   and   , represent the  ℎ cluster and the centroid of the  ℎ cluster, while   and  denote the data belonging to the  ℎ cluster, and the total number of clusters, respectively.

Average accuracy:
The mean clustering accuracy is obtained by averaging the accuracy rate of each class, as defined in Eq. ( 9).The merit of this performance metric is that it treats all classes equally, rather than being dominated by classes with a large number of samples [72].
where   ,   ,   , and   represent true positive, false negative, false positive, and true negative of the  ℎ cluster, respectively.
3. Average sensitivity: As defined in Eq. (10), sensitivity (i.e.recall) is used to measure the proportion of correctly identified positive samples over all positive samples in the data set.Similar to the average accuracy, the macro-average of sensitivity is calculated, in order to ascertain all classes are treated equally for multi-class clustering tasks [72].
Average specificity: Specificity is used to identify the proportion of correctly identified negative samples over all negative samples in the data set [72].Eq. (11) is used to obtain the macro-average specificity for multiclass tasks.

Ave_specificity =
∑     +   =1  (11) 5. Macro-average F-score (Fscore  ): Fscore  is a well-accepted performance metric, which is calculated based on the macro-average of precision and recall scores [72], as defined in Eqs. ( 12 where  =1, in order to obtain equal weightings of precision and recall.
For each data set, a total of 30 runs with each search method integrated with the KM clustering algorithm are conducted.The average performance over 30 runs for each performance metric is calculated and used as the main criterion for comparison.

Feature Selection and Clustering Performance Evaluation
As mentioned earlier, owing to the high dimensionality of Sonar, Ozone, and ALL data sets, and the possibility of the inclusion of redundant features, mRMR [43] is used to conduct feature dimensionality reduction and to investigate its underlying impact on the clustering performance.The clustering results before and after feature selection for each data set are shown in Tables 2-11, respectively.For the three high-dimensional data sets, namely ALL, Sonar, and Ozone, the numbers of selected features are 9, 17, and 22 from the original 80, 60, and 72 features, respectively.These feature sizes are obtained based on trial-and-error, which yield the best performance for nearly all evaluated models.The findings on feature selection are also consistent with those of existing studies [54,66], where the ranges of selected feature numbers are 9-36 [66], 15-20 [54], and 18-25 [54] for ALL, Sonar, and Ozone, respectively, therefore ascertaining efficiency of the mRMR-based feature selection method employed in this research.
The empirical results indicate that in combination with feature selection, the clustering performance is improved for most test cases.As an example, for the ALL data set illustrated in Table 2, the number of features is reduced from the original 80 to 9, while the mean accuracy, sensitivity, specificity, and Fscore  of the proposed CIEFA model over 30 runs increase significantly, i.e., from 51.23% to 80.4%, 51.67% to 74.67%, 50.8% to 86.13%, and 51.27% to 78.73%, respectively.The selected features include the cytoplasm and nucleus areas, ratio between the nucleus area and the cytoplasm area, form factor, compactness, perimeter and eccentricity, which represent the most significant clinical factors for blood cancer diagnosis [66,75,76].This in turn also indicates that some redundant or even contradictory features exist in the original data set [66], which may deteriorate the performance of clustering models drastically.Such findings also apply to other data sets, especially the highdimensional ones [54].The only exception is the low-dimensional Balance data set, as shown in Table 6, where the full feature set (i.e. a total of only four features) yields the best performance for nearly all the clustering models.In short, it is essential to eliminate redundant and irrelevant features to enhance the clustering performance.sensitivity 0.9600 0.9600 0.9600 0.9600 0.9600 0.9600 0.9600 0.9600 0.9600 0.9573 0.9600 0.9600 0.9600 specificity 0.9800 0.9800 0.9800 0.9800 0.9800 0.9800 0.9800 0.9800 0.9800 0.9787 0.9800 0.9800 0.9800

Performance Comparison and Analysis
As mentioned earlier, five metrics are used for clustering performance comparison, namely the fitness scores on the sum of intra-cluster distances, average accuracy, average sensitivity, average specificity, and macro-average F-score (Fscore  ).Since the best performances are achieved using the identified significant feature subsets in most test cases for nearly all the methods, we employ the enhanced results obtained in combination with feature selection for further analysis and comparison.The detailed evaluation results over 30 runs for each performance measure after feature selection are shown in Tables 12-16.
Table 12 The mean results of the minimum intra-cluster distance measure over 30 runs   In terms of mean accuracy and Fscore  , as shown in Tables 13-14, the proposed models achieve the best scores for all the data sets over 30 runs.With respect to the mean accuracy rates shown in Overall, the average accuracy, sensitivity, specificity and Fscore  results evidently indicate the superiority of IIEFA and CIEFA over other search methods, in terms of robustness and flexibility, for both high-and lowdimensional clustering problems in combination with feature selection.In particular, the proposed models outperform five other FA variants significantly in nearly all the test cases.Moreover, CIEFA demonstrates an evident advantage on the Wine data set than IIEFA on all five performance metrics, while attaining results similar to those of IIEFA with the rest of the data sets.Besides that, nearly all methods achieve similar scores on all five performance measures on the Iris data set (except for SCA).Since only two significant features are identified and remained after feature selection for the Iris data set, the complexity of this clustering task is significantly reduced.
The underlying reasons for the advantage demonstrated by IIEFA and CIEFA can be ascribed to the enhanced capability of exploration and exploitation contributed by the proposed search strategies.The first proposed mechanism is to intensify inward exploration by replacing the attractiveness coefficient with a random search matrix.The diversity of search directions, scales, and spaces is enhanced significantly, therefore improving the exploration ability and mitigating the constraints of biological laws.The second strategy is to intensify outward exploration by relocating the 'ineffective fireflies' to a greater and extended space out of the neighbourhoods of fireflies in comparison in the early stage of the search process.The search territory of firefly swarms is further expanded, therefore facilitating the ability of global exploration.With intensified neighbouring and global exploration from the above two strategies plus the advantages of automatic subdivision inherited from the original FA model [47], the probability of being trapped in local optima is reduced effectively, while the diversity of movement is enhanced significantly for the proposed FA models.Evidenced by the experimental and statistical results, these advantages enable the proposed FA models to undertake challenging clustering tasks with high dimensionality, noise, and less separable clusters, e.g. the ALL data set.
In contrast, some limitations related to search diversity and search efficiency can be identified in classical search methods according to empirical studies.As an example, Radcliffe and Surry [77] indicated that the GA-based clustering algorithms in some cases suffered from degeneracy resulted from the phenomenon of multiple chromosomes representing the same or very similar solutions [77].Such degeneracy could lead to inefficient coverage of the search space, since the centroid solutions with the same or very similar configurations are repeatedly explored [78].Moreover, multiple occurrences of the strongly favourable individuals in the GA can lead to the reproduction of many highly correlated offspring solutions, therefore reducing diversity of the population and resulting in premature convergence.Similarly, in ACO, the effect of emphasizing short paths diminishes, and search stagnation emerges when the quality of solutions becomes closer as the differences between individuals decrease [79].Premature convergence can also occur in ACO as the sub-optimal solutions dominate the search process at an early stage, and the parameter of trail persistence is not tuned properly [69,80,81].Consequently, owing to the potential local optima traps (GA) and search stagnation (ACO) without proper counteracting strategies, classical evolutionary algorithms such as GA and ACO are less competitive in comparison with the proposed CIEFA and IIEFA models based on results from the abovementioned five metrics including intra-cluster distances, accuracy, Fscore  , sensitivity and specificity, as illustrated in Tables 12-16.
Similar limitations are also applied to other FA variants.As an example, in the MFA model [64], each firefly not only moves towards all brighter fireflies in its neighbourhood, but also moves towards the swarm leader at the same time.The search diversity and exploration capability of the firefly swarm are obstructed owing to the continuous exposure to attraction of the global best solution during the search process.Consequently, the firefly swarm is more likely to converge prematurely, and be trapped in local optima.
Overall, owing to the assistance of the two proposed strategies, CIEFA and IIEFA are able to overcome local optima traps and outperform classical search methods, i.e.GA, ACO, FA, DA and SCA.They also outperform advanced FA variants employed in this study i.e.CFA1, CFA2, NaFA, VSSFA, and MFA.Additionally, the merits of the proposed strategies also indicate that a strict adherence to biological laws imposes certain constraints on the exploration ability of heuristic search algorithms.As a result, the original biological laws from nature need to be further extracted and refined to best facilitate the effectiveness and discard potential restrictions in the development of metaheuristic algorithms.Furthermore, there is other insightful research on metaheuristic algorithms, which provides promising directions for future investigation [67].

Statistical Tests
To examine the significance of the performance difference between the proposed models and baseline methods, both Friedman and Wilcoxon rank sum tests are conducted.

The Friedman Test
In the Friedman test, a test statistic  is constructed based on the mean rankings of test treatments, which can be approximated by a chi-squared distribution.Then, the null hypothesis that K treatments come from the same population is tested according to the p-values given by ( −1 2 > ) [82,83].The Friedman test is conducted with respect to three main comprehensive performance metrics (intra-cluster distance measures, average clustering accuracy, and Fscore  ) for IIEFA and CIEFA.Tables 17-18 show the mean ranking results of the three performance metrics for the CIEFA and IIEFA models, respectively.For each metric, the mean ranking of each method is obtained by averaging its rankings over ten data sets based on the results shown in Tables 12-14.
The significance level is set to 0.05 (i.e. = 0.05) as the confidence level in all test cases.Tables 19-20 show the details of statistical test results for the CIEFA and IIEFA models, respectively.

The Wilcoxon Rank Sum Test
The Wilcoxon rank sum test is conducted based on the mean accuracy rates of all the methods to further indicate the statistical distinctiveness of the proposed FA models against each baseline method.As indicated in Tables 21-22, the majority of the test results are lower than 0.05 for both CIEFA and IIEFA models, which indicate the proposed FA models significantly outperform 11 baseline algorithms with respect to most of data sets from the statistical perspective.The Iris data set is an exception since all the algorithms except for SCA achieve the same highest accuracy of 97.33% with feature selection.Moreover, as shown in Tables 21-22, in comparison with CIEFA, IIEFA demonstrates higher frequencies of insignificant difference in clustering accuracy as compared with those of the baseline models.This tendency becomes more evident on the ALL data set, since IIEFA does not show statistically significant differences as compared with seven baseline methods, i.e.KM, CFA1, VSSFA, DA, MFA, GA, and ACO, while for CIEFA, a similar case only occurs to two baseline methods, i.e.GA and VSSFA.This phenomenon may be attributed to the challenging factors of the ALL data set, owing to its high dimensionality and highly inseparable data distributions caused by the subtle differences between the normal and blast cases.On the other hand, the advantage demonstrated by CIEFA over IIEFA on the ALL data set can be ascribed to the proposed dispersing mechanism, which further enhances the exploration capability on the basis of IIEFA and reduces the probability of being trapped in local optima.Therefore, CIEFA is capable of delivering better clustering performances than those of IIEFA in tackling data samples with complex distributions and narrow class margins.Although the distinctiveness between IIEFA and CIEFA is evident on certain challenging data sets, i.e.ALL, IIEFA and CIEFA in general demonstrate similar performances on other clustering tasks evaluated so far.To better distinguish between CIEFA and IIEFA, both models are further evaluated with another four highdimensional data sets, i.e. a skin lesion data set (denoted as Lesion) [84], as well as three UCI data sets [74], i.e.Human Activity (Activity), Libras Movements (Libras), and Mice Protein Expression (Protein).The skin lesion data set is used in [84], which extracted shape, colour, and texture features of 660 dermoscopic skin lesion images from the Edinburgh Research and Innovation (Dermofit) lesion data set [85].A 98-dimension feature vector for each skin lesion image was then obtained to represent the lesion information for subsequent clustering analysis.Moreover, the dimensionalities of the Human Activity, Libras, and Mice Protein data sets are 560, 90, and 77, respectively.In this research, we employ three classes for the Libras data set and two classes for the Skin Lesion, Human Activity and Mice Protein data sets respectively.Details of the data sets are shown in Table 23.For each high-dimensional data set, a total of 30 runs are conducted for each proposed model.In order to fully evaluate the model efficiency, no feature selection is applied.The detailed clustering results are provided in Table 24.As illustrated in Table 24, the empirical results of the CIEFA model for these high-dimensional data sets demonstrate sufficient advantages over those of IIEFA according to five performance metrics, i.e. intra-cluster distances, accuracy, sensitivity, specificity, and Fscore M , over 30 runs.As an example, the CIEFA model achieves higher average accuracy rates of 67.12%, 80.20%, 76.62%, and 79.07%for the Human Activity, Skin Lesion, Mice Protein, and Libras data sets, respectively, while maintaining lower intra-cluster distances with these data sets.In contrast, the IIEFA model produces comparatively slightly lower accuracy rates of 64.36%, 78.54%, 72.38%, and 78.01%for the Human Activity, Skin Lesion, Mice Protein, and Libras data sets, respectively, while producing slightly higher intra-cluster distances.A similar observation can be obtained for the other three performance metrics, i.e. sensitivity, specificity, and Fscore M , for both models on most of the test cases.This indicates that the CIEFA model offers a better option, as compared with IIEFA, to undertake high-dimensional clustering tasks.As discussed above, complexity of clustering tasks is significantly increased on these high-dimensional data sets owing to a higher probability of inclusion of noise and redundant or contradictory features.The clustering tasks could be even more challenging especially when the data samples are not well-separated, and their distributions are far different from compact spherical.As an example, the skin lesion data set [84] consists of two types of lesions, benign and malignant.The appearance difference between these two types of lesions in terms of shape, colour and texture can be very subtle, which sometimes causes confusion even to dermatologists, therefore posing great challenges on the clustering tasks.In other words, this high-dimensional skin lesion data set contains highly inseparable and non-compact clusters.The enhanced exploration capability acquired from the additional dispersing mechanism in CIEFA accounts for its efficiency in identifying optimal centroids for this challenging lesion problem, as well as other UCI data sets, as compared with IIEFA.
In summary, the dispersing mechanism in CIEFA is able to boost the exploration capability by dispatching fireflies with high similarities in fitness values to the extended and unexploited search space.As such, the probability of identifying optimal centroids closer to the global optima is increased with the assistance of intensified local exploration as well as the expanded search territory.Therefore, CIEFA offers a better option, as compared with IIEFA, to deal with challenging clustering tasks such as data samples with high dimensionality, noise, and complicated distributions.

Conclusion
In this research, we have proposed two FA variants, namely IIEFA and CIEFA, to undertake the problems associated with initialization sensitivity and local optima traps of the conventional KM clustering algorithm.Two new strategies have been proposed in IIEFA and CIEFA to increase search diversification and efficiency.Firstly, the attractiveness coefficient in the original FA model is substituted by a randomized control matrix, therefore the one-dimensional search strategy in the original FA model is elevated to a multi-dimensional search mechanism with greater search scales and directions for exploration in the neighbourhood.Secondly, in the early stage of the search process, a firefly solution sharing a high similarity with another is relocated to a new position outside the scope between the two fireflies in comparison.As such, the chances of identifying global optima and avoiding local optima are enhanced, owing to the fact that fireflies with high similarities are dispersed and the distribution of the whole swarm is more diversified.Therefore, the search efficiency is improved with the guarantee of sufficient variance between fireflies in comparison at the early convergence stage.The performances of IIEFA-and CIEFA-enhanced KM clustering methods are first investigated with ALL and 9 other UCI data sets, which include both high-dimensional and low-dimensional problems.In combination with mRMR-based feature selection, the proposed methods show superiority over the KM clustering algorithm, five classical search methods, and five other FA variants in terms of the convergence speed and clustering performance with respect to average accuracy rates, sensitivity, specificity, and macro-average Fscore (Fscore  ) over 30 runs.The results have been ascertained using Friedman and Wilcoxon rank sum tests.
In short, the proposed search strategies account for the improved efficiency in enhancing the cluster centroids of original KM clustering, which in turn overcome the local optima traps.Moreover, a dedicated comprehensive study has also been conducted to further identify the distinctiveness between IIEFA and CIEFA using four additional high-dimensional data sets.The empirical results indicate that CIEFA outperforms IIEFA in dealing with challenging clustering tasks with noise, complicated data distributions, and non-compact and less separable clusters, owing to its enhanced exploration capability and expanded search territory.
For future research, other objective functions with the consideration of both inter-and intra-cluster measurements will be employed to enhance the proposed models for dealing with complex and irregular data distribution problems.The proposed FA variants will also be evaluated using other optimization tasks, such as discriminative feature selection [39,84,86], image segmentation [87,88], and evolving deep neural network generation [89].
proposed a Repulsion-Propulsion FA (PropFA) model by incorporating three strategies, i.e. (1) introduction of adaptive mechanisms for both randomization coefficient   and light absorption coefficient , (2) incorporation of the global best solution as a component for swarm position update, and (3) replacement of the Euclidean distance measurement with Manhattan distance measurement.Three ratios were yielded to construct the adaptive search parameter mechanisms based on a short term memory of the last positions and light intensities of fireflies.The PropFA model was evaluated using 18 classical benchmark functions, 14 additional functions of CEC-2005, and 28 functions of CEC-2013.The results demonstrated the competitiveness of the PropFA model in finding better solutions in comparison with PSO, EDA (Estimation of Distribution Algorithms), RC-EA (Mutation Step Co-evolution), RC-Memetic (Real-Coded Memetic Algorithm), CMA-ES (Covariance Matrix Adaptation Evolution Strategy) on CEC-2005 benchmark functions, and SHADE, CoDE (DE with composite trial vector generation strategies and control parameters), Jade (Adaptive DE with optional external archive) on CEC-2013 benchmark functions.The PropFA model was also employed to estimate the spill area of a fast expanding oil spill, and the PropFA-based confinement strategy proved to be successful.

Fig. 1 .
Fig. 1.The movement of fireflies in a two-dimensional search space (∆ denotes the position difference between fireflies  and ) 2.4 Clustering Models Integrated with Metaheuristic Algorithms A number of metaheuristic search algorithms have been employed to overcome the problems of initialization sensitivity and local optima traps of classical clustering algorithms.Karaboga and Ozturk [24] proposed an ABC-based clustering method by incorporating the original ABC model with KM clustering.The ABC-based clustering method was evaluated using 13 UCI data sets.The obtained results demonstrated the competitiveness of the combination of ABC with KM clustering in managing clustering tasks in comparison with those of PSO and nine classification techniques (e.g.Bayes Net, MultiLayer Perceptron Artificial Neural Network (MLP), Radial Basis Function Artificial Neural Network (RBF), Naïve Bayes Tree (NBTree), and Bagging).Shelokar et al. [26] incorporated the original ACO model with KM clustering.Two simulated and three UCI data sets were used to evaluate the performance of the proposed ACO-based clustering method.The ACO-based clustering method showed advantages in comparison with SA, GA, and TS in terms of quality of solution, average number of function evaluations, and processing time.Chen and Ye [28] proposed a PSO-based clustering method (PSOclustering) and evaluated its performance on four artificial data sets.The obtained results indicated a better performance of PSO-clustering over those of KM and Fuzzy C-Means clustering algorithms.Senthilnath et al. [31] employed FA for clustering analysis.The performance of the FA-based clustering method was tested with 13 UCI data sets.The FA model demonstrated superiority in terms of clustering error rates and computational efficiency over ABC, PSO, and nine other traditional classification methods (e.g.Bayes Net, MLP, and RBF).

Fig. 2 .
Fig. 2.An example of the change of one element from the step control matrix, , through iterationsBased on the variance of the element in Fig.2, it is observed that the whole search process of 'ineffective individuals' with low fitness dissimilarities (  < 0.5) goes through three stages as the iteration builds up.In the first stage, the outward exploration action dominates the first 50 (out of 200) iterations approximately, where the 'ineffective individuals' are dispersed to explore a greater unexploited search domain.In the second stage, both inward and outward explorations reside in the 50 th -90 th iterations, in order to balance between exploitation and exploration.In the third stage, the inward exploration operation replaces the outward exploration movement, and takes control once the number of iterations exceeds 90, as the whole swarm gradually congregates and converges altogether.It should be noted that the iteration numbers used for the division of three search modes fluctuate slightly around the thresholds given in the illustrated example in Fig.2, since the randomness of  affects the magnitude of elements in  delicately.Nevertheless, the general adaptive patterns coherently apply to the whole search process with respect to all dimensions in fireflies.Moreover, each element (either -1 or 1) in  controls the direction of the movement along each corresponding dimension, which enables fireflies to fully explore and exploit the search space.

Fig. 3 . 5 Algorithm 2 -
Fig. 3. Distribution of the updated positions of firefly  through iterations in the CIEFA model in a twodimensional search space when   < 0.5 Algorithm 2 -The pseudo-code of the proposed CIEFA model 1.Start 2. Initialize a swarm of  fireflies 3. Initialize randomization parameter   and set experiment parameters 4. Define the objective function/light intensity  = ()

Fig. 4 .
Fig. 4. Flowchart of the proposed clustering methodMoreover, as mentioned earlier, nearly all the hybrid KM-based clustering models partition data samples into the corresponding clusters based on the Euclidean distance, and quality of clustering centroids is improved by minimising the sum of intra-cluster distance measures.Therefore, irrelevant and redundant features contained in the data samples can negatively impact the distance-based clustering measures, since the distance measures under such circumstances are not able to represent the compactness of the clusters accurately.Owing to the high dimensionality of some of the data sets evaluated in this study, e.g.80 for ALL, 72 for Ozone, and 60 for Sonar, and the implementation of feature selection on the these data sets as validated in previous studies[54,66], we employ mRMR[43] to conduct feature dimensionality reduction and improve clustering performance by eliminating redundant and irrelevant features.A comprehensive evaluation of the proposed clustering method is presented in the next section.
[30]rithm to enhance the KM clustering performance.The BH-based clustering method was tested with six UCI data sets.It demonstrated a better performance in comparison with those of KM clustering, GSA, and PSO.Moreover, Hatamlou et al.[36]also applied the Big Bang-Big Crunch algorithm (BB-BC) to clustering analysis.The BB-BC results outperformed those of KM clustering, GA, and PSO with several UCI data sets.The employed cloning concept enabled the global best solution to be kept in the next iteration to accelerate convergence.Two hybrid clustering methods, namely MKCLUST and KMCLUST, were subsequently constructed based on MBCO.Additionally, a probability based selection method was introduced to allocate the remaining unassigned data samples to clusters.The MBCO method was evaluated with seven UCI data sets.It outperformed some existing algorithms, e.g.ACO, PSO, and KM clustering, while the proposed hybrid MKCLUST and KMCLUST models, on average, outperformed some existing hybrid methods, e.g.K-PSO (combination of PSO and KM), K-HS (combination of Harmony Search and KM), and IBCOCLUST (improved BCO clustering algorithm).In Niknam and Amiri[27], a hybrid evolutionary clustering model, namely FAPSO-ACO-K, was proposed by combining three traditional algorithms, i.e.FAPSO (fuzzy adaptive PSO), ACO, and KM.The proposed model was tested with four artificial and six UCI data sets.FAPSO-ACO-K was able to resolve the problem of initialization sensitivity in KM clustering.It outperformed other algorithms, such as PSO, ACO, SA, PSO-SA (combination of PSO and SA), ACO-SA (combination of ACO and SA), PSO-ACO (combination of PSO and ACO), GA, and TS.Boushaki et al.[30]constructed a quantum chaotic Cuckoo Search (QCCS) algorithm using chaotic maps and nonhomogeneous update based on the quantum theory to increase global exploration.The QCCS model was tested with six UCI data sets.QCCS outperformed eight well-known methods, including GQCS (genetic quantum CS), HCSDE (hybrid CS and DE), KICS (hybrid KM and improved CS), CS, QPSO (quantum PSO), KCPSO (hybrid KM chaotic PSO), GA, and DE, for solving clustering problems.In Zhou and Li [25]mber of modified metaheuristic search algorithms are available to further improve the performance of the original metaheuristic algorithm-based clustering methods.Das et al.[25]proposed a modified Bee Colony Optimization (MBCO) model by adopting both fairness and cloning concepts.The introduction of a fairness concept allowed bees with low probabilities to have a chance to be selected for enhancing search diversity.
In the original FA model, after being initiated, the whole firefly swarm tends to congregate continuously until convergence at one point.As such, the search process can be deemed as an inward contracting process, no matter how early the search stage is, or how close or similar two neighbouring fireflies are.Consequently, the approaching movement between fireflies with similar light intensities (i.e.fitness scores) at an early stage is more likely to result in waste of the resource, since the fitness score of the current firefly is very unlikely to be drastically improved under this circumstance by following the neighbouring slightly better solution, but with a high probability of being trapped in local optima.Therefore, we propose the second FA variant, i.e. a compound intensified exploration FA (CIEFA) model, by integrating both inward and outward search mechanisms to overcome this limitation inherent in the original FA model.This new CIEFA model is produced based on the first IIEFA model.Specifically, CIEFA combines the inward exploration strategy embedded in IIEFA with a newly proposed dispersing mechanism based on dissimilarity measures to increase diversification.Eq.(5) defines the proposed dissimilarity measure   between two fireflies.   represent the fitness scores of fireflies and , respectively, in the  ℎ iteration, while  represents the current global best solution, and    denotes its fitness score in the  ℎ iteration.
1. Start 2. Initialize a population of  fireflies 3. Initialize randomization parameter   and set experimental parameters 4. Define the objective function/light intensity  = () 5. Calculate light intensity for each firefly 6. while (t< Max iteration) or (other converging criteria not being met) 7. for  <=  8. for  <=  9. if   <   10.Generate a control matrix  11.Update the position of firefly  by moving towards firefly  using Eq.(3) Export the global best position   , and the global best fitness value   26.End 3.3 The Proposed Clustering Approach based on the IIEFA and CIEFA ModelsThe proposed IIEFA and CIEFA algorithms are subsequently employed to construct two novel clustering models to undertake initialization sensitivity and local optima traps of the original KM clustering algorithm.The flowchart and pseudo-code of the proposed clustering method are presented in Fig.4and Algorithm 3, respectively.
5. Calculate light intensity for each firefly 6. while (t< Max iteration) or (other converging criteria not being met) Algorithm 3 -The pseudo-code of the proposed clustering method 1. Start 2. Import data sets and set initial parameters 3. Initialize a firefly swarm  as a series of possible cluster centroids 4. Run KM on the data set and generate the initial cluster centroids   as a seed solution 5. Replace the first firefly in the swarm  with   6. while (t< Max iteration) or (other termination criteria not being met) 7. Use each firefly as the centroids to cluster the data set based on Euclidean distance 8. Evaluate fitness value/light intensity of each firefly using the sum of intra-cluster distance measure  as defined in Eq. (8) in Section 4.2 9. Update firefly positions using the proposed IIEFA/CIEFA models 10. end while 11.Export the global best position   , and the global best fitness value

Table 1
Ten selected data sets for evaluation Data set Number of attributes Number of classes Missing values Number of instances

Table 2
The mean clustering results over 30 independent runs on the ALL data set

Table 3
The mean clustering results over 30 independent runs on the Sonar data set

Table 4
The mean clustering results over 30 independent runs on the Ozone data set

Table 5
The mean clustering results over 30 independent runs on the Thyroid data set

Table 6
The mean clustering results over 30 independent runs on the Balance data set

Table 7
The mean clustering results over 30 independent runs on the E.coli data set

Table 8
The mean clustering results over 30 independent runs on the Wbc1 data set

Table 9
The mean clustering results over 30 independent runs on the Wbc2 data set

Table 10
The mean clustering results over 30 independent runs on the Wine data set

Table 11
The mean clustering results over 30 independent runs on the Iris data set

Table 13
The mean results of average accuracy after feature selection over 30 runs

Table 14
The mean results of Fscore  after feature selection over 30 runs

Table 15
The mean results of average sensitivity after feature selection over 30 runs

Table 16
The mean results of average specificity after feature selection over 30 runs With respect to the fitness scores, i.e. the intra-cluster distance measure, as shown in Table12, IIEFA and CIEFA achieve the minimum distance measures in eight out of ten data sets in total.Specifically, IIEFA yields the minimum intra-cluster measures with five data sets based on the average performance over 30 runs, i.e., Thyroid, Balance, E.coli, Ozone, and Wbc1, while CIEFA achieves the minimum fitness scores with four data sets, i.e., E.coli, ALL, Wbc2, and Wine.Moreover, KM clustering produces the minimum intra-cluster measures with the Sonar and Iris data sets in combination with mRMR-based feature selection, although IIEFA and CIEFA achieve the minimum objective function evaluation scores when the full feature sets for both Sonar and Iris data sets are used.Overall, in comparison with the six classical methods i.e.GA, ACO, DA, SCA, FA, KM, and other five FA variants i.e.CFA1, CFA2, NaFA, VSSFA, and MFA, both IIEFA and CIEFA models demonstrate faster convergence rates and great superiority over other methods in identifying enhanced centroids that lead to more compact clusters.The proposed search mechanisms account for the enhanced global exploration capability of IIEFA and CIEFA in comparison with those of other classical methods and FA variants in attaining the global best solutions.

Table 13 ,
IIEFA achieves the highest average accuracy rates over 30 runs with seven data sets (i.e.Thyroid, Balance, E.coli, Ozone, Wbc1, Wbc2 and Iris), while CIEFA achieves the best results with five data sets (i.e.Sonar, E.coli, ALL, Wine, and Iris).Both IIEFA and CIEFA demonstrate a clear advantage over other methods with four data sets i.e.Thyroid, Sonar, Balance, and ALL.Pertaining to the Fscore  measure shown in Table14, IIEFA and CIEFA achieve the best average scores over 30 runs with six data sets, i.e.Thyroid, Balance, E.coli, Ozone, Wbc2, and Iris for IIEFA and Sonar, E.coli, ALL, Wbc1, Wine, and Iris for CIEFA, respectively.Similar to the accuracy indicator, a clear performance distinction can be observed between the proposed models and other methods with respect to the Fscore  results.Moreover, the observed advantages of IIEFA and CIEFA are further reinforced by the results of sensitivity and specificity, as shown in Tables15-16.With respect to sensitivity and specificity, IIEFA achieves the highest scores for both metrics with five data sets (i.e.Thyroid, Balance, E.coli, Wbc2, and Iris), while CIEFA achieves the best results for both metrics with three data sets (i.e.E.coli, Wine, and Iris).This indicates that both CIEFA and IIEFA outperform other baseline models with most of the employed data sets.They are capable of clustering and recognising data samples from different classes effectively.

Table 17
The mean ranking results based on the Friedman test for the CIEFA model

Table 18
The mean ranking results based on the Friedman test for the IIEFA model

Table 19
Statistical results of the Friedman test for the CIEFA model

Table 20
Statistical results of the Friedman test for the IIEFA model

Tables 17-18, the
proposed CIEFA and IIEFA models dominate the highest rankings, and demonstrate clear advantages in the performance metrics of intra-cluster distance measure, accuracy, and Fscore  with the Friedman test.In comparison with the five FA variants, i.e.VSSFA, NaFA, CFA1, CFA2, and MFA, the proposed models achieve significant improvements in all three performance metrics, indicating the advantages of the proposed search mechanisms.The proposed FA models also outperform KM and five classical search methods, i.e.GA, ACO, FA DA, and SCA.Comparatively, the CIEFA model achieves a better ranking than that of the IIEFA model in overall evaluation based on the experimental results.Furthermore, as indicted in Tables19-20, the -values of the Friedman test are all lower than 0.05 with respect to each metric for both the IIEFA and CIEFA models, which suggest an overall statistically significant difference between the mean ranks of IIEFA and CIEFA as compared with those of other test algorithms.

Table 21
The Wilcoxon rank sum test results of the proposed CIEFA model

Table 22
The Wilcoxon rank sum test results of the proposed IIEFA model

.48E-01 2.79E-01 5.78E-09 2.93E-01 4.64E-01 1.18E-01
Fscore  measurements, with fewer parameter settings.Moreover, CIEFA demonstrates more advantages than IIEFA especially with data sets containing inseparable and less compact clusters owing to its enhanced exploration capability.Besides that, since redundant features in high-dimensional data sets can deteriorate clustering performance severely, it is important to eliminate redundant and irrelevant features in clustering tasks.Subsequently, we employ another four high-dimensional data sets to further identify the distinctiveness of the proposed IIEFA and CIEFA models and demonstrate their usefulness.

Table 23
Four additional high-dimensional data sets for further comparison between IIEFA and CIEFA Data set Number of attributes Number of classes Missing values Number of instances

Table 24
The mean clustering results over 30 independent runs with four high-dimensional data sets