A Hybrid Pathfinder Optimizer for Unconstrained and Constrained Optimization Problems

Hybridization of metaheuristic algorithms with local search has been investigated in many studies. This paper proposes a hybrid pathfinder algorithm (HPFA), which incorporates the mutation operator in differential evolution (DE) into the pathfinder algorithm (PFA). The proposed algorithm combines the searching ability of both PFA and DE. With a test on a set of twenty-four unconstrained benchmark functions including both unimodal continuous functions, multimodal continuous functions, and composition functions, HPFA is proved to have significant improvement over the pathfinder algorithm and the other comparison algorithms. Then HPFA is used for data clustering, constrained problems, and engineering design problems. The experimental results show that the proposed HPFA got better results than the other comparison algorithms and is a competitive approach for solving partitioning clustering, constrained problems, and engineering design problems.


Introduction
The main characteristics of metaheuristic algorithm are that there are few parameters and operators. It is easy to apply them to actual problems. Every metaheuristic algorithm has its advantages and disadvantages. For instance, artificial bee colony (ABC) algorithm is a relatively new metaheuristic algorithm inspired by the foraging behaviors of honey bee colony [1]. Because ABC is easy to implement, has few control parameters, and possesses better optimizing performance [2], ABC has been successfully applied to solve optimization problems [3,4]. However, with the increase of the dimensionality of the search space, ABC has a poor convergence behavior. The reason for that is because the ABC algorithm relies on the exchange of information between individuals. But each individual exchanges information on only one dimension with a random neighbor in each searching process. Yang carries out a critical analysis of the ABC algorithm by analyzing the way to mimic evolutionary operators [5]. In essence, operators in ABC belong to the mutation operator. So ABC shows a slower convergence speed. Like ABC algorithm, artificial butterfly optimization (ABO) algorithm is also an algorithm to simulate biological phenomena. The ABO algorithm simulates mate-finding strategy of some butterfly species and is tested on various benchmarks [6]. However, "No free lunch" theorems [7] suggest that one algorithm could not possibly show the best performance for all problems. Many strategies including improving existing algorithms or studying new algorithms can get better optimization effects. These strategies include opposition-based learning, chaotic theory, topological structure-based method, and hybridizing strategy. The strategy of hybridizing heterogeneous biological-inspired algorithms is a good way to balance the exploration and exploitation [8]. In order to add the diversity of the bat swarm, a hybrid HS/BA method adding pitch adjustment operation in HS to the BA method is proposed [9]. The hybridization of nature-inspired algorithms evolved as a solution necessary in overcoming certain shortcomings observed during the use of classical algorithms [10]. Good convergence requires clever exploitation at the right time and at the right place, which is still an open problem [11].
Pathfinder algorithm (PFA) is a relatively new metaheuristic algorithm inspired by the collective movement of animal group and mimics the leadership hierarchy of swarms to find best food area or prey [12]. PFA provides superior performance in some optimization problems. However, when the dimension of a problem is extremely increased, the performance of this method decreases because PFA mainly relies on mathematical formulas. The strategy of hybridizing heterogeneous biological-inspired algorithms can avoid the shortcomings of single algorithm because of increasing the individual information exchange. The differential evolution (DE) algorithm which is proposed by Storn and Price [13] performs very well in convergence [14]. In particular, DE has a good performance on searching the local optima and good robustness [15]. In view of the fast convergence speed of DE, the mutation operator in DE is incorporated into the PFA. Then, a hybrid pathfinder algorithm is proposed in this paper.
The rest of the paper is organized as follows. Section 2 will introduce the canonical PFA. Section 3 will present HPFA in detail. Section 4 will give the details of the experiment for unconstrained problems. The experiment results are also presented and discussed in this section. Section 5 introduces the data clustering problem and how the HPFA is used for clustering. Section 6 will give the details of the experiment for constrained problems. Section 7 will give the details of the experiment for engineering design problems. Section 8 gives the conclusions.

The Canonical Pathfinder Algorithm
The canonical PFA mimics the leadership hierarchy of animal group to find best prey. In the PFA, the individual with the best fitness is called pathfinder. The rest members of the swarm which are called followers in this paper follow the pathfinder. The PFA includes three phases: initialization phase, pathfinder's phase, and followers' phase.
In the initialization phase, the algorithm randomly produces a number of positions according to equation (1) in the search range. After that, the fitness values of the positions are calculated. The individual with the best fitness is selected as the pathfinder: In the pathfinder's phase, the position of the pathfinder is updated using equation (2). A greedy selection strategy is employed by comparing the fitness value of the new position of the pathfinder and the old one: where x p is the position vector of the pathfinder, k is the current iteration, and r 3 is a random vector uniformly generated in the range of [0,1]. A is generated using the following equation: where u 2 is a random vector range in [− 1, 1], k is the current iteration, and E is the maximum number of iterations.
In the followers' phase, the position of each follower is updated using equation (4). A greedy selection strategy is employed by comparing the fitness value of the new position of the follower and the old one. If the fitness of the follower with the best fitness is better than that of the pathfinder, the pathfinder is replaced with the follower: where x i is the position vector of the ith follower, x j is the position vector of the jth follower, k is the current iteration, and E is the maximum number of iterations. r 1 and r 2 are random values generated in the range of [0,1]. α and β are random values generated in the range of [1,2], and D ij is the distance between the ith follower and the jth follower. The termination condition of the PFA may be the maximum cycles or the maximum function evaluation.

The Hybrid Pathfinder Algorithm
The commonly used hybrid methods mainly include series and parallel. The series method refers to the optimization operation for all members of swarm in the evolution of each generation. The series method is used in the proposed hybrid algorithm. In DE, the differential mutation operator is the main operation. In view of the fast convergence speed of DE, the mutation operator in DE is incorporated into the PFA to form a new hybrid pathfinder algorithm (HPFA). In HPFA, the rest parts are the same as the canonical PFA except a mutation phase is added after the followers' phase. The pseudocode of HPFA is listed in Algorithm 1. The steps of the mutation phase are given below. For each follower x i in the swarm, do the following steps: Step 1: select three different followers x r , x p , and x q from the followers. The three values r, p, and q are not equal to i.
Step 2: for each dimension in D, produce a new position vector depending on CR. CR is a probability in the range of [0, 1]. The new position vector is produced according to equation (8). The new position vector v ij is determined by changing one dimension of x i and is set to its boundary value if exceeding its predetermined boundaries: where i, r, p, q are four different integers generated by random permutation, F is the differential weight in the range of [0, 2]. j is a randomly selected dimension index between [1, D].
Step 3: calculate the fitness of the new position vector.
Step 4: a greedy selection strategy is employed by comparing the fitness value of the new position vector and the original one. If the fitness of the new position vector is better than the original one, it will replace the original one. Otherwise, the original one does not make any change.

Unconstrained Benchmark Problems
For the ease of visualization, we have implemented all algorithms using Matlab for various test functions. In order to compare the different algorithms fairly, we use a number of function evaluations (FEs) as a measure criterion in this paper.

Benchmark Functions.
Evolutionary algorithms are usually experimentally assessed through various test problems because an analytical assessment of their behavior is very complex [16]. The twenty-four benchmark functions are widely adopted by other researchers to test their algorithms in many works [2,17,18]. In this paper, all functions used their standard ranges. These benchmark functions totaling twelve diverse and difficult minimization problems comprise unimodal continuous functions (f 1 − f 8 ), multimodal continuous functions (f 9 − f 16 ), and composition functions (f 17 − f 24 ). The formulas of these functions are presented in Table 1. Functions f 13 − f 16 are four rotated functions employed in Liang's work [19]. In the rotated functions, a rotated variable y, which is produced by the original variable x left multiplied an orthogonal matrix, is used to calculate the fitness (instead of x). The orthogonal matrix is generated according to Salomon's method [20].
Functions f 17 − f 24 are eight composition functions. These composition functions were specifically designed for the competition and comprise the sum of three of five unimodal and/or multimodal functions, leading to very challenging properties: multimodal, nonseparable, asymmetrical, and with different properties around different local optima.

Parameter Study.
The choice of parameters can have an effect on the performance of an algorithm. QPSO has demonstrated the high potential for setting parameters of optimization methods [21]. Computational intelligence methods have demonstrated their ability to monitor complex large scale systems, but the selection of optimal parameters for efficient operation is very challenging [22]. The parameter CR and parameter F are two parameters in HPFA. In order to analyze the impact of the two parameters, we do the following experiments. In all experiments, the population size of all algorithms was 100. The maximum evaluation count on dimensions 20 is 100,000.
Four continuous benchmark functions, Sphere 20D, Zakharov 20D, Sumsquares 20D, and Quadric 20D are employed to investigate the impact of parameter CR and parameter F. Set CR ratio e and F equal to different values and all the functions run 20 sample times. It is worth noting that the interval of CR ratio e and F is continuous and has numerous values. Here, three different values of the two parameters are used. The experimental results in terms of mean values and standard deviation of the optimal solutions over 30 runs are listed in Tables 2-5. From the results, we can find that HPFA with CR ratio e equal to 0.9 and F equal to 0.1 performs best on all four functions. According to the results, we chose CR equal to 0.9 and F equal to 0.1 for the next experiments.

Comparison with Other Algorithms.
In order to compare the performance of HPFA, PFA [12], differential evolution (DE) [13], canonical PSO with constriction factor (PSO) [23], and cooperative PSO (CPSO) [24] were employed for comparison. PSO is a classical population-based paradigm simulating the foraging behavior of social animals. CPSO is a cooperative PSO model, cooperatively coevolving multiple PSO subpopulations. In addition, a set of twelve well-known benchmark functions were used in this experiment.

Experiment Sets.
The population size of all algorithms was 100. The maximum evaluation count on dimensions 30 is 100,000. In order to do meaningful statistical analysis, each algorithm runs for 30 times and takes the mean value and the standard deviation value as the final result. For the specific parameters for comparison algorithms, we follow parameter settings of the original literature studies. For CPSO and PSO, the learning rates C1 and C2 were both set as 2.05. The constriction factor X � 0.729. The split factor for CPSO is equal to the dimensions. In DE, single-point crossover is employed, the crossover rate is 0.95, and F is 0.1.
All algorithms were implemented in Matlab R2010a using a computer with Intel Core i5-2450M CPU, 2.5 GHz, 2 GB RAM. The operating system of the computer is Windows7.

Experimental
Results and Analysis. The experimental results, including the mean and the standard deviation of the function values obtained by the five algorithms with 30 dimensions, are listed in Table 6. The best values obtained on each function are marked as bold. Rank represents the performance order of the five algorithms on each benchmark function. It is obvious that HPFA performed best on most functions. The mean best function value profiles of the five algorithms with 30 dimensions are shown in Figure 1.
(1) Continuous Unimodal Functions. On Sphere function, the performance order for the five intelligent algorithms is HPFA > PFA > DE > CPSO > PSO. The result achieved by HPFA was improved continually and got the best value, seen from Figure 1

Name
Function Limits Sphere (f 1 ) Values in bold represent the best results.
Values in bold represent the best results.  Computational Intelligence and Neuroscience  Computational Intelligence and Neuroscience 7 strong solving performance on Sphere function. HPFA and these three algorithms differ by about 30 orders of magnitude of solution quality. On Sinproblem function, the performance order for the six intelligent algorithms is HPFA > DE > CPSO > PFA > PSO. The performance of HPFA is much similar to it on Sphere. The result achieved by HPFA was improved continually. PFA and PSO converged very fast at the beginning and then trapped in local minimum. CPSO and DE converged continually, but the speed of convergence was slow, seen from Figure 1 (2) Continuous Multimodal Functions. On Ackley function, the performance order for the five intelligent algorithms is HPFA > DE > CPSO > PFA > PSO. The Ackley function poses a risk for optimization algorithms, so many algorithms are trapped in one of its many local minima. The Ackley function is widely used for testing optimization algorithms. The performance of DE, PSO, CPSO, and PFA deteriorates in optimizing this function. HPFA has a much stronger global searching ability, seen from Figure 1(k). The solution of HPFA is about 11 orders higher than that of DE.
The multimodal functions f 13 -f 16 are regarded as the most difficult functions to optimize since the number of local minima increases exponentially as the function dimension increases.
On Rot_rastrigin function, the performance order for the five intelligent algorithms is HPFA > PFA > PSO > DE > CPSO. The result achieved by HPFA was improved continually and got the best result, seen from Figure 1(m). PFA and PSO converged to a local minimum value at about 40,000 FEs. CPSO and DE perform worse than PSO and PFA.
On Rot_schwefel function, the performance order for the five intelligent algorithms is HPFA > PFA > DE > CPSO > PSO. CPSO and PSO converged fast at first, but it becomes trapped in a local minimum very soon. DE converged continually, but the speed of convergence was slow. Finally, HPFA got better results than PFA, seen from Figure 1(n).
On Rot_ackley function, the performance order for the five intelligent algorithms is HPFA > DE > PFA > PSO > CPSO. The solution of HPFA is about 14 orders higher than that of DE. At the very beginning, PFA, CPSO, DE, and PSO converged very fast and then trapped in local minimum. The result achieved by HPFA was improved continually and got the best result, seen from Figure 1(o).
On Rot_griewank function, the performance order for the five intelligent algorithms is HPFA > PFA > DE > PSO > CPSO. CPSO, PSO, and DE converged very slowly. HPFA and PFA converged continually and then trapped in local minimum, but HPFA performed better than PFA, seen from Figure 1  From the above analysis, we can observe that the ability of exploiting the optimum of HPFA is very strong. HPFA seemed to have the ability of continual improving especially on Sphere, Sinproblem, Sumsquares, Schwefel2.22, Ackley, Rotated Rastrigin, Rotated Schwefel, Rotated Ackley, and Rotated Griewank.

Statistical Analysis.
It is obvious that HPFA got the best ranking with a dimension of 30. Statistical evaluation of experimental results has been considered an essential part of validation of new intelligent methods. The Iman-Davenport and Holm tests are nonparametric statistical methods and used to analyze the behaviors of evolutionary algorithms in many recent works. The Iman-Davenport and Holm tests are used in this section. Details of the two statistical methods are introduced in reference [25]. The results of the Iman-Davenport test are shown in Table 6. The values are distributed according to F-distribution with 4 and 92 degrees of freedom. The critical values are looked up in the F-distribution table with a level of 0.05. As can be seen in Table 7, the Iman-Davenport values are larger than their critical values, which means that significant differences exist among the rankings of the algorithms.
Holm test was employed as a post hoc procedure. HPFA was chosen as the control algorithm. The results of Holm tests are given in Table 8. The α/i values listed in the tables are with a level of 0.05.
HPFA got the best ranking and is the control algorithm. As seen in Table 8, the p values of PSO, CPSO, DE, and PFA are smaller than their α/i values, which means that equality hypotheses are rejected and significant differences exist between these five algorithms and the control algorithm.

Algorithm Complexity Analysis.
In many heuristic algorithms, most of the computation is spent on fitness evaluation in each generation. The computation cost of one individual is associated with the test function complexity. It is very difficult to give a brief analysis in terms of time for all           Computational Intelligence and Neuroscience

12
Computational Intelligence and Neuroscience           Figure 2. From the results, it is observed that CPSO takes the most computing time in all compared algorithms. PSO takes the least computing time in all compared algorithms. In summary, it is concluded that, compared with other algorithms, HPFA requires less computing time to achieve better results.

Application to Data Clustering
Data clustering is a kind of typical unsupervised learning, which is used to divide the samples of unknown categories. Clustering algorithm is widely used in banking, retail, insurance, medical, military, and other fields. Many clustering algorithms including hierarchical methods, partitioning methods, and density-based methods are proposed. In this paper, we mainly focus on partitioning clustering. Given a set of n data objects and the number of clusters w to be formed, the partitioning method divides the set of objects into w parts. Each partition represents a cluster. The final clustering will optimize a partition criterion, so that the objects in a cluster are similar, while the objects in different clusters are not. Generally, the total mean square quantization error (MSE) is used as the standard measure function for partitioning. Let X � (x 1 , x 2 , . . . , x n ) be a set of n data and C � (c, c, . . . , c w ) be a set of w clusters. The following equation gives the definition of MSE. Minimizing this objective function is known to be an NP-hard problem (even for K � 2) [26]: where w is the number of clusters i ∈ [1, n]. c j denotes a clustering center. n denotes the size of the dataset. Each data Decode X i to w cluster centers following equation (12) Calculate Euclidean distance between all data objects and clustering centers following equation (10) Distribute the data to the nearest clustering center Calculate the total within-cluster variance following equation (9) Return fitness ALGORITHM 2: Pseudocode of fitness calculation.  Computational Intelligence and Neuroscience object x i in the dataset has p features. x i − c j denotes the Euclidean distance between x i and c j .

The HPFA Algorithm for Data Clustering.
In HPFA for clustering, each individual denotes a set of cluster centers according to equation (11). According to equation (12), each artificial butterfly can be decoded to a cluster center: where w is the number of clusters and p is the number of features of the data clustering problem: According to equation (9), the fitness of each individual can be calculated. Algorithm 2 gives the main steps of the fitness function.

Experiment Sets.
To verify the performance of the HPFA algorithm for data clustering, PFA, CPSO, and PSO are used to compare on several datasets, including Glass, Wine, Iris, and LD. These datasets are selected from the UCI machine learning repository [27].
In order to provide meaningful statistical analyses, each algorithm is run 30 times independently. The experimental results include the mean value and the standard deviation value. The population size of the four algorithms is set to 20. The maximum number of evaluations is 10000. Parameters for HPFA, PFA, CPSO, and PSO are the same with ones in Section 4. Table 9 gives the results obtained by HPFA, PFA, CPSO, and PSO. Figure 3 shows the mean minimum total within-cluster variance profiles of HPFA, PFA, CPSO, and PSO.

Results and Analysis.
The Glass dataset consists of 214 instances characterized by nine attributes. There are two categories in the data. As seen from Figure 3(a), CPSO converged quickly from the beginning and trapped a local minimum. HPFA and PFA converged continually.
The Wine dataset consists of 178 objects characterized by thirteen features. There are three categories in the data. As seen from Figure 3(b), CPSO and PSO converged quickly from the beginning and trapped a local minimum. PFA and HPFA converged continually before about 1000 FEs.
The Iris dataset consists of 150 objects characterized by four features. There are three categories in the data. As seen from Figure 3(c), CPSO converged more quickly and trapped a local minimum obviously. PSO converged slowly.
The LD dataset consists of 345 objects characterized by six features. There are two categories. With the LD dataset, CPSO trapped a local minimum obviously at the very beginning. PSO converged slowly, but PSO got better results than CPSO, as seen from Figure 3(d).
The performance of HPFA and PFA is much similar to Wine, Iris, and LD, but HPFA got the best result. Experimental results given in Table 9 show that HPFA outperforms the other clustering algorithms in terms of the quality of the solutions for four datasets including Glass, Wine, Iris, and LD.

Constrained Benchmark Problems
Experimental sets are as follows: the population size was 100 for HPFA. In order to do meaningful statistical analyses, each algorithm runs 25 times and takes the mean value and the standard deviation value as the final result. "FEs," "SD," and "NA" stand for number of function evaluations, standard deviation, and not available, respectively. The mathematical formulations for constrained benchmark functions (problems 1-4) are given in Appendixes A-D.

Constrained Problem 1.
In order to compare the performance of HPFA on constrained problem 1 (see Appendix A), WCA [28], IGA [29], PSO [30], CPSO-GD [31], and CDE [32] were employed for comparison. Table 10 gives the best results obtained by HPFA, WCA, and IGA. Table 11 gives the comparison of statistical results obtained from various algorithms for constrained problem 1. As shown in Table 11, in terms of the number of function evaluations, HPFA shows superiority to other algorithms.

Constrained Problem 2.
In order to compare the performance of HPFA on constrained problem 2 (see Appendix B), WCA [28], PSO [30], PSO-DE [30], GA1 [33], HPSO [34], and DE [35] were employed for comparison. Table 12 gives the best results obtained by GA1, WCA, and HPFA. Table 13 gives the comparison of statistical results obtained from various algorithms for constrained problem 2. As shown in Table 13, HPFA offered the best solution quality in less number of function evaluations for this problem. The proposed HPFA reached the best solution (− 30665.5386) in 15,000 function evaluations.

Constrained Problem 3.
In order to compare the performance of HPFA on constrained problem 3 (see Appendix C), WCA [28], PSO [30], PSO-DE [30], DE [35], and CULDE [36] were employed for comparison. Table 14 gives the best results obtained by GA1, WCA, and HPFA. Table 15 gives the comparison of statistical results obtained from various algorithms for constrained problem 3. As shown in Table 15, HPFA reached the best solution (− 0.999989) in 100,000 function evaluations.

Constrained Problem 4.
In order to compare the performance of HPFA on constrained problem 4 (see Appendix D), WCA [28], HPSO [34], PESO [37], and TLBO [38] were employed for comparison. Table 16 gives the best results obtained by WCA and HPFA. Table 17 gives the comparison of statistical results obtained from various algorithms for constrained problem 4. As shown in Table 17, HPFA reached   Computational Intelligence and Neuroscience the best solution (− 1) in 5,000 function evaluations which is considerably less than other compared algorithms.

Three-Bar Truss Design Problem.
In order to compare the performance of HPFA on the three-bar truss design problem (see Appendix E), WCA [28] and PSO-DE [30] were employed for comparison. Table 18 gives the best results obtained by PSO-DE, WCA, and HPFA. The comparison of obtained statistical results for HPFA with previous studies including WCA and PSO-DE is presented in Table 19. As shown in Table 19, HPFA obtained the best mean value in 10,000 function evaluations which is superior to PSO-DE.    Table 21. As shown in Table 21, HPFA obtained the best mean value in 11,000 function evaluations which is superior to other considered algorithms.

Pressure Vessel Design Problem.
In order to compare the performance of HPFA on pressure vessel design problem    (see Appendix G), WCA [28], PSO [31], CPSO [31],and GA3 [40] were employed for comparison. Table 22 gives the best results obtained by WCA, HPFA, CPSO, and GA3. The comparison of obtained statistical results for HPFA with previous studies including WCA, CPSO, and PSO is presented in Table 23. As shown in Table 23, HPFA obtained better mean value than PSO in 25,000 function evaluations.

Tension/Compression Spring Design Problem.
In order to compare the performance of HPFA on tension/compression spring design problem (see Appendix H), WCA [28], CPSO [31], and GA3 [40] were employed for comparison. Table 24 gives the best results obtained by WCA, HPFA, CPSO, and GA3. The comparison of obtained statistical results for HPFA with previous studies including WCA, CPSO, and     GA3 is presented in Table 25. As shown in Table 25, HPFA obtained the best mean value in 22,000 function evaluations which is superior to WCA, CPSO, and GA3.

Welded Beam Design Problem.
In order to compare the performance of HPFA on welded beam design problem (see Appendix I), WCA [28], CPSO [31], and GA3 [40] were      Table 26 gives the best results obtained by WCA, HPFA, CPSO, and GA3. The comparison of obtained statistical results for HPFA with previous studies including WCA, CPSO, and GA3 is presented in Table 27. As shown in Table 27, HPFA obtained the best mean value in 22,000 function evaluations which is superior to WCA, CPSO, and GA3.

Conclusion
The strategy of hybridizing heterogeneous biological-inspired algorithms can avoid the shortcomings of single algorithm because of increasing the individual information exchange. This paper proposed a hybrid pathfinder algorithm (HPFA), in which the mutation operator in DE is introduced. To validate the performance of HPFA, abundant experiments on twentyfour unconstrained benchmark functions compared with PFA, CPSO, PSO, and DE are carried out. The numerical experimental results show that HPFA has a good optimizing ability on most benchmark functions and outperforms the original PFA and the other comparison algorithms. Then HFPA is used for data clustering. Real datasets selected from the UCI machine learning repository are used. The experimental results show that the proposed HPFA got better results than the other comparison algorithms on the four datasets. Then HPFA is employed to solve four constrained benchmark problems and five engineering design problems. The experiment results show that HPFA obtained better solutions than the other comparison algorithms with less function evaluations on most problems. It proves that HPFA is an effective method for solving constrained problems. However, HPFA will still trap in local minimum on a few functions, which can be seen from the benchmark functions.
Finding the features of functions which HPFA works not well on and improving the algorithm in solving these functions are the future work.

Data Availability
Data for clustering in this study have been taken from the UCI machine learning repository (http://archive.ics.uci.edu/ ml/index.php). Data are provided freely for academic research purposes only.

Conflicts of Interest
The authors declare that they have no conflicts of interest.