An integrated cuckoo search optimizer for single and multi-objective optimization problems

Integrating heterogeneous biological-inspired strategies and mechanisms into one algorithm can avoid the shortcomings of single algorithm. This article proposes an integrated cuckoo search optimizer (ICSO) for single objective optimization problems, which incorporates the multiple strategies into the cuckoo search (CS) algorithm. The paper also considers the proposal of multi-objective versions of ICSO called MOICSO. The two algorithms presented in this paper are benchmarked by a set of benchmark functions. The comprehensive analysis of the experimental results based on the considered test problems and comparisons with other recent methods illustrate the effectiveness of the proposed integrated mechanism of different search strategies and demonstrate the performance superiority of the proposed algorithm.


INTRODUCTION
Meta-heuristic algorithm provides a wide space for solving optimization problems from a new perspective. The meta-heuristic algorithms are created by simulating natural processes or phenomena which show infinite charm with its magic. In order to solve all kinds of optimization problems, many novel meta-heuristics algorithms are developed, such as Ant Colony Optimization (ACO) (Dorigo & Gambardella, 1997), Simulated Annealing (SA) (Kirkpatrick, Gelatt & Vecchi, 1983), Genetic Algorithms (GA) (Holland, 1975), Taboo Search (TS) (Al-Sultan & Fedjki, 1997), Particle Swarm Optimization (PSO) (Kennedy & Eberhart, 1995), Artificial Bee Colony (ABC) (Karaboga & Basturk, 2007), Firefly Algorithm (FA) (Yang, 2010), Cuckoo Search (CS) (Yang, 2014), Pathfinder Algorithm (PFA) (Yapici & Cetinkaya, 2019), Dragonfly Algorithm (DA) (Mirjalili, 2016) and Water Cycle Algorithm (WCA) (Sadollah et al., 2015). It can be seen from these documents that meta-heuristic algorithm has spanned over half a century. The metaheuristic algorithms can be taken into account in three classes: evolutionary based, physical-based and swarm intelligence based. GA is an evolutionary based algorithm which simulates the natural selection and genetic mechanism of Darwinian biological evolution process (Holland, 1975). GA has two major advantages: the ability of dealing with complex problems and parallelism. GA can deal with various types of optimization, but GA converges to local optimum easily. SA is a physical-based algorithm which comes from the principle of solid state annealing (Kirkpatrick, Gelatt & Vecchi, 1983). SA has the advantages of high efficiency and less constrained by initial conditions, but the search time of SA is longer. TS (Al-Sultan & Fedjki, 1997) uses the taboo technique to jump out of local optimum, but TS has strong dependence on the initial solution. A group of seemingly simple ants can find a shorter path to forage, and ants can find food synchronously without centralized control. ACO is a swarm intelligence based algorithm which mimics the foraging behavior of ants (Dorigo & Gambardella, 1997). ACO is suitable for discrete optimization problems, but the search process takes a long time. PSO is a popular swarm intelligence based algorithm which simulates the social behavior of bird flocking or fish schooling (Kennedy & Eberhart, 1995). ABC is a popular swarm intelligence based algorithm by modeling waggle dance and foraging behaviors of real bee colonies (Karaboga & Basturk, 2007). The ABC algorithm divides the bees into three kinds: employed bees, onlooker bees and scout bees. The three kinds of bees cooperate to realize the exploration and exploitation of the solutions. However, ABC is easy to implement and has few control parameters. However, the optimization performance of ABC becomes worse when the problem dimension becomes larger. FA is a newly proposed swarm intelligence algorithm which is based on the idealized behavior of the flashing characteristics of fireflies (Yang, 2010). DA is a swarm intelligence based algorithm which simulates the static and dynamic swarming behaviors of dragonflies and five parameters are needed to control different behaviors (Mirjalili, 2016). DA is inspired by the static and dynamic swarming behaviors of dragonflies. The multi-objective DA (MODA) tends to find very accurate approximations of Pareto optimal solutions with high uniform distribution for multi-objective problems (Mirjalili, 2016). PFA is a relatively new meta-heuristic algorithm inspired by collective movement of animal group and mimics the leadership hierarchy of swarms to find the best food area or prey (Yapici & Cetinkaya, 2019). WCA simulates the circulation of water in nature, especially the process that small streams on the surface gather into large streams and finally flow to the sea. Experimental results on a set of instances demonstrate the validity of multi-objective WCA (MOWCA) for solving multi-objective optimization problems (Sadollah et al., 2015). Recent work has shown that the optimization ability of the CS algorithm is better than that of PSO and GA (Yang & Deb, 2010). In particular, the meta-heuristic optimization algorithm based on population is now making a significant contribution to the optimization of modern engineering problems (Yapici & Cetinkaya, 2019).
The swarm intelligence method proposed in recent years has a certain ability to solve complex problems, but it is still difficult to meet the requirements (robustness, accuracy and stability) in solving complex and large-scale optimization problems. Each algorithm has its advantages and disadvantages. Beyond that, "No Free Lunch" theorems (Wolpert & Macready, 1997) suggest that there is no meta-heuristic algorithm which appropriate for all optimization problems. Many strategies including improving existing algorithms or integrating all kinds of search strategies into one algorithm can get better optimization effects. Good integration requires clever use at the right time and place, which is still an unsolved problem (Ting et al., 2015). In order to improve the optimization performance of CS, an integrated CS optimizer (ICSO) is proposed in this article.
Our contributions can be summarized to four aspects: (1) A two subpopulation structure based on dynamic size is used. (2) The two subpopulations adopt different position updating methods. (3) The scaling factor of step size is reduced by linear decreasing in Lévy flight phase. (4) After the two subpopulations are merged into one population, an individual information exchange mechanism based on DE operation is applied to the whole population.
The structure of the reminder of this article is as follows. The canonical CS and its variants are introduced in the section "CS algorithm and its variants". The section "The proposed ICSO" proposes several search strategies and presents the proposed ICSO in detail. The section "Experiment and results of ICSO" gives the details of the experiment for single objective problems, presents the experiment results and discusses the results. The section "Experiment and results of MOICSO" presents the details of the experiment for multi-objective problems and the experiment results. Finally, the section "Conclusion" summarizes the proposed algorithm and draws the conclusions.

CS ALGORITHM AND ITS VARIANTS
Cuckoos can't make nests or brood. When other birds (host birds) go out to look for food, cuckoos lay their eggs in the nest of the host birds and let the host birds raise their children. Of course, before laying eggs, cuckoos remove the host birds' eggs in order not to be detected by the host birds. Once the chicks are hatched by their foster mothers, they also have the nature to push their nestlings of the host bird itself out of their nests. The canonical CS algorithm mimics the reproduction of the cuckoo birds (Yang, 2014). The CS algorithm supposes that each cuckoo lays only one egg at a time. When the host bird finds an alien egg, it abandons the nest and finds another place to build a new one. The process of cuckoos' looking for the nest and laying eggs is the process of finding the solution in the D-dimensional space, and the quality of the bird's nest symbolizes the quality of the solution. Whether an egg (or a nest) can be successfully hatched by the host bird and thrive is the only criterion to judge whether the solution is good or not.
In the CS algorithm, there are two ways to update the solutions. One is the cuckoo's way to find the nest to lay eggs, the other is the location path of the host bird to rebuild the nest after finding the foreign bird with a certain probability P a 2 0; 1 ½ . The way of rebuilding the nest's position can be in Lévy flight or random mode. The cuckoo's way of finding a nest and laying eggs is to use the Lévy flight according to Eq. (1). Lévy flight provides a random walk whose random step length is drawn from a Lévy distribution according to Eq. (2). An instance of simulation of Lévy flight in 2D plane is shown in Fig. 1. The triangle is the starting point and the cross is the end after several Lévy flights. From Fig. 1, we can see that Lévy flight is a kind of walk between long steps and short steps, which is actually the main feature of the Lévy flight. It is confirmed that many birds in nature follow Lévy's flight (Viswanathan, 2010).
where x p is the position vector of the pth solution, k is the current iteration and stepSize is the step size which should be related to the scales of the problem. represents entry-wise multiplications.
The pseudo code of CS is listed in Table 1. The termination condition of the CS algorithm may be the maximum cycles or the maximum function evaluation.
CS uses the whole update and evaluation strategy on solutions. This strategy may deteriorate the convergence speed and the quality of solution due to interference phenomena among dimensions when solving multi-dimension function optimization problems. To overcome this shortage, a dimension by dimension improvement based CS algorithm (DDICS) is proposed (Wang, Yin & Zhong, 2013). The way of rebuilding the nest's position adopts a random mode according to Eq. (3).
where x p is the position vector of the pth solution, k is the current iteration. P a 2 0; 1 ½ : r and e are random numbers uniformly generated in the range of [0,1]. represents entry-wise multiplications. x i is the position vector of the ith solution, x j is the position vector of the jth solution. The article (Zhang, Ding & Jia, 2019) proposed a hybrid CSDE algorithm. In the CSDE process, population updates are done by partitioning and combining. In the division process, the entire population is divided into two equal subgroups randomly. The two subgroups have the same number of individuals. After all individuals complete the search process, the two subgroups obtained by CS and DE are combined into one group so that individuals can share location information across the search space. The goal of this operation is to enable each individual to find the best solution in less time and maintain its useful information.
The parameter P a plays an important role which maintains balance between the local random walk and the global random walk. In the article (Mareli & Twala, 2018), three CS algorithms including CSLI, CSEI and CSPI are proposed on dynamic changing P a value as the number of CS iterations increases. CSLI uses linear increasing P a value. CSEI uses exponential increasing P a value. CSPI uses power increasing P a value. Simulation results show that CSEI is more effective in finding global minimum values of the sample to test functions used.

THE PROPOSED ICSO
To achieve good optimization performance with higher convergence speed and avoid falling into local optimal solutions, an ICSO is proposed by introducing multiple strategies into the original CS algorithm.

Two subpopulation structure
Many species boast complex societies with multiple levels of communities. A common case is when two dominant levels exist, one corresponding to leaders and the other consisting of followers (Ferdinandy, Ozogány & Vicsek, 2017). In this paper we change the structure of population of the CS algorithm. At each iteration, the population is divided into two subpopulations according to the individual fitness. Those who own better fitness form high fitness subpopulation while those who own worse fitness form low fitness subpopulation, seen from Fig. 2. The pentagram symbols represents the individuals with high fitness and the circle symbols represents the individuals with low fitness, as shown in Fig. 2.
The high fitness subpopulation is responsible for exploiting better solutions, while the low fitness subpopulation is responsible for exploring unknown solutions. Exploration plays a major role in the early iterations while exploitation plays a major role in the late iterations. So the size of each subpopulation is dynamically changed as the iteration number increases. The size of the high fitness subpopulation increases with the iteration while the size of the low fitness subpopulation decreases with the iteration. In the whole iteration process, the size of the whole population remains unchanged. The value of P L indicates the proportion of the low fitness subpopulation in the whole population. The value of P L decreases linearly from P L max to P L min with the increasing of iteration times, as shown in Eq. (4).
where k is the current iteration and E is the maximum iterations. P L is the proportion of the low fitness subpopulation in the whole population. P H is the proportion of the high fitness subpopulation in the whole population. The two subpopulations employ different search strategies which can guarantee the diversity. Meanwhile, two subpopulations adopt different search strategies to realize a balanced combination of a local random walk and a global explorative random walk. A lévy flight is employed to exploit the most promising area, so the high fitness population adopts a lévy flight mode. The previous literature has also shown that some of the new solutions should be produced by lévy flight around the best solution obtained so far, which will often speed up the local search (Pavlyukevich, 2007). A random walk method is used to explore the unknown area, so the low fitness population adopts a random walk method. In this paper, a random method is adopted according to Eq. (3).

Adaptive step size strategy
The step scaling factor (stepSize in Eq. (1)) plays an important role in updating the position. This parameter is related to the scales of the problem. When the variable range increases, the step size should be increased accordingly. When the variable range decreases, the step size should be reduced accordingly.
In addition, it is very important to balance the ability of global exploration and local exploitation in designing a population-based algorithm. Exploration plays a major role at the beginning of the algorithm, and the step size should be larger. In the later stage of the algorithm, exploitation plays a major role and a big step size may make it difficult for the algorithm to converge to the optimal point. So the step size should be smaller in the later stage of the algorithm.
Based on the above two points, in order to make step size adapt to different optimization problems, a self-adaptive step size mechanism for high fitness subpopulation is introduced. The value of the step scaling factor stepSize decreases linearly from step max to step min with the increasing of iteration times, as shown in the Eq. (6).
where k is the current iteration and E is the maximum iterations. range is the variable range. For problems with different variable ranges in different dimensions, range is set to the maximum value.

Crossover based on DE operation
Because the two subpopulations adopt different search strategies, the solutions have diversity on the whole, seen from Fig. 2. To take advantage of the diversity of these solutions to find the potential optimal solution, the two subpopulations are combined into a population and a crossover operation is applied to the population. Differential evolution (DE) proposed by Storn & Price (1997) performs very well in convergence (Gong, Cai & Ling, 2011). Especially, DE has a good performance on searching the local optima and good robustness (Mallipeddi et al., 2011). Based on the fast convergence speed of DE, a crossover phase based on DE is added after the two population are combined into a whole population. From the perspective of biodiversity, this phase can increase the individual information exchange. The steps of the crossover operation are given in Table 2.
where i, r, p; q are random numbers between 1 and the size of population and mutually different, j is a dimension index between [1, D]. F is the differential weight in the range of [0, 2].
In the crossover operation, the differential mutation operator is the main operation. Figure 3 shows a two-dimensional example for producing new position using Eq. (9). The DE operator is employed to produce a self-adaptive mechanism to change the search  range. In the initial stage, individual differences are large, and the algorithm searches in a wide range. At the later stage of algorithm, the population is in a state of convergence, and the individual difference is small. The algorithm searches in a narrow range in the later period of the optimization. Therefore, using DE operator alone will cause the diversity to disappear quickly. In the proposed ICSO, the initial population is divided into two subpopulations according to the fitness. The two subpopulations update their positions according to different search methods to ensure the diversity of the solutions. Crossover operation is applied after the two subpopulations are merged. After the crossover operation, the whole population is divided into two subpopulations according to their fitness. The above process goes on and on until the end of the algorithm iteration.

The proposed algorithm for single-objective problems
The procedure of ICSO is shown in Table 3 and the flowchart of ICSO is summarized in Fig. 4.
From Table 3, we can see that initialization phase is responsible for initializing the solution. Eq. (10) gives the way to initialize the solution. Table 3 Pseudo code of the proposed ICSO.
Step 1: Initialization phase: randomly produces a number of positions according to the Eq. (10) Step 2: Division phase: Step 2.1: Rank the positions according to fitness Step 2.2: Divide population into two subpopulations: high fitness subpopulation and low fitness subpopulation Step 3: High fitness subpopulation phase: Step 3.1: Produce step scaling factor according to the Eq. (6) Step 3.2: Produce new positions according to the Eq. (1) Step 3.3: Calculate the fitness of new positions Step 3.4: Greedy selection between new positions and old positions Step 4: Low fitness subpopulation phase: Step 4.1: Produce new positions according to the Eq. (3) Step 4.2: Calculate the fitness of new positions Step 4.3: Greedy selection between new positions and old positions Step 5: Combination phase: Combine high fitness subpopulation and low fitness subpopulation into a whole population Step 6: Crossover phase:

DE operation
Step 7: If meet termination condition, stop the procedure; otherwise, go to step 2 Step 8: post processing where x i;j (1 i N; 1 j D) represents the initial value of the jth variable of ith agent. N and D are the size of population and the number of variables, respectively. x max j and x min j denote the upper and lower bounds of the jth parameter, respectively.

The proposed algorithm for multi-objective problems
Many problems in real life are multi-objective. Without loss of generality, the mathematical expression of the multi-objective problem is as follows: where m > 1, f x ð Þ is called multi-objective functions. p is the number of equality constraints. q is the number of inequality constraints. low a is the lower bound of the ath variable. high a is the upper bound of the ath variable.
The Pareto front method is used to solve the multi-objective problems through the calculation of Pareto front.
Þare two vectors. U is said to dominate V if and only if U is partially less than V in objective space. x Ã is said to be a Pareto optimal solution if and only if any other solutions can't be detected to dominate x Ã . Pareto optimal solutions are also called non-dominated solutions. A set of Pareto optimal solution is called Pareto optimal front.
In order to evaluate the performance of multi-objective optimization algorithm quantitatively, two kinds of performance metrics are given here. One is convergence metric, the other is diversity metric convergence (Deb et al., 2002). The generation distance (GD) measure is defined as the criterion of convergence, which is calculated according to Eq. (16).
where n pf is the number of optimal solutions in the obtained Pareto front. d i is the Euclidean distance from each non dominated solution to the nearest true Pareto front solution.
The computing of GD is illustrated by Fig. 5 where triangles and circles correspond to the solutions on the true Pareto front and the obtained Pareto front, respectively.
The metric of spacing (S) measure is defined as the criterion of diversity, which is calculated according to Eq. (17).
where n pf is the number of optimal solutions in the obtained Pareto front. d i is the crowding distance of the i-th solution.
d is the average of all d i . The computing of crowding-distance is illustrated by Fig. 6. The crowding distance of the i-th solution is the average side length of the rectangle formed by the dotted line (Deb et al., 2002).
For the advantages of the ICSO, this article proposed a multi-objective ICSO (MOICSO) which can deal with multi-objective optimization problems. The main steps of the MOICSO are shown in Table 4. It is worth noting that compared with ICSO, MOICSO has four more phases. Nondominated sorting phase is to calculate the non-dominated solutions in the whole population. Crowding distance calculating procedure has been illustrated by Fig. 6.

EXPERIMENT AND RESULTS OF ICSO
In this research, the number of function evaluations is used as the termination condition of each algorithm. All algorithms were coded in Matlab.

Benchmark problems
Numerical optimization problem solving capability of ICSO is examined by using 24 classic benchmark problems (Civicioglu & Besdokb, 2019;Karaboga & Akay, 2009), f 1 -f 24 . All benchmark functions used their standard ranges. Detailed mathematical definitions of f 1 -f 24 are given in Table 5. Table 4 Pseudo code of the proposed MOICSO.
Step 1: Initialization phase: Randomly produces a number of positions according to the Eq. (10) Calculate the fitness of each position Step 2: Non-dominated sorting Step 3: Crowding distance calculating Step 4: Division phase: Step 4.1: Rank the positions according to crowding distance Step 4.2: Divide population into two subpopulations: high fitness subpopulation and low fitness subpopulation Step 5: High fitness subpopulation phase: Step 5.1: Produce step scaling factor according to the Eq. (6) Step 5.2: Produce new positions according to the Eq. (1) Step 5.3: Calculate the fitness of new positions Step 5.4: Greedy selection between new positions and old positions Step 6: Low fitness subpopulation phase: Step 4.1: Produce new positions according to the Eq. (3) Step 4.2: Calculate the fitness of new positions Step 4.3: Greedy selection between new positions and old positions Step 7: Combination phase: Combine high fitness subpopulation and low fitness subpopulation into a whole population Step 8: Crossover phase:

DE operation
Step 9: Non-dominated sorting Step 10: Crowding distance calculating Step 11: If meet termination condition, stop the procedure; otherwise, go to step 4 Step 12: post processing Table 5 Benchmark functions.

Name
Function Limits Functions f 20 -f 24 are four rotated functions (Liang et al., 2006). The fitness of the rotated function is calculated with a rotated variable y instead of x. The rotated variable y is calculated by the original variable x left multiplied an orthogonal matrix. The orthogonal matrix is calculated using Salomon's method (Salomon, 1996).

Parameter study
The influences of the parameters including ðf 7 Þ and ðf 8 Þ are experimented in this section. The dimension of the test functions including Sphere, Powers, Ackley and Griewank is set to 30. The population size of each algorithm is set to 50. The computational cost of each experiment was set to 100,000 function evaluations. Each test problem is solved by assigning different values to the two parameters in 30 independent runs. The numerical results in terms of the mean results (Mean), standard deviation (Std), the best result (Best) and the worst result (Worst) of the optimal function value were given in Tables 6-9. The results demonstrate that ICSO performs best on all test functions when both the parameter CR and F are set to 0.1. Therefore, these two parameters are set to 0.1 in the next experiments.

Comparison with other algorithms
In order to compare the performance of ICSO, DDICS (Wang, Yin & Zhong, 2013), DE/rand (Storn & Price, 1997), CSDE (Zhang, Ding & Jia, 2019) and CSEI (Mareli & Twala, 2018) were employed for comparison. DE/rand is a classical population-based algorithm. Here DE/rand stands for DE/rand/1/bin (Storn & Price, 1997). Detailed descriptions of DDICS, CSDE and CSEI are given in "CS Algorithm and its Variants". The population size of ICSO and DDICS was 50. Dimensions of f 1 -f 9 and f 11 -f 24 are selected as 50. Dimensions of f 10 is selected as 24. The maximum evaluation count for functions f 1 -f 24 is 200,000. P L max and P L min are set to 0.9 and 0.2 respectively. To gather significant statistical results, each algorithm carried out 30 independent runs. For the other specific parameters for involved algorithms, we can follow parameters settings of the original literatures of DDICS (Wang, Yin & Zhong, 2013), CSEI (Mareli & Twala, 2018), DE/rand and CSDE (Zhang, Ding & Jia, 2019).
The results of our experiments are illustrated in Table 10 and the average convergence curves obtained for ICSO, DDICS, CSDE, CSEI and DE/rand are depicted in Fig. 7. M stands for the mean value and S stands for the standard deviation. B stands for the best value and W stands for the worst value. R stands for the performance order of the different algorithms on each standard function. The best mean values obtained on each function are marked as bold. It is obvious that ICSO performed best on most functions. Particularly, ICSO has a strong ability to mine the optimal value in the later stage of iteration on f 4 and f 12 , seen from Fig. 7. The performance of ICSO on f 4 is much similar with it on f 12 . ICSO converged very fast at the beginning and then trapped in local minimum. However, after a certain number of iterations, ICSO jumps out of the local optimum. With multiple search strategies, ICSO keeps sufficient diversity to escape local optimum on functions like f 4 and f 12 , even at the later stage of algorithm iteration.
As per the results and discussions of this study, we can conclude that ICSO can enhance both exploration and exploitation capabilities effectively. As can be clearly seen from Fig. 7, ICSO seemed to be able to sustain improvement especially on f 1 , f 5 , f 6 , f 8 , f 14 , f 20 , f 21 and f 24 .

Statistical analysis
In this section, two statistical evaluation approaches including Iman-Daveport and Holm tests are considered for validating the performance of the proposed ICSO. A detailed introduction of the two statistical evaluation methods can be found in reference (García et al., 2010).
The results of the Iman-Davenport test and Holm tests are given in Tables 11 and  12 respectively. The critical value listed in the Table 11 and the a=i values listed in the Table 12 are with level of 0.05.   Note: The best results are shown in bold.  From Table 11, there are significant differences in the ranking of algorithms because the Iman-Davenport values are greater than the critical value. From Table 12, there are significant differences between comparison algorithms and control algorithm (ICSO) because the p values of CSDE, DE/rand, DDICS and CSEI are less than their a=i values.
In this article, the validity and stability of ICSO are also studied by analysis of variance (ANOVA). Figure 8 shows the box plots of the statistical performance of all algorithms. The general characteristics of the distribution can be obtained by looking at these box plots. It can be seen clearly from Fig. 8 that ICSO obtains a good compromise solution variance distribution on most test functions.

Dynamic size of subpopulation
This simulation is conducted to investigate how the size of subpopulation in the ICSO model changes along with the generations on the benchmark environments. In order to clearly see the change of subpopulation size, the initial population size is set to 100. The subpopulation evolution based on ICSO model was simulated on two-dimensional Ackley function. Figure 9 shows the positions of the individuals and the subpopulation variation in different phases, where each red star represents a position with high fitness and each blue circle represents a position with low fitness.
As can be seen from Fig. 9, the initial positions are distributed randomly over the map defined by the two-dimensional Ackley function at the first stage (FEs = 100). We can directly see from Fig. 9 that with the iteration, the size of high fitness subpopulation increases and the size of low fitness subpopulation decreases. From the first phase (FEs = 100) to third phase (FEs = 1,500), the whole population moves toward the optimal position. The two subpopulations adopt different ways to update the position, and exploit the optimal region and explore the unknown region respectively. Under the collaborative effect of the two subpopulations, ICSO can make the whole population move toward the optimal position.
Various Algorithms (C)
The DTLZ1 problem (Deb et al., 2002) is described as follows: DTLZ1 : here, M = 3, x M j j ¼ k = 5, the total number of variables is n ¼ M þ k À 1. The Paretooptimal solution corresponds to x Ã i ¼ 0:5 x Ã i 2 x M À Á and the objective function values lie on the linear hyper plane: P M m¼1 f Ã m ¼ 0:5. The DTLZ2 problem (Deb et al., 2002) is described as follows: DTLZ2 : here, M = 3, x M j j ¼ k = 10, the total number of variables is n ¼ M þ k À 1. The Paretooptimal solution corresponds to x Ã i ¼ 0:5 x Ã i 2 x M À Á and the objective function values must satisfy the P M m¼1 ðf Ã m Þ 2 ¼ 1. The DTLZ3 problem (Deb et al., 2002) is described as follows: DTLZ3 : here, M = 3, x M j j ¼ k = 10, the total number of variables is n = M + k -1. The Paretooptimal solution corresponds to The DTLZ6 problem (Deb et al., 2002) is described as follows: DTLZ6 : here, M = 3, jx M j = k = 20, the total number of variables is n = M + k -1.

Experiment sets
To evaluate the performance of MOICSO for multi-objective problems, we compared it with MODA (Mirjalili, 2016) and MOWCA (Sadollah et al., 2015). In order to do meaningful statistical analysis, each algorithm runs for 10 times. The maximum evaluation count is 20,000. The population size of MOICSO was 100. The experimental results include the best value, the worst value, the mean value, the median value and the variance of the convergence measure and the diversity measure. Parameters for MOICSO are the same as ICSO in the section "Experiment and results of ICSO". For the other specific parameters for involved algorithms, we can follow parameters settings of the original literatures of MODA (Mirjalili, 2016) and MOWCA (Sadollah et al., 2015).  For ZDT1 problem, it can be seen from Table 13 that MOICSO has better performance than MODA and MOWCA in diversity. However, MOWCA got the best performance in convergence. It can be seen from Fig. 10 that MODA cannot converge to the Pareto front of ZDT1.

Experiment results and analysis
For ZDT3 problem, it can be seen from Table 14 that MODA has better performance than MOICSO and MOWCA in diversity. However, it can be seen from Fig. 11 that MODA cannot converge to the Pareto front of ZDT3. MOWCA got the best performance in convergence. MOICSO outperforms MODA in terms of convergence and MOWCA in terms of diversity.
For ZDT6 problem, MOICSO outperforms the other two comparison algorithms in terms of convergence and diversity, as seen from Fig. 12 and Table 15. In particular, the performance of MOICSO in convergence metric is three order of magnitude better than  that of MODA and MOWCA, as shown in Table 15. The performance of MOICSO in diversity metric is two order of magnitude better than that of MOWCA and one order of magnitude better than that of MODA, as shown in Table 15.       For DTLZ1 problem, it can be seen from Fig. 14 and Table 16 that MOICSO has better performance than MODA and MOWCA in both convergence and diversity.
For DTLZ2 problem, it can be seen from Fig. 16 and Table 17 that MOICSO outperforms the other two comparison algorithms in convergence. However, MODA obtains the best result among all the algorithms in diversity.
For DTLZ3 problem, it can be seen from Fig. 18 and Table 18 that MOICSO outperforms the other two comparison algorithms in both convergence and diversity. In particular, the performance of MOICSO in convergence metric is one order of magnitude better than that of MODA and MOWCA.
For DTLZ6 problem, it can be seen from Fig. 20 and Table 19 that MOICSO outperforms the other two comparison algorithms in terms of convergence. MODA obtains the best result among all the algorithms in diversity.

CONCLUSION
Most of real problems such as tuning proportional integral derivative (PID) controller parameters and scheduling problem are uncertain, dynamic, nonlinear, multi-modal or NP-hard problems. PID controller has been widely used in industry for many years. However, the performance of the systems with standard PID controllers cannot meet the design requirements because of the nonlinear dynamics, strong external disturbance and the random noise (Tang, Zhang & Hu, 2020). When tuning PID controller parameters, it is difficult to obtain the optimal or near optimal PID parameters by using some traditional tuning methods. The permutation flow-shop scheduling problem (PFSP) has been extensively explored in production scheduling. As the PFSP is NP-hard even for the single-objective problem (Gonzalez & Sahni, 1978). Recently, meta-heuristic algorithm provides novel approaches for solving complicated problems as mentioned above. This article proposed an ICSO which is a hybrid meta-heuristic algorithm. In order to avoid the shortcomings of single algorithms, ICSO integrates heterogeneous biologicalinspired strategies and mechanisms into one algorithm. These strategies include a unique two-subpopulation structure, an adaptive variable step size mechanism based a linear decreasing function and a crossover based on DE operation. With the iteration of the algorithm, the size of the two subpopulations changes dynamically, but the size of the whole population remains unchanged. Furthermore, the individual fitness information in the optimization process is used to divide the two subpopulations. The two subpopulations adopt different optimization methods. The high fitness subpopulation is responsible for exploiting better solutions, while the low fitness subpopulation is responsible for exploring unknown solutions. The diversity of solutions is increased due to the different optimization methods adopted by the two subpopulations. In order to exchange information between individuals, the two subpopulations are merged after iteration and a crossover operation based on DE is applied.
In order to see how ICSO performs, a series of experiments on 24 classic standard functions compared with DDICS, CSDE, CSEI and DE/rand are carried out.
The comprehensive analysis of ICSO shows that ICSO yielded better results than the other comparison methods with the same number of evaluations and runs on most cases.