Multiscale Cooperative Differential Evolution Algorithm

A multiscale cooperative differential evolution algorithm is proposed to solve the problems of narrow search range at the early stage and slow convergence at the later stage in the performance of the traditional differential evolution algorithms. Firstly, the population structure of multipopulation mechanism is adopted so that each subpopulation is combined with a corresponding mutation strategy to ensure the individual diversity during evolution. Then, the covariance learning among populations is developed to establish a suitable rotating coordinate system for cross operation. Meanwhile, an adaptive parameter adjustment strategy is introduced to balance the population survey and convergence. Finally, the proposed algorithm is tested on the CEC 2005 benchmark function and compared with other state-of-the-art evolutionary algorithms. The experiment results showed that the proposed algorithm has better performance in solving global optimization problems than other compared algorithms.


Introduction e differential evolution (DE) is a bionic intelligence method proposed by American scholars Rainer Storn and
Kenneth Price in 1995, simulating survival of the fittest [1,2]. e algorithm adopts mutation, crossover, and selection operations to mimic genetic mutations during biological evolution and retains highly adaptable individuals for optimal solutions. Aiming at the problems of both population convergence stagnation and premature convergence, researchers mainly focus on three aspects of control parameter setting and mutation strategy selection [3][4][5][6], crossover operation [7][8][9], and population structure [10][11][12] to improve the algorithm performance. e DE has been widely concerned by researchers because of its simple coding, convergence, and strong robustness. It has been applied in many fields such as industrial control [13], antenna design [14], power system [15], image processing [16], and so on. e parameter control and evolutionary strategy selection are mainly discussed in two aspects of DE. On the one hand, control parameter settings for the scaling factor F, crossover probability CR, and population size NP [17]. On the other hand, different strategies for different optimization problems [18], we need to choose the most suitable strategy. e parameter setting affects the population diversity [19], the development ability of the early period, and the convergence of the later period [20]. e choice of evolutionary strategy is the key step to determine the balance between exploration and convergence of DE, and different evolution strategies will show different surveying capabilities and the convergence tendencies. At the same time, diverse crossover operations have diverse effects on seeking global optimization. Although the traditional binomial crossover operation has a certain role, it is more dependent on the cross coordinate system and is widely used. In addition, the population structure is also an important indicator for the algorithm performance. If the population size is too small, it will easily lead to the loss of effective alleles, thereby reducing the generation of competitive individuals. In contrast, if the population size is too large, the possibility of correct search direction by the algorithm will be reduced [11].
To further improve the convergence and reduce the population stagnation, a multiscale cooperative differential evolution (MCDE) algorithm is proposed. In terms of parameter setting, the scaling factor F and the crossover probability CR are mainly adjusted based on the literature [24]. In the selection of mutation strategy, the MCDE selects "current-to-pbest/1," "current-to-rand/1," and "rand/1" as mutation strategy groups. In the initial phase, the evolutionary population was divided into multiple subpopulations, and one subpopulation was selected as the experimental population to test the mutation strategy with better evolutionary results. In the evolutionary phase, the global search capability is continuously promoted by establishing a constant rotation of the cross coordinate system and coordinating among multiple subpopulations. In the end, the best individual that remains is used as the optimal solution. In CEC 2005, 30-dimension and 50-dimension simulation tests were conducted and compared with contemporary evolutionary algorithms; MCDE was found to have more significant effects. e paper is organized as follows. Section 2 briefly introduces the standard DE algorithm. Section 3 elaborates on the algorithm improvement. Section 4 analyzes the significance of the proposed algorithm through experimental data. Section 5 gives a summary.

Standard Differential Evolution Algorithm
DE can be regarded as greedy evolution algorithm based on real number coding and global optimization. In the evolutionary phase, three iteration processes of mutation, crossover, and selection are performed until the stop condition is satisfied.
e fitness function f(x) is utilized to evaluate the quality and the best individual is recorded.

Initialization.
Assuming that the population size is NP and the dimension of the feasible solution space is D, x G is employed to represent the evolution population of G generation. Each individual is composed of D-dimensional parameters, which can be expressed as where x G i,j ∈ (x L , x H ) and x L and x H represent the upper and lower bounds of the individual, respectively.

Mutation Operation.
e individual x G i in the parent population generates a variant individual v G i by a mutation strategy. "DE/rand/1" indicates that the DE chooses a random perturbation individual to mutate. e expression is as follows: where r1 ≠ r2 ≠ r3 and r1, r2, and r3 are a randomly generated mutation individual. e scaling factor F is chosen from [0, 1].

Crossover Operation.
e main function of the crossover operation is that the generated variant individuals cross with individuals in the original population to generate new crossover individuals. e DE adopts binomial crossover scheme. e crossover operation is as follows: where rand j ∈ [0, 1], j rand is chosen from {1, 2, . . ., D}, and crossover probability CR is [0, 1].

Selection
Operation. e selection operation mainly adopts the greedy selection mode of the survival of the fittest, making the offspring always superior to or equal to the parent individual x i . When the fitness value of the new individual u i is better than that of the objective individual, the new individuals u i will be accepted by the population. Otherwise, x i still remains in the next generation population and continues to perform mutation and crossover operations as the objective individual in the next iterative calculation so that the population will always evolve toward the optimal solution. e selection operation is for minimization fitness value as follows: where f(x) is the objective function to be optimized.

Multiscale Cooperative Differential Evolution Algorithm
In the proposed algorithm, we divide the whole population into multiple subpopulations and give corresponding 2 Computational Intelligence and Neuroscience mutation strategies. en, the cross coordinate system of each subpopulation is established by covariance learning and parameter adaptation of evolutionary subpopulation. Finally, the obtained crossover individual is selected and the individuals with better fitness are retained to make the whole population search forward to the global optimal solution.

Multiscale Mutation Strategy Integration
Method. In recent years, because different mutation strategies are suitable for solving different optimization functions, some researchers mainly focus on multiple mutation strategies method [23,24]. Even for a specific optimization problem, the most appropriate mutation strategy may be different at different stages of evolution. erefore, mutation strategy is an important indicator to ensure significant results in the DE. During evolution, this paper selects "current-to-pbest/ 1," "current-to-rand/1," and "rand/1" as the multiscale mutation strategy set because of the different performance requirements for the mutation strategy. e individuals of "current-to-rand/1" and "rand/1" involved in mutation are all selected in a random manner so that global optimization can be performed in the early stages of evolution. "currentto-pbest/1" seeks the global optimal solution through the current best population individual. During evolution, the search range can be reduced to the vicinity of the optimal solution and the convergence speed can be accelerated. Current-to-rand/1: where x G pbest is uniformly chosen as one of the top p individuals in the current population with pbest ∈ (0, 1]. Because the three mutation strategies have their own advantages, there are some differences. erefore, the population multiscale mechanism is introduced in this paper. e whole population Pop is divided into three subpopulation Pop 1 , Pop 2 , and Pop 3 . Pop 1 with the largest population size is determined as the experimental population and combined with the corresponding mutation strategy. During evolution, the experimental population is allocated to mutation strategy with better evolution results. e population structure is expressed as follows: Pop i , where we assume that Pop 1 is an experimental population, NP is the population size, σ represents population size ratio, After the population structure is well designed, the distribution rules of the subpopulations should be given. First, subpopulation Pop 1 , Pop 2 , and Pop 3 incorporate corresponding mutation strategies. en, the population undergoes mutation, crossover, and selection operations. Finally, the total number bd i of superior individuals retained after each subpopulation evolution is counted. at is, the superior rate br i of the subpopulation can be expressed as e superior rate br i of each generation of subpopulation is calculated, and the subpopulation is reallocated for three mutation strategies according to the superior rate br i in the next generation initialization stage. e multiscale mutation strategy set method makes full use of the advantages of the three mutation strategies to regulate and balance the contradiction between the population diversity and the convergence speed, which can be seen from the experimental results of the latter. In the first generation, the subpopulations randomly assign a mutation strategy. At the end of the first generation, we calculate the subpopulations superior rate by equation (9). e maximum superior rate stands for the best mutation strategy in this generation. Assume that the firstgeneration mutation strategy "current-to-pbest/1" has the highest superior rate, and the second generation assigns Pop 1 to it. e remaining subpopulations Pop 2 and Pop 3 randomly assign a mutation strategy ("current-to-rand/1" or "rand/1").

Covariance Learning.
e aforementioned crossover operators mainly depend on the coordinate system, while the distribution information of the population reflects the direction of evolution to some extent [20]. During evolution, the distribution of population is often neglected, leading to the possibility of the population falling into local optimum and premature convergence. In this paper, variance and covariance are utilized to analyze population distribution and form covariance matrix to reflect population diversity information. erefore, the systematic use of covariance matrix can reduce the dependence on coordinate system and the interaction between variables. Covariance matrix learning includes two related technologies: the feature decomposition and coordinate transformation of covariance matrix. e covariance matrix learning steps are as follows: Step 1. Calculate covariance matrix C of subpopulations.
Step 2. Get the eigenvalue λ and feature vector matrix R of covariance.
Step 3. Update the objective individual and the variant individual through the feature-based cooperative system.
Step 4. Populations with better fitness for crossover and selection operations are retained and rotated back to the original coordinate system.
Based on the above four steps, we establish the population feature coordinate system. Figure 1(a) shows the initial coordinate system of population evolution, and Figure 1(b) shows the feature coordinate system. By analyzing the population feature, we obtain the ox 1 x 2 coordinate system and discover that we can find the global optimum faster.

Adaptive Control Parameter Settings.
At present, researchers have proposed many effective parameter adaptation methods [21,23,24]. e combination of different control parameters and mutation strategies for the optimization problem will yield different results. In this paper, each scale strategy has its own control parameters, and different technologies are applied to the algorithm. e method in [24] is more suitable for the algorithm, so it adapts to the algorithm by improving its technology.
During evolution, scaling factor F plays a decisive role in the search range of base vectors. In standard DE algorithm, the value of F is a fixed value, which cannot be applied to solve all global optimization functions. In this paper, the scalar factor F mainly adopts the Cauchy inverse cumulative distribution function, assuming that F i,j represents the scale factor of each dimension in the individual. F i,j is expressed as follows: where Fm j is the position parameter of the Cauchy inverse cumulative distribution function and the scale factor of current individual and the initial value of Fm j is set to 0.5. 0.1 indicates the scale parameter of the Cauchy inverse cumulative distribution function. To better apply to population evolution, the weighting factor c is introduced to combine the parent factor and the next generation factor. e current Fm j is expressed as follows: where c ∈ [0, 1] and parental scalar factor S F,j is calculated using the power mean. e power mean is expressed as follows: where n is the index value of the power mean, which is quantified to the influence of the parent's scaling factor on the offspring. In the DE algorithm, the crossover probability CR determines the possibility that an objective individual inherits gene from variant individual v G i . In this paper, the crossover probability CR mainly adopts the normal distribution function, assuming that CR i,j represents the crossover probability of each dimension in the individual. CR i,j is expressed as follows: where CRm j is the mean of individual crossover probability and the initial value is set to 0.5. e standard deviation of normal distribution is set to 0.1. To better inherit the parent gene, a weighting factor c is introduced to combine the parent crossover probability with the next generation crossover probability. CRm j is expressed as follows: where c ∈ [0, 1] and parental crossover probability S CR,j is calculated using the Lehmer mean. e Lehmer mean is as follows: e Lehmer mean method can flexibly adjust the value of CR according to the parent cross probability.

Algorithm Framework.
e proposed algorithm combines multiscale strategy and covariance learning and introduces adaptive control parameters to lead the population to keep close to the global optimum. Based on the above analysis, the basic flow of MCDE is summarized as Algorithm 1.

Experimental Results and Analysis
e MCDE is tested on 25 benchmark functions of IEEE CEC 2005. e 25 benchmark functions mainly include unimodal function F 1 -F 5 , basic multimodal function F 6 -F 12 , extended multimodal function F 13 -F 14 , and complex function F 15 -F 25 . For details, please refer to [45]. In this paper, the parameter setting of the MCDE is as follows: population size NP � 250 and subpopulation ratio σ 1 � 0.6, σ 2 � σ 3 � 0.2. Experimental environment: the operating system is win7 Professional 64 bit, CPU is core i7 (3.40 GHz), RAM is 8 GB, and the compiler is MATLAB R2014b.

Comparison with Improved DE Algorithm.
To verify the performance of the MCDE, it is compared with the six classic DE-improvement algorithms: JADE [24], jDE [21], SaDE [23], EPSDE [17], CoDE [18], CoBiDE [31], and LSHADE [46]. JADE and jDE are representative algorithms and are heavily referenced. SaDE and EPSDE are based on multistrategy improved algorithms. CoDE and CoBiDE are improved algorithms based on population structure. e experimental results of the above seven algorithms are shown in Table 1, where D � 30 and MaxFES � 300000. e form of the numerical values in the table is the mean error ± standard deviation. "− / + / ≈ " means that the comparison algorithm is obviously better than, worse than, and similar to MCDE. Based on the data given in Table 1 and Figure 2, we can draw the following conclusions: (1) unimodal function F 1 -F 5 : among the comparison algorithms, JADE and LSHADE have the best effect. Because of the greedy strategy "current-to-pbest/1," the algorithm can achieve fast convergence and high precision. However, the MCDE multiscale strategy achieves better results than JADE in the accuracy of the benchmark functions F 3 (a), F 4 (b), and F 5 (c). (2) Basic multimodal function F 6 -F 12 : the best performing algorithm is CoBiDE, which is better than MCDE on benchmark functions F 6 , F 8 , F 9 , and F 11 . Test results on F 7 (d), F 10 (e), and F 12 (f ) are worse than MCDE.
Overall, the MCDE is similar to the CoBiDE on this type of benchmark function. (3) Extended multimodal function F 13 -F 14 (g): the average error of the seven algorithms is at an order of magnitude, but the effects of JADE, CoDE, and MCDE are slightly better than the other four algorithms. (4) Global optimum Global optimum Figure 1: Population initial coordinate system and population feature coordinate system: (a) the initial coordinate system of population evolution; (b) the Eigen coordinate system of population evolution.
From Table 2, it can be seen that when Wilcoxon's detection is at α � 0.05 and α � 0.1, MCDE is more effective than JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE. According to Friedman's average ranking (D � 30) in Table 3, MCDE performed well in all types of benchmark functions and achieved the best ranking. e experimental results of the above seven algorithms are shown in Table 4, where D � 50 and MaxFES � 500000. Based on the data given in Table 4 and Figure 3, we can draw the following conclusions. (1) Unimodal function F 1 -F 5 : in D � 50, the effect of MCDE is higher than the other algorithms on F 4 (b) and F 5 (c). It shows that the search scope of mutation strategy is wider and the parameter-adaptive control performance is better. e test results of F 2 and F 3 are only inferior to the JADE and LSHADE. (2) Basic multimodal function F 6 -F 12 : CoBiDE and LSHADE perform significantly better in this type of benchmark functions, whereas MCDE is slightly better than that on F 6 (d), F 7 (e), F 10 (f ), and F 12 (g). (3) 18,20,21,17,22,13, and 15 benchmark functions, worse than them on 4, 2, 2, 7, 2, 6, and 7 benchmark functions, and similar to them on 3, 3, 2, 1, 1, 6, and 3 benchmark functions, respectively. From Table 5, it can be seen that when Wilcoxon's detection is at α � 0.05, MCDE is more effective than JADE, jDE, SaDE, EPSDE, CoDE, and LSHADE, and the p value of CoBiDE is 0.06. At α � 0.1, MCDE has significant differences from other algorithms. Based on Table 6 Friedman average ranking in D � 50, MCDE performed well in all types of benchmark functions and achieved the top ranking.

Comparison with Related Evolutionary Algorithms.
To further evaluate MCDE, it is compared with CLPSO [47], CMA-ES [48], and GL-25 [25,26]. CLPSO is a local version of the PSO, adopting a new learning strategy mechanism. CMA-ES adopts a covariance matrix adaptive mechanism and is mainly utilized to solve continuous optimization problems. GL-25 is a global and local real-coded genetic algorithm based on a new crossover operator. e experimental results of MCDE, CLPSO, CMA-ES, and GL-25 are shown in Table 7 at D � 30 and MaxFES � 300000. It can be concluded that MCDE has the most prominent effect on the unimodal function (F 2 -F 5 ) and is smaller than the average error of other evolution algorithms. On the basic multimodal function F 10 and F 12 , the result of MCDE is significantly better than other algorithms. On the extended multimodal functions and complex functions, most functions (F 14 , F 16 , F 17 , F 21 , F 23 , F 24 , and F 25 ) have significant effects. Finally, the experimental results of MCDE in D � 30 are better than CLPSO, CMA-ES, and GL on 19, 15, and 21 benchmark functions, worse than those on 2, 5, and 1 benchmark functions, and similar to those on 4, 5, and 3 benchmark functions.
In this paper, the proposed algorithm is further compared with other evolution algorithms. From Table 8, it can be seen that when Wilcoxon's test detects α � 0.05 and α � 0.1, MCDE's p value is less than 0.05 and 0.1, and the effect is most significant. According to Table 9, the average ranking of Friedman under D � 30 shows that MCDE performs best on benchmark functions.

Runtime Comparison and Mechanism Comparison.
In general, the running time of evolution algorithm contains the operating time of operator and the time of evaluating the fitness function. JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and the proposed algorithms were run 25 times independently on 25 benchmark functions, and the average CPU time consumed was recorded. Set the parameters: MaxFES � 300000 and D � 30. To compare the average time, this paper determines the running speed of the algorithm by means of the mean CPU time ratio (AR) between the algorithms. AR > 1 shows that the algorithm runs slower than MCDE, and AR < 1 shows that the algorithm is faster than MCDE.
From the average AR in Table 10, it can be seen that its main range is [0.85, 13.41]. jDE runs at the fastest speed, and EPSDE runs at the slowest speed. e proposed algorithm is ranked third. e proposed algorithm is slower than jDE and JADE because multiscale strategies increase the search range but consume more time in mutation strategies.
By increasing the experiment with and without multigroup mechanism and covariance learning, it can be concluded from Table 11 and Figure 4 that the multiscale mechanism (DE-1) is outstanding in the unimodal function F 4 (a). Covariance learning (DE-2) performs significantly in basic multimodal and complex functions with relatively complex structures in F 10 (b), F 16 (c), and F 17 (d). e population structure is a multipopulation mechanism, and each subpopulation combines the corresponding mutation strategy to ensure the individual diversity in the evolutionary process. en, the covariance learning establishes a proper      rotation coordinate system for the crossover operation in the population. At the same time, adaptive control parameters are adopted to balance population survey and algorithm convergence.

Conclusions
MCDE introduces multiscale strategies, including local mutation strategies and global mutation strategies, to expand the population search scope. During evolution, the initial coordinate system is properly rotated by the covariance learning matrix to rotate the objective individual and the variant individual. During the covariance learning, the excellent crossover probability CR and the scaling factor F were inherited from the previous generation by the Lehmer mean and the power mean, respectively. e proposed algorithm is compared with JADE, jDE, SaDE, EPSDE, CoDE, CoBiDE, and LSHADE on the CEC 2005 benchmark function, and it can be seen that there are significant effects on the global optimization problem with D � 30 and D � 50. To further verify the algorithm, we compare it with other evolutionary algorithms such as CLPSO, CMA-ES, and GL-25 in D � 30 and discover it works best. In terms of running time, the proposed algorithm is in the upper part of the comparison algorithms. In summary, both the accuracy and the convergence speed have improved, so MCDE can be implemented.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Computational Intelligence and Neuroscience 15