An Improved Grey Wolf Optimization Algorithm with Variable Weights

With a hypothesis that the social hierarchy of the grey wolves would be also followed in their searching positions, an improved grey wolf optimization (GWO) algorithm with variable weights (VW-GWO) is proposed. And to reduce the probability of being trapped in local optima, a new governing equation of the controlling parameter is also proposed. Simulation experiments are carried out, and comparisons are made. Results show that the proposed VW-GWO algorithm works better than the standard GWO, the ant lion optimization (ALO), the particle swarm optimization (PSO) algorithm, and the bat algorithm (BA). The novel VW-GWO algorithm is also verified in high-dimensional problems.


Introduction
A lot of problems with huge numbers of variables, massive complexity, or having no analytical solutions were met during the behavior of exploring, exploiting, and conquering nature by human beings. e optimization methods are proposed to solve them. But unfortunately, because of the no free lunch rule [1], it is always hard to find a universal efficient way for almost all problems. erefore, scientists and engineers around the world are still under ways to find more optimization algorithms and more suitable methods.
Traditionally, the optimization algorithms are divided into two parts: the deterministic algorithms and the stochastic algorithms [2]; the deterministic algorithms are proved to be easily trapped in local optima, while the stochastic algorithms are found to be capable of avoiding local solutions with randomness. us, more attention is paid to the stochastic algorithms, and more and more algorithms are proposed. Among the research on the stochastic algorithms, presentations, improvements, and applications of the nature-inspired computing (NIC) algorithms come into being a hot spot. e NIC algorithms are proposed with inspiration of the nature, and they have been proved to be efficient to solve the problems human meet [3,4]. One of the most important parts of NIC algorithms are the bionic algorithms, and most of the bionic algorithms are metaheuristic [5][6][7]. ey can solve problems with parallel computing and global searching. e metaheuristic algorithms divide the swarms in global and local searching with some methods. ey cannot guarantee the global optimal solutions; thus, most of the metaheuristic algorithms introduce randomness to avoid local optima. e individuals in swarms are controlled to separate, align, and cohere [8] with randomness; their current velocities are composed of the former velocities, random multipliers of the frequency [9], or Euclidean distances of specific individuals' positions [10][11][12][13][14]. Some improvements are made with inertia weights modification [15][16][17], hybridization with invasive weed optimization [18], chaos [19], and binary [20] vectors et al. Most of these improvements result in a little better performance of the specific algorithms, but the overall structures remain unchanged.
Almost all of the metaheuristic algorithms and their improvements so far are inspired directly from the behaviors of the organisms such as searching, hunting [11,21], pollinating [13], and flashing [14]. In the old metaheuristic algorithms, such as the genetic algorithm (GA) [22], simulated annealing (SA) [23], and the ant colony optimization (ACO) algorithm [24], the individuals are treated in the same way, and the final results are the best fitness values. Metaheuristic algorithms perform their behavior under the same governing equations. To achieve a better performance and decrease the possibility of being trapped in local optima, random walks or levy flights are introduced to the individuals when specific conditions are might [25,26]. ese mostly mean that the swarms would perform their behavior in more uncontrolling ways. Furthermore, as organisms living in swarms in nature, most of them have social hierarchies as long as they are slightly intelligent. For example, in an ant colony, the queen is the commander despite its reproduction role; the dinergates are soldiers to garden the colony, while the ergates are careered with building, gathering, and breeding. It can be concluded that the hierarchy of the ant colony is queen → dinergates → ergates if they are classified with jobs. e ergates' behavior could be directed by their elder's experience and their queen or the dinergates. If the ergates are commanded by the queen, some dinergates, or elders, and such operations are mathematically described and introduced to the ant colony optimization (ACO) in some way, will the ACO algorithm perform better in solving the problems? In other words, how about the social hierarchy of the swarms considered in the metaheuristic algorithms? is work was done by Mirjalili et al., and a new optimization method called the grey wolf optimization (GWO) algorithm was proposed [27]. e GWO algorithm considers the searching, hunting behavior, and the social hierarchy of the grey wolves. Due to less randomness and varying numbers of individuals assigned in global and local searching procedures, the GWO algorithm is easier to use and converges more rapidly. It has been proved to be more efficient than the PSO [27] algorithm and other bionic algorithms [28][29][30][31][32]. More attention had been paid to its applications due to its better performance. Efforts have been done in feature and band selection [33,34], automatic control [29,35], power dispatching [32,36], parameter estimation [31], shop scheduling [28], and multiobjective optimization [37,38]. However, the standard GWO algorithm was formulated with equal importance of the grey wolves' positions, which is not consistent strictly with their social hierarchy. Recent developments of the GWO algorithms such as the binary GWO algorithm [34], multiobjective GWO algorithm [37], and mix with others [39], together with their applications [40][41][42][43] keep it remaining unchanged. If the searching and hunting positions of the grey wolves are also agreed to the social hierarchy, the GWO algorithm will be possibly improved. With a hypothesis that the social hierarchy of the grey wolves would be also functional in the grey wolves' searching procedure, we report an improvement of the original GWO algorithm in this paper. And considering the applications in engineering when a maximum admissible error (MAE) is usually restricted for given problems, a declined exponentially governing equation of the controlling parameter is introduced to avoid the unknown maximum iteration number. e rest of this paper is organized as follows: Section 2 presents the inspiration of the improvement and the revision of the controlling equations to meet the needs of the latter experiments. Experiment setup is described in Section 3, and results are compared in Section 4. Finally, Section 5 concludes the work and further research suggestions are made.

Algorithms
According to Mirjalili et al. [27], the grey wolves live together and hunt in groups. e searching and hunting process can be described as follows: (1) if a prey is found, they first track and chase and approach it. (2) If the prey runs, then the grey wolves pursue, encircle, and harass the prey until it stops moving. (3) Finally, the attack begins.

Standard GWO Algorithm.
Mirjalili designed the optimization algorithm imitating the searching and hunting process of grey wolves. In the mathematical model, the fittest solution is called the alpha (α), the second best is beta (β), and consequently, the third best is named the delta (δ). e rest of the candidate solutions are all assumed to be omegas (ω). All of the omegas would be guided by these three grey wolves during the searching (optimizing) and hunting.
When a prey is found, the iteration begins (t � 1). ereafter, the alpha, beta, and the delta wolves would lead the omegas to pursue and eventually encircle the prey. ree coefficients A → , C → , and D → are proposed to describe the encircling behavior: where t indicates the current iteration, X → is the position vector of the grey wolf, and X 1 �→ , X 2 �→ , and X 3 �→ are the position vectors of the alpha, beta, and delta wolves. X → would be computed as follows: e parameters A → and C → are combinations of the controlling parameter a and the random numbers r 1 → and r 2 → [27]: e controlling parameter a changes A → and finally causes the omega wolves to approach or run away from the dominant wolves such as the alpha, beta, and delta. eoretically, if | A → | > 1, the grey wolves run away from the dominants, and this means the omega wolves would run away from the prey and explore more space, which is called a global search in optimization. And if | A → | < 1, they approach the dominants, which means the omega wolves would follow the dominants approaching the prey, and this is called a local search in optimization. e controlling parameter a is defined to be declined linearly from a maximum value of 2 to zero while the iterations are being carried on: where N is the maximum iteration number, and it is initialized at the beginning by users. It is defined as the cumulative iteration number. e application procedure can be divided in three parts. (1) e given problems are understood and mathematically described, and some elemental parameters are then known. (2) A pack of grey wolves are randomly initialized all through the space domain. (3) e alpha and other dominant grey wolves lead the pack to search, pursue, and encircle the prey. When the prey is encircled by the grey wolves and it stops moving, the search finishes and attacks begin. e pseudocode is listed in Table 1.

Proposed Variable Weights and eir Governing Equations.
We can see from the governing equation (5) that the dominants play a same role in the searching process; every one of the grey wolves approaches or runs away from the dominants with an average weight of the alpha, beta, and delta. However, although the alpha is the nearest to the prey at the beginning of the search, it might be far away from the final result, let alone the beta and delta. erefore, at the beginning of the searching procedure, only the position of the alpha should be considered in equation (5), or its weight should be much larger than those of other dominants. On the contrary, the averaging weight in equation (5) is also against the social hierarchy hypothesis of the grey wolves. If the social hierarchy is strictly followed in the pack, the alpha is the leader and he/she might be always the nearest one to the prey. e alpha wolf should be the most important, which means that the weight of alpha's position in equation (5) should be always no less than those of the beta and the delta. And consequently, the weight of the beta's position should be always no less than that of the delta. Based on these considerations, we further hypothesize the following: (1) e searching and hunting process are always governed by the alpha, the beta plays a less important role, and the delta plays a much less role. All of the other grey wolves transfer his/her position to the alpha if he/she gets the best. It should be noted that, in real searching and hunting procedures, the best position is nearest to the prey, while in optimization for a global optimum of a given problem, the best position is the maximum or minimum of the fitness value under given restrictions. (2) During the searching process, a hypothesized prey is always surrounded by the dominants, while in hunting process, a real prey is encircled. e dominant grey wolves are at positions surrounding the prey in order of their social hierarchy. is means that the alpha is the nearest one among the grey wolves; the beta is the nearest one in the pack except for the alpha; and the delta ranks the third. e omega wolves are involved in the processes, and they transfer their better positions to the dominants.
With hypothesis mentioned hereinbefore, the update method of the positions should not be considered the same in equation (5).
When the search begins, the alpha is the nearest, and the rest are all not important. So, his/her position should be contributed to the new searching individuals, while all of the others could be ignored. is means that the weight of the alpha should be near to 1.0 at the beginning, while the weights of the beta and delta could be near zero at this time. At the final state, the alpha, beta, and the delta wolves should encircle the prey, which means they have an equal weight, as mentioned in equation (5). Along with the searching procedure from the beginning to the end, the beta comes up with the alpha as he/she always rank the second, and the delta comes up with the beta due to his/her third rank. is means that the weights of the beta and delta arise along with the cumulative iteration number. So, the weight of the alpha should be reduced, and the weights of the beta and delta arise.
e above ideas could be formulated in mathematics. First of all, all of the weights should be varied and limited to 1.0 when they are summed up. Equation (5) is then changed as follows: Secondly, the weight of the alpha w 1 , that of the beta w 2 , and that of the delta w 3 , should always satisfy w 1 ≥ w 2 ≥ w 3 . Mathematically speaking, the weight of the alpha would be changed from 1.0 to 1/3 along with the searching procedure. And at the same time, the weights of the beta and delta would be increased from 0.0 to 1/3. Generally speaking, a cosine function could be introduced to describe w 1 when we restrict an angle θ to vary in [0, arccos(1/3)].
irdly, the weights should be varied with the cumulative iteration number or "it". And we know that w 2 · w 3 ⟶ 0 when it � 0 and w 1 , w 2 , w 3 ⟶ 1/3 when it ⟶ ∞. So, we introduce an arc-tangent function about it which would be varying from 0.0 to π/2. And magically sin (π/4) � cos (π/4) � � 2 √ /2, so another angular parameter φ was introduced as follows: Considering w 2 would be increased from 0.0 to 1/3 along with it, we hypothesize that it contains sin θ and cos φ and when it ⟶ ∞, θ ⟶ arccos (1/3), w 2 � 1/3, we can then formulate w 2 in details. Based on these considerations, a new update method of the positions with variable weights is proposed as follows: e curve of the variable weights is drawn in Figure 1. We can then find that the variable weights satisfy the hypothesis, the social hierarchy of the grey wolves' functions in their behavior of searching.

Proposed Declined Exponentially Governing Equation of
the Controlling Parameter. In equation (7), the controlling parameter is declined linearly from two to zero when the iterations are carrying on from zero to the maximum N. However, an optimization is usually ended with a maximum admissible error (MAE) which is requested in engineering.
is also means that the maximum iteration number N is unknown.
Furthermore, the controlling parameter is a restriction parameter for A, who is responsible for the grey wolf to approach or run away from the dominants. In other words, the controlling parameter governs the grey wolves to search globally or locally in the optimizing process. e global search probability is expected to be larger when the search begins; and consequently, the local search probability is expected to be larger when the algorithm is approaching the optimum. erefore, to obtain a better performance of the GWO algorithm, the controlling parameter is expected to be decreased quickly when the optimization starts and converge to the optimum very fast. On the contrary, some grey wolves are expected to remain global searching to avoid being trapped in local optima. Considering these reasons, a controlling parameter declined exponentially [44] is introduced as described below: where a m is the maximum value and M is an admissible maximum iteration number. e parameter M restricts the algorithm to avoid long time running and nonconvergence. It is expected to be larger than 10 4 or 10 5 based on nowadays computing hardware used in most laboratories.

Empirical Studies and the Experiments Prerequisite
e goal of experiments is to verify the advantages of the improved GWO algorithm with variable weights (VW-GWO) with comparisons to the standard GWO algorithm and other metaheuristic algorithms in this paper. Classically, optimization algorithms are applied to optimize benchmark functions which were used to describe the real problems human meet.

Empirical Study of the GWO Algorithm.
Although there are less numbers of parameters in the GWO algorithm than that in other algorithms such as the ALO, PSO, and bat algorithm (BA) [45], the suitable values of the parameters remain important for the algorithm to be efficient and economic. Empirical study has been carried out, and results show that the population size is expected to be 20∼50 balancing the computing complexity and the convergent rate. In an empirical study on the parameters of the maximum value a m , the sphere function (F1) and Schwefel's problems 2.22 (F2) and 1.2 (F3) are optimized to find the relationship between a m and the mean least iteration times with a given error tolerance of 10 −25 , as shown in Figure 2. Computational Intelligence and Neuroscience We can know from Figure 2 the following: (1) the maximum value a m of the controlling parameter a influences the MLIT under a given MAE; when a m is smaller than 1.0, the smaller the a m is, the more the MLIT would be needed. On the contrary, if the a m is larger than 2.5, the larger the a m is, the more the MLIT would be needed.
(2) a m should be varied in [1.0, 2.5], and a m is found to be the best when it is 1.6 or 1.7.

Benchmark Functions.
Benchmark functions are standard functions which are derived from the research on nature. ey are usually diverse and unbiased, difficult to be solved with analytical expressions. e benchmark functions have been an essential way to test the reliability, efficiency, and validation of optimization algorithms. ey varied from the number of ambiguous peaks in the function landscape, the shape of the basins or valleys, reparability to the dimensional. Mathematically speaking, the benchmark functions can be classified with the following five attributes [46]. (e) Unimodal or multimodal: some of the functions have only one peak in their landscape, but some of them have many peaks. e former attribute is called unimodal, and the latter is multimodal.
ere are 175 benchmark functions, being summarized in literature [46]. In this paper, we choose 11 benchmark functions from simplicity to complexity including all of the above five characteristics. ey would be fitted to test the capability of the involved algorithms, as listed in Table 2

Results and Discussion
ere are 11 benchmark functions being involved in this study. Comparisons are made with the standard grey wolf optimization algorithm (std. GWO) and three other bionic methods such as the ant lion optimization algorithm (ALO), the PSO algorithm, and BA.

General Reviews of the Algorithms.
e randomness is all involved in the algorithms studied in this paper, for example, the random positions, random velocities, and random controlling parameters. e randomness causes the fitness values obtained during the optimization procedure to fluctuate. So, when an individual of the swarm is initialized Computational Intelligence and Neuroscience or it randomly jumps to a position quite near the optimum, the best fitness value would be met. Table 3 lists the best and worst fitness results of some chosen benchmark functions and their corresponding algorithms. During this experiment, 100 Monte Carlo (MC) simulations are carried out for every benchmark function. e results show that the randomness indeed leads to some random work, but at most of the time, the final results would be more dependent on the algorithms. e GWO algorithms always work the best at first glance of Table 3, either the VM-GWO or the std. GWO algorithm could optimize the benchmark functions best to its optima with little absolute errors, while the proposed VM-GWO algorithm is almost always the best one. Other compared algorithms such as the PSO, ALO algorithms, and the BA would lead to the worst results at most time. ese mean that the GWO algorithms are more capable, and the proposed VM-GWO algorithm is indeed improving the capability of the std. GWO algorithm. A figure about the absolute errors averaged over MC � 100 versus iterations could also lead to this conclusion, as shown in Figure 3. e convergence rate curve during the iterations of F3 benchmark function is demonstrated in Figure 3. It shows that the proposed VM-GWO algorithm would result in faster converging, low residual errors, and stable convergence.

Comparison, Statistical Analysis, and Test.
General acquaintances of the metaheuristic algorithms might be got from Table 3 and Figure 3. However, the optimization problems often demand the statistical analysis and test. To do this, 100 MC simulations are carried out on the benchmark functions. e benchmark functions are all two dimensional, and they are optimized by the new proposed VM-GWO and other four algorithms over 100 times. Causing the benchmark functions are all concentrated to zeros, and the simulated fitness results are also their absolute errors.
e mean values of the absolute errors and the standard deviations of the final results are listed in Table 4; some of the values are quoted from the published jobs, and references are listed correspondingly. e proposed VM-GWO algorithm and its compared algorithms are almost all capable of searching the global optima of the benchmark functions. e detailed values in Table 4 show that the standard deviations of the 100 MC simulations are all small. We can further draw the following conclusions: (1) All of the algorithms involved in this study were able to find the optimum. (2) All of the benchmark functions tested in this experiment could be optimized, whether they are unimodal or multimodal, under the symmetric or unsymmetric domain. (3) Comparatively speaking, although the bat algorithm is composed of much more randomness, it did the  Computational Intelligence and Neuroscience worst job. e PSO and the ALO algorithm did a little better. (4) e GWO algorithms implement the optimization procedure much better. e proposed VM-GWO algorithm optimized most of the benchmark functions involved in this simulation at the best, and it did much better than the standard algorithm. erefore, the proposed VM-GWO algorithm is better performed in optimizing the benchmark functions than the std. GWO algorithm as well as the ALO, PSO algorithm, and the BA, which can be also obtained from the Wilcoxon rank sum test [47] results, as listed in Table 5.
In Table 5, the p values of the Wilcoxon rank sum test is reported and show that the proposed VM-GWO algorithm has superiority over most of the benchmark functions except F5: Rosenbrock function.

Mean Least Iteration Times (MLIT) Analysis over
Multidimensions. Compared with other bionic algorithms, the GWO algorithm has fewer numbers of parameter. Compared with the std. GWO algorithm, the proposed VM-GWO algorithm does not generate additional uncontrolling parameters. It furthermore improves the feasibility of the std. GWO algorithm by introducing an admissible maximum iteration number. On the contrary, there are large numbers of randomness in the compared bionic algorithms such as the ALO, PSO algorithms, and the BA. erefore, the proposed algorithm is expected to be fond by the engineers, who need the fastest convergence, the most precise results, and which are under most control. us, there is a need to verify the proposed algorithm to be fast convergent, not only a brief acquaintance from Figure 3.
Generally speaking, the optimization algorithms are usually used to find the optima under constrained conditions. e optimization procedure must be ended in reality, and it is expected to be as faster as capable. e admissible maximum iteration number M forbids the algorithm to be run endlessly, but the algorithm is expected to be ended quickly at the current conditions. is experiment will calculate the mean least iteration times (MLIT) under a maximum admissible error. e absolute values of MAE are constrained to be less than 1.0 × 10 −3 and M � 1.0 × 10 5 . In this experiment, 100 MC simulations are carried out, and for simplicity, not all classical benchmark functions are involved in this experiment. e final statistical results are listed in Tables 6-8. Note that the complexity of the ALO algorithm is very large, and it is time exhausted based on the current simulation hardware described in Appendix. So, it is not included in this experiment. Table 8 lists the MLIT data when VW-GWO, std. GWO, PSO algorithm, and BA are applied to the unimodal benchmark function F1. e best, worst, and the standard deviation MLIT values are listed. e mean values are also calculated, and t-tested are carried out with α � 0.05. e last column lists the remaining MC simulation numbers discarding all of the data when the searching processes reach the admissible maximum iteration number M.
e final results demonstrate the best performance of the proposed VM-GWO algorithm on unimodal benchmark functions compared to other four algorithms involved. e data in Tables 6-8 are under the same conditions, and only difference is that Table 6 lists the data obtained when the algorithms are applied to a multimodal benchmark function with the symmetrical domain. However, Table 8 lists the data obtained when the algorithms applied to a multimodal benchmark function with the unsymmetrical domain. A same conclusion could be drawn.
Note that, in this experiment, the dimensions of the benchmark functions are varied from 2 to 10 and 30. e final results also show that if the dimensions of the benchmark functions are raised, the MLIT values would be increased dramatically. is phenomenon would lead to the doubt whether it also performs the best and is capable to solve high-dimensional problems. Tables 6-8 show that the larger the dimensions are, the more the MLIT values would be needed to meet to experiment constraints. However, as described in the first part, the optimization algorithms are mostly developed to solve the problems with huge number of variables, massive complexity, or having no analytical solutions. us, the high-dimensional availability is quite interested. As described in the standard GWO algorithm, the proposed VM-GWO algorithm should also have the merits to solve the large-scale problems. An experiment with dim � 200 is carried out to find the capability of the algorithms solving the high-dimensional problems. For simplicity, three classical benchmark functions, such as F4: Schwefel's problem 2.21 function, F8: exponential function, and F11: Zakharov function, are used to demonstrate the results, as listed in Table 9. e final results of 100 MC experiments will be evaluated and counted, and Computational Intelligence and Neuroscience Computational Intelligence and Neuroscience each time the search procedure will be also iterated for a hundred times. e data listed in Table 9 show that the GWO algorithms would be quickly convergent, and the proposed algorithm is the best to solve the large-scale problems.

High-Dimensional Availability Test.
To test its capability even further, we also carry out an experiment to verify the capability solving some benchmark function in high dimensions with restrictions MC � 100 and MLIT � 500. In this experiment, we change the dimensions from 100 to 1000, and the final results which are also the      Figure 4. We can see from Figure 4 that the VM-GWO is capable to solve high-dimensional problems.

Conclusions
In this paper, an improved grey wolf optimization (GWO) algorithm with variable weights (VW-GWO algorithm) is proposed. A hypothesize is made that the social hierarchy of the packs would also be functional in their searching positions. And variable weights are then introduced to their searching process. To reduce the probability of being trapped in local optima, a governing equation of the controlling parameter is introduced, and thus, it is declined exponentially from the maximum. Finally, three types of experiments are carried out to verify the merits of the proposed VW-GWO algorithm. Comparisons are made to the original GWO and the ALO, PSO algorithm, and BA.
All the selected experiment results show that the proposed VW-GWO algorithm works better under different conditions than the others. e variance of dimensions cannot change its first position among them, and the proposed VW-GWO algorithm is expected to be a good choice to solve the large-scale problems.
However, the proposed improvements are mainly focusing on the ability to converge. It leads to faster convergence and wide applications. But it is not found to be capable for all the benchmark functions. Further work would be needed to tell the reasons mathematically. Other initializing algorithms might be needed to let the initial swarm individuals spread all through the domain, and new searching rules when the individuals are at the basins would be another hot spot of future work. Computational Intelligence and Neuroscience 11