A-DVM: A Self-Adaptive Variable Matrix Decision Variable Selection Scheme for Multimodal Problems

Artificial Bee Colony (ABC) is a Swarm Intelligence optimization algorithm well known for its versatility. The selection of decision variables to update is purely stochastic, incurring several issues to the local search capability of the ABC. To address these issues, a self-adaptive decision variable selection mechanism is proposed with the goal of balancing the degree of exploration and exploitation throughout the execution of the algorithm. This selection, named Adaptive Decision Variable Matrix (A-DVM), represents both stochastic and deterministic parameter selection in a binary matrix and regulates the extent of how much each selection is employed based on the estimation of the sparsity of the solutions in the search space. The influence of the proposed approach to performance and robustness of the original algorithm is validated by experimenting on 15 highly multimodal benchmark optimization problems. Numerical comparison on those problems is made against the ABC and their variants and prominent population-based algorithms (e.g., Particle Swarm Optimization and Differential Evolution). Results show an improvement in the performance of the algorithms with the A-DVM in the most challenging instances.


Introduction
Artificial Bee Colony (ABC) is a swarm intelligence (SI) heuristic for optimization problems inspired by the foraging behavior of honeybees. It was initially designed to solve box-constrained continuous problems [1]. The algorithm consists of three main steps-employed bees, onlooker bees and scout bees-that perform local and global search. In the original implementation of the ABC, at the employed and onlooker bees steps, a single component from each solution is chosen to be updated by a position update rule.
Following its conception, improvements to the search capabilities of the original ABC were proposed by many researchers. The great majority of these propositions centered around changes to the initialization of solutions in the solution space, update procedure of the first two phases and selection method in the onlooker phase [2]. Despite differences between each variant, they all share a common trait: the solution update rule chooses one to n decision variables with equal probability under a random uniform distribution. This may allow for a better exploration of the search space and prevent solutions to collapse in the same subspace at later iterations. However, issues to the consistency and convergence of the algorithm may arise due to this design choice.

Initialization
If no information of the solution space is provided, x i is sampled from a uniform distribution in the feasible interval [x j ,x j ] of each decision variable x i j , i.e., for i = 1, 2, . . . , SN, where U(0, 1) denotes a uniform distribution between 0 and 1. A counter l i = 0 to indicate unsuccessful updates is initialized for each i.

Employed Bees Cycle
A randomly chosen component x i j of each solution x i ∈ X is moved by a random step size towards the jth component of some x k ∈ X, k = i. Therefore x i is updated into: where φ ∈ U(−1, 1). To verify if the update step was successful, the value of f is evaluated and a greedy selection is done for each i = 1, 2 . . . , SN: where e j is the j-th fundamental vector. Needless to say, if (3) fails, then (4) will flag (3) as a failed update.

Onlooker Bees Phase
The solution x i ∈ X is chosen with probability p i according to a weighted roulette selection scheme and updated using (3). This step can be thought of enhancing local search for solutions with better objective function values. The probability p i is determined for each solution x i ∈ X as follows, where F(·) is the adjusted objective function value:

Scout Bees Phase
LetX = {x k ∈ X | l k ≥ Lit} denote the set of solutions flagged as stagnated. A new point in R d is resampled using (2) ifX = ∅. This prevents the algorithm from premature convergence to bad local optima and increases the number of explorations. The parameter Lit is commonly defined equal to SN · d. IfX = ∅, then x w ∈ argmax{ f (x) | x ∈X} is always chosen to be resampled.

ABC Variants
Due to the modular nature of the ABC it is easy to make changes to any of the three steps of the algorithm [6]. Many modifications were proposed by several researchers to improve on some difficulties of the algorithm such as inclusion of memory to assist local search; efficient mechanisms to displace solutions stuck in local optima; handle high-dimensional (d > 100) problem instances; changes to the update rule; and initialization of solutions using local information. As observed by Aydin et al. [2], the ABC variants often differ in some core components such as the initialization of the first solution set; the update step in the employed and onlooker bees; the computation of selection probabilities in the onlooker bees, the way of displacing solutions in the scout step.
Some well-known and successful variants include the chaotic ABC version of Alatas and Bilal [7] which uses chaotic maps for solution initialization, the ABC of Akay and Karaboga [8] and Gao and Liu [9] which update multiple decision variables in a single update step. Additionally, the ABC with modified selection scheme based on neighborhood distances by Diwold et al. [10], integration of the Differential Evolution algorithm with the ABC by Xiang et al. [11] and Akay et al. [12] are well known. The reader is encouraged to read the survey of Karaboga et al. [6], Sharma and Bhambu [13] for further information.

Issues Of Randomization
Population-based optimization methods usually employ randomization. By choosing step sizes, decision variables or even target solutions at random during the update steps, population-based optimization methods can "cover more ground" in the search space effortlessly. This is a key element to the success of population-based heuristics, but not without some unintended side effects.
For the sake of clarity, we refer in accordance to [14] to a neighborhood N (·) as the classical definition of a Euclidean ball centered at a point x k , where · 2 is the 2 norm and ≥ 0. Assume that a stochastic heuristic, such as the Artificial Bee Colony (ABC), runs infinitely on a problem ( f , S) ∈ P, where f is the objective function, S is the feasible set, and P is a problem family. Moreover, borrowing some concepts explained in [15], let X f ,S (ω) = { f (ω k ) | k = 1, 2, . . .} denote the infinite sequence of iterates generated by the heuristic where ω = {ω k | k = 1, 2, . . .} is a sequence of random numbers distributed independently from ( f , S). Let X f ,S andX f ,S denote the set of accumulation points and the closure of the sequence X f ,S (ω). Lastly, let X * f ,S denote the set of global optima. Clearly no X * can be "seen" In the following section, we will see how randomization affects the performance of the ABC.

An Analysis of the ABC Decision Variable Selection
Often overlooked, a common aspect of the ABC variants is that decision variable x i j is chosen according to the same uniform distribution with for all j = 1, 2 . . . , d during the Onlooker and Employed bees steps.
Let Pr(x i j ) be the probability that x i j is chosen in (3). For each situation below, we assume that ( f , S) ∈ P,X f ,S X * f ,S = ∅ and X f ,S (ω) is monotonically decreasing, i.e., f (ω 1 ) ≥ f (ω 2 ) ≥ . . ..
The original ABC chooses a single x i j each time it calls (3) during the Onlooker and Employed bees step. We need to notice the following issues brought by the process of selecting j randomly.

1.
Failed update steps cause solutions to be trapped in basins of attraction: Choosing the same wrong decision variable many times fails to move solutions out of basins of attraction, contributing to wasteful iterations, premature convergence and needless flagging of solutions at the scout bees step. Let x w * ∈ X f ,S , x k * ∈ X f ,S , x s * ∈ X f ,S . Suppose that x w ∈ N (x w * ) and Lastly, let x w j be a component of x w such that a successful update moves x w to N (x k * ) while an update to x w q for any q = j moves x w to N (x s * ). x w is moved in (3) one axis at a time, if x w j is chosen, then (4) accepts x w and l w = 0. Otherwise, l w is incremented by 1 every time x w is rejected by (4). If each component is chosen in (3) with equal probability, then the probability of x w j to be chosen is Pr(x w j ) = 1/d. Therefore, x w has a probability of 1 − 1/d to move to a basin of attraction N (x s * ) similar to N (x w * ), and probability 1/d to move to a more promising region N (x k * ).

2.
Decision variables may never be chosen: If the problem is of high-dimensional (d > 100) or the evaluation function f (x) is so expensive that only a limited number of objective function calls are allowed, there will be at least a component x i j that may never be chosen in (3). Let Pr(∼x i j ) = 1 − 1/d be the probability of x i j not to be chosen at (3). Then the probability of x i j not to be chosen be at the end of MCN iterations is Pr MCN It is clear that Pr MCN (∼x i j ) converges to 0 as the number of iterations goes to infinity. If d > 100 and ABC runs t ≈ d iterations, then Pr MCN (∼x i j ) 1, so x i j is not chosen in (3). At first glance, there would be two ways to resolve these issues. Either assign a non-equal probability to choose the decision variable or choose more than one x i j at (3) to be updated simultaneously. We disprove the effectiveness of these "quick fixes" through following arguments.

•
Changing the choice probabilities of decision variables to be unequal would not solve issue #2 in high-dimensional problems because Pr MCN (∼x i j ) still converges to 0. A sufficient measure, in this case, would keep previously chosen components in memory. This only increases the complexity of the Onlooker and Employed bees phase from O(n) to O(n log n) if non-visited components are kept in a separate list for each solution in X in an efficient way • Changing (3) to choose multiple components from x i would not improve issue #1. Let J ⊂ {1, . . . , d} and suppose that x i j is chosen for each j ∈ J to update. Update rule (3) is an affine transformation in the j-th axis along the line segment between x i j and x k j , i = k. If |J| > 1, then |J| simultaneous affine transformations in the |J|-dimensional subspace between x i and x k would be performed. In terms of complexity, there would be no burden j decision variables are updated at once by means of a matrix product operation. However, in terms of performance, there would be no improvement because of two reasons. First, moving along many axes at once does not reduce the possibility of x i to remain in N (x * ) if x k ∈ N (x * ). Secondly, setting |J| > 1 in (3) has been shown to be not as good as |J| = 1 in later iterations due to the coarseness of the search when most of the solutions have converged to a single accumulation point [2].
In the following section, we present a method that provides a solution to the issues stated above, the Adaptive Decision Variable Matrix (A-DVM). A-DVM is a decision selection variable scheme proposed to the ABC.

A Novel Decision Variable Selection Mechanism
We propose a method for selecting decision variables efficiently without any additional memory nor simultaneous update of multiple components. The Adaptive Decision Variable Matrix (A-DVM) is an extension of the decision variable selection procedure of Mollinetti et al. [3]. It exploits the same modular nature as the Artificial Bee Colony (ABC), and thereby it can be integrated to the employed and/or onlooker bees phase without interfering with any additional steps of the original or any variant. To emphasize the difference between the A-DVM and Mollinetti et al. [3] deterministic selection, we briefly explain their proposition as follows.

Fully Deterministic Decision Variable Selection
The selection scheme proposed by Mollinetti et al. [3] is inspired by Cantor's Diagonalization argument used to prove the non-existence of bijection from the set of natural numbers to the set of real numbers [16,17]. Cantor's argument state that any binary square matrix T does not have the same column as the vector consisting of the complements of the diagonal elements of T. The authors extended this notion to generate new solutions x i in the solution set X. For any given problem, the deterministic decision variable selection arranges the solution set X into a R d×SN matrix: If A is a square matrix, the entries on the main diagonal are stored in an m-vector c = (a 11 , a 22 , . . . , a mn ) and undergo the update step. In general, the higher the number of solutions, the better the exploration of the search, and so SN > d holds. If A is wide, then vector c consists of entries on the main diagonal and the superdiagonals of A offset d units to the right. For instance, if A is a 2 × 6 matrix, then c will be: The vector c allows (3) to be performed simultaneously for all columns of A by means of a simple vector multiplication: where is the Hadamard product, Φ is a SN column vector of values sampled from U(−1, 1), z = (z 1 , . . . , z SN ) = c, and ψ(·) is a function similar to (4) defined as: Suppose in the matrix A of the above example that if f ( 2 )e 2 ) ≤ f (x 2 ), then c = (c 1 , c 2 , a 13 , a 24 , a 15 , a 26 ) . Thus, entries a 11 and a 22 are replaced with c 1 , c 2 in A, and the corresponding values of f are updated.
Lastly, a safeguard step is performed so that every decision variable of each candidate solution can be updated at least once before the algorithm termination. The last column x SN of A is moved to the first position and the remaining columns are shifted one position to the right. Referring to the example, the matrix A is now: This step ensures that every decision variable is updated by (8) every d iterations.
The results in Mollinetti et al. [3] indicate that eliminating the randomness in the choice of the decision variable in (3) boosted the performance of the original ABC in multimodal problems of up to 30 decision variables. However, it is observed that the diversity of solutions was compromised because local search was more emphasized over global search. From this result, we suppose that the bias towards local search brought by the fully deterministic parameter selection has yet not solved issue #1. In fact, if anything, the fully deterministic selection made it worse. Therefore, reintroducing a small degree of randomness while guaranteeing that every solution is chosen at some iteration is a step in the right direction to refocus global search.

A Self-Adaptive Decision Variable Selection Procedure (A-DVM)
Let us change the focus to a partially deterministic selection, and reintroduce an adaptive degree of randomness to the selection process based on the "spread" of solutions throughout the search space.
The variables x i j are chosen via a binary decision matrix. The goal of the A-DVM is not only to provide an acceptable solution to the issues discussed in Section 3, but to improve the overall performance of the state-of-the-art of ABC for the multimodal and high-dimensional problem of the form of (1).
The main piece of the A-DVM is the d × SN binary matrix P am that represents which x i j has been chosen to be updated by (3) or (8). The matrix P am is a composition of two matrices, P r , a binary matrix with a single 1 in each column, whose row is determined randomly according to a uniform distribution; and P d , a matrix with 0 or 2 in each entry generated by the fully deterministic scheme of Mollinetti et al. [3]. For example, P r and P d are matrices of the form: The matrix P am is the result of a composition of P r into P d . That means some solutions x i ∈ X have their j-th component randomly selected when updated by (3) or (8) while the rest have their j-th component chosen by the fully deterministic scheme. We write P am = βP r ⊕ αP d when β% of the columns of P am are from P r and the remaining α(= 1 − β)% are from P d . An example of P am based on the above example is as follows: The degree of how much P d is favored over P r is represented by the coefficient α that is iteratively adjusted as follows, to maintain a healthy diversity of solutions while balancing between local search and local search: where ∆ ∈ [0, 1] is the measure of the dispersion of the population at the current iteration and K 1 and K 2 are scaling parameters set to 0.3 and 0.7 in accordance to McGinley et al. [18]. Values of α close to 1 signify high population diversity and activate exploitation by the deterministic selection.
On the other hand, values close to 0 boost exploration using random selection. Because solutions in population-based algorithms tend to concentrate around accumulation points x ∈ X f ,S after a considerable amount of iterations [14], α is increased by a growth function ρ defined as follows, to intensify local search around x after t iterations: where γ is set to 0.01. The value of t is given by: where an acceptable value for λ t was empirically verified to be 0.1.
To ensure that every decision variable is chosen in at least every d iterations, we introduce a history H ∈ {0, 1} d that stores which columns of P d were put into P am , and give a chance to the remaining columns of P d to be contained in P am at the next iteration. We enforce a bound on the number of iterations that solutions are chosen by the fully deterministic selection to be no more than 3 5 K 1 and no less than 1 2 K 2 (refer to (10)). When the entries in H are all ones, H is reinitialized and the whole process runs again. The overall steps of the A-DVM are outlined in Algorithm 2.

Algorithm 2:
Steps of the A-DVM

The ∆ Dispersion Estimate
Estimating the dispersion of the solutions in the search space is specifically effective for population-based algorithms to deal with multimodal or high-dimensional problems. Measuring how far apart solutions in X are from each other is very helpful to guide them towards accumulation points or free them from local optima. Significant contributions related to this subject can be found in Ursem [19] and Back et al. [20] which introduced the Sparse Population Diversity (SPD) metric, a method for estimating the variation of the solution set by measuring the distance from each solution in relation to the centroid. McGinley et al. [18] proposed the Healthy Population Diversity (HPD), an extension of the SPD that introduces the concept of individual contribution to the computation of the centroid.
Metrics like SPD and HPD may accurately and inexpensively identify differences between the solutions in X by measuring the distances to each x i . However, this kind of measurement does not take into account how the solutions are distributed in the search space, which is problematic since the same measurement values from SPD and HPD may indicate different search-space coverage of solution of X. Because of that, we employ the ∆ dispersion measure introduced in Morrisson [4], initially proposed for Evolutionary Algorithms with binary solution encoding, and adapt it to continuous problems in the form of (1). Computation of ∆ is as follows: where ∆ 1 = 0.75 − S 1 , ∆ 2 = 1 − S 2 and S = S 1 + S 2 . The values of S 1 and S 2 are obtained by measuring the moment of inertia of the solution centroid in relation to each solution. We denote as P = |X| as the number of solutions x i ∈ X. The centroid cr j of the jth components and the moment of inertia I j of centroid cr j are The first measure S 1 involves a quantitative assessment of the solutions around the distribution centroid. Assuming the distribution around the centroid to be uniform, S 1 is where I U o represents the inertia of a uniform distribution: Measure ∆ 2 indicates how much the calculation of ∆ 1 is misleading when the distribution is not uniform in the search-space, since ∆ 1 only verifies non-uniformity along the principal diagonal of the search space. The second measure S 2 is where is the characteristic function that returns either 0 or 1 whether a solution belongs to c− or c+, respectively, and φ j is the range between [x j ,x j ], so that ∏ d j=1 φ j = 1 for an N-dimensional unit volume.

Remarks On Complexity
As for the complexity, SN function evaluations are done in each employed and onlooker bees phase, so the addition of A-DVM preserves the same 2n + 1 function evaluations per iteration as the classical ABC. The effort to compute the sum of moments of inertia and ∆ dispersion is proportional to the solution set size SN [4]. Updates of solutions (3) during the employed or onlooker bees is done one by one in a loop require O(n) time for the size SN of solution set X. On the other hand, offline update (8) can be done in a linear time due to the vector multiplication. We recommend (8) for parallel versions of the ABC, when MCN is large or the evaluation of f (x) is expensive.
Regarding lookup table H, it is verified in O(n) time which columns of P d were not chosen to be a part of P am . Lastly, about binary matrix P d , because the deterministic parameter selection extracts the diagonal of solution set X, it is recommended to set SN ≥ d to ensure that every decision variable of each solution is chosen in at most d iterations.

Experiment and Results
A numerical experiment was carried out to answer the following research questions: "Does incorporating the Adaptive Decision Variable Matrix (A-DVM) improve the Artificial Bee Colony (ABC) and its variants overall performance on multimodal problems?". To provide an answer to that question, we chose 15 instances of (1), each of which is designed to validate the capability of metaheuristics to handle multimodal and non-smooth objective functions. The instances are ranked in the top 30 hardest continuous optimization functions in the Global Optimization Benchmarks suite [21].
The number of variables ranges from 2 to 30 to test the robustness of the solvers when dealing with many as well as few variables. Each algorithm is executed 30 times with the same seed interchangeably in a random fashion to avoid bias in the machine load. The number of variables, the box constraint range [l j , u j ] and the global optimum of each instance are listed in Table 1.
Testing involves the incorporation of the A-DVM to the onlooker and employed bees phase of the following versions of the ABC: the original ABC from Karaboga [22] (ABC+A-DVM), two versions of the global best guided ABC (gbestABC) from Gao et al. [23] (GBESTABC+A-DVM, GBESTABC2+A-DVM) and two versions of the ABC-X from [2] for multimodal problems (ABC-XM1+A-DVM, ABC-XM2+A-DVM). The original counterparts were also used for the baseline (ABC, GBESTABC, GBESTABC2, ABC-XM1, ABC-XM5) together with the modified ABC for multidimensional functions (MABC) from Akay and Karaboga [8] and its version with the A-DVM (MABC+A-DVM). Comparison is not limited only to ABCs and variants, but popular population-based algorithms, such as the Particle Swarm Optimization from Kennedy and Eberhart [24], Evolutionary Particle Swarm Optimization by Miranda and Fonseca [25] and Differential Evolution (DE) [26], were also included in the experiment.
The stopping criteria for each algorithm was set to 10 5 function evaluations (FE's) or if the difference between the best value found so far and the global optimum f (x * ) is less than 10 −8 . The population size was common to all algorithms and fixed at 30. For PSO, the inertia factor (w 1 ) was set to 0.6 and both cognitive and social parameters (w 2 , w 3 ) to 1.8. For Differential Evolution (DE) [26] with best1bin strategy, F value was 0.5 and CR 0.9. For each version of the ABC: Lit = SN · d. For MABC, MR, SF and m were 0.5, 0.7, and 2.5% of maximum FE's, respectively. ABC-X parameters were Lit = 1.06 · d, maximum population of 66 and minimum of 15 for ABC-Xm1 and Lit = 0.83 · d, while for ABC-Xm5, maximum population of 78 and minimum of 17. Lastly, parameters γ and λ t of the A-DVM were set to 0.1.
The experiment was conducted in a machine with the following hardware configuration: Intel core i7-6700 "Skylake" 3.4 GHz CPU; 16 GB RAM DDR4 3200 clocked at 3000 MHz. The running operating system (OS) is UbuntuOS 18.04. All algorithms were written in the python 3 programming language. Floating point operations were handled by the numpy package, version 1.19.1. Tables 2-7 show the computational results obtained from this experiment. The statistics used for comparison are the mean, standard deviation, median, and best-worst results obtained from 30 runs with distinct random seeds shown in Tables 6 and 7. Statistical significance between pairs is verified by the Mann-Whitney U-test for non-parametric data, with confidence interval α set to 0.95 as shown in Tables 2-5. Entries where p > 0.05 denotes no statistical difference between the algorithms. For better legibility, the precision of decimals is set to 5 digits and values lower than 10 −6 are rounded to 0. Plots of the behavior of each algorithm are shown in Figures 1-3. Each line represents the mean of the best solution of all executions for each function evaluation call. All plots were log-scaled for better legibility. If the performance of any algorithm for a particular instance is statistically significant, it means that its p-value in the U-test is less than 0.05 in the pairwise comparison against all other algorithms. The bold numbers in the tables indicate the least value for that particular statistic and instance.              -    First, we discuss the Rosenbrock, Whitley and Zimmerman function instances where the A-DVM resulted in overall worse performance than all their original counterparts. The A-DVM was indeed able to guide the functions towards a valley, but a thorough local search mechanism was lacking due to the parabolic surface of the Rosenbrock function [27]. The same behavior is observed in the Whitley and Zimmerman functions, which share the same property as Rosenbrock instance. The poor results in these functions imply a failure of the A-DVM to properly address issues #1 and #2 discussed in Section 3. Additionally, we can relate this case to the no-free-lunch theorem of Wolpert [28], saying that no algorithm can be strictly better than the others in every problem instance. The inferior results of the A-DVM are also seen for the Rastrigin function in the ABC-X variants. The Cause of such behavior could be due to intensification of the local search mechanism that forced solutions to stay far from the local attractors of the surface of the functions.
Strong evidence of the robustness of the A-DVM against strongly multimodal surfaces was found in the Damavandi, DeVilliersGlasser02 and CrossLegTable instances, ranked as the three hardest functions in the benchmark suite [21]. Both functions feature large basins of attraction for bad local optima, the number of which is directly proportional to the problem dimensionality. There are two possible causes explaining why the A-DVM versions were not superior to all other versions in these particular instances. First, a small number of dimensions means that a square matrix can be built, providing a thorough exploration of the search space. Second, exploration in the early stages allowed solutions to escape from the basins of attraction.
Evidence that the A-DVM improved the search process in comparison to their counterparts without the search can be seen in the Bukin06, SineEnvelope, CrownedCross and Schwefel06. Although the ABCs with A-DVM were not the best solvers, their robustness was statistically significant in comparison to the versions without A-DVM. Lastly, in the Cola, Griewank, XinSheYang03 and Trefethen, no statistical significance that corroborated that the incorporation of the A-DVM improved or worsened the performance of the original algorithm was found.

Conclusions
In this paper, a decision variable selection scheme named Adaptive Decision Variable Matrix (A-DVM) was proposed to be incorporated in the Artificial Bee Colony (ABC) algorithm. A-DVM can be incorporated in both employed and or onlooker bees phases and can be used with any variant of the ABC. A-DVM attempts to balance exploration and exploitation throughout the execution of the algorithm by constructing an augmented binary matrix that represents the choice of components of solutions in the solution set. The binary matrix is composed of a deterministic selection binary matrix that chooses matrix diagonals according to the proposal in [3] and another binary matrix whose components were selected by a random uniform distribution. The number of columns to be used from the deterministic matrix is determined by a self-adaptive parameter that is based on the ∆ value, a measure of the sparsity of the actual solution set in the search space. Introducing a lookup table of chosen solutions of P d guarantees that every solution is a part of P d in the update step at least once before termination.
Effects of the A-DVM to the performance of the ABC is verified by a numerical experiment including several versions of the ABC with the A-DVM included and their original counterparts. Representative heuristics such as the Particle Swarm Optimization (PSO), Evolutionary Particle Swarm Optimization (EPSO) and Differential Evolution (DE) are included in the experiment to provide a baseline for the results. For the sake of brevity and to narrow the scope of this work, other prominent Swarm Intelligence (SI) algorithms suited to the multimodal family of problem, such as the monarch butterfly optimization (MBO) [29]; earthworm optimization algorithm (EWA) [30]; elephant herding optimization (EHO) [31]; and moth search (MS) algorithm [32], were not part of the experiment.
The results indicate that the A-DVM enhances the ability of the ABC to adapt to highly multimodal functions. However, the elimination of the full global search of the stochastic selection resulted in solutions not converging towards accumulation points that are located in basins, as seen in some instances where the A-DVM performed poorly. Integration with ABC variants with smart restart procedures in the scout bees phase may be a possible direction to improve this issue.
Future works include in-depth sensitivity analysis and integration of the selection mechanism to the state-of-the-art ABC used for optimization competitions and testing on large scale problems, mechanical design and power systems to further investigate the performance of the selection. Moreover, a thorough comparative study of multimodal problems using only SI algorithms including the aforementioned examples is due. Another research direction includes applying the proposed method for weight tuning of shallow networks [33,34]. Such networks may benefit from the proposed optimization mechanism since it tackles small sample size problems featuring rough objective function landscapes.