Nested Markov chain hyper-heuristic (NMHH): a hybrid hyper-heuristic framework for single-objective continuous problems

This article introduces a new hybrid hyper-heuristic framework that deals with single-objective continuous optimization problems. This approach employs a nested Markov chain on the base level in the search for the best-performing operators and their sequences and simulated annealing on the hyperlevel, which evolves the chain and the operator parameters. The novelty of the approach consists of the upper level of the Markov chain expressing the hybridization of global and local search operators and the lower level automatically selecting the best-performing operator sequences for the problem. Numerical experiments conducted on well-known benchmark functions and the comparison with another hyper-heuristic framework and six state-of-the-art metaheuristics show the effectiveness of the proposed approach.


INTRODUCTION
Optimization is an essential task not only in computer science but also in other research fields.Most processes can be described as optimization problems, where the best solution needs to be found from the set of all feasible solutions.
The literature contains many optimization algorithms and heuristics, some inspired by nature, others inspired by physics, iterative, and hybrid.Finding the appropriate approach is often problem-specific and can be tedious.Population-based methods may approximate the global optimum but at a high computational cost.Iterative methods converge to a local minimum faster but are highly dependent on the initial solution.
Recently, hyper-heuristic algorithms have been of huge interest, as they provide an automatic way of selecting or generating heuristics for unseen problems (see Ryser-Welch & Miller, 2014 for a review).These approaches can be thought of as the optimization of the optimization process.The selection or generation of heuristics yields a problem-specific optimization algorithm that in many cases performs better than a single standard heuristic.
Although several hyper-heuristic frameworks have been proposed, most of them are concerned with specific combinatorial optimization problems; only a few are designed to solve continuous numerical optimization problems.A research gap exists regarding hyper-heuristic frameworks that balance the exploration-exploitation rate and tune the operator parameters in an online fashion.As another research gap we can mention the lack of generality of the proposed hyper-heuristic frameworks, the majority of them are incorporating domain specific knowledge about a specific problem (for example Guerriero & Saccomanno, 2023).
The goal of this study is to propose a new hyper-heuristic framework and to present its advantages for single-objective continuous problems and the comparison with a well-known hyper-heuristic and six state-of-the-art metaheuristics.The novelty of our approach consists of introducing a nested Markov chain to the base level for the search for the best-performing heuristic operators and their sequences.Simulated annealing is used on the hyperlevel, which evolves the chain and the operator parameters.In our approach, the upper level of the Markov chain expressing the hybridization of global and local search operators and the lower level automatically selecting the best-performing operator sequences for the problem.The general formulation of the model allows the usage of other arbitrary operators, as well.Our model can be used to achieve good optimization results without the user having deep domain (problem specific) knowledge.The limitations of our approach are similar to other hyper-heuristics, finding the right operator configurations and balance can require many function evaluations.
The remainder of the article is organized as follows: the ''Related Work'' section describes the related work, the ''Proposed Model'' section presents the proposed framework, and the ''Numerical Experiments'' section describes the numerical experiments conducted.The article ends with conclusions and further research directions.

RELATED WORK
The literature proposes several hyper-heuristic classifications.Two main categories appear in Burke et al. (2010): selection-based and generation-based.Selection-based approaches pick the best-performing heuristics from an existing catalogue, while generation-based approaches design new algorithms from existing components and create problem-specific ones.Four categories of heuristic selection are mentioned in Chakhlevitch & Cowling (2008).Metaheuristic-based approaches employ genetic algorithms (Cowling, Kendall & Soubeiga, 2001), simulated annealing (Bai & Kendall, 2005), tabu search (Kendall & Hussin, 2005) or some other metaheuristic for the selection process.In Bándi & Gaskó (2023) on the hyperlevel, a simulated annealing algorithm is used, and on the base level a genetic algorithm, a differential evolution algorithm and a grey wolf optimizer.Random approaches employ uniform selection (Cowling & Chakhlevitch, 2003).Other approaches use reinforcement learning for adaptive selection.McClymont & Keedwell (2011) adapts a Markov chain that models heuristic sequences.Karapetyan, Punnen & Parkes (2017) uses the Conditional Markov Chain Search (CMCS) algorithm for the bipartite Boolean quadratic programming problem (BBQP).Greedy selection methods preliminarily evaluate all heuristics and choose the one that performs best at each step (Cowling, Kendall & Soubeiga, 2001).Oteiza, Ardenghi & Brignole (2021) presents a parallel cooperative hyperheuristic optimizer (PCHO), which is used to solve systems of nonlinear algebraic equations (with equality and inequality constraints).It uses a master-worker architecture with three algorithms on the worker level: GA, SA and PSO.
Since our proposed hyper-heuristic framework uses a hybridization of global and local search, we will present existing approaches in this category.
The use of local search algorithms is a straightforward direction in the study of hyper-heuristics but was used mainly for combinatorial optimization problems.Burke, Kendall & Soubeiga (2003) incorporates tabu search in hyper-heuristics for the timetabling problem.Turky et al. (2020) proposes a two-stage hyper-heuristic to control the local search and its operators; the framework is used for two combinatorial optimization problems.Hsiao, Chiang & Fu (2012) proposes a hyper-heuristic based on variable neighbourhood search, where local search is used and tested for four combinatorial optimization problems.Soria-Alcaraz et al. (2016) designs a hyper-heuristic based on an iterated local search algorithm for a course timetabling problem.Additionally, the reviews (Ryser-Welch & Miller, 2014;Drake et al., 2020) present several hyper-heuristic frameworks, such as HyFlex (hyper-heuristics flexible framework) for combinatorial optimization problems (Ochoa et al., 2012), or Hyperion (Swan, Özcan & Kendall, 2011) for the Boolean satisfiability problem.
In terms of continuous optimization, Oliva et al. (2022) proposes the HHBNO framework, a hyper-heuristic approach based on Bayesian learning for single-objective continuous problems.The framework evolves heuristic sequences by learning their interdependencies and estimates the best-performing heuristic distributions.In Tapia-Avitia et al. (2022), an artificial neural network is trained to identify patterns that can be used to learn the best-performing heuristic sequences.Cruz-Duarte et al. (2021) proposes a new framework for continuous optimization problems, where new sequences are designed with the help of different search operators.
Our proposed hyper-heuristic framework incorporates local and global search algorithms, which were mostly used for combinatorial optimization problems before.Another advantage of the proposed method consists in the general structure of the base level, which can be easily extended with other algorithms.At the same time the framework preserves the general nature, no domain specific knowledge is needed for the optimisation process.

PROPOSED MODEL
The structure of the proposed hyper-heuristic framework is presented in Fig. 1.The approach is based on two levels.The base level optimizes the problem, starting with a population of candidate solutions and a limited number of function evaluations for each candidate.The hyperlevel guides and improves the base level.The hyperlevel searches Figure 1 The structure of the hyper-heuristic framework.The base level is used to optimize the problem using a given number of function evaluations.On this level a genetic algorithm (GA) and differential evolution (DE) operator is used in the perturb category, an elitist selector (ES) in the select category, and in the refiner category the gradient descent (GD) and limited-memory Broyden, Fletcher, Goldfarb, Shanno algorithm (LBFGS) operators are used.The hyper level is used to guide and improve the base level.
Full-size DOI: 10.7717/peerjcs.1785/fig- 1 the algorithm space by finding the best-performing operator sequences and operator parameters via simulated annealing.The base level performs the optimization according to the operator sequence modelled by a nested Markov chain.The first layer models the transitions between perturb, selection, and refinement operators, and the second layer models the operator sequences of each category.
Our approach contains a genetic algorithm (GA) and a differential evolution (DE) operator in the perturb category and an elitist selector (ES) in the select category.It incorporates a refiner category containing the gradient descent (GD) and Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (LBFGS) operators that perform the local search.In this way, the base level is parameterized so that it can express a continuum between exploration and exploitation.Formal definition.A more formal definition of the parameterization of the operator sequence that the base level applies during optimization can be given in the following way.Let C denote the set of operator categories in the base level, and c ∈ C an operator category. Let denote the sequence of categories c i modeled by the Markov chain parameterized by the initial distribution and transition matrix π C ,P C and the sequence of operators o c j modeled by the Markov chain associated to category c parameterized by the initial distribution and transition matrix π c ,P c .
Then the sequence is called the operator sequence of the base level modeled by the nested Markov chain.
The set of all operator parameters associated to operators in category c is denoted by β c .The hyper level searches in the sequence and parameter space via simulated annealing using linear multiplicative cooling for the best performing base level configuration.
Hyper-heuristic optimization algorithm.At each step, a statistically significant number of base-level evaluations are performed and the performance metric is the median plus interquartile range of the costs.The next step in the design space is taken by perturbing the previous point by a normally distributed noise factor scaled according to the parameter bounds.The advantage of this formalization lies in the expressiveness of the base level as it allows the selection and combination of sets of operators that have different roles (exploration, exploitation, selection).
The model can perform well in high-dimensional settings, as it can iteratively find the equilibria between exploration and exploitation operators.The random initialization of the hyper-heuristic is presented in Algorithm 1.The hyper-heuristic search process using simulated annealing and the determination of the next simulated annealing step are detailed in Algorithm 2 and Algorithm 3. Algorithm 4 shows the base-level optimization procedure that is modelled by the Markov chain and operator parameters.Operators used.The GA operator is parameterized to allow it to express both arithmetic and one-point crossover; the mutation is carried out by adding a normally distributed noise factor as detailed in Algorithm 5.The DE/rand/1/bin scheme is used for the differential evolution operator; it is parameterized by the crossover rate and scaling factor.The local search operators are parameterized by the initial step size and the number of iterations performed.The LBFGS operator also exposes the c 1 ,c 2 parameters that control the step length in the line search phase.The selection operator uses the elitist strategy.Population evolution.All operators in the perturb category generate a new population of candidates.The operators in the refine category perform an iterated local search starting with these candidate solutions.The elitist selection operator then selects the best-performing points from the old and new populations to become the next generation.All perturbed points landing in the attraction basin of a better solution are selected into the next generation when the refiner operators iterate the points closer to these attractors and the ES operator selects them.

NUMERICAL EXPERIMENTS
The results of the numerical experiments conducted were compared to state-of-the-art metaheuristics and another recent hyper-heuristic that incorporates several state-of-the-art metaheuristic operators.

Benchmarks
For the numerical experiments, we used six well-known continuous benchmark functions: Rastrigin, Rosenbrock, Styblinski Tang, Schweffel 2.23, Trid, and Qing.The basic properties of the test functions are presented in Table 1.The dimensionality, convexity, separability, and multimodality of the functions were varied to assess performance in different settings.

Parameter tuning
The performance of the simulated annealing algorithm within the hyperlevel is sensitive to the initial temperature.The performance of the process in the case of the Rosenbrock function was assessed in various dimensions with varying initial temperatures and iterations.Figure 2 depicts these results.The plots show that having a higher initial temperature (10,000) yields improved, more robust results.Detailed results are presented in the Supplemental Files.• it uses the central force dynamic operator of the central force optimisation (CFO) (Formato, 2008) algorithm; • differential crossover and mutation operators from differential evolution (DE) (Storn & Price, 1997); • the genetic mutation and crossover operators from genetic algorithm (GA) (Whitley, 1994); • the spiral dynamic operator of stochastic spiral optimisation (SSO) (Cruz-Duarte et al., 2017); • the gravitational search operator of the gravitational search algorithm (GSA) (Rashedi, Nezamabadi-pour Saryazdi, 2009); • the swarm dynamic operator from particle swarm optimisation (PSO) (Kennedy & Eberhart, 1995); • the firefly dynamic operator from firefly algorithm (FA) (Gandomi, Yang & Alavi, 2011); • the random flight and search operators from random search (RS) and • uniform random sampling, and the local random walk operator from cuckoo search (CS) (Yang & Deb, 2013).
CUSTOMHyS uses simulated annealing on the hyperlevel and searches for a fixed-length operator sequence that is repeated.
The comparison with CUSTOMHyS was performed on the six benchmark functions, using the same number of function evaluations, base-level sample size, simulated annealing steps, population and problem size.The approaches were tested with each offspring being limited to 100 function evaluations to highlight the limitations that appear in costly optimization problems.The base-level performance sample size was fixed at 30 to ensure statistical significance.The number of simulated annealing steps was limited to 100.The population size was fixed at 30.Various problem dimensionalities were considered (5,50,100,500).The minimal median plus interquartile range of the performances at each simulated annealing step was considered the final performance of each method.This metric was chosen so that the performances were not affected by outliers.
We compare our results with the following state-of-the-art metaheuristics: 1.The slime mould algorithm (SMA) (Li et al., 2020) which is a biology-inspired metaheuristic that is based on the oscillation of slime mould; 2. the artificial ecosystem optimizer (AEO) (Zhao, Wang & Zhang, 2019) which is a system-based heuristic that mimics the behavior of an ecosystem of living organisms; 3. the battle royal optimizer (BRO) (Rahkar Farshi, 2020) which is a human-based metaheuristic that simulates a survival game; 4. the Archimedes optimization algorithm (ArchOA) (Hashim et al., 2020) which is a physics-inspired metaheuristic that imitates the phenomenon of buoyancy of objects immersed in a fluid; 5. the particle swarm optimizer (PSO) (Kennedy & Eberhart, 1995) which is a swarm-based metaheuristic that simulates the movement of particles and 6. the coral reef optimizer (CRO) (Salcedo-Sanz et al., 2014) which is a nature-inspired algorithm that simulates the growth of coral reefs.
The six metaheuristics were tested in similar settings, resulting in the same number of total function evaluations.This was achieved by having equal numbers of performance samples and hyper-heuristic steps; that is, each performance sample had the same size and total function evaluations as in a hyper-heuristic step.All populations were initialized uniformly within the problem bounds and were of the same size.
The experiments were performed on a system equipped with an Intel Core i7-9750H CPU, NVIDIA GeForce GTX 1660 Ti Mobile GPU, and 16 GB of RAM running Ubuntu 20.04.1 LTS.The 11.4 version of the CUDA Toolkit was used along with Python version 3.8.10.The results of the experiments are available and can be reproduced with the public NMHH implementation, which can be accessed on GitHub (https://github.com/BNandor/MatOpt/tree/main/NMHH).
Metaheuristic parameters.The NMHH operator parameter bounds used are as follows: the DE operator force F ∈ [0.4,0.7] and crossover rate c r ∈ [0.9,1], the GA operator one-point crossover rate and point c r ,cp r ∈ [0,1], the arithmetic crossover constant α ∈ [0,1]; the mutation rate and size m r ∈ [0,0.1],mσ ∈ [0,100], and the ratio of the parent pool p r ∈ [0.2,1].The GD and LBFGS initial step lengths were α ∈ [0.5,5].The evaluation limits were set to f GD ∈ [1,3], f LBFGS ∈ [6,10].The LBFGS memory was fixed to 5, and the step coefficients were The parameters of the state-of-the-art metaheuristics were set to those suggested by the MEALPY package.For the CRO, the rate of occupation was set to 0.4, the broadcast/existing rate (F b ) to 0.9, the duplication rate (F a ) to 0.1, the depredation rate (F d ) to 0.1, the maximum depredation probability (P d ) to 0.5, the probability of the mutation process (GCR) to 0.1, the mutation process factors gamma min to 0.02, gamma max to 0.2, and the number of attempts of a larva to set in reef (n trials ) to 5. For BRO, the dead threshold was set to 3. For the ArchOA, the factors were set to c 1 = 2,c 2 = 5,c 3 = 2 and c 4 = 0.5.The accelerations were set to acc min = 0.1 and acc max = 0.9.The AEO does not expose any parameters.The SMA probability threshold was set to 0.3.The local and global coefficient of the PSO was set to 2.05, the minimum weight to 0.4 and the maximum weight to 0.9.
For CUSTOMHyS, the suggested heuristic collection and parameters were used: The simulated annealing initial temperature (max_temperature) was set to 200, the temperature cooling rate (cooling _rate) to 0.05, the stagnation rate (stagnation_percentage) to 0.3 and the length of the operator sequence (cardinality) to 3. The population size (num_agents) was set to 30, and each offspring was limited to 100 iterations (num_iterations).The number of hyper-heuristic steps (num_steps) was limited to 100 with each step having a performance sample size (num_replicas) of 30.The suggested heuristic (default ) collection contains variations of twelve operators: random search, central force dynamic, differential mutation, firefly dynamic, genetic crossover, genetic mutation, gravitational search, random flight, local random walk, random sample, spiral dynamic and swarm dynamic.The parameters of all 205 variations can be found in the CUSTOMHyS implementation (https://github.com/ElsevierSoftwareX/SOFTX-D-20-00035,last accessed 12/11/2022) (Cruz-Duarte et al., 2020).

Results and discussion
Numerical results are presented in Table 2.The best results are highlighted in bold, and the Wilcoxon rank-sum statistical test was used for comparison.The results point to a considerable difference between the hyper-heuristic and standard metaheuristic approaches.
For the majority of problems (Qing, Rosenbrock, Styblinski Tang, Trid), NMHH outperformed all metaheuristics and CUSTOMHyS.In the case of Rastrigin and Schwefel 2.23, the SMA, AEO and BRO found the global minimum.For Rastrigin, both hyperheuristic approaches performed considerably worse than the metaheuristics, but for Schwefel 2.23, the NMHH approximated the solution several magnitudes better than CUSTOMHyS.NMHH found the best solution in 66% of test cases.These results indicate that in many cases our approach can outperform state-of-the-art metaheuristics and the investigated selection hyper-heuristic.Computational time.Table 3 shows the measured computational times for the Rosenbrock function in low-and high-dimensional settings for 10 runs.In the low-dimensional setting, both NMHH and CUSTOMHyS had times of the same order of magnitude, but in the high-dimensional setting, NMHH performs an order of magnitude better than CUSTOMHyS and has a lower variance.This shows that the NMHH implementation scales well with increasing dimensionality.
Evolved Markov chain.The ability of NMHH to adapt the used operator and operator category distributions to the problem is best seen in Figs. 4 and 5. Figure 4 depicts the best-performing operator category transition matrices and initial distributions of the upper layer in the case of Styblinski Tang and the Rosenbrock function.Figure 5 depicts the best-performing operator transitions.NMHH adapted the exploration-exploitation rate to the shape of the cost landscapes.The shape of Styblinski Tang favours a balance between exploration and exploitation as it contains many local minima.The transition matrices reflect that the best-performing distributions include both iterated local search and perturb operators.In 500 dimensions, the optimal optimization approach proved to be more balanced towards continuous refinement.NMHH adapted to the valley-shaped landscape of the Rosenbrock test function and evolved to use the iterated local search approach in the limited function evaluation setting.

CONCLUSION AND FURTHER WORK
Optimization plays an important role in computational tasks.Hyper-heuristics are a new paradigm for solving optimization problems as they can significantly improve numerical results.In this article, we propose a new hyper-heuristic framework (NMHH) with two main innovations: the use of a nested Markov chain to model complex distributions of operators and the search for the equilibrium between exploration and exploitation in this space by balancing the category of iterated local search against metaheuristic exploratory operators.Numerical experiments conducted on continuous benchmark problems in high dimensions confirm the effectiveness of the proposed approach.The results show that NMHH evolved operator sequences and found the exploration-exploitation rate that outperformed state-of-the-art metaheuristics and the CUSTOMHyS hyper-heuristic in 66% of the cases in the high-dimensional setting.

Algorithm 4
Base level optimization process { l x ,u x -problem bounds } { G -current generation of offspring } { G -next generation of offspring } { best (G) -cost of best performing offspring in G } { o c s -next operator state in the c category} { C s -next category state} { f e -objective function evaluation count } { f o -function evaluations required by operator o} for all x

Algorithm 5
Combined one point and arithmetic crossover for GA |G| { -number of offspring in G} n { -dimensionality of the objective function} α { -arithmetic crossover coefficient} c r { -crossover rate} m r { -mutation rate} m σ { -standard deviation of mutation distribution} cp r { -crossover point ratio} p r { -parent pool ratio} U

Figure 3
Figure 3 Evolution of the Rosenbrock, Styblinski Tang problems over time, comparing the CUSTOMHyS and two variants of the method: the presented SA based (NMHH) and the random based variant (Random NMHH).The evolution of the minimum median plus interquantile range is shown.Full-size DOI: 10.7717/peerjcs.1785/fig-3

Figure 4
Figure 4 The best performing operator category transition matrices for Styblinksi Tang and Rosenbrock functions.For Styblinksi Tang the alternation of perturb and selection operators with the occasional refinement performed best.In the case of the Rosenbrock function the evolved sequences initially perform perturbation and finish with constant refinement.Full-size DOI: 10.7717/peerjcs.1785/fig-4

Figure 5
Figure 5 The best performing operator transition matrices for Styblinksi Tang and Rosenbrock functions.While the perturbating operators alternate in the majority of cases, for refinement the gradient descent operator was preferred.Full-size DOI: 10.7717/peerjcs.1785/fig-5 Algorithm 1 Hyper-heuristic parameter initialization { l θ ,u θ -bounds of parameter θ } P C ← U (T ) {// random uniform transition matrix} π C ← U (π ) {// random uniform initial distribution} for all θ ∈ β c ,∀c ∈ C do θ ← U (l θ ,u θ ) {// random uniform parameter within bounds} f -function evaluation limit of offspring per step} { T -initial simulated annealing temperature} { α -linear multiplicative cooling coefficient} { h p -base level performance sample size} initialize chain and operator parameters (algorithm 1) t ← T f best ← ∞ for l ← 0;l < h l ;l ← l + 1 do P ,β ← mutation of P,β (algorithm 3) for s ← 0;s < h p ;s ← s + 1 do p l,s ← performance sample for P ,β (algorithm 4) end for apply the next operator } f e ← f e + f o Cs s {// track function evaluations} C s ← c, c ∼ P C,C s {//go to next category state from C s } o C s s ← o, o ∼ P C s ,o Cs s {// go to next state in P C s from o

Table 2 Results obtained for the six test functions.
The minimal median plus interquartile range is presented.Best ranking results according to the Wilcoxon rank-sum test are highlighted in bold.

Table 3 Average and standard deviation of the computational times measured in seconds for the Rosenbrock function.
Best results are highlighted in bold.

Table 4 Evolved operator sequences of CUSTOMHyS for Rosenbrock and Styblinksi Tang.
Genetic crossover 73 , spiral dynamic 180 , random search 172 100 Genetic crossover 117 , swarm dynamic 203 , random search 169 Differential mutation 19 , swarm dynamic 193 , genetic crossover 56 500 Random search 168 , local random walk 158 , genetic crossover 11 Genetic crossover 99 , random search 174 , genetic crossover 98 Evolved CUSTOMHyS operator sequences.Table 4 depicts the operator sequences that CUSTOMHyS evolved for Rosenbrock and Styblinski Tang and shows the index of the heuristic operator variation within the heuristic collection.