1 Introduction

Optimization runs through the whole human civilization. It is the behavior of people to improve existing things in order to solve practical problems and adapt to the current environment. The mission of global optimization is to find the optimal solution among all possible solutions to a particular problem. Generally, current optimization algorithms can be divided into two categories: deterministic and stochastic. Although deterministic algorithms have been mature in mathematical theory, their optimization effect and performance are poor when dealing with discontinuous and non-differentiable functions, and sometimes they cannot solve such problems. However, most engineering optimization problems with many local optimal values are discontinuous, non-differentiable, and even difficult to be expressed by mathematical models. Because of these conditions, more and more scholars have shifted their attention to stochastic optimization algorithms. Their notable feature is the introduction of randomness, which provides the possibility of jumping out of local optimality (Hoos and Stützle 2004). Hence, it is important to utilize stochastic optimization algorithms to obtain the optimal solutions of global optimization problems.

Swarm intelligence algorithm (Chakraborty and Kar 2017; Nayyar et al. 2018) is a kind of stochastic optimization technology, which is inspired by various behaviors of biological communities in nature and opens up a new way to solve the optimization problems. In nature, in order to make up for the capabilities of individual foraging and avoiding predation, many species form coordinated and amazing swarm intelligence behaviors through cooperation and competition among individuals. For example, the predation of wolves, the gathering and migration of birds, and the social behavior of bees and ants. Therefore, swarm intelligence optimization algorithms can be realized by studying the potential behaviors of individuals in the population and using mathematical modeling method to build the working mechanism of the population system, including the cooperation and competition among individuals within the population and the interaction between the population and the external environment.

The proposed swarm intelligence algorithms are classified into five categories in this paper:

  1. 1.

    Bird-based swarm intelligence algorithms. This type of swarm intelligence algorithm simulates a series of activities of birds, such as swarming, migration, courtship, reproduction, nesting, hatching and brooding. Particle Swarm Optimization (PSO) (Eberhart and Kennedy 1995) is the most famous and classical bird-based swarm intelligence algorithm. Other new algorithms of this category mainly include Crow Search Algorithm (CSA) (Askarzadeh 2016), Sooty Tern Optimization Algorithm (STOA) (Dhiman and Kaur 2019), Harris Hawks Optimization (HHO) (Heidari et al. 2019), African Vultures Optimization Algorithm (AVOA) (Abdollahzadeh et al. 2021), Aquila Optimizer (AO) (Abualigah et al. 2021a, b), Artificial Hummingbird Algorithm (AHA) (Zhao et al. 2022), etc.

  2. 2.

    Terrestrial animal-based swarm intelligence algorithms. The second category of swarm intelligence algorithm is inspired by the behaviors of terrestrial animals such as group division of labor, hierarchical leadership, hunting and hedging, and information communication. Some popular instances of this kind are Multi-Objective Artificial Sheep Algorithm (MOASA) (Lai et al. 2019), Red Deer Algorithm (RDA) (Fathollahi-Fard et al. 2020), Horse herd Optimization Algorithm (HOA) (MiarNaeimi et al. 2021), Zebra Optimization Algorithm (ZOA) (Trojovská et al. 2022), and Reptile Search Algorithm (RSA) (Abualigah et al. 2022).

  3. 3.

    Aquatic animal-based swarm intelligence algorithms. The behaviors of aquatic animals mainly include schooling, following, ingestion, breeding, encircling predation, and spawning migration, which constitute the important search phases in the algorithms of this category. A well-known case of aquatic animal-based swarm intelligence algorithms is whale optimization algorithm (WOA) (Mirjalili and Lewis 2016) that mimics the feeding behaviors of humpback whale. Some other currently proposed algorithms of this type are Manta Ray Foraging Optimization (MRFO) (Zhao et al. 2020), Marine Predators Algorithm (MPA) (Faramarzi et al. 2020), and Jellyfish Search (JS) optimizer (Chou and Truong 2021).

  4. 4.

    Insect-based swarm intelligence algorithms. Social insects refer to insects that live together in groups with members divided into several classes or types, each of which has a specific function. The inspired idea behind this category of algorithms reflects the social behaviors of insects, including group living with overlapping generations, fine division of labor, building defensive structures or nests, competition and cooperation, information sharing, recruitment and alarm, etc. Ant Colony Optimization (ACO) (Dorigo et al. 1996) is a typical representative of insect-based swarm intelligence algorithm. Dragonfly Algorithm (DA) (Mirjalili 2016), Beetle Antennae Search Algorithm (BSA) (Jiang and Li 2018), and Grasshopper Optimization Algorithm (GOA) (Mirjalili et al. 2018), are other examples of this category.

  5. 5.

    Microorganism-based swarm intelligence algorithms. Microorganisms include bacteria, viruses, fungi and some small protozoa, and microscopic algae. They are individually tiny and closely related to humans. The last category of swarm intelligence algorithms mimics the microorganism behaviors such as diffusion, foraging, reproduction, variation, invasion, infection, and phagocytosis. Some powerful microorganism-based swarm intelligence algorithms proposed in literature are Mushroom Reproduction Optimization (MRO) (Bidar et al. 2018), Slime Mould Algorithm (SMA) (Li et al. 2020), Coronavirus Herd Immunity Optimizer (CHIO) (Alweshah 2022), and so no.

Crow Search Algorithm (CSA) (Askarzadeh 2016), an outstanding described meta-heuristic, belongs to the category of bird-based swarm intelligence algorithm. A simple foraging behavior simulation structure considering tracking and hiding food is utilized in CSA to obtain the optimal solution in the search space. CSA has the advantages of easy to understand and implement, few control parameters, good universality and strong global search ability. Since its development, CSA has been widely used in many fields, including numerical optimization (Khalilpourazari and Pasandideh 2020; Necira et al. 2022; Gholami et al. 2021), feature selection (Sayed et al. 2019; Ouadfel and Abd Elaziz 2020), image processing (Upadhyay and Chhabra 2020; Fred et al. 2020), optimal power flow (Saha et al. 2017; Fathy and Abdelaziz 2018), economic load dispatch (Mohammadi and Abdi 2018; Spea 2020), cloud computing (Kumar and Vimala 2019; Kumar and Kousalya 2020), control engineering (Turgut et al. 2020), and chemical engineering (Abdallh and Algamal 2020).

However, CSA only uses randomly selected individuals to guide the search, and lacks the guidance of excellent individuals or elite individuals, which is easy to lead to low convergence accuracy and slow convergence speed of the algorithm. Up to now, many strategies and techniques have been proposed to improve the original CSA, manifested in the following aspects.

  1. 1.

    Control parameter improvement strategies. In CSA, the awareness probability AP and flight length fl are two important parameters to affect algorithm performance. dos Santos Coelho et al. (2016) proposed a modified CSA, which uses population diversity information and Gaussian distribution to adjust the control parameters. Cuevas et al. (2019) has proposed another variant of CSA by dynamically adjusting AP with the fitness quality of each candidate solution. Later in Makhdoomi and Askarzadeh (2020), an adaptive chaotic awareness probability AP was formulated to improve CSA’s efficiency. Also, Necira et al. (2022) designed a dynamic CSA (DCSA) with dynamic fl changes based on the generalized Pareto probability density function, and AP adjusted linearly over optimization process.

  2. 2.

    Search mode improvement strategies. In Jain et al. (2017), for solving high-dimensional optimization problems, Jain et al. balanced the exploration and exploitation of CSA by adding Lévy flight, experience factor and adaptive adjustment factor. Moghaddam et al. (2019) used crossover and mutated operators in CSA to improve performance and prevent it from falling into suboptimal regions. Zamani et al. (2019) introduced three new search strategies called neighborhood-based local search (NLS), non-neighborhood based global search (NGS) and wandering around based search (WAS) to improve the movement of crows in different search spaces.

  3. 3.

    Hybridization with other algorithms. This type of improved variants makes up for shortcomings of CSA by hybridizing the advantages of other algorithms, such as BCSA (Javaid et al. 2018) hybridized with bat algorithm, DECSA (Mahesh and Vijayachitra 2019) hybridized with dolphin echolocation, GWOCSA (Arora et al. 2019) hybridized with grey wolf optimization, SCCSA (Khalilpourazari and Pasandideh 2020) hybridized with sine cosine algorithm, and PSO-CSA (Farh et al. 2020) hybridized with particle swarm optimization.

From the above literature review, it can be concluded that although most of the current studies have improved CSA to some extent, there are still some drawbacks that should be noted, including single type of search guiding individual and search stage, inadequate adaptability to complex and high-dimensional problems, and insufficient balance between global exploration and local exploitation. This means that CSA still has room for further improvement. Hence, this paper’s aggregate purpose is to propose an enhanced crow search algorithm with multi-stage search integration (MSCSA) that can be suitable for different complex and high-dimensional optimization problems. The principal contributions and novelty of this study include:

  1. 1.

    Multiple search-guiding individuals are applied to balance the global search and local search capabilities of the algorithm. In this study, three mixed search-guiding individuals are introduced for gradual search guidance with respect to the evolution process.

  2. 2.

    Multiple search modes are designed to extend the search scope of the algorithm and increase the probability of finding optimal solution. Using different guiding individuals to lead the search, and formulating different flight length control parameters for each search mode, six new individual search modes are designed in this paper.

  3. 3.

    Proportion parameter of guiding individual and flight length parameter in each mode are designed to control the gradual transition from large-scale search in the early and middle iterations to small-scale mining in the later iterations.

  4. 4.

    Multiple search stages are integrated to achieve different search intensities and enhance the population diversity. Following the basic foraging principle of crows, combined with the multiple guidance individuals and position update modes mentioned above, this study extended the single search stage of the original CSA to three search stages, excluding the initialization stage.

  5. 5.

    The proposed MSCSA has good practicability and applicability. It is capable of solving different complex and high-dimensional problems.

The organization of the paper is as follows: Sect. 2 reviews the original CSA. The details of proposed MSCSA are presented in Sect. 3. In Sect. 4, a range of compared algorithms are applied to verify the effectiveness of MSCSA for solving complex and high-dimensional benchmark functions. The results are also analyzed in this section. Finally, Sect. 5 concludes the study and provides some future research directions.

2 Crow search algorithm

CSA is proposed based on the idea that crows store their redundant food in hiding places and bring it back when they need it. In CSA, each crow represents a solution (position) to the optimization problem. Assuming that the dimension of the solution space of the optimization problem is D, and the size of the crow population is N, the position of the ith crow at iteration t can be expressed as: \(X_{i}^{t} = (x_{i,1}^{t} ,x_{i,2}^{t} ,...,x_{i,D}^{t} )\), where \(i = 1,2,...,N\), \(t = 1,2,...,T_{\max }\), and \(T_{\max }\) represents the maximum iteration number. Each crow has a memory and can remember the hiding place of food, namely the best position obtained so far. Assume that crow i’s hiding place at iteration t is \(M_{i}^{t}\). In the foraging process, crows move in the environment and search for better food sources (hiding places).

The main steps of CSA are described as follows:

Step 1: Randomly initialize a crow population P, and take P as the initial memory M (hiding places) of crows.

Step 2: Crow i updates its position by randomly selecting another crow j and following it. The position is generated by Eq. (1).

$$ X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}l} {X_{i}^{t} + r_{i} \times fl_{i}^{t} \times (M_{j}^{t} - X_{i}^{t} ),} \hfill &\quad {\;r_{j} \ge AP_{j}^{t} } \hfill \\ {{\text{a}}\;{\text{random}}\;{\text{position,}}} \hfill &\quad {\;\;{\text{otherwise}}} \hfill \\ \end{array} } \right., $$
(1)

where \(r_{i}\) and \(r_{j}\) are random values between 0 and 1, \(fl_{i}^{t}\) is the flight length of the crow i at iteration t, and \({{AP}}_{j}^{t}\) is the awareness probability of being followed of crow j at iteration t.

Step 3: Calculate the fitness of crow i according to the new position, and update the memory of it as follows:

$$ M_{i}^{t + 1} = \left\{ \begin{array}{ll} X_{i}^{t + 1} ,\;f(X_{i}^{t + 1} ) &\quad {\text{ is better than }}f(M_{i}^{t} ) \hfill \\ M_{i}^{t} {,}&\quad {\text{otherwise}} \hfill \\ \end{array} \right., $$
(2)

where \( \, f(.)\) denotes the fitness value.

Step 4: Repeat the steps 2–3 for all crows until the termination conditions are met.

3 CSA with multi-stage search integration (MSCSA)

The simplicity of optimization is the main characteristic of CSA, but this also becomes the disadvantage of CSA because of its single search stage, namely random following mode or free flight mode. For some complex optimization problems, especially in the cases of high-dimensional functions and multi-modal problems, CSA may fall into the dilemma of evolutionary stagnation, and its solving performance declines sharply.

We introduced four strategies, including chaos, multi-OBL, multi-guidance, and multi-position-update to improve the performance of the original CSA. These strategies are embedded in the different optimization stages of MSCSA. The details of the proposed algorithm are described as follows according to its different stages.

3.1 Initialization with the hybridization of chaos and OBL

Population initialization is a critical stage of meta-heuristic algorithms as it can affect convergence accuracy and speed of the final results. Improving the ergodicity and quality of the initial solutions as much as possible is an effective method to generate excellent initial population.

In recent years, chaos theory and chaotic maps have been widely used to improve the performance of meta-heuristic algorithms (Chen et al. 2020a, b; Chen et al. 2020a, b; Lu et al. 2020). Due to the traits of randomness, unpredictability, non-repetition and ergodicity, chaos can conduct more thorough search at a higher speed than traditional probabilistic random search. Hence, in this study, the chaotic map has been utilized to initialize the population so that the problem space information can be extracted as much as possible to enhance the coverage of the search.

Moreover, opposition-based learning (OBL) (Tizhoosh 2005), as a new technique in the field of computational intelligence, has played a significant role in enhancing global search ability of algorithms (Rahnamayan et al. 2008). In order to expand the proportion of excellent individuals and population diversity, multiple OBL techniques constructed by different reference individuals are also used to generate the initial population.

The main steps of initialization that mixes chaos and OBL are described as follows.

Step 1: Use Tent map (Lu et al. 2020) to generate a chaotic sequence \(C = \{ C_{1} ,\;...,C_{i} ,...,C_{N} \}\), where \(C_{i} = (c_{i,1} ,...,c_{i,j} ,...,c_{i,D} )\) and \(c_{i,j} {\kern 1pt} (i \ge 2)\) is calculated as follows:

$$ c_{i,j} = \left\{ {\begin{array}{*{20}l} {\frac{{c_{i - 1,j} }}{0.7},} \hfill & {\quad c_{i - 1,j} < 0.7} \hfill \\ {\frac{10}{3}(1 - c_{i - 1,j} ),} \hfill & {\quad c_{i - 1,j} \ge 0.7} \hfill \\ \end{array} } \right., $$
(3)

Specifically, \(c_{1,j}\) is a random number between 0 and 1.

Step 2: The chaotic sequence is applied to the population \(P = \{ X_{1} ,\;...,X_{i} ,...,X_{N} \}\) generated by random method, and a chaotic population \(CP = \{ CX_{1} ,\;...,CX_{i} ,...,CX_{N} \}\) is then obtained through Eq. (4). Calculate the fitness of each individual in CP.

$$ CX_{i} = X_{i} \times C_{i} , $$
(4)

Step 3: For each individual in CP, use the following formula to calculate its opposition solution.

$$ OX_{i} = \left\{ {\begin{array}{*{20}l} {U + L - CX_{i} ,} \hfill & {\quad {\text{if}}\;r = 1} \hfill \\ {2CX_{{{{best}}}} - CX_{i} ,} \hfill & {\quad {\text{if}}{\kern 1pt} {\kern 1pt} {\kern 1pt} r = 2} \hfill \\ {2CX_{{{\text{mean}}}} - CX_{i} ,} \hfill & {\quad {\text{if}}\;{\kern 1pt} r = 3} \hfill \\ \end{array} } \right., $$
(5)
$$ CM_{{{{mean}}}} = \frac{1}{N}\sum\limits_{i = 1}^{N} {CX_{i} } , $$
(6)

where U and L represent the lower bound and upper bound of the optimization problem, respectively; \(CX_{best}\) and \(CX_{mean}\) are the best individual and mean individual (center position) of the population CP, respectively; r is a random integer from 1 to 3.

Step 4: For each opposition solution \(OX_{i}\), if \(f(OX_{i} )\) is better than \(f(CX_{i} )\), replace \(CX_{i}\) with \(OX_{i}\).

Step 5: Take the final population CP as the initial population P for MSCSA.

As can be seen from Eq. (5), three different OBL methods are applied to calculate the opposition solutions, including one traditional method based on lower and upper bounds, and two new proposed methods based on best individual reference point and mean individual reference point, respectively. Obviously, the multiple OBL methods can enhance the population quality and increase the diversity.

3.2 Free foraging stage

The free foraging means that crows conduct a random search around themselves to find a location closer to the food source. Two search modes with different intensities are proposed in the free foraging stage, namely the expanded search based on Lévy flight and the narrowed search based on standard normal distribution, shown as follows:

$$ X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}l} {X_{i}^{t} + fl \times w \times {{sign}} (R - 0.5) \times {\text{Levy}} (D),} \hfill & {\quad {\text{if}}\;{\kern 1pt} r_{{{c}}} < 0.5} \hfill \\ {X_{i}^{t} + fl \times w \times {{randn}} (D),} \hfill & {\quad {\text{otherwize}}} \hfill \\ \end{array} } \right., $$
(7)
$$ w = 1 - \frac{1}{{1 + {{e}}^{{ - 10\left( {\frac{t - 1}{{T_{\max } - 1}} - 0.5} \right)}} }}, $$
(8)

where R is a random vector by size D with each element between 0 and 1; rc is a random number between 0 and 1; sign(.) means a sign function with value in {− 1, 0, 1} for evaluating each dimension of a vector; Levy(D) is a Lévy flight function that can generate a D-dimensional vector; the function randn(D) represents generating a D-dimensional vector from the standard normal distribution; and \(w\) is a Sigmoid-based function that acts as the flight length control parameter with the feature of decreasing nonlinearly and smoothly from 1 to 0, shown in Fig. 1.

Fig. 1
figure 1

Change curve of w

The description of Lévy flight function with one dimension is as follows (Yang 2010):

$$ {\text{Levy}} (x) = \frac{u \times \sigma }{{\left| v \right|^{{\frac{1}{\beta }}} }},{\kern 1pt} \quad \sigma = \left( {\frac{\Gamma (1 + \beta ) \times \sin (\pi \times \beta /2)}{{\Gamma \left( {\frac{1 + \beta }{2}} \right) \times \beta \times 2^{{\frac{\beta - 1}{2}}} }}} \right)^{{\frac{1}{\beta }}} , $$
(9)

where \(u\), \(v\) are both random values obeying standard normal distribution, \(\beta\) is a constant with value of 1.5.

Since the random position offset produced by Lévy flight or standard normal distribution is small in most cases, the free foraging search concentrates on local exploitation, that is, it is very beneficial for individuals to carry out fine excavation in their adjacent regions. However, Lévy flight occasionally generates some random walks of large steps during optimization, this search mode is also helpful for enhancing the capability of jumping out of local optima. Furthermore, as can be seen from Eq. (7), each crow does not need the information of other individuals in the population when updating its position, which can avoid premature population convergence and enhance the population diversity. Finally, the introduction of flight control parameter w can accelerate the convergence speed of the algorithm.

3.3 Following stage

After the crows accomplish their free foraging, the population will distribute in some high-quality areas of the search space. At this time, each crow can move to a promising position (better food hiding place) by following the excellent guiding individuals. The following stage is also achieved through two different intensities of search, depicted in the following formula:

$$ X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}l} {X_{i}^{t} + fl \times R \times (G_{1} - X_{i}^{t} ),} \hfill & {\quad {\text{if}}\;\;{\kern 1pt} r_{{{c}}} < 0.5} \hfill \\ {X_{i}^{t} + fl \times r \times (G_{2} - X_{i}^{t} ),} \hfill & {\quad {\text{otherwise}}} \hfill \\ \end{array} } \right., $$
(10)
$$ G_{1} = (1 - w) \times M_{{{{best}}}}^{t} + w \times M_{j}^{t} , $$
(11)
$$ G_{2} = (1 - w) \times M_{{{{best}}}}^{t} + w \times M_{{{{mean}}}}^{t} , $$
(12)

where G1 and G2 denotes the guiding individuals performing extended and narrowed searches, respectively; \(M_{{{{best}}}}^{t}\) is the best hiding place of all crows at iteration t; \(M_{{{{mean}}}}^{t}\) is the mean value of all hiding places at iteration t; \(M_{j}^{t}\) is the hiding place of crow j by random selection at iteration t; r is a random number lying in (0,1); the meanings of R, rc are the same as described in Sec. 3.2; and w is used to control the proportion of the components in the guiding individuals, whose calculation is shown in Eq. (8).

The search of this stage focuses on global exploration. On the one hand, as can be seen from Eq. (10), under the guidance of G1 or G2, the current individual can carry out a large range of movement, due to the big gap between the guiding individual and the current individual. In particular, in the early stage of evolution, individuals have a wider range of search because there is good diversity in the population and the difference between the guiding individual and the current individual is more pronounced. On the other hand, as the evolution proceeds, the global search intensity of the algorithm decreases. This is because the diversity of the population gradually deteriorates, and G1 and G2 gradually converge to the best hiding place \(M_{{{{best}}}}\), thus the whole population gradually approaches the optimal solution.

The first search mode guided by G1 conducts more intense global search with comparison to the second mode guided by G2. There are two reasons for this. First of all, the randomly chosen hiding place is introduced to form G1, which results in each crow having a different guiding individual. But G2 is formed by the hybridization of the best hiding place and the mean hiding place, which is the same for each individual over the course of an iteration. Hence, the first mode (expanded search) allows individuals search more dispersedly than the second (narrowed search). Furthermore, in the first mode, the flight length control parameter R is a D-dimensional vector that causes the position offset to have a different flight length factor in each dimension. While in the second mode, the flight length control parameter r is a scalar such that the position offset is multiplied by the same coefficient for each dimension. Therefore, individuals in the first mode can obtain a larger scale of position change than in the second mode.

3.4 Large-scale migration stage

After the above two stages, crows have been able to conduct a thorough search of an area. However, if this area continues to be searched, the population will fall into local optima and the diversity will gradually deteriorate. This is especially true in later stages of evolution. Thus, for the sake of improving the diversity of the population and prevent the evolutionary stagnation, a large-scale migration stage is proposed to help the population move to a new search region and recover the population diversity. This search phase consists of two position-update modes, corresponding to different search strength, detailed below:

$$ X_{i}^{t + 1} = \left\{ {\begin{array}{*{20}l} {M_{{{{best}}}}^{t} + fl \times w \times (2R - 1) \times G_{3} ,} \hfill & {\quad {\text{if}}{\kern 1pt} {\kern 1pt} {\kern 1pt} r_{{{c}}} < 0.5} \hfill \\ {M_{{{{best}}}}^{t} + fl \times {{sign}} (R - 0.5) \times ({{e}}^{ - \lambda \times w} - 1) \times {{abs}} (G_{4} - X_{i}^{t} ),} \hfill & {\quad {\text{otherwise}}} \hfill \\ \end{array} } \right. $$
(13)
$$ G_{3} = (1 - w) \times M_{{{{best}}}}^{t} + w \times X_{{{{rand}}}} , $$
(14)
$$ G_{4} = G_{2} , $$
(15)

where G3 and G4 represent the two guiding individuals that are applied to construct the migrated positions, and the calculation of G4 is consistent with that of G2; \(X_{{{{rand}}}}\) is a randomly generated individual (position); \(\lambda\) is a D-dimensional random vector in the range of 0 to 1; abs(.) is a function that evaluates the absolute value of each dimension of a vector. R, rc are the same as described in Sect. 3.2; and w is used to control the flight length in Eq. (13) and component proportion in Eqs. (14) and (15), respectively.

For the first migration mode (extended migration), \(\left| {w \times (2R - 1)} \right|\) is the flight length control parameter, whose figure for one dimension is illustrated in Fig. 2. And the flight length control parameter of the second migration mode (narrowed migration) is \(\left| {{{e}}^{ - \lambda \times w} - 1} \right|\), whose figure for one dimension is displayed in Fig. 3. Obviously, the curve shapes of control parameters for both modes are basically the same. However, the range of control parameter in the extended mode is larger than that in the narrowed mode, and the curve reduction speed is also faster than that in the narrowed mode.

Fig. 2
figure 2

Flight length control parameter of the extended migration

Fig. 3
figure 3

Flight length control parameter of the narrowed migration

The large-scale migration stage concentrates on helping the population jump out of the local optima and recovering the population diversity. Most importantly, as can be seen from Eq. (13), the base vector used to calculate the migration position is independent of the current individual. That is to say, individuals can migrate to a position quite different from their current positions. Moreover, in the extended migration, the guiding individual G3 is built by mixing the best hiding place \(M_{{{{best}}}}^{t}\) and the randomly generated individual \(X_{{{{rand}}}}\) that is unrelated to the current population. In the early and middle stages of evolution, \(X_{{{{rand}}}}\) plays a major role in G3, so the migration position produced by \(M_{{{{best}}}}^{t}\) and G3 for each individual can be far away from its previous position. As for the narrowed migration, since the current individual participates in the calculation of position offset and smaller value of control parameter, the resulting range of the migration positions is weaker than that of the first mode.

Meanwhile, the design of this stage also takes into account the solution quality and convergence speed. Although the migration operation can effectively enhance the population diversity and help the population jump to a new area, exorbitant diversity or poor quality of the new area will lead to slow convergence of the algorithm and even worse convergence accuracy. Hence, two measures have been proposed to deal with these problems. The first measure is that the migration position fuses the valuable information of the so-far-best solution, due to the base vector \(M_{{{{best}}}}^{t}\), which makes the migrated population of good quality. Another measure is that, with the continuous iteration of the algorithm, the migration position of each individual gradually converges to \(M_{{{{best}}}}^{t}\), which can enhance the convergence accuracy and speed of the algorithm. To be specific, this lies on w that can decrease smoothly and nonlinearly from 1 to 0. With the assistance of w, the guiding individuals G3 and G4 transform from random individual \(X_{{{{rand}}}}\) and mean individual \(M_{{{{mean}}}}^{t}\) to the best individual \(M_{{{{best}}}}^{t}\), and the migration position also transforms from the region far away from \(M_{{{{best}}}}^{t}\) to the vicinity of \(M_{{{{best}}}}^{t}\). Thus, this stage realizes the transition from large-scale exploration in the early and middle stages to small-scale exploitation in the later stage, which not only improves the convergence precision, but also accelerates the convergence speed.

3.5 Procedure and overall analysis of MSCSA

The proposed MSCSA procedure is represented in Algorithm 1, and its flowchart is shown in Fig. 4.

Fig. 4
figure 4

Flowchart of MSCSA

figure a

In view of the previous detailed description of each stage of the algorithm and the above algorithm process, the overall optimization of the algorithm follows the principle of achieving the good trade-off between global exploration and local exploitation, and is embodied in the following three different levels of balances (from top to bottom, from macro to micro).

Search-stage level Specifically, the initialization is devoted to improving the ergodicity and quality of the original population. The free foraging stage mainly conducts local search for enhancing the solution accuracy through individual fine excavation in their own neighborhood. The global search is performed in the following stage to expand the search space as much as possible by following each other among individuals. And the last stage large-scale migration concentrates on greatly increasing the population diversity and helping the population jump out of the local optima by transferring the population to a new potential region. Apparently, each of these stages has its own characteristics and is good at different intensities of search, so as to form complementary advantages. They jointly achieve the balance between global exploration and local exploitation in the whole search process of the algorithm.

Position-update level There are two position-update modes, extended and narrowed, of varying intensities in each search phase. The extended mode and narrowed mode can cover different search ranges and form different search fineness, which diversifies the population and increases the probability of finding better solutions. Combining these two modes in one search phase also sticks to the principle of giving consideration to large-range coarse-grained exploration and small-range fine-grained exploitation.

Control-parameter level The proportion control parameters can make the guiding individuals converge to the best individual (the best hiding place of the population). Under the lead of the guiding individuals, with the changes of flight length control parameters, the search area of each individual will gradually shrink from a large area far away from itself to the nearest area of the best individual or its own neighborhood. Therefore, even at the lowest micro level, the design of the algorithm still reflects the process of transition from large-scale search in the early and middle iterations to small-scale mining in the later iterations.

3.6 Computational complexity

All the algorithms involved in this study utilize max function evaluations (maxFEs) to determine the timing of algorithm termination. For analysis purpose, the number of iterations is used to figure out the time complexity, which is then converted to the one based on maxFEs.

Generally, the time complexity of an algorithm mainly consists of three parts: initialization processes, fitness function evaluation, and updating of solutions. Let the population size be N, the problem space dimension be D, and the maximum number of iterations of the algorithm be \(T_{\max }\). The computational complexity of the calculation of fitness (one function evaluation) is set to be \(O(f)\). In the initialization stage, the time complexity of generating chaotic sequence is \(O(ND)\). The time complexity of producing a chaotic population is \(O(ND)\), and so is the opposition population. The time complexity of obtaining the fitness of these two populations is \(O(2Nf)\). Hence, the total time complexity of initialization stage is \(O(3ND + 2Nf)\). Each iteration of the algorithm consists of three search stages. Each stage generates a new population, and carries out the population evaluation. After \(T_{\max }\) iterations, the time complexity is \(O((3ND + 3Nf)T_{\max } )\). Thus, the time complexity of MSCSA is \(O(3ND + 2Nf + (3ND + 3Nf)T_{\max } )\). Apparently, \(2N + 3NT_{\max } = {{maxFEs}}\), but \(2N\) is much less than \(3NT_{\max }\), so \(3NT_{\max } = {{maxFEs}}\) can be concluded. Replacing \(T_{\max }\) with maxFEs, the time complexity can finally be expressed as \(O(3ND + 2Nf + {{maxFEs}}(D + f))\). Applying the same method, the time complexity of the original CSA can be obtained as \(O(ND + Nf + {{maxFEs}}(D + f))\). Obviously, the time complexity of MSCSA is only slightly greater than that of CSA, mainly because chaos and OBL techniques are used to generate the initial population.

In addition, the time complexity of other algorithms can be calculated in a similar way. The time complexity of all comparison algorithms is shown in Table 1. Pm in the time complex of GA represents the mutation probability. As can be seen from the table, the time complexity of these algorithms can be considered the same, namely \({{maxFEs}}(D + f)\). However, due to the different position-update modes and control parameters used by different algorithms, the actual computational time varies greatly. This point is fully verified in the comparison test of algorithm computational time.

Table 1 Time complexity of the algorithms

4 Experimental results and analysis

In order to comprehensively evaluate the performance of the proposed MSCSA, two types of experiments are conducted in this study. MSCSA is first tested by CEC 2017 (Awad et al. 2016) special session benchmark functions for solving complex global optimization problems with dimensions of 30, 50, and 100. And then, the effectiveness of MSCSA is estimated for solving complex and high-dimensional global optimization problems through benchmark functions CEC 2010 (Tang et al. 2009) with dimensions up to 1000.

In all experiments, the performance of MSCSA is compared with some state-of-the-art meta-heuristic algorithms, including original CSA (Askarzadeh 2016), Sine–cosine CSA (SCCSA) (Khalilpourazari and Pasandideh 2020), Dynamic CSA (DCSA) (Necira et al. 2022), Improved CSA (ICSA) (Gholami et al. 2021), Whale Optimization Algorithm (WOA) (Mirjalili 2016), Harris Hawks Optimization (HHO) (Heidari et al. 2019), Sine Tree-Seed Algorithm (STSA) (Jiang et al. 2020), Arithmetic Optimization Algorithm (AOA) (Abualigah et al. 2021a), Aquila Optimizer (AO) (Abualigah et al. 2021b), Reptile Search Algorithm (RSA) (Abualigah et al. 2022), Genetic Algorithm (GA) (Bonabeau et al. 1999), and Particle Swarm Optimization (PSO) (Kennedy and Eberhart 1995). Among these algorithms, DCSA, SCCSA, and ICSA are newly developed variants of CSA; WOA, HHO, AO, and RSA are powerful swarm intelligence algorithms; STSA is an improved TSA (Kiran 2015) of evolutionary algorithm; AOA is another recently proposed physics-based algorithm; GA and PSO are two classical meta-heuristic algorithms. Hence, the compared algorithms cover various types of meta-heuristic algorithms, and the effectiveness of MSCSA can be fully verified by comparing with them.

In particular, for algorithms CSA, DCSA, SCCSA, ICSA, STSA, GA, and PSO, we implemented them by MATLAB coding according to the corresponding papers. And for algorithms WOA, HHO, AO, AOA, and RSA, since the authors provided the MATLAB implementation, we downloaded the codes according to the links given in the corresponding papers and used them for experiments.

4.1 Benchmark functions

The benchmark functions of CEC 2017 can be divided into four categories: unimodal, simple multi-modal, hybrid and composition. The unimodal functions (F1–F3) with a unique global optimum can reveal the exploitation capabilities of different algorithms, while the multi-modal functions (F4–F10) possessing lots of local optima can disclose the exploration and premature convergence avoidance capabilities of compared optimizers. As for hybrid functions (F11–F20) and composition functions (F21–F30), owing to their traits of maintaining the continuity around the local and global optima, they are quite suitable for detecting the balance between global and local search abilities and escape potential of local optima.

Generally, with the increase of dimension of global optimization problems, the search space increases exponentially. Meta-heuristic algorithms often suffer from the curse of dimensionality when solving large-scale problems. Thus, the high-dimensional benchmark functions of CEC 2010 with dimensions 1000 are applied to investigate the performance of the proposed MSCSA. This test suite consists of twenty functions with different features: unimodal, multi-modal, shifted, separable, fully nonseparable and scalability in the different range space.

4.2 Experimental setup

All algorithms are implemented in MATLAB R2018b and simulations are tested on Core i7 processor with 2.4 GHz and 16 GB main memory in windows 11. The maximum number of function evaluations (maxFEs) is used as the condition to terminate the algorithms, and its value is set to 20,000. For all following experiments population size is set to 20, and for each function, algorithms are run 30 times independently.

In MSCSA, only the flight length fl needs to be set and its value is set to 2. For a fair comparison, the other specific parameters for compared algorithms are obtained from their corresponding papers, as shown in Table 2.

Table 2 Parameter settings of the algorithms

4.3 Numerical performance evaluation

In this subsection, the performance of MSCSA is comprehensively and thoroughly evaluated by CEC 2017 and CEC 2010 benchmark functions. The experimental results are shown in Tables 3, 4, 5, 6, 7 where the average (Avg) and standard deviation (Std) of the best obtained fitness in each run are applied to estimate the compared algorithms, and the minimum values are recorded in bold.

Table 3 Comparison results on unimodal functions of CEC 2017
Table 4 Comparison results on multimodal functions of CEC 2017
Table 5 Comparison results on hybrid functions of CEC 2017
Table 6 Comparison results on composite functions of CEC 2017
Table 7 Comparison results on high-dimensional functions of CEC 2010 with dimensions 1000

4.3.1 Evaluation of local exploitation capability

The unimodal functions are suitable to evaluate the exploitation capability of the algorithms. Table 3 discloses the optimal values gained by MSCSA and other competitors for unimodal functions F1–F3 with dimensions 30, 50 and 100. It can be seen that MSCSA is superior to the compared algorithms except on F3 with dimensions 50 and 100.

4.3.2 Evaluation of global exploration capability

The simple multimodal functions F4–F10 in CEC 2017 have one global optimum and several local optima. As the dimension increases, the number of local optima increases exponentially. Thus, these functions are very suitable for evaluating the global exploration capability of algorithms. Table 4 shows the results of the efficiency of the proposed MSCSA and other compared algorithms in different dimensions 30, 50 and 100. Except F6 and F10 of dimension 30, MSCSA is superior to other algorithms.

4.3.3 Evaluation of capability to escape from local optima

The capability of an algorithm to jump out of local optima mainly depends on the maintenance of population diversity and the potential quality of the region to jump to. Thus, the power of proposed MSCSA to avoid falling into local optima and balance the global search and local search is estimated by hybrid functions and composite functions in Tables 5 and 6, respectively. As the results shown in Table 5, MSCSA outperforms other compared algorithms in most cases. Although the convergence accuracy of MSCSA is inferior to that of the optimal algorithm in low-dimensional F13, F14, F15 and F18 functions, it regains the advantages in high-dimensional cases. And for the results in Table 6, MSCSA performs better than other algorithms except on F22 with dimension 30, F27 with dimensions 50 and 100, F29, and F30 with dimensions 30 and 50. All in all, the ability of MSCSA to escape from local minima is better than the competitors, and it is very suitable for solving high-dimensional problems.

4.3.4 Evaluation of large-scale global optimization

In order to investigate the search ability of solving high-dimensional complex problems, the global optimization performance of the proposed algorithm is further evaluated by using the CEC 2010 benchmark with 1000 dimensions. The obtained results are recorded in Table 7, which proves that in most cases, MSCSA is more efficient than the comparison algorithms while handling large-scale problems.

4.4 Convergence analysis

In this section, the convergence features of various algorithms on different test suites and dimensions are compared and analyzed. For each category of CEC 2017, several functions are chosen as representatives: F1–F3 for unimodal, F5, F7, and F9 for simple multi-modal, F12, F16, and F19 for hybrid, F23, F26, and F30 for composite. Functions F1–F3, F5–F7, F9–F11, F13–F15, F18–F20 are selected to represent CEC 2010. Figures 5, 6, 7, 8 and 9 show the convergence curves of the involved algorithms for solving these functions with different dimensions. In CEC 2017 test suite, except F3 with dimensions 50 and 100, F19 with dimension 50, and F30 with dimensions 30 and 50, MSCSA converges to more accurate solutions at a faster rate than other algorithms. As for CEC 2010 test suite, except F2, F3, F10, and F15, the convergence of MSCSA is better than compared algorithms.

Fig. 5
figure 5

Convergence curves of MSCSA and other algorithms for unimodal benchmark functions

Fig. 6
figure 6

Convergence curves of MSCSA and other algorithms for multi-modal benchmark functions

Fig. 7
figure 7

Convergence curves of MSCSA and other algorithms for hybrid benchmark functions

Fig. 8
figure 8

Convergence curves of MSCSA and other algorithms for composite benchmark functions

Fig. 9
figure 9

Convergence curves of MSCSA and other algorithms for CEC 2010 functions (D = 1000)

4.5 Non-parametric statistical analysis

In addition to the above experimental evaluations, non-parametric statistical analysis is utilized to further validate the overall performance of MSCSA. Hence, in this section, Wilcoxon signed rank test (Derrac et al. 2011) and Friedman test (García et al. 2010) are applied as non-parametric methods to experiment on CEC 2017 and CEC 2010 benchmark functions.

The Wilcoxon signed rank test is used to detect the significant differences between the results of two algorithms. The symbols “ + , − , = ” indicate the times that MSCSA is superior, inferior, or equal to the compared algorithms, respectively. As shown in Table 8 of CEC 2017 benchmark, MSCSA is inferior to the comparison algorithms only on no more than five functions, while in all other cases, it is superior to or equal to its competitors. With the increase of function dimension, MSCSA still maintains obvious advantages. In Table 9 of CEC 2010 benchmark, in most cases, MSCSA is superior to others. In fact, MSCSA fails to achieve optimal performance for only five functions. Moreover, the p values of the comparisons for various dimensions 30, 50, 100, and 1000 are less than 0.05, which verify the superiority of the proposed MSCSA according to statistics.

Table 8 Wilcoxon signed rank test for CEC 2017 functions
Table 9 Wilcoxon signed rank test for CEC 2010 functions

The Friedman test can be used for multiple comparisons among several algorithms by computing the ranks of the observed results. As shown in Tables 10 and 11, the mean rank of the MSCSA algorithm is the smallest in any dimension. Therefore, MSCSA ranks the first among all algorithms, and is significant different from the others.

Table 10 Friedman test for CEC 2017 functions
Table 11 Friedman test for CEC 2010 functions

4.6 Computational time

The computational times of the compared algorithms are listed in Table 12 for CEC 2017 and Table 13 for CEC 2010, respectively. These times are the mean running times of benchmark functions with the same dimensions.

Table 12 Computational times of CEC 2017 functions
Table 13 Computational times of CEC 2010 functions

In CEC 2017, MSCSA ranks seventh among all algorithms in any dimension. Regardless of the dimension of the function, it takes slightly longer than CSA, less than 0.1 s to run. As in CEC 2010 with the dimension up to 1000, MSCSA ranks fourth, which indicates that it has better optimization efficiency in solving high-dimensional problems. Moreover, as described in Sect. 3.6 on the time complexity of algorithms, the time complexity of all algorithms can be summed up to be consistent, but the actual running time varies greatly. The reason for this phenomenon is that the control parameter construction and individual position updating methods of different algorithms are quite different.

4.7 Overall analysis of the improvements

Based on the results for complicated and high-dimensional global benchmarks, the effectiveness of the MSCSA is enhanced relative to the original CSA and several state-of-the-art methods. As stated in Sects. 3 and 4, the main reason is that the proposed multi-stage search integration structure can achieve multi-level and multi-granularity balances between exploratory and exploitative propensities throughout iterations. These balances are accomplished by different mechanisms designed in MSCSA. In the initialization stage, chaos-based and OBL techniques improve the coverage and quality of the initialized population. In the free foraging stage, the random flights strengthen the exploitation capability of the algorithm, which leads to enhance the quality of the solutions. Also, the mutual following of individuals is conducive to the information exchange of the population, which assists MSCSA in expanding the search space for exploration. The proposed MSCSA also benefits from the large-scale migration stage for greatly increasing the population diversity and jumping out of local optima by moving to a promising search area. In addition, the parameter-control strategy is proposed in MSCSA, which can slowly control the component proportion of guiding individuals and flight length of individuals so as to gradually change the search intensity from the beginning extensive search to the focused local search as the iteration proceeds.

5 Conclusions and future directions

This study first proposes a multi-stage search structure, and then incorporates chaos, multi-OBL, multi-guidance, and multi-position-update strategies into the structure to improve the performance of the original CSA. The proposed MSCSA possesses three different levels of balances between global exploration and local exploitation, namely the search-stage level, the position-update level and the control-parameter level. These balances significantly improve the diversity of the population and the ability of the algorithm to jump out of local optima. Testing a comprehensive set of complex and high-dimensional benchmark functions from CEC 2017 and CEC 2010 shows that the efficiency of MSCSA is remarkably better than the original CSA and its variants. Furthermore, as compared to other various categories of meta-heuristic algorithms, including WOA, AO, RSA, HHO, STSA, AOA, GA, and PSO, MSCSA also demonstrates outstanding performance in convergence accuracy, convergence speed and non-parametric statistics. Therefore, MSCSA improves both the global and local search capabilities of the original CSA.

In near future, it would be interesting to extend MSCSA to handle more complex real-world problems, such as multi-objective optimization, combinatorial optimization and constrained optimization problems.