1 Introduction

Optimization has always been a popular area of research in various fields of science and technology. In real-world problems, resources, time, and money are always limited, necessitating the need for different optimization algorithms. According to the “No-free-lunch” theorem, there is no single algorithm available that works well in all applications. Hence, an optimization algorithm with improved performance is always needed. Optimization algorithms systematically find the solution for a particular problem with or without constraints. Traditional mathematical optimization methods such as linear programming and the Newton–Raphson method face difficulty in solving complex problems due to discontinuity, higher dimension, computation of derivatives, etc. Many nature-inspired algorithms are proposed in the literature to mitigate these issues.

Nature-inspired algorithms (NIA) are meta-heuristic algorithms inspired by some of the natural processes around us. NIAs can be classified into various categories: swarm based, evolution based, human based, bio based, math based, and physics based. Particle Swarm Optimization (PSO) (Dhalwar et al. 2016), Ant Lion Optimization (ALO) (Mafarja and Mirjalili 2019), Ant Colony Optimization (Kumar et al. 2020), Grey Wolf Optimization (GWO) (Faris et al. 2018), Cuckoo Search Algorithm (CSA) (Miao et al. 2021), Salps Swarm Optimization (SSA) (Bairathi and Gopalani 2021), Harris Hawks Optimization (HHO) (Fan et al. 2020), Whale Optimization Algorithm (WOA) (Mafarja and Mirjalili 2017), Follow The Leader (FTL) (Singh and Kottath 2022a; Singh et al. 2022) are a few examples of the swarm-based optimization algorithms proposed over the period. Evolution-based optimization techniques include Differential Evolution (DE) (Deng et al. 2021), Genetic Algorithm (GA) (Wang et al. 2020), Evolutionary Programming (EP) (Hong et al. 2018), Evolutionary Strategies (ES) (Cofnas 2018), etc. Influencer Buddy Optimization (IBO) (Kottath and Singh 2023), Teaching Learning-Based Optimization (TLBO) (Peng et al. 2019), Culture Algorithm (CA) (Chen et al. 2020b), Corona Virus Herd Immunity Optimization (CHIO) (Al-Betar et al. 2020), Forensic-Based Investigation Optimization (FBIO) (Kuyu and Vatansever 2021) are well-known algorithms under the category of human-based optimization algorithms. Bio-based optimization algorithms include Biogeography Based Optimization (BBO) (Chen et al. 2019), Virus Colony Search (VCS) (Li et al. 2016), Satin Bowerbird Optimizer (SBO) (Moosavi and Bardsiri 2017), Earthworm Optimization Algorithm (EOA) (Wang et al. 2018), etc., whereas Hill Climbing (HC) (El Yafrani and Ahiod 2018) and Sine Cosine Algorithm (SCA) (Li and Wang 2020) come under math-based optimization. Atom Search Optimization (ASO) (Zhao et al. 2019), Electromagnetic Field Optimization (EFO) (Abedinpourshotorban et al. 2016), Multi-verse Optimizer (MVO) (Benmessahel et al. 2018), Simulated Annealing (SA) (Fathollahi-Fard et al. 2019) are some of the famous examples of physics-based optimization algorithms.

Swarm intelligence has received wide popularity due to its performance in solving various optimization problems. Particle Swarm Optimization is the pioneering work in the area of swarm intelligence and has motivated the evolution of various swarm-based algorithms. The communication among the swarm brings the exploitation property, whereas the randomness added with different operators brings the exploration property within the search space. A balance between these two properties is essential for an optimization algorithm to find an optimal solution. Many of the proposed meta-heuristic algorithms have the ability to avoid local optima, but they suffer from problems like parameter selection, premature convergence, etc. In the past few years, many hybrid optimization algorithms have been proposed to alleviate these issues (Ahmadian et al. 2021; Jakubik et al. 2021; Chong et al. 2021). The hybrid algorithm improves information exchange within candidates and diversity within the population, thus enhancing the ability to solve complex engineering problems (Singh and Kottath 2021). Wang et al. proposed a hybrid algorithm CS-PSO by combining Cuckoo Search with PSO, which improves the algorithm’s ability to avoid local optima (Wang et al. 2011). Zhang et al. proposed a hybrid optimization technique by combining PSO along with Tabu Search for flexible job-shop scheduling problems (Zhang et al. 2009). Yang et al. proposed a hybrid model based on the Fruit Fly optimization algorithm and neural network to optimize the network parameters and predict underwater acoustic signal (Yang et al. 2018). Singh and Dwivedi integrated ANN and FTL algorithms to learn the weight parameters of network architecture and showed the efficiency of the proposed model over short-term electricity load forecasting problem (Singh and Dwivedi 2018). Further, the author extended their work and proposed a hybrid model based on ANN and optimization algorithm using Controlled Gaussian Mutation (CGM) for electricity demand prediction (Singh et al. 2019).

The literature shows that hybrid algorithms are capable of generating better results than traditional ones. Based on a similar idea, we have combined multiple algorithms in a single stack to find an optimal solution. In this work, a novel method of combining swarm-based optimization techniques has been proposed. The motivation behind this work comes from multiple model systems where the individual particles of an algorithm update their positions by selecting the best available solutions from several optimization algorithms. The novelty of this work lies in developing a framework by combining multiple algorithms which can be extended to any swarm-based approach. To verify the proposed framework, we have utilized four well-known optimization algorithms: CSA, GWO, HHO, and WOA, as the base algorithms. The selection of these optimization algorithms has been solely made on the basis of their wide acceptability and implementation to solve a huge range of complex problems. Also, the number of algorithms is chosen four to limit the number of possible combinations of algorithms for the analysis of our proposed model. The set of all possible combinations of these algorithms, namely CSA-GWO, CSA-HHO, CSA-WOA, GWO-HHO, GWO-WOA, and HHO-WOA, has been evaluated in this work. As the algorithms are executed in parallel, the order of algorithms mentioned in the name does not affect their performance. The proposed hybrid algorithms are tested on twenty-four standard unimodal and multimodal benchmark functions. Furthermore, the performance of hybrid algorithms is evaluated on electricity load and price forecasting problems. Also, the Friedman test has been performed to show the significance of generated results. The major contributions of our work are as follows:

  • A novel approach of combining two different optimization algorithms in a single framework has been proposed.

  • Exhaustive combinations of CSA, GWO, HHO, and WOA have been evaluated.

  • The proposed algorithms are validated on twenty-four unimodal and multimodal benchmark functions with varying dimensions and population sizes.

  • Further, they have been combined with ANN to further show their performance on electricity load and price forecasting problems.

The rest of the paper is organized as follows. Section 2 gives background details of Artificial Neural Networks and different optimization algorithms. Section 3 describes the proposed novel approach of combining CSA, GWO, HHO, and WOA with a detailed flow diagram. Experimental results with a detailed analysis are presented in Sect. 4. Section 5 discusses the results and compiles the interpretations of the results. Finally, Sect. 6 concludes the work by showing some light on the future scope.

2 Related works

In this section, we provide the background details of artificial neural networks and the mathematical representation of different optimization algorithms utilized in this work. The detailed theory about these optimization algorithms can be read from their base papers as per the references.

2.1 Artificial Neural Network

An Artificial Neural Network models the biological neural network of our brain to learn the mapping between input and output neurons. ANN architectures have evolved a lot over the period, and the multi-layer perceptron (MLP) is one of the most widely used networks. This consists of an input layer, one or more hidden layers, and an output layer connected through network weights. ANN has been one of the major choices for time series prediction applications such as prediction of rainfall, water demand, electricity load (Singh and Dwivedi 2019), etc.

For a two-layered feed forward network containing n input neurons h hidden neurons and m output neurons, the output \(Y_k\) can be obtained as:

$$\begin{aligned} Y_k=\sum _{j=1}^{n}\left( W_{k,j}*\frac{1}{1+exp\left( -\sum _{i=1}^{n}W_{j,i}X_i+b_j \right) }+b_k \right) \nonumber \\ \end{aligned}$$
(1)

where \(b_j\) represents the bias term in the hidden layer and \(b_k\) represents the bias terms in the output layer. Machine learning problems can be widely classified as regression and classification problems. Neural networks and their variants have been widely used for these applications. Selection of hyper-parameters such as the number of neurons in each layer, number of layers in the network, choice of activation function, etc. has always been challenging, and hence, the trial-and-error method is used for the same (Hamzaçebi 2008).

2.2 Optimization algorithms

Several optimization algorithms have been proposed in the literature. Algorithms, such as PSO, CSA, HHO, GWO, and WOA, show similar behavior of updating the individual candidate for the next generation. As mentioned earlier, a combination of two or more algorithms may improve the accuracy of the given problem and overcome the drawback of the existing algorithm. This section discusses the basics of CSA, GWO, HHO, and WOA optimization algorithms.

2.2.1 Cuckoo Search Algorithm (CSA)

The Cuckoo Search Algorithm is inspired by the aggressive reproduction strategy of female cuckoo birds. Cuckoo birds lay their eggs in communal nests, and the host birds raise them (Gandomi et al. 2013). Cuckoos use some strategies to increase the hatching probability of their eggs and reduce the probability of abandonment of eggs by the host birds. The cuckoo search algorithm works on three basic rules:

  • Each cuckoo lays one egg at a time and dumps their egg in a randomly chosen nest among available host nests

  • Best nests with good eggs will be sent for next generation

  • \(p_a\) is a probability that an egg laid by a cuckoo can be discovered by the host bird, and the number of host nests is fixed. In this situation, a host can either throw the egg away or leave the nest and build a new nest. Basu and Chowdhury (2013).

A cuckoo uses Levy flight distribution to create a new nest based on the previous best nests. The new nest is calculated as follows:

$$\begin{aligned} X_{\text {next}}= X_{c}+ \alpha *r_1*\text {step}\left( X_{c}- X_{\text {cbest}} \right) \end{aligned}$$
(2)

where step is calculated using Mantegna’s algorithm and \(\alpha>\) 0 (Yang 2010). Apart from basic CSA, numerous variants have been published by researchers and practitioners to improve the performance of existing versions. Chen and Kunjie proposed a hybrid meta-heuristic algorithm by combining biogeography-based optimization (BBO) and CSA to identify the photo-voltaic model parameters. As CSA is good at global exploration while BBO favors local exploitation, it thus brings a good combination of exploration and exploitation (Chen and Yu 2019).

2.2.2 Grey Wolf Optimizer (GWO)

Grey Wolf Optimizer is a population-based algorithm that imitates the leadership hierarchy and group hunting behavior of wolves (Nadimi-Shahraki et al. 2021). In this algorithm, three categories of leader wolves, namely \(\alpha \), \(\beta \), and \(\delta \) have been considered the best solutions (Faris et al. 2018). These three groups of wolves lead the remaining wolves, termed as \(\omega \) wolves, toward good search space to find the global solution. The hunting strategy followed by wolves can be described in three main steps:

  • Encircling Prey:

    $$\begin{aligned}{} & {} X_{next}=X_{prey,c} - A.|C.X_{prey,c}-X_{c}|\end{aligned}$$
    (3)
    $$\begin{aligned}{} & {} C=2.r_2 \end{aligned}$$
    (4)
    $$\begin{aligned}{} & {} A=2.a.r_1-a,~~~~~~ a=2\left( 1-\frac{ite}{Maxite}\right) \end{aligned}$$
    (5)

    where \(r_1\) and \(r_2\) are random numbers. \(\text {ite}\) is current iteration; \(\text {Maxite}\) is maximum number of iterations. \(X_c\) is current position, and \(X_{\text {next}}\) is next position of the wolf. \(X_{\text {prey},c}\) is position of prey in current iteration.

  • Hunting:

    $$\begin{aligned} \begin{matrix} X_{1,c}=X_\alpha -A_1.|C_1X_\alpha -X_c |\\ X_{2,c}=X_\beta -A_1.|C_2X_\beta -X_c |\\ X_{3,c}=X_\delta -A_1.|C_3X_\delta -X_c |\end{matrix} \end{aligned}$$
    (6)

    where \(X_\alpha \), \(X_\beta \), and \(X_\delta \) are the best solutions of current iteration. \(C_1\), \(C_2\) and \(C_3\) are calculated using Equ. 4. Wolves belonging to \(\omega \) should be update their position using:

    $$\begin{aligned} X_c=\frac{X_{1,c}+X_{2,c} +X_{3,c}}{3} \end{aligned}$$
    (7)
  • Attacking the prey: the grey wolves finish the hunt by attacking the prey when it stops moving.

Due to simplicity and few control parameters, the GWO algorithm has been applied in different areas to solve optimization problems such as the economic load dispatch problem (Nithiyananthan and Ramachandran 2013), feature selection (Kiziloz and Deniz 2021), scheduling problem (Chen et al. 2021), and recommendation system (Katarya and Verma 2018).

2.2.3 Harris Hawks Optimizer (HHO)

Harris Hawks Optimizer is a meta-heuristic algorithm that imitates the cooperative behavior and prey-catching manner of Harris hawks (Chen et al. 2020a). The HHO algorithm is expressed between two phases: exploration and exploitation phase.

$$\begin{aligned} X_{\text {next}}=\left\{ \begin{matrix} X_r-r_1|X_r-2.r_2.X_c |&{}p\ge 0.5\\ \left( X_{\text {prey}}-X_{\text {avg}} \right) -r_3\left( lb+r_4\left( ub-lb \right) \right) &{}p< 0.5 \end{matrix}\right. \nonumber \\ \end{aligned}$$
(8)

where \(X_{\text {next}}\) and \(X_c\) are the position of hawks in the next iteration and current iteration, respectively. \(X_{\text {prey}}\) is the best position of prey; \(X_{\text {avg}}\) is the average location of hawks in current iteration. p, \(r_1\), \(r_2\), \(r_3\), and \(r_4\) are the random numbers between 0 and 1. The energy of prey decreases during its escaping behavior, which can be modeled as follows:

$$\begin{aligned}{} & {} E=E_0\left( 2-\frac{2t}{T} \right) \end{aligned}$$
(9)
$$\begin{aligned}{} & {} X_{\text {next}}=\left\{ \begin{matrix} X_{\text {prey}}-X_c-E|2(1-r_5).X_{\text {prey}}-X_c |&{}r_5\ge 0.5~ \text {and} ~|E |\ge 0.5&{} Prey ~has ~energy ~to ~escape ~(Soft~besiege)\\ X_{\text {prey}}-E|X_{\text {prey}}-X_c|&{} r_5\ge 0.5~ \text {and} ~|E |<0.5&{} Prey~ is~ extremely~ tired~(Hard~besiege) \end{matrix}\right. \end{aligned}$$
(10)

Soft besiege with progressive rapid dives:

$$\begin{aligned} Y=\left\{ \begin{matrix} X_{\text {prey}}-E|2(1-r_5).X_{\text {prey}}-X_c |\\ Z=Y~+~S\times \text {LF}(d) \end{matrix}\right\} r_5<0.5 ~\text {and} ~|E |\ge 0.5 \nonumber \\ \end{aligned}$$
(11)

where \(\text {LF}\) is Levy flight function; S is random vector of size d. The final position of hawks can be mathematically updated using the following:

$$\begin{aligned} X_{next}=\left\{ \begin{matrix} Y &{} if~ F(Y)<F(X_c))\\ Z &{} if~ F(Z)<F(X_c)) \end{matrix}\right. \end{aligned}$$
(12)

Hard besiege with progressive rapid dives:

$$\begin{aligned} Y=\left\{ \begin{matrix} X_{\text {prey}}-E|2(1-r_5).X_{\text {prey}}-X_{\text {avg}} |\\ Z=Y~+~S\times \text {LF}(d) \end{matrix}\right\} r_5<0.5 ~\text {and} ~|E |< 0.5\nonumber \\ \end{aligned}$$
(13)

The final strategy to update the position of hawks can be mathematically represented as:

$$\begin{aligned} X_{next}=\left\{ \begin{matrix} Y &{} if~ F(Y)<F(X_c))\\ Z &{} if ~F(Z)<F(X_c)) \end{matrix}\right. \nonumber \\ \end{aligned}$$
(14)

2.2.4 Whale Optimization Algorithm (WOA)

The whale optimization algorithm is a population-based optimization algorithm that mimics the bubble-net feeding behavior of humpback whales while foraging (Aljarah et al. 2018). The bubble net created by humpback whales helps trap the prey and makes it easier for the whale to hunt closer to the surface (Mafarja and Mirjalili 2017). The algorithm depicts the exploitation phase by encircling the prey and creating a bubble net and the exploration phase by randomly searching for prey. The exploration and exploitation phase can be represented as follows:

  • Exploitation Phase: the whale moves around prey by shrinking encircling mechanism and upward spiral-shaped path.

    $$\begin{aligned} X_{\text {next}}=\left\{ \begin{matrix} X_{\text {best}}-A.|C.X_{\text {best}}-X_c |&{} \text {if}\,\, r_2<0.5 \\ |X_{\text {best}}-X_c |.e^{bl}.cos(2\pi l)+X_{\text {best}}&{} \text {if}\,\, r_2\ge 0.5 \end{matrix}\right. \nonumber \\ \end{aligned}$$
    (15)

    where \(A=2.a.r_1-a\), \(C=2.r_1\), a is linearly decreasing from 2 to 0, b defines spiral shape, l lies between −1 and 1, and \(r_1\) and \(r_2\) random numbers between 0 and 1.

  • Exploration phase: a random position \(X_rand\) is used to update the position of whales. A is a vector with random values between 1 and −1 that force a solution to move away from the best solution.

    $$\begin{aligned} X_{\text {next}}=X_{r}-A.|C.X_{r}-X_c |\end{aligned}$$
    (16)
Fig. 1
figure 1

Flowchart of proposed model

3 Proposed methodology

This section details the proposed hybrid optimization algorithms and their approach to combining various algorithms. This section also discusses the method to develop the hybrid model by combining hybrid optimization algorithms with ANN.

3.1 Proposed hybrid algorithm

A hybrid algorithm is one in which aspects of multiple algorithms are combined in a single framework. Exploration and exploitation are the two primary ingredients of an optimization algorithm. The algorithms proposed in the literature are intended to balance these two properties. One of the significant issues with optimization algorithms is that they fail to generalize the performance on multiple problems. The convergence rate and accuracy of the CSA algorithm have been considered as its main limitation (Cuong-Le et al. 2021). The single search strategy in GWO makes it insufficient for various optimization problems (Faris et al. 2018). Major limitations of the HHO algorithm include the problem of solutions diversity and the problem of local optima (Elgamal et al. 2020). GWO suffers majorly due to a limited degree of exploration (Subramanian et al. 2020). This paper evaluates an exhaustive combination of CSA, GWO, HHO, and WOA. Overall, six different hybrid algorithms, namely CSA-GWO, CSA-HHO, CSA-WOA, GWO-HHO, GWO-WOA, and HHO-WOA, are developed with a possible combination of two. These combinations ensure that the positive aspects of the algorithms will be retained, and the performance can be ensured in multiple optimization problems.

In this work, two algorithms are chosen from the base algorithms to create their hybrid algorithm. After the initialization step, each particle changes its position using updated equations from the selected algorithms. In each iteration, the particle selects the best position generated from these updated positions. The essence of both algorithms is incorporated in a single stack through this modification. Each particle gets an option to update its position based on the best available options, allowing each particle to explore the search space in the best possible way. This process of particle updating its position goes on until the termination condition is met. Finally, the best solution is selected from the obtained solutions. A generalized flow diagram of the proposed hybrid algorithm is shown in Fig. 1. The updated equations of CSA, GWO, HHO, and WOA are given in Eq. (2), Eqs (3)–(7), Eqs (8)–(14), and Eqs (15)–(16), respectively.

3.2 Proposed hybrid model

Proposed hybrid algorithms (CSA-GWO, CSA-HHO, CSA-WOA, GWO-HHO, GWO-WOA, and HHO-WOA) have been applied to the time series prediction problem by replacing the gradient descent algorithm of ANN with the proposed hybrid optimization algorithms. This combination of ANN and hybrid algorithms has been tested on real-life prediction problems to analyze their performance. For better clarification, the framework of the proposed hybrid algorithm, ANN-CSA-GWO, is summarized in Table 1.

Table 1 Pseudocode of the proposed hybrid model

4 Experiment and results

In this section, we validate our proposed hybrid algorithms on benchmark functions and test them on two different time series prediction problems to verify their efficacy and performance. All the simulations have been performed on MATLAB R2020b software and Python 3.8 on a Windows 10, 64-bit machine with Intel(R) Core(TM) i5 CPU 760 @ 2.80 GHz.

4.1 Parameter settings

To show the effectiveness of the proposed hybrid algorithms, they have been compared with CSA, GWO, HHO, and WOA algorithms. The parameters of CSA (Shehab et al. 2017), GWO (Kohli and Arora 2018), HHO (Heidari et al. 2019), and WOA (Mirjalili and Lewis 2016) algorithm have been chosen from their respective base papers, and the same parameters have been used in hybrid algorithms as shown in Table 2. During the validation phase, algorithms were tested for 30 epochs with 100 particles and 500 iterations over 2, 5, 10, and 20 dimensions for unimodal and multimodal functions. Furthermore, in the testing phase, these algorithms have been integrated with ANN for short-term electricity load and price forecasting problems, which is executed for 2000 iterations with 100 population sizes and 20 hidden neurons.

Table 2 Experimental parameter values

4.2 Validation on benchmark functions

The performance evaluation of the proposed models has been carried out on twenty-four standard benchmark functions. These functions are categorized into two classes: unimodal (F1–F12) and multimodal (F13–F24). The mathematical equations for these functions and their Domain are given in Table 13 (Appendix). Unimodal functions have only one global solution, whereas multimodal functions have multiple local solutions with one global solution. Thus, unimodal functions can be utilized to validate the exploitation strategy of the optimization algorithm, whereas multimodal functions can be used to test both exploration and exploitation.

Table 3 Comparison of optimal results obtained from different algorithms over unimodal functions
Table 4 Comparison of optimal results obtained from different algorithms over multimodal functions

4.2.1 Influence on dimension

Tables 3 and  4 show the results generated by different optimization algorithms. All the mentioned algorithms have been implemented in four dimensions (2, 5, 10, 20) to analyze their performance with an increasing number of input variables. In Table 3, for the F1 function, GWO, CSA-HHO, CSA-WOA, GWO-HHO, GWO, WOA, and HHO-WOA obtained global values in 2D and HHO-WOA in 5D. In F5 function, CSA-GWO, GWO-WOA, and HHO-WOA generated optimal results in 2D, whereas CSA-GWO and GWO-WOA in 5D. Similar results are obtained from CSA-WOA for F7 and F12 benchmark functions. CSA-HHO and HHO-WOA obtained the best value for F8, F9, and F11 unimodal benchmark functions in all dimensions. Also, HHO and WOA algorithm obtained their optimal values for F9 and F11 functions in 2, 5, 10, and 20 dimensions. Table 4 shows that HHO, CSA-GWO, CSA-WOA, and HHO-WOA generated optimal values for F13, F15, F19, and F23 benchmark functions in all dimensions. GWO, WOA, and CSA-HHO obtained the best values for F13 function in both 2D and 5D. Also, CSA-HHO generated the best function values for F15, F19, and F23 multimodal benchmark functions. For a better understanding, the best values obtained for unimodal and multimodal functions are highlighted in Table 3 and Table 4.

Tables 3 and 4 show that CSA-GWO and HHO-WOA generated superior results for almost all the benchmark functions in different dimensions. In contrast, the performance of CSA-HHO, GWO-HHO, and GWO-WOA is inferior to their base algorithms under given circumstances. Results generated by CSA-HHO are significantly closer to CSA-GWO and better than their respective base algorithm in almost all dimensions. Figure 2 shows the convergence curve of different algorithms over benchmark functions. The convergence plot shows that HHO-GWO and CSA-GWO have good convergence compared to other algorithms. On the other hand, the performance of CSA algorithm has been inferior in all dimensions for both unimodal and multimodal functions, which are visualized from both Tables 3 and  4.

Fig. 2
figure 2

Convergence plot of unimodal and multimodal benchmark functions

Table 5 Effect of population size on F1 to F6 unimodal functions
Table 6 Effect of population size on F7 to F12 unimodal functions
Table 7 Effect of population size on F13 to F18 multimodal functions
Table 8 Effect of population size on F19 to F24 multimodal functions

4.2.2 Influence on population size

This section describes the effect of population size on the performance of the hybrid algorithms. These algorithms and their base algorithms are tested on 25, 50, 75, 100, 125, and 150 populations for 500 iterations. The simulation is performed over 30 epochs to analyze the stability of the obtained results. For this purpose, the dimensionality of the function has been fixed to 10 for analyzing the effect of varying populations on different algorithms. The mean and standard deviation of the generated results for all the benchmark functions are given in Tables 5,  6,  7, and  8. The best value obtained for each population size is highlighted in these tables for a better interpretation. From the tables, it can be observed that there are some cases in which multiple algorithms are performing equally well; hence, they are highlighted. Importance has been given to mean value to select the best. If the mean value is equal, then the standard deviation has been considered to select the best algorithm.

From Table 5, CSA-GWO, CSA-WOA, and HHO-WOA obtained zero standard deviation for F1 function over 25, 50, 75, 100, 125, and 150 population sizes, and HHO-WOA obtained global minima for 125 and 150 population sizes. HHO-WOA found zero std for F2 and F3 functions for 100, 125, and 150 populations. In Table 6, CSA-GWO, CSA-WOA, and HHO-WOA obtained zero std for F7, F8, F9, and F10 in different population sizes. HHO-WOA obtained minimum fitness value for all varying populations in F1, F2, F3, and F4 while CSA-GWO in F5 and F6 unimodal functions among compared algorithms. From Table 6, in F7 function, HHO-WOA generated the best average and zero std value, while CSA-GWO and CSA-WOA obtained zero std values with varying population sizes in a fixed dimension. CSA-WOA and HHO-WOA found global optima, i.e., zero average and std value in the F8 function. HHO, CSA-WOA, CSA-WOA, and HHO-WOA achieved global optima for the F9 and F11 functions and HHO-WOA with its best average value in the F10 and F12 functions.

Tables 7 and 8 show the mean and std of the compared algorithm on the F13–F24 benchmark functions over varying population size with fixed dimension. In F13 and F15, HHO, CSA-GWO, CSA-WOA, and HHO-WOA obtained their global optima and zero std values, whereas better solutions in F14 function for different populations. Also, CSA-GWO obtained the best mean solution for F16 and F17 and CSA-HHO in the F18 function. As shown in Table 8, HHO, CSA-GWO, CSA-WOA, and HHO-WOA found global optima for F19 and F23 over varying population sizes. In addition, HHO-WOA performed superior for F20 and F21 functions, while CSA-GWO for F22 and F24 functions.

From Tables 5, 6, 7, and 8, for population size 25, F1 F2, F3, F4, F7, F9, F10, F11, F12, F13, F15, F19, F20, F21, F23, and F24, the hybrid algorithms CSA-GWO, CSA-WOA, and HHO-WOA show better convergence, and among these HHO-WOA generates superior results. F14 and F22 functions do not show any proper pattern with increasing population size. It can be observed from Tables 5,  6,  7, and  8 that with the increase in the population, the performance of algorithms improves for all functions except F14 and F22. For small population sizes, CSA-GWO and CSA-HHO exhibit better performance than other algorithms in the case of F16 and F18. It can be observed that the performance of GWO-WOA on F16 improves when the population size is greater than 100. For function F24, with increasing population size, CSA-GWO significantly outperformed other algorithms. It is evident from Tables 5,  6,  7, and  8 that with the increase in population size, the performance of hybrid algorithms is comparable to or better than their base algorithms. From the results, it can be noticed that the performance of the algorithms saturates after 100 population. Therefore, for other experimental purposes, a 100 population size has been considered to evaluate the performance of hybrid algorithms.

4.3 Test on real-world problem

To demonstrate the effectiveness of the proposed hybrid algorithms, they are tested on short-term electricity load and price forecasting problems. This section discusses the results generated by hybrid algorithms when integrated with ANN and compares their results with base algorithms.

Electricity load and price forecasting are two important but crucial tasks in the deregulated power market. Error in prediction leads to substantial economic losses; therefore, an accurate model is required to meet future demand and mitigate the gap between supply and demand. Unfortunately, nonlinear and random behavior in load and price data makes forecasting difficult and unreliable. In the past few decades, various models have been proposed to improve the accuracy of electricity load and price forecasting problems (Kottath and Singh 2022; Singh and Kottath 2022b; Singh and Dwivedi 2022.

4.3.1 Problem statement

Short-term electricity load and price forecasting have become important issues in the power market. Therefore, developing an efficient and accurate forecasting model is necessary. ANN is one of the widely accepted models for time series prediction applications. The great learning capability, robustness, huge generalization ability, and high fault tolerance are a few characteristics of the ANN model (Singh and Dwivedi 2018). However, ANN carries certain limitations, such as the selection of an appropriate training algorithm, network architecture, choosing appropriate numbers of hidden neurons and hidden layers, etc. (Singh et al. 2019). One solution to these problems can be a hybrid model that uses an optimization algorithm to train the neural network. We have proposed hybrid models based on a similar concept by using the mentioned hybrid algorithms for training vanilla ANN.

4.3.2 Data description

The ISO New England electricity load dataset and ISO New England electricity price dataset have been used to test the performance of hybrid models. In the case of load forecasting, the models have been trained on hourly data from 2004 to 2007 and tested on out-of-sample data from 2008 to 2009. The purpose of this experiment is to forecast a day-ahead load on an hourly interval. The load data incorporated in the training dataset consist of eight input parameters (\(~L_1,~L_2,~L_3,~L_4,~L_5,~L_6,~L_7,~L_8~\)) where \(L_1\): previous 24-hr. average load, \(L_2\): 24-hr lagged load, \(L_3\): 168-hr lagged load, \(L_4\): dry bulb temperature in \(^{\circ }C\), \(L_5\): dew point temperature in \(^{\circ }C\), \(L_6\): hr. of the day, \(L_7\): day of the week, and \(L_8\): holiday/weekend indicator (England 2009).

The electricity price data for 2004 to 2007 have been used to train the model for price forecasting, whereas out-of-sample data for 2008 are used to test the model. The dataset consists of fourteen input variables \(( P_1, ~P_2, ~P_3, ~P_4~, ~P_5, ~P_6, ~P_7,\) \(~P_8, ~P_9, ~P_{10},~P_{11}\), \(~P_{12}, ~P_{13}, ~P_{14})\), where initial eight parameters \(P_1-P_8\) are same as \(L_1 - L_8\) taken in load dataset, \(P_9\): system load, \(P_{10}\): previous 24-hr. average price, \(P_{11}\): 24-hr lagged price, \(P_{12}\): 168-hr lagged price, \(P_{13}\): 24-hr lagged natural gas price, and \(P_{14}\): 168-hr average lagged natural gas price (England 2009). Before training and testing the model, data are pre-processed using the MIN-MAX normalization technique to reduce the training time.

4.3.3 Evaluation metrics

We have used seven performance evaluation metrics to critically analyze hybrid models over forecasting results. These evaluation metrics are calculated between predicted and actual values. Mathematical equations of the metrics used are given below:

  • Average error (AE)

    $$\begin{aligned} \text {AE} = \frac{1}{N}\sum _{j=1}^{n}Y_{j} - Y'_{j} \end{aligned}$$
    (17)
  • Mean absolute error (MAE)

    $$\begin{aligned} \text {MAE} = \frac{1}{N}\sum _{j=1}^{n}|Y_{j} - Y'_{j}| \end{aligned}$$
    (18)
  • Normalized mean squared error (NMSE)

    $$\begin{aligned} \text {NMSE} = \frac{1}{\Delta ^2N}\sum _{j=1}^{n}(Y_{j} - Y'_{j})^2 \end{aligned}$$
    (19)
  • Root of mean squared error (RMSE)

    $$\begin{aligned} \text {RMSE} = \sqrt{\frac{1}{N}\sum _{j=1}^{n}(Y_{j} - Y'_{j})^2} \end{aligned}$$
    (20)
  • Mean absolute percent error (MAPE)

    $$\begin{aligned} \text {MAPE} = \frac{1}{N}\sum _{j=1}^{n} \frac{\mid Y_{j} - Y'_{j}\mid }{Y_j}*100 \end{aligned}$$
    (21)
  • Directional Change (DC)

    $$\begin{aligned}{} & {} \text {DC}=\frac{100}{N-1}\sum _{j=1}^{ N-1}a_{t}, a_t~\nonumber \\{} & {} \quad =~ \begin{Bmatrix} 0,~\text {otherwise}\\ 1, ~ if({Y}'_{j+1}-Y_j)(Y_{j+1}-Y_j)~>~0 \end{Bmatrix} \end{aligned}$$
    (22)
  • Pearson’s correlation coefficient (r)

    $$\begin{aligned} r=\frac{\sum _{j=1}^{n}(Y_{j} - {\bar{Y}})(Y'_{j}-\bar{{Y}'})}{\sqrt{\sum _{j=1}^{n}(Y_{j} - {\bar{Y}})^2}{\sum _{j=1}^{n}(Y'_{j}-\bar{{Y}'})}^2} \end{aligned}$$
    (23)

where \(Y_j\) is actual price/ load value of day j, \(Y'_j\) is predicted price/ load value of day j, \({\bar{Y}}\) is mean of actual price/ load value, \(\bar{Y'}\) is mean of predicted price/ load value and, N is number of elements in training data.

Fig. 3
figure 3

Convergence plot of different algorithm for electricity load forecasting

4.3.4 Result: electricity load forecasting

The ANN model has been trained with 20 hidden neurons for both electricity load and price forecasting. During the training phase, the neural network updates its weight to optimize the objective function, i.e., mean squared error. Figure 3 shows the convergence graph of ANN-CSA, ANN-GWO, ANN-HHO, ANN-WOA, ANN-CSA-GWO, ANN-CSA-HHO, ANN-CSA-WOA, ANN-GWO-HHO, ANN-GWO-WOA, ANN-HHO-WOA hybrid models during the training phase of neural network. From the figure, we can deduce that ANN-GWO-HHO generates minimum MSE at the end of the termination condition. It can be noted that hybrid algorithms converge earlier than standalone optimization algorithms when combined with the neural network. For example, ANN-CSA-GWO started with minimal convergence value and gave tough competition to ANN-GWO-HHO till 1250 iterations, but later on, after 1500 iterations, the ANN-GWO-HHO converged rapidly. The graph also shows that ANN-GWO generated maximum training error. Overall we can generalize that the training error of hybrid models with combined algorithms generates less error than others.

A detailed analysis of results generated by different algorithms based on different error metrics is shown in Table 9. The forecasting results for the years 2008 and 2009 are shown in the table. Different evaluation metrics such as AE, MAE, MAPE, RMSE, NMSE, r, and DC have been used to compare the performance of different hybrid models ANN-CSA, ANN-GWO, ANN-HHO, ANN-WOA, ANN-CSA-GWO, ANN-CSA-HHO, ANN-CSA-WOA, ANN-GWO-HHO, ANN-GWO-WOA, and ANN-HHO-WOA. The table shows that the performance of the hybrid models ANN-CSA-GWO and ANN-WOA-HHO is superior to standalone and other combination algorithms. The MSE values show that the ANN-WOA-HHO model generates the least value, whereas ANN-CSA-GWO gives the second best. The hybrid model ANN-WOA failed to converge compared to other prediction models by generating maximum MAE and MAPE values of 1708.092 MWh and 12.28%, respectively, which proves its inefficacy over the electricity load forecasting problem. However, hybrid models based on WOA, such as ANN-CSA-WOA, ANN-GWO-WOA, and ANN-HHO-WOA, perform superior to the ANN-WOA algorithm. The MAPE metric shows the least value of 4.372268% for ANN-CSA-GWO, whereas the ANN-WOA with the maximum value of 12.28163 MWh. The RMSE metric has a similar response to MAE, generating the least value for ANN-WOA-HHO. The last row of the table depicts the Friedman test statistic generated by all the hybrid models for the electricity load forecast. The Friedman values show that ANN-CSA-GWO and ANN-WOA-HHO produced almost similar values. Overall, Table 9 depicts that the CSA-GWO and WOA-HHO hybrid algorithms add better learning ability to ANN than other algorithms.

Table 9 Performance metrics of different algorithms on load forecasting

Table 10 shows the MAE and RMSE values of ANN-CSA, ANN-GWO, ANN-HHO, ANN-WOA, ANN-CSA-GWO, ANN-CSA-HHO, ANN-CSA-WOA, ANN-GWO-HHO, ANN-GWO-WOA, ANN-HHO-WOA hybrid algorithms monthly. The table shows that for the month of October and November, ANN-WOA-HHO generated a minimum MAE value of 430.2016 MWh and 576.8031 MWh, and RMSE value of 553.6002 MWh and 742.1469 MWh, respectively. Based on the results, we conclude that ANN-GWO-WOA is superior among other hybrid models discussed, whereas ANN-CSA-GWO ranked second.

Figure 4 shows the bar graph of different hybrid algorithms discussed in terms of MAE and RMSE over days for 2008. From the graph, it can be noted that all the hybrid algorithms generated maximum MAE and RMSE on Monday. ANN-GWO-HHO generated minimum MAE and RMSE on Tuesday, Wednesday, Thursday, Friday, and Monday. On Saturday and Sunday, ANN-GWO-HHO generated results closer to minimal values. It is visible from the graph that ANN-WOA performed severely in terms of MAE and RMSE for all the days.

Table 10 MAE and RMSE results of different algorithms on a monthly basis for the year 2008 (Load)
Fig. 4
figure 4

Graph comparison of MAE and RMSE on a daily basis for electricity load forecasting

4.3.5 Result: electricity price forecasting

To deeply analyze the performance of the hybrid algorithms, they are integrated with ANN and applied to solve the electricity price forecasting problem. Figure 5 shows the plot between MSE and the increasing number of iterations. The figure plots the reducing convergence values of MSE generated by ANN-CSA, ANN-GWO, ANN-HHO, ANN-WOA, ANN-CSA-GWO, ANN-CSA-HHO, ANN-CSA-WOA, ANN-GWO-HHO, ANN-GWO-WOA, and ANN-HHO-WOA models. From the figure, we can see that ANN-WOA generated maximum MSE and didn’t converge well. It can be noted that ANN-CSA-GWO showed good convergence while ANN-GWO, ANN-CSA-HHO, and ANN-GWO-HHO showed similar patterns in terms of MSE. The figure reveals that most hybrid models generate less training error than others.

Fig. 5
figure 5

Convergence plot of different algorithms for electricity price forecasting

The test results of hybrid models are shown in Table 11. Different evaluation metrics such as AE, MAE, MAPE, RMSE, NMSE, r, and DC have been used to compare the performance of different hybrid models. The table shows that the performance of ANN-CSA-GWO and ANN-WOA-HHO are superior to single and other hybrid algorithms. The MSE and MAPE values generated by ANN-CSA-GWO and ANN-WOA-HHO are the best and second best compared to other algorithms. The ANN-WOA model failed to converge by generating maximum MAE and MAPE values of 15.97964 $/MWh and 20.70291%, respectively, which proves its inefficacy over the electricity price forecasting problem. However, hybrid models based on WOA, such as ANN-CSA-WOA, ANN-GWO-WOA, and HHO-WOA, perform superior to the base ANN-WOA model. The table shows that ANN-CSA-GWO generates the least MAPE value of 7.77011%. The last row of the table depicts the Friedman test statistic for the electricity price forecast, and values show that ANN-CSA-GWO got the least value of 51. These results show that hybrid algorithms perform better than individual algorithms when combined with the ANN model.

Table 11 Performance metrics of different algorithms on price forecasting
Table 12 MAE and RMSE results of different algorithms on monthly basis for the year 2008 (Price)
Fig. 6
figure 6

Graph comparison of MAE and RMSE on a daily basis for electricity load forecasting

Table 12 shows the MAE and RMSE values generated by ANN-CSA, ANN-GWO, ANN-HHO, ANN-WOA, ANN-CSA-GWO, ANN-CSA-HHO, ANN-CSA-WOA, ANN-GWO-HHO, ANN-GWO-WOA, ANN-HHO-WOA models on a monthly basis. The table reveals that ANN-GWO produced a minimum MAE value of 10.25275 $/MWh, 7.372667 $/MWh, and 3.095078 $/MWh for Jan, May, and Oct, whereas minimum RMSE value of 13.61383 $/MWh, and 3.931929 $/MWh for Jan and Oct. ANN-CSA-GWO generated minimum MAE for Apr, Jun, July, Nov and Dec, whereas minimum RMSE for Apr, May, June, July, Nov and Dec. The table shows that the ANN-WOA-HHO model performed well in terms of MAE and RMSE for the remaining months. From the results, we can conclude that ANN-CSA-GWO gives superior results for most of the months, and ANN-WOA-HHO generated similar results while ANN-WOA generated the maximum error. Figure 6 shows the bar graph between MAE, RMSE, and days of the week. The graph shows the comparison of MAE and RMSE generated by different hybrid algorithms daily. The graph shows that MAE and RMSE values on Tuesday and Monday are higher than on the other days. The graph shows that ANN-CSA-GWO and ANN-GWO-HHO generated minimum MAE and RMSE values, whereas ANN-CSA, ANN-WOA, and ANN-HHO-WOA generated maximum MAE and RMSE for all the days. ANN-CSA-GWO outranked all stated algorithms on all weekdays except Sundays.

5 Discussions

In this section, we discuss the valuable outcome of this work. This section gives a brief interpretation of the obtained results from this work given in the previous section.

From the literature, the performance of an algorithm can be improved by combining it with other appropriate algorithms. Based on this, CSA, GWO, HHO, and WOA algorithms are chosen and combined to form several new algorithms combinations. Tables 5,  6,  7,  8, and  9 show the impact of the increasing population on different hybrid algorithms. The outcome reveals that the performance of the algorithms saturates after a population size of 100.

According to the “No-Free-Lunch” theorem, there is no algorithm that can perform well for all applications. Hence, this method of combining multiple algorithms can reduce the performance degradation of algorithms when the application changes. The key takeaways from this work can be listed as follows:

  • A novel method of combining multiple algorithms in a single stack has been discussed and devised in this paper

  • Four different optimization algorithms: CSA, GWO, HHO, and WOA, have been utilized to create exhaustive combinations of two algorithms.

  • The hybrid algorithms are tested on twenty-four unimodal and multimodal benchmark functions to evaluate their performance along with the base algorithms.

  • Two real-world time series prediction problems (electricity load and electricity price) have been used to further test the performance of the hybrid algorithms by integrating them with the ANN.

  • The evaluation results show that ANN-CSA-GWO and ANN-GWO-HHO outranked other hybrid models.

  • The metrics show that the performance of WOA has been significantly improved when combined with other algorithms. Hence, it can be utilized to improve the performance of the algorithms in different applications.

  • The Friedman values 47 and 49 for electricity load and price show the improved performance of ANN-CSA-GWO on these applications.

The main advantage of the proposed algorithm is that every particle can update the position based on multiple options available. This allows us to combine the positives of multiple algorithms in the same stack. In optimization, some algorithms perform well on particular problems but poorly on others. Through this hybrid method, we can combine the algorithms to make them more generalized. The proposed algorithm gives a method of creating hybrid algorithms without degrading their performance. Though this provides a better option for us to combine algorithms, the selection of the algorithms is one of the challenging tasks here. It is important to have complementary algorithms to be combined in order to get a cumulative performance. If the algorithms selected are not complementary, we may end up having similar performance with increased computation. This can be considered one of the challenges of the proposed algorithm.

6 Conclusions and future work

Hybrid optimization techniques combine multiple optimization algorithms in a single framework. This work discusses a novel method of combining different optimization algorithms. The algorithms CSA, GWO, HHO, and WOA are chosen as the base for making the hybrid algorithms. Exhaustive combinations of these algorithms, namely CSA-GWO, CSA-HHO, CSA-WOA, GWO-HHO, GWO-WOA, and HHO-WOA, are discussed in detail. The performance of these algorithms is tested in twenty-four well-known unimodal and multimodal benchmark functions. Rigorous analysis has been performed by varying the dimensions and population size for these functions. CSA-GWO performed well in almost all the benchmark functions, and CSA-WOA and HHO-WOA generated competitive results. It can be interpreted that the hybrid algorithm performance is superior to the base algorithms when the individual algorithms perform well. To test the algorithm’s efficacy, they have been applied to short-term electricity load and price forecast problems. The hybrid algorithms are combined with ANN to learn the network parameters during training. The results indicate that the performance of ANN-CSA-GWO is superior in all the test cases. The remaining hybrid algorithms are competitive enough but not in the problem under discussion. The reason for superior performance is that hybridization ensures better exploration in the search space, which helps the algorithms converge faster. In the future, more algorithms can be tested for making hybrid algorithms and can also be tested in different classification and regression problems.