A Hybrid Model of Extreme Learning Machine Based on Bat and Cuckoo Search Algorithm for Regression and Multiclass Classification

Extreme learning machine (ELM), as a new simple feedforward neural network learning algorithm, has been extensively used in practical applications because of its good generalization performance and fast learning speed. However, the standard ELM requires more hidden nodes in the application due to the random assignment of hidden layer parameters, which in turn has disadvantages such as poorly hidden layer sparsity, low adjustment ability, and complex network structure. In this paper, we propose a hybrid ELM algorithm based on the bat and cuckoo search algorithm to optimize the input weight and threshold of the ELM algorithm. We test the numerical experimental performance of function approximation and classification problems under a few benchmark datasets; simulation results show that the proposed algorithm can obtain significantly better prediction accuracy compared to similar algorithms.


Introduction
In recent years, artificial intelligence algorithms have drawn extensive attention from scientific research. As an important part of artificial intelligence, machine learning has been widely used in data mining [1], speech recognition [2], feature selection [3,4], learning incentivization strategy [5], natural language processing [6], and the nonlinear function approximation and benchmark problem [7]. As a branch of machine learning, neural networks have been successfully applied in many tasks of learning from data. However, most of the traditional neural networks use the gradient learning algorithm for network training, which makes the network make problems such as low training efficiency, slow speed, and easy to fall into local optimal.
Extreme Learning Machine (ELM) is a new method of training artificial neural networks and includes supervised training methods, which is a kind of neural network structure put forward by Huang et al. using single hidden layer feedforward networks (SLFN) [8][9][10]. Huang et al. [11] argue that the existing neural networks have some defects in learning speed; the main reason for the low rate of learning is that all the parameters on the network are determined repeatedly by a training method. In the ELM learning algorithm, the weight feedback and threshold are generated randomly. en, the output of the hidden layer matrix is used to calculate the final output weight. Computing the final weights was obtained using Moore-Penrose (MP) generalized inverse. Compared with other neural networks based on the gradient learning algorithm, the ELM learning algorithm has great advantages in learning speed, and it is capable of producing good generalization performance and greatly reduces the computational complexity of complex application problems [12,13]. Meanwhile, these good performances have been widely promoted in various practical application fields such as biomedicine [14][15][16], fault diagnosis [17,18], and indoor positioning systems [19,20]. However, since the input parameters are generated randomly and the ELM requires a large number of hidden neurons, the amplitude of the output weight will be large when the output matrix of the hidden layer is ill, which will cause the trained model to fall into the local minimum and show the phenomenon of overfitting [21]. In [22,23], an ELM based on different regularization was proposed to effectively overcome the overfitting phenomenon. e accuracy and effectiveness of the ELM algorithm largely rest with the internal parameters of the model. So as to choose the suitable model parameters, many researchers use a bionic optimization algorithm to optimize the input weights and thresholds.
In the literature [24], the improved ELM algorithm was proposed, which used a differential evolution algorithm to choose the input weights and then used MP generalized inverse analysis to determine the output weights. is improvement enables it to obtain better generalization performance in a compact network. In the literature [25], the coral reefs optimization (CRO − ELM) has been used for carrying out evolution in ELM weights to enhance the performance of these machines. A new evolutionary algorithm, particle swarm optimization (PSO − ELM), is introduced to optimize the input weight and hidden bias of ELM [26,27] so that the network has better generalization performance in the benchmark classification experiment and is more suitable for some prediction problems. A realcoded genetic algorithm (RCGA − ELM) was proposed [28] to select the number of hidden neurons and the input weights, such that the generalization performance of the classifier is a maximum. But it needed to adjust many parameters in genetic operators artificially. e cuckoo search algorithm (CS − ELM) was proposed [29][30][31][32][33], which was used to pretrain the ELM ensuring optimal solutions and to further improve the accuracy and stability of CS − ELM. References [34,35] proposed ICS model, which combines the improved cuckoo search algorithm with ELM. Both CS − ELM and ICS − ELM select the input weights and biases before calculating the output weights, and they ensure the full column rank of the hidden layer output matrix.
Bat algorithm (BA) [36,37] and cuckoo search algorithm (CS) [38,39] are two new heuristic swarm intelligence optimization algorithms. Bat algorithm has the advantages of a simple model, fast convergence rate, strong global optimization, and so on and has been widely used in engineering optimization, model identification, and other problems. e cuckoo search algorithm has the characteristics of simple and efficient, few parameters, easy to implement, and excellent random search path and has been successfully applied to medical image optimization [40], multiobjective optimization [41], image processing [42], and other practical problems. Literature [43] shows that bat algorithm and cuckoo search algorithm have great advantages over genetic algorithm and particle swarm optimization in the new metaheuristic environment. In this paper, we combine the BACS hybrid algorithm with traditional ELM and propose an optimization algorithm of ELM based on BACS. e basic thought of the BACS − ELM algorithm is to use the BACS algorithm to train the input weight and threshold value randomly generated by ELM to find the optimal parameter and then determine the output weights by using MP generalized inverse so as to improve the convergence speed and stability of the network model. e main contributions are as follows: (1) Based on the idea of a group intelligence optimization algorithm, this paper introduces how to train ELM by BACS hybrid algorithm. By using this method, the input weights and thresholds of the ELM network can be reasonably optimized to solve the randomness problem of hidden layer parameters so that the network parameters can reach the optimum. (2) By improving the traditional ELM network by BACS hybrid algorithm, the local and global optimization problems are effectively balanced, and the generalization performance of the network is improved. (3) Nonlinear function fitting and classification problems present that the BACS − ELM algorithm can acquire better approximation effect and generalization performance than other algorithms. e rest of the paper is arranged as follows: Section 2 introduces the traditional ELM network model and algorithm. Section 3 introduces the principles and implementation steps of the bat algorithm and cuckoo search algorithm.
e hybrid algorithm of Extreme Learning Machine based on the bat cuckoo algorithm is described in Section 4. Some numerical experiments are discussed in Section 5. Section 6 offers some conclusions for this paper.

The Preliminary of ELM
In this section, we begin with the introduction of standard ELM, the network model of ELM is shown in Figure 1, and its network model can be divided into three layers, which are the input layer, hidden layer, and output layer. All of these works provide fundamental theoretical support for the new method proposed next. (x j , o j ) ∈ R n × R m represents P arbitrary various samples, where x j � (x j1 , x j2 , . . . , x jn ) T ∈ R n and o j � (o j1 , o j2 , . . . , o jm ) T ∈ R m ; the traditional SLFN with L hidden nodes can be mathematically modeled as is an activation function, which can take various kinds forms, such as the sigmoid function: or Gaussian function: e above SLFN can approximate these P samples in the training process of gradual iteration. When the learning error is reduced to zero, P j�1 ‖t j − o j ‖ � 0, the learning capacity of the ELM is optimal, and then there exist (w i , b i ) and β i such that 2 Journal of Mathematics where w i � (w i1 , w i2 , . . . , w in ) T ∈ R n is the input weight, which links the i-th hidden node as presented in Figure 1, b i ∈ R is the threshold of the i-th hidden node and is generated randomly, β i � (β i1 , β i2 , . . . , β im ) T ∈ R m is the output weight of the i-th hidden node, and t j represents the actual output of input x j in the network. e above P equations can be rewritten as the following matrix form: where where H is called the output matrix of the hidden layer and β represents the final output matrix. e basic principle of ELM is to obtain the output weight β through formula Hβ � O.
In practical training, the number of nodes L in the hidden layer is usually less than the number of training samples P.
erefore, on the premise that the activation function is differentiable, input weights and thresholds randomly selected before training should remain unchanged during training. In this way, the output weight of the network can be obtained by solving the least squares of the following linear system: min β ‖Hβ − O‖, (7) and the explicit solution is where H † represents the MP generalized inverse of H [44]. erefore, ELM can be described as follows (Algorithm 1).

Bat Algorithm.
Bat algorithm (BA) is a swarm intelligence optimization algorithm that simulates the predation behavior of bats. Because of its simple model, fast convergence speed, and strong global optimization, it has been widely used in data mining, wireless sensors, and power systems. However, there are also some problems in practical applications, such as easy premature convergence and low optimization accuracy. e bat algorithm determines the optimal bat in the current search space by adjusting the frequency, wavelength, and loudness and then obtains the optimal solution to the optimization problem. For this algorithm, in order to simulate this predation behavior, the following assumptions are proposed in the process: (1) All bat individuals can use echolocation to perceive the distance and distinguish the difference between the target and the obstacle in a special way (2) e bat flies randomly at position x i at speed v i , finds the target with frequency f min , variable wavelength λ, and loudness A 0 , and automatically adjusts the wavelength (or frequency) and pulse emission rate r ∈ [0, 1] through the distance from the target and so on (3) Assume that the loudness changes from the maximum value A 0 to the minimum value A min Assuming that, in the search space with dimension d, the number of iterations is t, the update formulas for the frequency, velocity, and position of the bat individual i in the t-th generation are as follows: where f i represents the frequency of the i-th bat and its adjustment range is [f min , f max ], β is a random number that obeys a uniform distribution in [0, 1], and x * represents the current optimal solution. For the current local search domain, a random number rand 1 is generated. If rand 1 > rand i , the current new solution is generated by the random disturbance of the optimal solution. e update formula is as follows: Output Hidden layer where ε is a random number in [−1, 1] and A t represents the average loudness of the bat population.
When the bat is constantly approaching the target, its loudness A will drop to a fixed value, and at this time, r will continue to increase. Randomly generate a number rand 2 ; if rand 2 < A i and the new fitness value f(x new ) > f(x old ), the new solution generated by (12) e update formula for the loudness A i and pulse rate r i of the first bat is as follows: where α represents the loudness attenuation coefficient and 0 < α < 1. σ represents the pulse frequency enhancement coefficient and σ > 0.

Cuckoo Search
Algorithm. e cuckoo search algorithm (CS) is simplification and simulation of the cuckoo nest finding and spawning behavior. e special habit of cuckoos is parasitic brooding; that is, other host birds hatch and brood on their behalf. In order to make this phenomenon difficult to detect, the bird will first find a bird with similar characteristics to its own egg as the host during the breeding period. After being recognized by the host bird, the egg is removed or the host rebuilds the nest. In order to simulate its reproductive behavior, the following assumptions are proposed in the process: (1) Each cuckoo lays only one egg at a time and randomly selects the nest to hatch (2) e best bird's nest is retained to the next generation (3) e number of available bird nests n remains unchanged; there is a probability (p a ) that the host bird finds foreign eggs, p a ∈ [0, 1] For the cuckoo search algorithm, randomly initialize n bird nest positions in the d-dimensional search space and leave the best position to the next generation.
e new position is generated by Levy flight. en the cuckoo's nest search path and position update formula are as follows: where x t i represents the position of the i-th bird nest in the t-th generation, α represents the step-length control factor and α > 0, ⊕ is the point-to-point multiplication, Levy(λ) is the random search path, and After the position is updated, compare the random numbers r and p a , and 0 ≤ r ≤ 1; if r > p a , then use the random walk method to change the position so as to retain a set of better values and obtain the current optimal bird nest position and optimal solution through iteration. e update formula is as follows: where τ represents the uniformly distributed scaling factor within [0, 1] and both x t m and x t k represent the random solution in the t-th generation.

Bat Cuckoo Hybrid Algorithm.
Although the bat algorithm has low convergence accuracy, its global search ability is strong; in order to improve the quality of the cuckoo population, the bat algorithm is integrated into the cuckoo algorithm for optimization, and a bat cuckoo hybrid algorithm (BACS) is proposed. For this algorithm, the nest position obtained by the cuckoo algorithm is not directly used as the initial position, but the bat algorithm is used to continue to optimize the optimal value after the position is updated, which greatly accelerates the global search ability of the algorithm. erefore, the integration of the two algorithms effectively balances the problem of local and global optimization. Based on this, the specific steps of the bat cuckoo hybrid algorithm are shown in Table 1.

Hybrid Algorithm of Extreme Learning Machine Based on Bat Cuckoo Algorithm
Extreme Learning Machine (ELM) selects hidden layer parameters randomly and does not need to update iteratively during training, and the output weight can be determined by the least square solution, which greatly accelerates the learning process. Although ELM overcomes the shortcomings of the traditional gradient descent algorithm, the number of hidden nodes still needs to be set in advance, which may lead to many redundant nodes. erefore, ELM requires more random hidden nodes in some applications than traditional neural network algorithms. However, this will lead to a decrease in the sparsity and regulation ability of the hidden layer, the complexity of the network structure, and the extension of the training time and finally affect the generalization ability and robustness of the network. Input: given a training set (x j , o j ) ∈ R n × R m , activation function is G(w i , b i , x j ), and the hidden nodes number is L. Output β.
Step 1: setting learning parameters for hidden nodes w i and b i , 1 ≤ i ≤ L.
Step 2: calculate the output matrix H based on (5).
ALGORITHM 1: ELM algorithm. 4 Journal of Mathematics BACS algorithm has the characteristics of strong search accuracy, fast convergence speed, and not easy to fall into local best and effectively balances local and global search. Using this optimization ability, the hidden layer parameters of ELM are selected appropriately to solve the problem that the hidden layer parameters need to be optimized due to randomness. erefore, this paper considers the use of the BACS algorithm to optimize ELM so as to propose a hybrid algorithm of Extreme Learning Machine based on the bat cuckoo algorithm (BACS − ELM). We first use the BACS algorithm to train the input weights and thresholds randomly generated by ELM. e population is taken as the initially hidden layer parameter of ELM, and the fitness function of the BACS algorithm is used to conduct iterative optimization.
e position of the individual of the population is constantly adjusted to find the optimal hidden layer parameter until the maximum number of iterations or search accuracy is reached. At the end of the iteration, the optimal individual position is obtained, and the optimized results are used as the input weights and thresholds of ELM to train the network so as to improve the convergence speed and stability of the network model. To prevent the problem of output saturation caused by excessive input value, we use the following formula to normalize the data: where x is the original data and x max and x min represent the maximum and minimum values of the original data, respectively. Next, the input weights and thresholds of ELM were represented by the cuckoo individuals using real coding rules. On the basis of Section 2, the number of neurons in the input layer and hidden layer is fixed as n and L , respectively. erefore, the calculation formula of the coding length of the cuckoo individual is Individual position of cuckoo can be expressed as e input weights w i and thresholds b i of ELM are mapped to the individual position of the cuckoo, the population is randomly initialized, and the obtained random individuals are assigned to the input weights and thresholds of ELM one by one and placed in the ELM network. Here, the assignments of input weights and thresholds are, respectively, expressed as follows: In the training sample process of ELM, in order to evaluate the prediction performance more objectively, we used the root mean square error as the evaluation index of model prediction, so the fitness function was designed as where P is the total number of samples, T � (t 1 , t 2 , . . . , t P ) represents the actual output value of samples, and O � (o 1 , o 2 , . . . , o P ) represents the expected output value of samples. Table 2 shows the specific implementation steps of the BACS − ELM algorithm.

Experimental Results
In order to verify the performance of the proposed algorithm, a function fitting and several classification problems are tested in this section, and the validity of BACS − ELM is tested by comparing it with the ELM, BA − ELM, and CS − ELM algorithms.

Function Fitting.
In order to declare the performance of the proposed algorithm more intuitively and effectively, we take into account adopting ELM, BA − ELM, CS − ELM, and BACS − ELM to approximate the Sinc function and then compare the function approximation capabilities. e expression for the Sinc function is defined as follows: e training set and test set of 5000 samples were selected, respectively, and the input variables x i obey the uniform distribution in the interval [−10, 10]. In order to increase the authenticity and improve the generalization performance of the algorithm, random noise was added to the training samples, whereas the testing data remained noise-free. For different optimization methods, the initial parameter settings are presented in Table 3, and the maximum iteration number is set I � 100. e activation function is the RBF function, and the fitness function is Table 1: Steps of the bat cuckoo hybrid algorithm.
Step 1: initialize the basic parameters and set the loop termination criteria Step 2: initialize the location of the bird nest, calculate the fitness value of each bird nest, and obtain the optimal position and optimal value Step 3: record the optimal position of the previous generation, update according to formula (15) to obtain a new set of positions, calculate the fitness value, and compare it with the value of the previous generation to determine the current better position Step 4: compare the random number r with p a ; if r > p a , update the position randomly; otherwise, it will not change Step 5: use the new position as the initial point of the bat algorithm and use equations (9)- (14) to update the position of the bird nest Step 6: record the position of step 5 and calculate the fitness value to determine the current optimal position and optimal value Step 7: if the termination conditions are met, continue to the next step; otherwise, go to step 3 Step 8: output the global optimal position, and the algorithm ends Journal of Mathematics 5 RMSE. In order to compare the results of each algorithm more objectively, each experiment was run 20 times and then took the mean value. e selection of the number of hidden nodes will have a direct influence on the performance of the model. erefore, the experiment on BACS − ELM was carried out by adjusting the number of hidden nodes, and the test results obtained are shown in Table 4. e results show that the function has the best fitting effect when the number of hidden nodes is 12, and the mean square error of training and testing tends to be stable with the increase of nodes. To ensure the performance of the algorithm and reduce the complexity of the model, the architecture of the optimized ELM network can be determined as 1-12-1.
en, based on the selection of the above parameter values, simulation experiments were carried out on the ELM, BA − ELM, CS − ELM, and BACS − ELM algorithms. It can be seen from Figure 2 that the approximation effect of the BACS − ELM algorithm is better than that of other algorithms. Moreover, the performance comparison of each algorithm is shown in Table 5. According to the displayed results, the test RMSE value of the BACS − ELM algorithm is the smallest, which means that the algorithm has higher accuracy and better stability. As can be seen from the training time in the table, due to the randomness of hidden layer parameters of ELM, it has a very fast learning speed, but the fitting effect is not ideal.
e results also show that the three optimization methods are all effective. But there is little difference in training and testing time between the BA − ELM, CS − ELM, and BACS − ELM algorithms and the advantages of learning efficiency are not embodied. Nevertheless, the ELM model based on the BACS algorithm greatly improves the convergence accuracy of function fitting, so the computational efficiency is also within the acceptable range.

Classification Problems.
In this section, in order to more accurately appraise the effectiveness of the BASC − ELM algorithm, the performance of the algorithm will be compared on multiple classification problems. e relevant information of the dataset is given in Table 6. e initial parameter setting of each group was consistent with the above. e maximum iterations number I � 100 and the activation function was the Sigmoid function. Each group of experiments was run 20 times to take the average value. Figure 3 shows the comparison of the classification accuracy of the algorithm in different datasets with the change of the number of nodes. Figure 3(a) is based on the variation trend of breast cancer; it can be seen from the figure that ELM needs the most nodes to achieve relatively high accuracy, while other algorithms all achieve the highest accuracy when the node is 20, and further speaking, BASC − ELM is slightly better. Figure 3(b) is based on the changing trend of heart failure. It can be seen from the figure that the four algorithms all show a similar curve trend when the number of hidden nodes increases and they all have the best accuracy when the node is 20, but at this time, BASC − ELM has the highest value of 84.23%. Figure 3(c) is based on the variation trend of Iris. BASC − ELM has the best accuracy when the node is 10, which is 5 fewer nodes than other algorithms when they get the maximum value. Figure 3(d) is based on the changing trend of the vertebral column. It can be seen from the graph that BASC − ELM only needs the Table 2: Steps of the BACS − ELM learning algorithm.
Step 1: initialize the basic parameters and set the loop termination criteria Step 2: initialize the cuckoo individual, code the input weights and thresholds of ELM into the individual, and each individual represents an ELM network structure Step 3: normalize the training data and random initial individual position and calculate the fitness value in line with equation (21) Step 4: record the optimal position, obtain a group of new positions according to equation (15), calculate the fitness value, and determine the current optimal position Step 5: compare the random numbers r with P a ; if r > P a , update the position randomly; otherwise, it will not change Step 6: take the new position as the starting point of BA, and randomly generate rand 1 ; if rand 1 > r i , update the current optimal position; otherwise, go to step 7 Step 7: randomly generate rand 1 ; if rand 1 < A i &&f(x new ) > f(x old ), replace x new with the current position x t+1 i or do not update x new Step 8: calculate the fitness value of each individual, and determine the current optimal position and optimal value Step 9: if the termination condition is met, proceed to the next step; otherwise, go to step 4 Step 10: the individual cuckoo is decoded into the input weights and thresholds of ELM; obtain the optimal ELM network structure according to these parameters Table 3: e population parameter setting of three optimization methods.    is is because when the BACS algorithm optimizes the input weights and thresholds of ELM, it has a strong local optimization ability at the initial stage of search and makes full use of the global optimization ability of the BA algorithm. e combination of the two greatly improves the convergence accuracy.
Based on the above analysis, the performance results of the four algorithms on the number of hidden nodes, training time, training, and test accuracy are also given in the  experiment. It can be clearly seen from Table 7 that the BACS − ELM algorithm can achieve the best test accuracy under the minimum number of hidden nodes in all the four datasets, which indicates that the algorithm can effectively optimize the parameters of the hidden layer of the ELM model by using BACS algorithm and then obtain a more appropriate and simplified network structure. At the same time, the best generalization performance and classification ability are obtained. In terms of computing time or efficiency, hidden layer parameters of ELM do not need to be iteratively tuned, so the learning speed is very fast, but the success rate of its classification is very low. In Table 7, we did not list the test time data because the values of the four algorithms for different datasets in the experimental results are very low, and the size is similar; that is to say, the impact of the data on the overall experiment results cannot be regarded as an evaluation item. Compared with the other two optimization methods, although the BACS − ELM algorithm is slightly worse in learning efficiency, it shows great advantages in classification accuracy.

Conclusions
In this paper, we propose a hybrid Extreme Learning Machine algorithm based on the bat and cuckoo search algorithm to optimize the input weight and threshold of the traditional ELM algorithm, thus improving the disadvantages of traditional ELM, such as poor sparsity of hidden layer, low adjustment ability, and complex network structure. Meanwhile, the BACS algorithm has the characteristics of strong searching accuracy, fast convergence speed, and not easy to fall into the local optimal, which effectively balances the local and global optimization problems. erefore, the proposed BACS-ELM algorithm can effectively solve the optimization problem due to the randomness of hidden layer parameters and improve the generalization performance of the network.
Experimental results show that the BACS-ELM algorithm is superior to other algorithms in function fitting and classification. In the future, we consider extending the BACS-ELM algorithm to practical application problems and solving a wider class of even tougher optimization problems.

Data Availability
All data included in this study are available upon request by contact with the corresponding author. Disclosure is manuscript is the authors' original work and has not been published nor has it been submitted simultaneously elsewhere.

Conflicts of Interest
e authors declare that they have no conflicts of interest.