Improved Ensemble Extreme Learning Machine Regression Algorithm

Li, Meiyi; Cai, Weibiao; Liu, Xingwang

doi:10.1007/978-3-030-00828-4_2

Improved Ensemble Extreme Learning Machine Regression Algorithm

Meiyi Li¹⁸,
Weibiao Cai¹⁸ &
Xingwang Liu¹⁹

Conference paper
First Online: 26 September 2018

1070 Accesses
1 Citations

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 538))

Abstract

Compared with other traditional neural network algorithms, the Extreme Learning Machine (ELM) has the advantages of simple structure, fast learning speed and good generalization performance. However, there are still some shortages that restrict the further development of ELM. For example, the randomly generated input weights, biases and the ill-conditioned appearance of the hidden layer design matrix all affect the generalization performance and robustness of the ELM algorithm model. In order to overcome the adverse affects of both, an improved ensemble extreme learning machine regression algorithm (ECV-ELM) is proposed in this paper. The method first generates multiple sub CV-ELM model through AdaBoost..RT method, and the selects the best set of sub-models to integrate. The ECV-ELM algorithm makes use of ensemble learning method to complement each other among sub-models, so that the generalization performance and robustness of the algorithm are better than that of the sub-model. The results of regression experiments on multiple data sets show that the ECV_ELM algorithm can effectively reduce the influence of the ill conditioned matrix, the random input weight and bias, and has good generalization performance and robustness.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Artificial neural network (ANN) has been widely used in various fields due to its good learning ability and high speed optimization capability [1, 2]. However, the artificial neural network is used to calculate the parameters of the model by the gradient descent method. The gradient descent method will increase the time complexity of the algorithm and cause the algorithm to fall into the local optimal solution easily. Due to these shortcomings, it may take more time to train the network and not necessarily guarantee the best solution. Therefore, searching for high efficiency and high real-time neural network has become the main research direction of many scholars.

The Extreme Learning Machine (ELM) [3] is a new single-hidden-layer feed-forward neural networks (SLFNs) proposed by Huang et al. in 2006. The method can randomly generate input weights and hidden layer node thresholds, and calculate the output layer weights without the need for iterative operations, so its computation speed is much better than the traditional neural network algorithm. When constructing the model, the traditional neural network algorithm is different from the traditional neural network algorithm. The limit learning function gets the minimum norm of the weight of the output layer while reaching the minimum training error. According to the neural network theory: For feed-forward neural networks, the smaller the training error and norm of the weight of the ELM model, the stronger its network generalization ability [4]. Therefore, we can theoretically demonstrate the generalization ability of ELM algorithm. In recent years, ELM has been applied to various fields of various disciplines, such as face recognition [5], classified, regression, image processing [6], ground reconstruction [7], and so on, because of the ability to effectively improve the defects of traditional neural networks.

Although ELM has fast learning speed and good generalization performance, when some column vectors in the hidden layer design array are approximated to linear correlation, that is, the hidden layer design array has multiple collinearity or ill posed. It can result in poor generalization performance and stability by using ordinary least square method to estimate the solution of the ill conditioned matrix. To this end, many scholars have improved the ELM model, for example, Li et al. Proposed an improved extreme learning machine regression algorithm (CV-ELM) based on conditional index and variance decomposition ratio (CV-ELM) [8], Ceng Lin and others proposed the extreme learning machine (PC-ELM) [9] based on the principal component estimation.

The CV-ELM algorithm improves the generalization performance of the extreme learning machine to a certain extent and can ensure good algorithm robustness. However, this algorithm still has defects in some cases, so that the model can’t achieve the minimum error. Our analysis considers that there are two main reasons: First, high-dimensional data may obscure the noise components in the data, making the proposed method unable to completely isolate the relevant variables, that is, the algorithm can’t reduce the noise completely. Secondly, some initial parameters of CV-ELM, such as input weight and hidden layer bias, are generated randomly, which makes the generalization ability and robustness of the model affected by them. Therefore, this paper introduces the ensemble learning method [10] in the CV-ELM algorithm. This method can train a number of similar learners at the same time, and then extract the best set of learners from these neural networks to integrate [11]. Through regression experiments of multiple data sets, it is proved that this method can achieve good generalization performance and stability.

2 Review of ELM and CV-ELM

In this section, we mainly review ELM [3] and CV-ELM [10].

2.1 Extreme Learning Machine

For N arbitrary distinct samples $ \left( {{\text{x}}_{\text{i}}, {\text{y}}_{\text{i}} } \right) $, where $ {\text{x}}_{\text{i}} = \left[ {{\text{x}}_{{{\text{i}}1}} ,\,\,{\text{x}}_{{{\text{i}}2}} ,\,\, \ldots ,\,\,{\text{x}}_{\text{in}} } \right]^{\text{T}} \, \in {\text{R}}^{\text{n}} $ is an n-dimensional feature of the ith sample, and $ {\text{y}}_{\text{i}} = \left[ {{\text{y}}_{{{\text{i}}1}} ,\,\,{\text{y}}_{{{\text{i}}2}} ,\,\, \ldots ,\,\,{\text{y}}_{\text{im}} } \right]^{\text{T}} \in {\text{R}}^{\text{m}} $, Then, the output of a feed-forward neural network with L hidden nodes and excitation function G(x) can be expressed as

$$ {\text{f}}_{\text{L}} \left( {\text{x}} \right) = \mathop \sum \limits_{{{\text{i}} = 1}}^{\text{L}}\upbeta_{\text{i}} {\text{G}}\left( {{\text{a}}_{\text{i}} \cdot {\text{x}}_{\text{i}} + {\text{b}}_{\text{i}} } \right),\,\,{\text{a}}_{\text{i }} \in \,{\text{R}}^{\text{n}},\,\,\upbeta_{\text{i}} \, \in \,{\text{R}}^{\text{m}} , $$

(1)

where $ {\text{a}}_{\text{i}} = \left[ {{\text{a}}_{{{\text{i}}1}} ,\,\,{\text{a}}_{{{\text{i}}2}} ,\,\, \ldots , \,\,{\text{a}}_{\text{in}} } \right]^{\text{T}} $ is the weight connecting the i-th hidden node and the input nodes, and $ {\text{b}}_{\text{i}} $ is threshold of the i-th hidden nodes, $ \upbeta_{\text{i}} = \left[ {\upbeta_{{{\text{i}}1}} ,\,\,\upbeta_{{{\text{i}}2}} ,\,\, \ldots , \,\,\upbeta_{\text{im}} } \right]^{\text{T}} $ is the weight connecting the i-th hidden node and the output nodes, $ {\text{a}}_{\text{i}} \cdot {\text{x}}_{\text{i}} $ denote the inner product of $ {\text{a}}_{\text{i}} $ and $ {\text{x}}_{\text{i}} $. The excitation function G(x) can choose “Sigmoid”, “Sine” or “RBF” and so on.

If this feed forward neural network with L hidden layer nodes and M output layer nodes can approximate this N samples with zero error, then the above N equations can be written compactly as

$$ {\text{f}}_{\text{L}} \left( {\text{x}} \right) = \sum\nolimits_{{{\text{i}} = 1}}^{\text{L}} {\upbeta_{\text{i}} {\text{G}}\left( {{\text{a}}_{\text{i}} .{\text{x}}_{\text{i}} + {\text{b}}_{\text{i}} } \right)} = {\text{y}}_{\text{i}} ,\,\,{\text{i}} = 1, 2,\, \cdots ,\,{\text{L}}, $$

(2)

(2) can be simplified as

$$ {\text{H}}\upbeta = {\text{Y}} $$

(3)

where

$$ {\text{H}} = \left[ {\begin{array}{*{20}c} {{\text{G}}\left( {{\text{a}}_{1} , {\text{b}}_{1} , {\text{x}}_{1} } \right)} & \cdots & {{\text{G}}\left( {{\text{a}}_{\text{L}} , {\text{b}}_{\text{L}} , {\text{x}}_{1} } \right)} \\ \vdots & \ddots & \vdots \\ {{\text{G}}\left( {{\text{a}}_{1} , {\text{b}}_{1} , {\text{x}}_{\text{N}} } \right)} & \cdots & {{\text{G}}\left( {{\text{a}}_{\text{L}} , {\text{b}}_{\text{L}} , {\text{x}}_{\text{N}} } \right)} \\ \end{array} } \right] $$

(4)

$$ \upbeta = \left[ {\begin{array}{*{20}c} {\upbeta_{1}^{\text{T}} } \\ \vdots \\ {\upbeta_{\text{L}}^{\text{T}} } \\ \end{array} } \right]_{{{\text{L}} \times {\text{M}}}} \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Y}} = \left[ {\begin{array}{*{20}c} {{\text{y}}_{1}^{\text{T}} } \\ \vdots \\ {{\text{y}}_{\text{N}}^{\text{T}} } \\ \end{array} } \right]_{{{\text{N}} \times {\text{M}}}} $$

(5)

H is called the hidden layer output matrix of the network, and ELM training can be transformed into a problem of solving the least squares solution of output weights. The output weight matrix $ {\hat{\upbeta }} $ can be obtained from (6)

$$ {\hat{\upbeta }} = \left( {{\text{H}}^{\text{T}} {\text{H}}} \right)^{ - 1} {\text{H}}^{\text{T}} {\text{Y}} = {\text{H}}^{ + } {\text{Y}} $$

(6)

Where $ {\text{H}}^{ + } $ represents the Moore-penrose generalized inverse of the hidden layer output matrix H.

2.2 CV-ELM

The hidden layer matrix H is calculated according to the ELM model, and then the H matrix is separated according to the condition number and variance decomposition ratio to obtain

$$ {\text{H}} = \left( {{\text{H}}1,{\text{H}}2} \right) $$

(7)

where $ {\text{H}}1 $ is the non-interference data column in the hidden layer matrix H and $ {\text{H}}2 $ is the interference data columns.

According to the LS principle, the output weight matrix $ {\hat{\upbeta}} $ can be obtained from (8)

$$ \begin{aligned} {\hat{\upbeta }} & = \left( {{\text{H}}^{\text{T}} {\text{H}}} \right)^{ - 1} {\text{H}}^{\text{T}} {\text{Y}} \\ & = \left( {\left[ {{\text{H}}1, {\text{H}}2} \right]^{\text{T}} \left[ {{\text{H}}1,{\text{H}}2} \right]} \right)^{ - 1} \left[ {{\text{H}}1, {\text{H}}2} \right]{\text{Y}} \\ & = \left[ {\begin{array}{*{20}c} {{\text{H}}1^{\text{T}} {\text{H}}1} & {{\text{H}}1^{\text{T}} {\text{H}}2} \\ {{\text{H}}2^{\text{T}} {\text{H}}1} & {{\text{H}}2^{\text{T}} {\text{H}}2} \\ \end{array} } \right]^{ - 1} \left[ {{\text{H}}1,{\text{H}}2} \right]{\text{Y}} \\ \end{aligned} $$

(8)

In order to enhance the generalization performance and stability of the model without destroying the authenticity of the data of the non-interference data, and add a small constant to the diagonal elements of the data matrix of the interfering data. The output weight matrix $ {\hat{\upbeta }} $ can be obtained from (9)

$$ {\hat{\upbeta }} = \left[ {\begin{array}{*{20}c} {{\text{H}}1^{\text{T}} {\text{H}}1} & {{\text{H}}1^{\text{T}} {\text{H}}2} \\ {{\text{H}}2^{\text{T}} {\text{H}}1} & {{\text{H}}2^{\text{T}} {\text{H}}2 + {\text{kI}}} \\ \end{array} } \right]^{ - 1} \left[ {{\text{H}}1, {\text{H}}2} \right]{\text{Y}} $$

(9)

where k is a small constant and $ {\text{I}} $ is the unit matrix.

CVELM algorithm is described as follows:

Known training samples $ \left( {{\text{x}}_{\text{i}} ,\,\,\,{\text{y}}_{\text{i}} } \right) , {\text{i}} = 1 ,\,\, \cdots ,\,\,{\text{N}} $, the number of hidden nodes is L, and the excitation function is $ {\text{G}}\left( {\text{x}} \right) $.

(1)
Random setting of input weights $ {\text{a}}_{\text{i}} $ and bias $ {\text{b}}_{\text{i}} ,\,\,{\text{i}} = 1,\,\, \cdots ,\,\,\,{\text{L}} $
(2)
Computing hidden layer output matrix H
(3)
Through the condition number and variance decomposition machine decomposition matrix H, and get (H1, H2)
(4)
Determining the ridge parameter k
(5)
The output matrix $ {\hat{\upbeta }} $ is calculated by Eq. (10)

3 Improved Ensemble Extreme Learning Machine

The CV-ELM algorithm overcomes the situation that the generalized performance and robustness of the algorithm deteriorate when the hidden layer design array is ill-conditioned. However, the random generation of input weights and incomplete cancellation of noise under high-dimensional data can cause generalization performance and robustness to be poorly processed. Therefore, this paper proposes a CV-ELM regression algorithm based on ensemble learning (ECV-ELM), which makes use of the complementarity between multiple learners, thus making the ensemble better performance. ECV-ELM overcomes the shortcomings of poor model stability due to the random generation of input weight, bias and incomplete cancellation of CV-ELM noise. It combines the ensemble learning method with the CV-ELM regression algorithm and uses some common methods to selects the appropriate sub CV-ELM model, which can further improve the performance of the entire CV-ELM.

It is assumed that the training set and test set are $ {\text{G}} = \left\{ {\left( {{\text{x}}_{\text{i}}, {\text{y}}_{\text{i}} } \right) | {\text{i}} = 1,2, \cdots ,{\text{l}}} \right\} $, $ \text{G}^{{\prime }} = \left\{ {\left( {{\text{x}}_{\text{i}} , {\text{y}}_{\text{i}} } \right)|{\text{i}} = 1,2, \cdots ,{\text{l}}} \right\} $, and $ {\text{x}}_{\text{i}} $ is the model input, and the $ {\text{y}}_{\text{i }} $ is the output of the model. First, according to the training set G come into the different training sub set $ {\text{G}} = \left\{ {{\text{G}}_{1} , \cdots ,{\text{G}}_{\text{T}} } \right\} $, several different sub CV-ELM models are generated by different training subsets. Then, a part of the excellent sub CV-ELM model is selected according to the training results. Finally, the results are made by means of the average method.

In summary, the proposed ECV-ELM integrated regression algorithm can be summarized as follows:

Input: Training sample set T

Output: Integrated CV-ELM regression model

(1)
Using training set G to randomly generate T intersecting data sub sets $ {\text{G}} = \left\{ {{\text{G}}_{1} ,{\text{G}}_{2} , \cdots ,{\text{G}}_{\text{T}} } \right\} $, set the activation function of all sub-models to g(x), and the number of hidden layer neurons is L;
(2)
Initialization t = 1;
(3)
Determine whether it reaches the number of iterations, that is, t <= T; if yes, execute step (4); otherwise, execute step (7);
(4)
Using the random function to generate the input weight a and the hidden layer offset b;
(5)
The t-th sub CV-ELM model is trained using the randomly generated a, b, and t-th data subsets;
(6)
Perform step (3);
(7)
Calculate the MSE values of all the sub-models. According to the size of the MSE values, select the k best sub CV-ELM models;

$$ {\text{MSE}} = \frac{1}{{{\text{n}}\left( {{\text{Y}} - {\text{H}}{\hat{\upbeta }}} \right)^{\text{T}} \left( {{\text{Y}} - {\text{H}}{\hat{\upbeta }}} \right)}} $$
(10)
(8)
using the simple average method to integrate K sub CV-ELM and get the final model, that is, the ECV-ELM model.

4 Experiment and Analysis

This section analyzes the CV-ELM regression method (ECV-ELM) based on ensemble learning proposed in the previous section. In order to better carry out experimental analysis, the time and prediction results of standard ELM algorithm, CV-ELM algorithm and ECV-ELM algorithm are compared. The experiment uses 5 regression analysis data sets from UCI database and LIACC, which are Balloon data set, California House data set, Cloud dataset, Strike data set, Bodyfat data set. There are huge differences between the input attributes and the number of samples in these five data sets, which can better analyze the performance of the algorithm, as shown in Table 1. The number of hidden layer nodes required for each dataset in the ECV-ELM neutron model is shown in Table 2. All experiments in this chapter are run on Windows7 64 bit operating system and Matlab 2016 environment in 3.30 GHz i5-4590 CPU, 4G RAM.

Table 1. Regression analysis dataset.

Full size table

Table 2. Number of required hidden layer nodes in each dataset.

Full size table

Table 3 shows the comparison of test time and training time of ELM, ECV-ELM and CV-ELM on multiple data sets. Table 4 shows the comparison of RMSE of ELM, ECV-ELM and CV-ELM on multiple data sets. Table 5 shows the comparison of DEV of ELM, ECV-ELM and CV-ELM on multiple data sets. Among them, the activation functions of the standard ELM, the CV-ELM model and the ECV-ELM model all use the Sigmoid function. In the ECV-ELM model, the number of training sub-models is T = 20, the number of integrated sub-models is k = 10, and the sub-training set selected by the sub-model is three quarters of the randomly selected training set.

Table 3. Comparison of training and testing time of ELM, ECV-ELM, CV-ELM.

Full size table

Table 4. Comparison of testing RMSE of ELM, ECV-ELM, CV-ELM.

Full size table

Table 5. Comparison of testing DEV of ELM, ECV-ELM, CV-ELM.

Full size table

5 Conclusions

In this paper, a CV-ELM regression algorithm based on ensemble learning is proposed. This algorithm uses ensemble learning method to overcome the disadvantage of poor model stability caused by CV-ELM random input weight, bias and incomplete noise elimination. Through different regression data sets, the performance of ECV-ELM algorithm is analyzed. It is concluded that although the time cost of ECV-ELM algorithm is longer than that of CV-ELM, the generalization performance and robustness of the algorithm have been greatly improved.

Compared to the CV-ELM algorithm, the ECV-ELM algorithm proposed in this section is trained by training multiple CV-ELM sub-models, and then using ensemble learning methods to make multiple CV-ELM learners complementation, thus improving the generalization ability and robustness of the algorithm.

References

Han, F., Ling, Q.H., Huang, D.S.: An improved approximation approach incorporating particle swarm optimization and a priori information into neural networks. Neural Comput. Appl. 19(2), 255–261 (2010)
Article Google Scholar
Han, F., Ling, Q.H.: A new learning algorithm for function approximation by incorporating a priori information into feed-forward neural networks. Neural Comput. Appl. 17(5–6), 433–439 (2008)
Article Google Scholar
Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Article Google Scholar
Zhang, M., Zeng, X., Ma, C.: An online learning algorithm based on cluster-based extreme learning machine. Comput. Eng. Appl. 50(11), 188–191 (2014)
Google Scholar
He, B., Xu, D., Rui, N., et al.: Fast face recognition via sparse coding and extreme learning machine. Cogn. Comput. 6(2), 264–277 (2014)
Google Scholar
An, L., Yang, S., Bhanu, B.: Efficient smile detection by extreme learning machine. Neurocomputing 149(PA), 354–363 (2015)
Article Google Scholar
Zhou, Z.H., Zhao, J.W., Cao, F.L.: Surface reconstruction based on extreme learning machine. Neural Comput. Appl. 23(2), 283–292 (2013)
Article Google Scholar
Li, M., Cai, W., Sun, Q.: Extreme learning machine for regression based on condition number and variance decomposition ratio. In: 2018 International Conference on Mathematics and Artificial Intelligence (2018)
Google Scholar
Zeng, L., Zhang, X., Bu, Z., et al.: Extreme learning machine based on principal components estimation. CEA 52(4), 110–114 (2016)
Google Scholar
Hansen, L.K., Liisberg, C., Salamon, P.: Ensemble methods for handwritten digit recognition. In: Neural Networks for Signal Processing, pp. 333–342. IEEE (2002)
Google Scholar
Tang, W., Zhou, Z.H.: Bagging-based selective clusterer ensemble. J. Softw. 16(4), 496–502 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Xiangtan University College of Information Engineering, Xiangtan University, Xiangtan, China
Meiyi Li & Weibiao Cai
Information Center of Jianglu Mechanical & Electrical Group Co., Ltd., Xiangtan, China
Xingwang Liu

Authors

Meiyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Weibiao Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xingwang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meiyi Li .

Editor information

Editors and Affiliations

Institute of Computing Technology, CAS, Beijing, China
Zhongzhi Shi
University of Reims Champagne-Ardenne, Saint Drezery, France
Eunika Mercier-Laurent
University of South Australia, Mawson Lakes, SA, Australia
Jiuyong Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Cai, W., Liu, X. (2018). Improved Ensemble Extreme Learning Machine Regression Algorithm. In: Shi, Z., Mercier-Laurent, E., Li, J. (eds) Intelligent Information Processing IX. IIP 2018. IFIP Advances in Information and Communication Technology, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-030-00828-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-00828-4_2
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00827-7
Online ISBN: 978-3-030-00828-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Abstract

1 Introduction

2 Review of ELM and CV-ELM

2.1 Extreme Learning Machine

2.2 CV-ELM

3 Improved Ensemble Extreme Learning Machine

4 Experiment and Analysis

5 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation