Elsevier

Neurocomputing

Volume 129, 10 April 2014, Pages 428-436
Neurocomputing

Learning of a single-hidden layer feedforward neural network using an optimized extreme learning machine

https://doi.org/10.1016/j.neucom.2013.09.016Get rights and content

Abstract

This paper proposes a learning framework for single-hidden layer feedforward neural networks (SLFN) called optimized extreme learning machine (O-ELM). In O-ELM, the structure and the parameters of the SLFN are determined using an optimization method. The output weights, like in the batch ELM, are obtained by a least squares algorithm, but using Tikhonov's regularization in order to improve the SLFN performance in the presence of noisy data. The optimization method is used to the set of input variables, the hidden-layer configuration and bias, the input weights and Tikhonov's regularization factor. The proposed framework has been tested with three optimization methods (genetic algorithms, simulated annealing, and differential evolution) over 16 benchmark problems available in public repositories.

Introduction

Multilayer feedforward neural networks (FFNN) have been used in the identification of unknown linear or non-linear systems (see, e.g. [1], [2]). Their appeal is based on their universal approximation properties [3], [4]. However, in industrial applications, linear models are often preferred due to faster training in comparison with multilayer FFNN trained with gradient-descent algorithms [5]. In order to overcome the slow construction of FFNN models, a new method called extreme learning machine (ELM) is proposed in [6]. This method is a new batch learning algorithm for single-hidden layer FFNN (SLFN) where the input weights (weights of connections between the input variables and neurons in the hidden-layer) and the bias of neurons in the hidden-layer are randomly assigned. The output weights (weights of connections between the neurons in the hidden-layer and the output neuron) are obtained using the Moore–Penrose (MP) generalized inverse, considering that the activation function of the output neuron is linear.

Since in ELM the output weights are computed based on the random input weights and bias of the hidden nodes, there may exist a set of non-optimal or unnecessary input weights and bias of the hidden nodes. Furthermore, the ELM tends to require more hidden neurons than conventional tuning-based learning algorithms (based on error backpropagation or other learning methods where the output weights are not obtained by the least squares method) in some applications, which can negatively affect SLFN performance in unknown testing data [6]. The use of the least squares method without regularization in noisy data also makes the model displaying a poor generalization capability [7]. Fitting problems may also be encountered in the presence of irrelevant or correlated input variables [5].

Optimization methods have been used jointly with analytical methods for network training. In [8] a new method to choose the most appropriate FFNN topology, type of activation functions and parameters of the training algorithm using a genetic algorithm (GA) is proposed. Each chromosome is composed of the specification of the minimization algorithm used in the back-propagation (BP) method, the network architecture, the activation function of the neurons of the hidden layer, and the activation function of the neurons of the output layer using binary encoding. In [9] a new nonlinear system identification scheme is proposed, where differential evolution (DE) is used to optimize the initial weights used by a Levenberg–Marquardt (LM) algorithm in the learning of a FFNN. In [10] a similar method is proposed using a simulated annealing (SA) approach. In these algorithms, the evaluation of each individual or state is made by training the FFNN with a BP method, which is computationally expensive.

In [11] an improved GA is used to optimize the structure (connections layout) and the parameters (connection weights and biases) of a SLFN with switches. The switches are unit step functions that make possible the removal of each connection. Using a real encoding scheme, and new crossover and mutation techniques, this improved GA obtains better results in comparison with traditional GAs. The structure and the parameters of the same kind of SLFN with switches are also tuned in [12], in this case using a hybrid Taguchi GA. This approach is similar to a traditional GA but a Taguchi method [13] is used for the crossover process. The use of this method implies the construction of a (n+1)×n two-level orthogonal matrix, where n is the number of variables for the optimization process. However, the construction of this orthogonal matrix is not simple. There are some standard orthogonal matrices but they can be only used when n is small. In large networks, n is large and therefore this method is not a good practical approach. In these methodologies, the weights between the hidden-layer and the output layer are optimized by the GA. Using the ELM approach, the output weights could be calculated using the Moore–Penrose generalized inverse (considering an output neuron with linear activation function) and a good solution could be quickly obtained, reducing the convergence time of the GA. Furthermore, as the number of variables of the optimization process is lower, the search space to be explored by the GA narrows. This approach was used in [14] where a GA is used to tune the (selective existence of) connections and parameters between the input layer and the hidden layer, and a least squares algorithm is applied to tune the parameters between the hidden layer and the output layer. However, in this type of approach it is difficult to deal with the tendency to require more hidden nodes than conventional tuning-based algorithms, as well as the problem caused by the presence of irrelevant variables is difficult to solve. These problems occur also in the methods proposed in [15] and [16]. In [15] a learning method called evolutionary ELM (E-ELM) is proposed, where the weights between the input layer and the hidden layer and the bias of the hidden layer neurons are optimized by a DE method and the output weights are calculated using the Moore–Penrose generalized inverse like in ELM. In [16] a similar method called self-adaptive E-ELM (SaE-ELM) is proposed; however, in this methodology the generation strategies and control parameters of the DE method are self-adapted by the optimization method.

In this paper, a novel learning framework for SLFNs called optimized extreme learning machine (O-ELM) is proposed. This framework uses the same concept of the ELM where the output weights are obtained using least squares, however, with the difference that Tikhonov's regularization [17] is used in order to obtain a robust least squares solution. The problem of reduction of the ELM performance in the presence of irrelevant variables is well known, as well as its propensity for requiring more hidden nodes than conventional tuning-based learning algorithms. To solve these problems, the proposed framework uses an optimization method to select the set of input variables and the configuration of the hidden-layer. Furthermore, in order to optimize the fitting performance, the optimization method also selects the weights of connections between the input layer and the hidden-layer, the bias of neurons of the hidden-layer, and the regularization factor. Using this framework, no trial-and-error experiments are needed to search for the best SLFN structure. In this paper, three optimization methods (GA, SA, and DE) are tested in the proposed framework.

selection of the optimal number of neurons in this layer and the activation function of each neuron, trying to overcome the propensity of ELM for requiring more hidden nodes than conventional tuning-based learning algorithms. In this paper, three optimization methods (GA, SA, and DE) are tested in the proposed framework.

The paper is organized as follows. The SLFN architecture is overviewed in Section 2. Section 3 gives a brief review of the batch ELM. The proposed learning framework is presented in Section 4. Section 5 gives a brief review of the optimization methods tested in the O-ELM. Section 6 presents experimental results. Finally, concluding remarks are drawn in Section 7.

Section snippets

Adjustable single hidden-layer feedforward network architecture

The neural network considered in this paper is a SLFN with adjustable architecture as shown in Fig. 1, which can be mathematically represented byy=g(bO+j=1hwjOvj),vj=fj(bj+i=1nwijsixi).n and h are the number of input variables and the number of the hidden layer neurons, respectively; vj is the output of the hidden-layer neuron j; xi, i=1,…,n, are the input variables; wij is the weight of the connection between the input variable i and the neuron j of the hidden layer; wjO is the weight of the

Extreme learning machine

The batch ELM was proposed in [6]. In [18] it is proved that a SLFN with randomly chosen weights between the input layer and the hidden layer and adequately chosen output weights are universal approximators with any bounded non-linear piecewise continuous functions.

Considering that N samples are available, the output bias is zero, and the output neuron has a linear activation function, (1), (2) can be rewritten asy=(wOTV)T,where y=[y(1),,y(N)]T is the vector of outputs of the SLFN, wO=[w1O,,w

Optimized extreme learning machine

In O-ELM, the weights of the output connections are obtained using the same ELM methodology presented in Section 3, however, with a change.

The objective of the least squares method is to obtain the best output weights by solving the following problem:min(yyd2),where ·2 is the Euclidean norm. The minimum-norm solution to this problem is given by (8).

The use of least squares can be considered as a two-stage minimization problem involving the determination of the solutions to (9), and the

Optimization methods

This section presents the three methods used in the optimization of the SLFN.

Results

This section presents experimental results in 16 benchmark data sets. Table 1 presents the number of training samples Nt, the number of testing samples Ntt, and the number of input variables of these benchmark data sets. In these data sets, delays between each input variable and the output could be considered and the performance of the methods could possibly be improved [33]; however, as only the learning capability of the SLFN is being analyzed, no delay between the input variables and the

Conclusion

A novel learning framework for SLFNs called optimized extreme learning machine was presented. Like in the original ELM, the output weights are obtained using the least squares algorithm, but with Tikhonov's regularization. The proposed integration of regularization penalizes solutions with larger norms and allows an improvement in the SLFN generalization capability, improving the performance in the test data. In order to solve the tendency of ELM to require more neurons in the hidden-layer than

Acknowledgments

This work was supported by Project FAir-Control “Factory Air Pollution Control” (reference: E!6498), supported by the Eurostars Programme of the EUREKA network, financed by “Fundação para a Ciância e a Tecnologia” (FCT), “Agência de Inovação” (AdI), and the Seventh Framework Programme for Research and Technological Development (FP7) of the European Union.

Tiago Matias and Rui Araújo acknowledge the support of FCT project PEst-C/EEI/UI0048/2011.

Francisco Souza has been supported by FCT under

Tiago Matias received his B.Sc. and M.Sc. degrees in Electrical and Computer Engineering (Automation branch) from the University of Coimbra, in 2011. He is currently pursuing his Ph.D. degree in Electrical and Computer Engineering at the University of Coimbra. Since 2011, he is a Researcher at the “Institute for Systems and Robotics - University of Coimbra” (ISR-UC). His research interests include optimization, meta-heuristics, and computational intelligence.

References (34)

  • Y. Miche et al.

    Op-elmoptimally pruned extreme learning machine

    IEEE Trans. Neural Netw.

    (2010)
  • S. Chen et al.

    Regularized orthogonal least squares algorithm for constructing radial basis function networks

    Int. J. Control

    (1996)
  • B. Subudhi et al.

    Differential evolution and levenberg marquardt trained neural network scheme for nonlinear system identification

    Neural Process. Lett.

    (2008)
  • P. A. C. Valdivieso, J. J. M. Guervós, J. González, V.M.R. Santos, G. Romero, Sa-prop: optimization of multilayer...
  • F.H.F. Leung et al.

    Tuning of the structure and parameters of neural network using an improved genetic algorithm

    IEEE Trans. Neural Netw.

    (2003)
  • J.-T. Tsai et al.

    Tuning the structure and parameters of a neural network by using hybrid Taguchi-genetic algorithm

    IEEE Trans. Neural Netw.

    (2006)
  • J.-T. Tsai et al.

    Hybrid Taguchi-genetic algorithm for global numerical optimization

    IEEE Trans. Evol. Comput.

    (2004)
  • Cited by (0)

    Tiago Matias received his B.Sc. and M.Sc. degrees in Electrical and Computer Engineering (Automation branch) from the University of Coimbra, in 2011. He is currently pursuing his Ph.D. degree in Electrical and Computer Engineering at the University of Coimbra. Since 2011, he is a Researcher at the “Institute for Systems and Robotics - University of Coimbra” (ISR-UC). His research interests include optimization, meta-heuristics, and computational intelligence.

    Francisco Souza was born in Fortaleza, Ceará, Brazil, 1986. He received the B.Sc. degree in Electrical Engineering (Automation branch) from the University Federal of Ceará, Brazil. He is currently pursuing his Ph.D. degree in Electrical and Computer Engineering at the University of Coimbra. Since 2009, he is a Researcher at the “Institute for Systems and Robotics - University of Coimbra” (ISR-UC). His research interests include machine learning and pattern recognition with application to industrial processes.

    Rui Araújo received the B.Sc. degree (Licenciatura) in Electrical Engineering, the M.Sc. degree in Systems and Automation, and the Ph.D. degree in Electrical Engineering from the University of Coimbra, Portugal, in 1991, 1994, and 2000, respectively. He joined the Department of Electrical and Computer Engineering of the University of Coimbra where he is currently an Assistant Professor. He is a founding member of the Portuguese Institute for Systems and Robotics (ISR-Coimbra), where he is now a researcher. His research interests include computational intelligence, intelligent control, computational learning, fuzzy systems, neural networks, estimation, control, robotics, mobile robotics and intelligent vehicles, robot manipulators control, sensing, soft sensors, automation, industrial systems, embedded systems, real-time systems, and in general architectures and systems for controlling robot manipulators, mobile robots, intelligent vehicles, and industrial systems.

    Carlos Henggeler Antunes received his Ph.D. degree in Electrical Engineering (Optimization and Systems Theory) from the University of Coimbra, Portugal, in 1992. He is a full professor at the Department of Electrical and Computer Engineering, University of Coimbra. His research interests include multiple objective optimization, meta-heuristics, and energy planning, namely demand-responsive systems.

    View full text