Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications

doi:10.1016/S0893-6080(96)00029-9

Neural Networks

Volume 9, Issue 9, December 1996, Pages 1583-1596

https://doi.org/10.1016/S0893-6080(96)00029-9 Get rights and content

Abstract

The paper presents the efficient training program of multilayer feedforward neural networks. It is based on the best second order optimization algorithms including variable metric and conjugate gradient as well as application of directional minimization in each step. Its efficiency is proved on the standard tests, including parity, dichotomy, logistic and two-spiral problems. The application of the algorithm to the solution of higher dimensionality problems like deconvolution, separation of sources and identification of nonlinear dynamic plant are also given and discussed. It is shown that the appropriately trained neural network can be used for the nonconventional solution of these standard signal processing tasks with satisfactory accuracy. The results of numerical experiments are included and discussed in the paper. Copyright © 1996 Elsevier Science Ltd.

Section snippets

INTRODUCTION

Methods to speed up the learning phase and to optimize the learning process in feedforward neural networks (NN) have been recently studied and several new adaptive learning algorithms have been discovered (Fahlman, 1988; Jutten et al., 1991a; Battiti, 1992; Charalambous, 1992; Karayiannis and Venetsanopoulos, 1992; Moeller, 1993; Sperduti and Starita, 1993). Some of them introduce the momentum term, others use the alternative cost functions or dynamic adaptation of the learning parameters. Many

OPTIMIZATION ROUTINES

The energy function to be minimized is defined in the usual way as the squared difference between the destination and the actual responses of the output neurons over all p training samples. Let us assume the multilayer feedforward NN of N input and M output neurons. The number of hidden layers may be arbitrary just as the number of neurons in the layers. The supervised learning of this net is equivalent to the minimization of the energy function (the error function), which can be written as

GENERATION OF GRADIENT USING SIGNAL FLOW GRAPHS

Both methods (BFGS and conjugate gradient) use the information of the gradient of the energy function as the basic factor in the process of updating the weights. The gradient ▿ E, defined as the vector of derivatives of the energy function with respect to the parameters (weights) of the system, can be easily generated for the multilayer neural network using the signal flow graph (SFG) technique and the notion of the adjoint signal flow graph (Osowski, 1993). It has been shown that the gradient

THE RESULTS OF NUMERICAL TESTS

A comparison of the efficiency of the program to other existing ones will be made on the examples of standard tests: parity, dichotomy, logistic map and two-spiral problem. We will compare the variable metric BFGS method (VM) with the standard backpropagation based on the steepest descent algorithm (BP) and conjugate gradient implemented in a standard way (CG) used in optimization theory (Gill et al., 1981) and scaled conjugate gradient (SCG) presented by Moeller (1993). A comparison of the

APPLICATIONS

Although the results of the standard tests have confirmed the good performance of the developed learning methods, they indicate good properties of the program only for a low number of weights. The following investigations have been done for high dimensionality neural networks, with the number of weights exceeding a few thousand. These investigations have been performed at the application of the neural network to the deconvolution of the signal, the separation of sources and the identification

CONCLUSIONS

The neural network learning algorithm, which is superior to the conventional backpropagation, has been presented in this paper. The algorithm combines the variable metric method and the conjugate gradient direction of search with a suitable line search. The most important advantages of the program are: practically unlimited number of variables (at system UNIX), high efficiency, small number of iterations and short learning time, easy method of generation of network structure and training data

Acknowledgements

This work was partly supported by FRP RIKEN, Tokyo, Japan.

References (18)

T. Denoeux et al.
Initializing back propagation with prototypes
Neural Networks
(1993)
A. Sperduti et al.
Speed up learning and network optimization with extended backpropagation
Neural Networks
(1993)
Achenie, L.E. (1994). Computational experience with a quasi Newton method based training of the feedforward neural...
R. Battiti
First and second order methods for learning: between steepest descent and Newton methods
Neural Computation
(1992)
Battiti, R., & Masulli, F. (1990). BFGS optimization for faster and automated supervised learning. Proc. IEEE Int. NN...
C. Charalambous
Conjugate gradient algorithm for efficient training of artificial neural networks
IEEE Proc. G.
(1992)
Fahlman, S. (1988). Fast learning variations on backpropagation: an empirical study. Proc. 1988 Connectionist Models,...
Gill, P., Murray, W., & Wright, M. (1981). Practical optimization. New York: Academic...
Jutten, C., Nguyen Thi, L., Dijkstra, E., Vittoz, E., & Caelen, J. (1991a). Blind separation of sources: algorithm for...

There are more references available in the full text version of this article.

Cited by (52)

Evolving multi-dimensional wavelet neural networks for classification using Cartesian Genetic Programming
2017, Neurocomputing
Citation Excerpt :
The training of this relatively complex architecture took 10,000 - 20,000 epochs. In [73] a 2-50-1 MLP was trained by employing a second-order Newton optimization method where training took only 650 epochs. Hence, WNNs with powerful activation functions have the ability to approximate functions with a minimum number of wavelons efficiently.
Wavelet Neural Networks (WNNs) are complex artificial neural systems and their training can be a challenge. In the past, most common training schemes for WNNs, such as gradient descent, have been restricted to training only a subset of differentiable parameters. In this paper, we propose an evolutionary method to train both differentiable and non-differentiable parameters using the concept of Cartesian Genetic Programming (CGP). The approach was evaluated on the two-spiral task and on real-world datasets for the detection of breast cancer and Parkinson’s disease. In our experiments, the performance of the proposed method was comparable to several standard methods of classification. On the breast cancer dataset, the performance was better than other non-ensemble and multistep processing methods. The experimental results show how the performance of WNNs depends on the number of wavelons used. The presented case studies demonstrate that the proposed WNNs perform competitively in comparison to several other methods and results reported in literature.
Online gradient method with smoothing ℓ<inf>0</inf> regularization for feedforward neural networks
2017, Neurocomputing
Citation Excerpt :
Multilayer feedforward neural networks (FNNs) has been widely used in various fields [1,2]. The training of FNNs can be reduced to solving nonlinear least square problems, to which numerous traditional numerical methods, such as the gradient descent method, Newton method [3], conjugate gradient method [4], extended Kalman filtering [5], Levenberg-Marquardt method [6], etc., can be applied. Among those training methods, backpropagation algorithm, which is derived based on the gradient descent rule, has become one of the most popular training strategy for its simplicity and ease of implementation [7].
ℓ_p regularization has been a popular pruning method for neural networks. The parameter p was usually set as $0 < p \leq 2$ in the literature, and practical training algorithms with ℓ₀ regularization are lacking due to the NP-hard nature of the ℓ₀ regularization problem; however, the ℓ₀ regularization tends to produce the sparsest solution, corresponding to the most parsimonious network structure which is desirable in view of the generalization ability. To this end, this paper considers an online gradient training algorithm with smoothing ℓ₀ regularization (OGTSL0) for feedforward neural networks, where the ℓ₀ regularizer is approximated by a series of smoothing functions. The underlying principle for the sparsity of OGTSL0 is provided, and the convergence of the algorithm is also theoretically analyzed. Simulation examples support the theoretical analysis and illustrate the superiority of the proposed algorithm.
Improving the accuracy of prediction of PM <inf>10</inf> pollution by the wavelet transformation and an ensemble of neural predictors
2012, Engineering Applications of Artificial Intelligence
Citation Excerpt :
In the case of MLP the optimal number of hidden neurons was found by using trial and error approach. It means learning many different structures of MLP networks (Osowski et al., 1996) and accepting this one which provides the least value of the error on the validation data extracted from the learning data set (10% of learning data). On the basis of these experiments, we have found the optimal MLP structure consisting of one hidden layers of 8 sigmoidal neurons (the structure 8-8-1).
The paper presents the application of wavelet transformation and neural network ensemble to the accurate forecasting of the daily average concentration of particulate matter of diameter up to 10 μm (PM₁₀). Few neural predictors are applied: the multilayer perceptron, radial basis function, Elman network and support vector machine as well as one linear ARX model. They are used for prediction in combination with wavelet decomposition, forming many individual prediction results that will be combined in an ensemble. The important role in presented approach fulfills the wavelet transformation and the integration of this ensemble. We have proposed solution applying the additional neural network responsible for the final forecast (integration of all particular prediction results). The numerical experiments for prediction of the daily concentration of the PM₁₀ pollution in Warsaw are presented. They have shown good overall accuracy of prediction in terms of all investigated measures of quality.
Robust adaptive learning of feedforward neural networks via LMI optimizations
2012, Neural Networks
Feedforward neural networks (FNNs) have been extensively applied to various areas such as control, system identification, function approximation, pattern recognition etc. A novel robust control approach to the learning problems of FNNs is further investigated in this study in order to develop efficient learning algorithms which can be implemented with optimal parameter settings and considering noise effect in the data. To this aim, the learning problem of a FNN is cast into a robust output feedback control problem of a discrete time-varying linear dynamic system. New robust learning algorithms with adaptive learning rate are therefore developed, using linear matrix inequality (LMI) techniques to find the appropriate learning rates and to guarantee the fast and robust convergence. Theoretical analysis and examples are given to illustrate the theoretical results.
An H∞ control approach to robust learning of feedforward neural networks
2011, Neural Networks
A novel $H_{\infty}$ robust control approach is proposed in this study to deal with the learning problems of feedforward neural networks (FNNs). The analysis and design of a desired weight update law for the FNN is transformed into a robust controller design problem for a discrete dynamic system in terms of the estimation error. The drawbacks of some existing learning algorithms can therefore be revealed, especially for the case that the output data is fast changing with respect to the input or the output data is corrupted by noise. Based on this approach, the optimal learning parameters can be found by utilizing the linear matrix inequality (LMI) optimization techniques to achieve a predefined $H_{\infty}$ “noise” attenuation level. Several existing BP-type algorithms are shown to be special cases of the new $H_{\infty}$ -learning algorithm. Theoretical analysis and several examples are provided to show the advantages of the new method.
Local coupled feedforward neural network
2010, Neural Networks
Citation Excerpt :
Very slow convergence rate and the need for predetermined learning parameters limit the practical use of this algorithm. Many improved learning algorithms have been reported in literature (see e.g. Chen, Cowan, Billings, & Grant, 1990; Ergezinger & Thomsen, 1995; Evans & Zainuddin, 1997; Fu, Hsu, & Principe, 1996; Gorse, Romano-Critchley, & Taylor, 1997; Liu, Feng, & Zhang, 2009; Man, Wu, Liu, & Yu, 2006; Navia-Vázquez & Figueiras-Vidal, 2000; Ng, Cheung, & Leung, 2004; Nishiyama & Suzuki, 2001; Ooyen & Neinhuis, 1992; O’Reilly, 1996; Osowski, Bojarczak, & Stodolski, 1996; Plagianakos, Magoulas, & Vrahatis, 2002; RoyChowdhury, Singh, & Chansarkar, 1999; Rubanov, 2000; Salomon & Hemmen, 1996; Sang-Hoon & Soo-Young, 1999; Siu, Yang, Lee, & Ho, 2007; Solomon & Hemmen, 1996; Unnikrishnan & Venugopal, 1994; Wang & Lin, 1998; Yu & Chen, 1997; Zweiri, 2007). These improvements achieve better convergence rates; and for many purposes, they perform sufficiently.
In this paper, the local coupled feedforward neural network is presented. Its connection structure is same as that of Multilayer Perceptron with one hidden layer. In the local coupled feedforward neural network, each hidden node is assigned an address in an input space, and each input activates only the hidden nodes near it. For each input, only the activated hidden nodes take part in forward and backward propagation processes. Theoretical analysis and simulation results show that this neural network owns the “universal approximation” property and can solve the learning problem of feedforward neural networks. In addition, its characteristic of local coupling makes knowledge accumulation possible.

View all citing articles on Scopus

View full text

Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications

Abstract

Section snippets

INTRODUCTION

OPTIMIZATION ROUTINES

GENERATION OF GRADIENT USING SIGNAL FLOW GRAPHS

THE RESULTS OF NUMERICAL TESTS

APPLICATIONS

CONCLUSIONS

Acknowledgements

Neural Networks

Neural Networks

First and second order methods for learning: between steepest descent and Newton methods

Neural Computation

Conjugate gradient algorithm for efficient training of artificial neural networks

IEEE Proc. G.