Elsevier

Neural Networks

Volume 9, Issue 9, December 1996, Pages 1583-1596
Neural Networks

Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications

https://doi.org/10.1016/S0893-6080(96)00029-9Get rights and content

Abstract

The paper presents the efficient training program of multilayer feedforward neural networks. It is based on the best second order optimization algorithms including variable metric and conjugate gradient as well as application of directional minimization in each step. Its efficiency is proved on the standard tests, including parity, dichotomy, logistic and two-spiral problems. The application of the algorithm to the solution of higher dimensionality problems like deconvolution, separation of sources and identification of nonlinear dynamic plant are also given and discussed. It is shown that the appropriately trained neural network can be used for the nonconventional solution of these standard signal processing tasks with satisfactory accuracy. The results of numerical experiments are included and discussed in the paper. Copyright © 1996 Elsevier Science Ltd.

Section snippets

INTRODUCTION

Methods to speed up the learning phase and to optimize the learning process in feedforward neural networks (NN) have been recently studied and several new adaptive learning algorithms have been discovered (Fahlman, 1988; Jutten et al., 1991a; Battiti, 1992; Charalambous, 1992; Karayiannis and Venetsanopoulos, 1992; Moeller, 1993; Sperduti and Starita, 1993). Some of them introduce the momentum term, others use the alternative cost functions or dynamic adaptation of the learning parameters. Many

OPTIMIZATION ROUTINES

The energy function to be minimized is defined in the usual way as the squared difference between the destination and the actual responses of the output neurons over all p training samples. Let us assume the multilayer feedforward NN of N input and M output neurons. The number of hidden layers may be arbitrary just as the number of neurons in the layers. The supervised learning of this net is equivalent to the minimization of the energy function (the error function), which can be written as

GENERATION OF GRADIENT USING SIGNAL FLOW GRAPHS

Both methods (BFGS and conjugate gradient) use the information of the gradient of the energy function as the basic factor in the process of updating the weights. The gradient ▿ E, defined as the vector of derivatives of the energy function with respect to the parameters (weights) of the system, can be easily generated for the multilayer neural network using the signal flow graph (SFG) technique and the notion of the adjoint signal flow graph (Osowski, 1993). It has been shown that the gradient

THE RESULTS OF NUMERICAL TESTS

A comparison of the efficiency of the program to other existing ones will be made on the examples of standard tests: parity, dichotomy, logistic map and two-spiral problem. We will compare the variable metric BFGS method (VM) with the standard backpropagation based on the steepest descent algorithm (BP) and conjugate gradient implemented in a standard way (CG) used in optimization theory (Gill et al., 1981) and scaled conjugate gradient (SCG) presented by Moeller (1993). A comparison of the

APPLICATIONS

Although the results of the standard tests have confirmed the good performance of the developed learning methods, they indicate good properties of the program only for a low number of weights. The following investigations have been done for high dimensionality neural networks, with the number of weights exceeding a few thousand. These investigations have been performed at the application of the neural network to the deconvolution of the signal, the separation of sources and the identification

CONCLUSIONS

The neural network learning algorithm, which is superior to the conventional backpropagation, has been presented in this paper. The algorithm combines the variable metric method and the conjugate gradient direction of search with a suitable line search. The most important advantages of the program are: practically unlimited number of variables (at system UNIX), high efficiency, small number of iterations and short learning time, easy method of generation of network structure and training data

Acknowledgements

This work was partly supported by FRP RIKEN, Tokyo, Japan.

References (18)

  • T. Denoeux et al.

    Initializing back propagation with prototypes

    Neural Networks

    (1993)
  • A. Sperduti et al.

    Speed up learning and network optimization with extended backpropagation

    Neural Networks

    (1993)
  • Achenie, L.E. (1994). Computational experience with a quasi Newton method based training of the feedforward neural...
  • R. Battiti

    First and second order methods for learning: between steepest descent and Newton methods

    Neural Computation

    (1992)
  • Battiti, R., & Masulli, F. (1990). BFGS optimization for faster and automated supervised learning. Proc. IEEE Int. NN...
  • C. Charalambous

    Conjugate gradient algorithm for efficient training of artificial neural networks

    IEEE Proc. G.

    (1992)
  • Fahlman, S. (1988). Fast learning variations on backpropagation: an empirical study. Proc. 1988 Connectionist Models,...
  • Gill, P., Murray, W., & Wright, M. (1981). Practical optimization. New York: Academic...
  • Jutten, C., Nguyen Thi, L., Dijkstra, E., Vittoz, E., & Caelen, J. (1991a). Blind separation of sources: algorithm for...
There are more references available in the full text version of this article.

Cited by (52)

  • Evolving multi-dimensional wavelet neural networks for classification using Cartesian Genetic Programming

    2017, Neurocomputing
    Citation Excerpt :

    The training of this relatively complex architecture took 10,000 - 20,000 epochs. In [73] a 2-50-1 MLP was trained by employing a second-order Newton optimization method where training took only 650 epochs. Hence, WNNs with powerful activation functions have the ability to approximate functions with a minimum number of wavelons efficiently.

  • Online gradient method with smoothing ℓ<inf>0</inf> regularization for feedforward neural networks

    2017, Neurocomputing
    Citation Excerpt :

    Multilayer feedforward neural networks (FNNs) has been widely used in various fields [1,2]. The training of FNNs can be reduced to solving nonlinear least square problems, to which numerous traditional numerical methods, such as the gradient descent method, Newton method [3], conjugate gradient method [4], extended Kalman filtering [5], Levenberg-Marquardt method [6], etc., can be applied. Among those training methods, backpropagation algorithm, which is derived based on the gradient descent rule, has become one of the most popular training strategy for its simplicity and ease of implementation [7].

  • Improving the accuracy of prediction of PM <inf>10</inf> pollution by the wavelet transformation and an ensemble of neural predictors

    2012, Engineering Applications of Artificial Intelligence
    Citation Excerpt :

    In the case of MLP the optimal number of hidden neurons was found by using trial and error approach. It means learning many different structures of MLP networks (Osowski et al., 1996) and accepting this one which provides the least value of the error on the validation data extracted from the learning data set (10% of learning data). On the basis of these experiments, we have found the optimal MLP structure consisting of one hidden layers of 8 sigmoidal neurons (the structure 8-8-1).

  • Local coupled feedforward neural network

    2010, Neural Networks
    Citation Excerpt :

    Very slow convergence rate and the need for predetermined learning parameters limit the practical use of this algorithm. Many improved learning algorithms have been reported in literature (see e.g. Chen, Cowan, Billings, & Grant, 1990; Ergezinger & Thomsen, 1995; Evans & Zainuddin, 1997; Fu, Hsu, & Principe, 1996; Gorse, Romano-Critchley, & Taylor, 1997; Liu, Feng, & Zhang, 2009; Man, Wu, Liu, & Yu, 2006; Navia-Vázquez & Figueiras-Vidal, 2000; Ng, Cheung, & Leung, 2004; Nishiyama & Suzuki, 2001; Ooyen & Neinhuis, 1992; O’Reilly, 1996; Osowski, Bojarczak, & Stodolski, 1996; Plagianakos, Magoulas, & Vrahatis, 2002; RoyChowdhury, Singh, & Chansarkar, 1999; Rubanov, 2000; Salomon & Hemmen, 1996; Sang-Hoon & Soo-Young, 1999; Siu, Yang, Lee, & Ho, 2007; Solomon & Hemmen, 1996; Unnikrishnan & Venugopal, 1994; Wang & Lin, 1998; Yu & Chen, 1997; Zweiri, 2007). These improvements achieve better convergence rates; and for many purposes, they perform sufficiently.

View all citing articles on Scopus
View full text