Skip to main content
Log in

Accelerating the convergence of the back-propagation method

  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

The utility of the back-propagation method in establishing suitable weights in a distributed adaptive network has been demonstrated repeatedly. Unfortunately, in many applications, the number of iterations required before convergence can be large. Modifications to the back-propagation algorithm described by Rumelhart et al. (1986) can greatly accelerate convergence. The modifications consist of three changes:1) instead of updating the network weights after each pattern is presented to the network, the network is updated only after the entire repertoire of patterns to be learned has been presented to the network, at which time the algebraic sums of all the weight changes are applied:2) instead of keeping η, the “learning rate” (i.e., the multiplier on the step size) constant, it is varied dynamically so that the algorithm utilizes a near-optimum η, as determined by the local optimization topography; and3) the momentum factor α is set to zero when, as signified by a failure of a step to reduce the total error, the information inherent in prior steps is more likely to be misleading than beneficial. Only after the network takes a useful step, i.e., one that reduces the total error, does α again assume a non-zero value. Considering the selection of weights in neural nets as a problem in classical nonlinear optimization theory, the rationale for algorithms seeking only those weights that produce the globally minimum error is reviewed and rejected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike H (1959) On a successive transformation of probability distribution and its application to the analysis of the optimum gradient method. Ann Inst Statist Math 11:1–17

    Article  Google Scholar 

  • Alkon DL (1983) Learning in a marine snail. Scientific American (July 1984), pp 70–84

  • Alkon DL (1984) Calcium-mediated reduction in ionic currents: a biophysical memory trace. Science 226:1037–1945

    Article  CAS  PubMed  Google Scholar 

  • Alkon DL (1985) Conditioning — induced changes ofHermissenda channels: relevance to mammalian brain function. In: Weinberger NM, McGaugh JL, Lynch G (eds) Memory systems of the brain. The Guiford Press, New York

    Google Scholar 

  • Levy AV, Gomez S (1985) The tunneling method applied to global optimization. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. SIAM, Philadelphia, pp 213–244

    Google Scholar 

  • Luenberger DG (1984) Linear and nonlinear programming, 2nd ed. Addison-Wesley, Reading, Mass

    Google Scholar 

  • Lundy M, Mees A (1986) Convergence of an annealing algorithm. Math Prog 34:111–124

    Article  Google Scholar 

  • Muneer T (1988) Comparison of optimization methods for nonlinear least squares minimization. Int J Math Educ Sci Tech 19:192–197

    Google Scholar 

  • Pardalos PM, Rosen JB (1986) Methods for global concave minimization: a bibliographic survey. SIAM Rev 28:367–379

    Article  Google Scholar 

  • Parker DB (1987) Optimal algorithms for adaptive networks: second order back propagation, second order direct propagation, and second order Hebbian learning. In: Caudill M, Butler C (eds) Proceedings of the 1st International Conference on Neural Networks, San Diego, Calif., June 1987. IEEE Cat. #87TH0191-7, pp II-593-II-600

  • Pegis RJ, Grey DS, Vogl TP, Rigler AK (1966) The generalized orthonormal optimization program and its applications. In: Lavi A, Vogl TP (eds) Recent advances in optimization techniques, Wiley, New York, pp 47–60

    Google Scholar 

  • Pineda FJ (1987) Generalization of back propagation to recurrent and higher order neural networks. Proceedings of the IEEE Conference on Neural Information Processing Systems, Denver, Colorado 1987: (to be published)

  • Rinnooy-Kan AHG, Timmer GT (1985) A stochastic approach to global optimization. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. STAM, Philadelphia, pp 245–262

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representation by error propagation. In: Rumelhart DE, McClelland JL and the PDP Research Group (eds) Parallel distributed processing, vol 1, chap 8. MIT Press, Cambridge, Mass

    Google Scholar 

  • Walster GW, Hansen ER, Sengupta S (1985) Test results for a global optimization algorithm. In: Boggs PT, Byrd RH, Schnabel RB (eds) Numerical optimization 1984. SIAM, Philadelphia, pp 272–287

    Google Scholar 

  • Watson LT (1986) Numerical linear algebra aspects of globally convergent homotopy methods. SIAM Rev 28:529–545

    Article  Google Scholar 

  • Whitson GM (1988) An introduction the the parallel distributed processing model of cognition and some examples of how it is changing the teaching of artificial intelligence. In: Dreshem HL (ed) Proceedings of the 19th Annual Technical Symposium on Comp Sci Education. ACM, New York, pp 59–62

    Google Scholar 

  • Whitson GM, Kulkarni A (1988) A testbed for sensory PDP models. Proceedings of the 16th Annual Comp Sci Conf. ACM, New York, pp 467–468

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vogl, T.P., Mangis, J.K., Rigler, A.K. et al. Accelerating the convergence of the back-propagation method. Biol. Cybern. 59, 257–263 (1988). https://doi.org/10.1007/BF00332914

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00332914

Keywords

Navigation