Abstract
A new neural network training algorithm which optimises performance in relation to the available memory is described. Numerically it has equivalent properties to Full Memory BFGS optimisation (FM) when there are no restrictions on memory and to FM with periodic reset when memory is limited. Achievable performance is determined by the ratio between available memory and problem size and accordingly varies between that of the full and memory-less versions of the BFGS algorithm.
Similar content being viewed by others
References
D. Anguita, A. DaCanal, W. Da Canal, A. Falcone and A.M. Scapolla, “On the distributed implementation of the back-propagation”, Int. Conf. on Artificial Neural Networks, ICANN '94, Sorrento, Italy, pp. 1376–1379, 1994.
R. Battiti and M. Masulli, “BFGS optimization for faster automated supervised learning”, International Neural Network Conference, Vol. 2, pp. 757–760, 1990.
R. Battiti, “First-and second-order methods for learning: Between steepest descent and newton's method”, Neural Computation, Vol. 4, pp. 141–166, 1992.
M. Besch M. H.W. and Pohl, “Flexible data parallel training of neural networks using MIMDcomputers”, in: ThirdWorkshop on Parallel and Distributed Processing, San Remo, Italy, 1995.
C.G. Broyden, J.E. Dennis and J.J. More, “On the local and superlinear convergence of quasi-Newton methods”, J.I.M.A., Vol. 12, pp. 223–246, 1973.
A. Buckley and A. Lenir, “QN-Like variable storage conjugate gradients”, Mathematical Programming, Vol. 27, pp. 155–175,0 1983.
S.E. Fahlman, “Fast learning variations on backpropagation: An empirical study”, in D.S. Touretzky, G. Hinton and T. Sejnowski (eds), Proc. 1988 Connectionist Models Summer School, pp. 38–51, 1988.
R. Fletcher, Practical Methods of Optimization, Vol. 1, pp. 51, Wiley & Sons: New York, 1980.
P.E. Gill, W. Murray and M.H. Wrights, Practical Optimization, Academic Press: London, 1981.
D. Goldfarb, “Factorized variable metric methods for unconstrained optimization”, Mathematics of Computation, Vol. 30, No. 136, pp. 796–811, 1976.
G. Lightbody and G.W. Irwin, “A parallel algorithm for training neural network based nonlinear models”, Proc. 2nd IFAC Workshop on Algorithms and Architectures for Real-time Control, pp. 99–104, 1992.
G. Lightbody, “Identification and control using neural networks”, PhD thesis, Queen's University of Belfast, Control Engineering Research Group, May 1993.
J. McKeown, D. Meegan and D. Sprevak, “An Introduction to Unconstrained Optimization”, Adam Hilger, 1990.
S.F. McLoone, “Neural Network training for modelling and control”, PhD thesis, Queen's University of Belfast, Control Engineering Research Group, December 1996.
S.F. McLoone and G.W. Irwin, 1997, “Fast parallel off-line training of multilayer perceptrons”, IEEE Trans. on Neural Networks, Vol. 8, No. 3, pp. 646–653.
S.F. McLoone and G.W. Irwin, 1995, “Fast gradient based off-line training of multilayer perceptrons”, in: K.J. Hunt, G.W. Irwin and K. Warwick (eds), Neural Network Engineering in Dynamic Control Systems, pp. 179–200, Springer.
M.F. Møller, “A scaled conjugate gradient algorithm for fast supervised learning”, Neural Networks, Vol. 6, pp. 525–533, 1990.
L. Nazareth, “A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms”, SIAM Journal of Numerical Analysis, Vol. 16, No. 5, pp. 794–800, 1979.
J. Nocedal, “Updating Quasi-Newton matrices with limited storage”, Mathematics of Computation, Vol. 35, No. 151, pp. 773–782, 1980.
C.R. Reeves (ed.), Modern Heuristic Techniques for Combinatorial Problems, Wiley: New York, 1993.
D.E. Rumelhart, G.E. Hinton and R.J. Williams, “Learning internal representations by error propagation”, in D.E. Rumelhart and J.L. McClelland (eds) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, Chapter 8, MIT Press: Cambridge, Mass., 1986.
W. Schiffmann, M. Joost and R. Werner, “Optimization of the backpropagation algorithm for training multilayer perceptrons”, University of Koblenz, Institute of Physics, Rheinau 3-4, W-5400 Koblenz, Germany.
Takumi Ichimura, Takeshi Takano and Eiichiro Tazaki, “Learning of neural networks using hybrid genetic algorithms”, 4th European Congress on Intelligent Techniques and Soft Computing (EUFIT'96), 1996.
P. Van Der Smagt, “Minimisation methods for training feedforward neural networks”, Neural Networks, Vol. 7, No. 1, pp. 1–11, 1994.
P.D. Wasserman, “Neural Computing, Theory and Practice”: Van Nostrand Reinhold, 1989.
A.R. Webb, D. Lowe and M.D. Bedworth, 1988, “A comparison of nonlinear optimization strategies for feed-forward adaptive layered networks”, Royal Signals and Radar Establishment", Memorandum 4157, Controller HMSO, London.
S. Yakowitz and F. Szidarovszky, “An Introduction to Numerical Computations”, 2nd edn, Macmillan Publishing Company, 1989.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
McLoone, S., Irwin, G. A Variable Memory Quasi-Newton Training Algorithm. Neural Processing Letters 9, 77–89 (1999). https://doi.org/10.1023/A:1018676013128
Issue Date:
DOI: https://doi.org/10.1023/A:1018676013128