Abstract
Supervised neural networks generalize well if there is much less information in the weights than there is in the output vectors of the training cases. So during learning, it is important to keep the weights simple by penalizing the amount of information they contain. The amount of information in a weight can be controlled by adding Gaussian noise and the noise level can be adapted during learning to optimize the trade-off between the expected squared error and the information in the weights. We describe a method of computing the derivatives of the expected squared error and of the amount of information in the noisy weights in a network that contains a layer of non-linear hidden units. Provided the output units are linear, the exact derivatives can be computed efficiently without time-consuming Monte Carlo simulations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hinton, G. E. (1987) Learning translation invariant recognition in a massively parallel network. In Goos, G. and Hartmanis, J., editors, PARLE: Parallel Architectures and Languages Europe, pages 1–13, Lecture Notes in Computer Science, Springer-Verlag, Berlin.
Lang, K., Waibel, A. and Hinton, G. E. (1990) A Time-Delay Neural Network Architecture for Isolated Word Recognition. Neural Networks, 3, 23–43.
Le Cun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. and Jackel, L. D. (1989) Back-Propagation Applied to Handwritten Zipcode Recognition. Neural Computation, 1, 541–551.
Mackay, D. J. C. (1992) A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448–472.
Neal, R. M. (1993) Bayesian learning via stochastic dynamics. In Giles, C. L., Hanson, S. J. and Cowan, J. D. (Eds), Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo CA.
Nowlan. S. J. and Hinton, G. E. (1992) Simplifying neural networks by soft weight sharing. Neural Computation, 4, 173–193.
Rissanen, J. (1986) Stochastic Complexity and Modeling. Annals of Statistics, 14, 1080–1100.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag London Limited
About this paper
Cite this paper
Hinton, G.E., van Camp, D. (1993). Keeping Neural Networks Simple. In: Gielen, S., Kappen, B. (eds) ICANN ’93. ICANN 1993. Springer, London. https://doi.org/10.1007/978-1-4471-2063-6_2
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2063-6_2
Published:
Publisher Name: Springer, London
Print ISBN: 978-3-540-19839-0
Online ISBN: 978-1-4471-2063-6
eBook Packages: Springer Book Archive