Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

Abstract

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Statistical theory of overtraining - is cross-validation effective. In: [23], pp. 176–182 (1996)

    Google Scholar 

  2. Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Aymptotic statistical theory of overtraining and cross-validation. IEEE Trans. on Neural Networks 8(5), 985–996 (1997)

    Article  Google Scholar 

  3. Baldi, P., Chauvin, Y.: Temporal evolution of generalization during learning in linear networks. Neural Computation 3, 589–603 (1991)

    Article  Google Scholar 

  4. Cowan, J.D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems 6. Morgan Kaufman Publishers Inc., San Mateo (1994)

    Google Scholar 

  5. Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: [22], pp. 598–605 (1990)

    Google Scholar 

  6. Fahlman, S.E.: An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (September 1988)

    Google Scholar 

  7. Fahlman, S.E., Lebiere, C.: The Cascade-Correlation learning architecture. In: [22], pp. 524–532 (1990)

    Google Scholar 

  8. Fiesler, E.: Comparative bibliography of ontogenic neural networks (1994) (submitted for publication)

    Google Scholar 

  9. Finnoff, W., Hergert, F., Zimmermann, H.G.: Improving model selection by nonconvergent methods. Neural Networks 6, 771–783 (1993)

    Article  Google Scholar 

  10. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)

    Article  Google Scholar 

  11. Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.): Advances in Neural Information Processing Systems 5. Morgan Kaufman Publishers Inc., San Mateo (1993)

    Google Scholar 

  12. Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal brain surgeon. In: [11], pp. 164–171 (1993)

    Google Scholar 

  13. Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: [16], pp. 950–957 (1992)

    Google Scholar 

  14. Levin, A.U., Leen, T.K., Moody, J.E.: Fast pruning using principal components. In: [4] (1994)

    Google Scholar 

  15. Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.): Advances in Neural Information Processing Systems 3. Morgan Kaufman Publishers Inc., San Mateo (1991)

    Google Scholar 

  16. Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.): Advances in Neural Information Processing Systems 4. Morgan Kaufman Publishers Inc., San Mateo (1992)

    Google Scholar 

  17. Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward nets: Some experiments. In: [22], pp. 630–637 (1990)

    Google Scholar 

  18. Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Computation 4(4), 473–493 (1992)

    Article  Google Scholar 

  19. Prechelt, L.: PROBEN1 — A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany, Anonymous, ftp://pub/papers/techreports/1994/1994-21.ps.gz on, ftp.ira.uka.de (September 1994)

  20. Reed, R.: Pruning algorithms — a survey. IEEE Transactions on Neural Networks 4(5), 740–746 (1993)

    Article  Google Scholar 

  21. Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Proc. of the IEEE Intl. Conf. on Neural Networks, San Francisco, CA, pp. 586–591 (April 1993)

    Google Scholar 

  22. Touretzky, D.S. (ed.): Advances in Neural Information Processing Systems 2. Morgan Kaufman Publishers Inc., San Mateo (1990)

    Google Scholar 

  23. Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.): Advances in Neural Information Processing Systems 8. MIT Press, Cambridge (1996)

    Google Scholar 

  24. Wang, C., Venkatesh, S.S., Judd, J.S.: Optimal stopping and effective machine complexity in learning. In: [4] (1994)

    Google Scholar 

  25. Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: [15], pp. 875–882 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Prechelt, L. (2012). Early Stopping — But When?. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35289-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35288-1

  • Online ISBN: 978-3-642-35289-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics