Early Stopping — But When?

Prechelt, Lutz

doi:10.1007/978-3-642-35289-8_5

Lutz Prechelt¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7700))

66k Accesses
210 Citations
22 Altmetric

Abstract

Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting (“early stopping”). The exact criterion used for validation-based early stopping, however, is usually chosen in an ad-hoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multi-layer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average).

Previously published in: Orr, G.B. and Müller, K.-R. (Eds.): LNCS 1524, ISBN 978-3-540-65311-0 (1998).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Statistical theory of overtraining - is cross-validation effective. In: [23], pp. 176–182 (1996)
Google Scholar
Amari, S., Murata, N., Müller, K.-R., Finke, M., Yang, H.: Aymptotic statistical theory of overtraining and cross-validation. IEEE Trans. on Neural Networks 8(5), 985–996 (1997)
Article Google Scholar
Baldi, P., Chauvin, Y.: Temporal evolution of generalization during learning in linear networks. Neural Computation 3, 589–603 (1991)
Article Google Scholar
Cowan, J.D., Tesauro, G., Alspector, J. (eds.): Advances in Neural Information Processing Systems 6. Morgan Kaufman Publishers Inc., San Mateo (1994)
Google Scholar
Le Cun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: [22], pp. 598–605 (1990)
Google Scholar
Fahlman, S.E.: An empirical study of learning speed in back-propagation networks. Technical Report CMU-CS-88-162, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA (September 1988)
Google Scholar
Fahlman, S.E., Lebiere, C.: The Cascade-Correlation learning architecture. In: [22], pp. 524–532 (1990)
Google Scholar
Fiesler, E.: Comparative bibliography of ontogenic neural networks (1994) (submitted for publication)
Google Scholar
Finnoff, W., Hergert, F., Zimmermann, H.G.: Improving model selection by nonconvergent methods. Neural Networks 6, 771–783 (1993)
Article Google Scholar
Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4, 1–58 (1992)
Article Google Scholar
Hanson, S.J., Cowan, J.D., Giles, C.L. (eds.): Advances in Neural Information Processing Systems 5. Morgan Kaufman Publishers Inc., San Mateo (1993)
Google Scholar
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal brain surgeon. In: [11], pp. 164–171 (1993)
Google Scholar
Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: [16], pp. 950–957 (1992)
Google Scholar
Levin, A.U., Leen, T.K., Moody, J.E.: Fast pruning using principal components. In: [4] (1994)
Google Scholar
Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.): Advances in Neural Information Processing Systems 3. Morgan Kaufman Publishers Inc., San Mateo (1991)
Google Scholar
Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.): Advances in Neural Information Processing Systems 4. Morgan Kaufman Publishers Inc., San Mateo (1992)
Google Scholar
Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward nets: Some experiments. In: [22], pp. 630–637 (1990)
Google Scholar
Nowlan, S.J., Hinton, G.E.: Simplifying neural networks by soft weight-sharing. Neural Computation 4(4), 473–493 (1992)
Article Google Scholar
Prechelt, L.: PROBEN1 — A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultät für Informatik, Universität Karlsruhe, Germany, Anonymous, ftp://pub/papers/techreports/1994/1994-21.ps.gz on, ftp.ira.uka.de (September 1994)
Reed, R.: Pruning algorithms — a survey. IEEE Transactions on Neural Networks 4(5), 740–746 (1993)
Article Google Scholar
Riedmiller, M., Braun, H.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In: Proc. of the IEEE Intl. Conf. on Neural Networks, San Francisco, CA, pp. 586–591 (April 1993)
Google Scholar
Touretzky, D.S. (ed.): Advances in Neural Information Processing Systems 2. Morgan Kaufman Publishers Inc., San Mateo (1990)
Google Scholar
Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.): Advances in Neural Information Processing Systems 8. MIT Press, Cambridge (1996)
Google Scholar
Wang, C., Venkatesh, S.S., Judd, J.S.: Optimal stopping and effective machine complexity in learning. In: [4] (1994)
Google Scholar
Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: [15], pp. 875–882 (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Informatik, Universität Karlsruhe, D-76128, Karlsruhe, Germany
Lutz Prechelt

Authors

Lutz Prechelt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science, Technische Universität Berlin, Franklinstr. 28/29, 10587, Berlin, Germany
Grégoire Montavon & Klaus-Robert Müller &
Dept. of computer Science, Willamette University, 900 State Street, 97301, Salem, OR, USA
Geneviève B. Orr

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Prechelt, L. (2012). Early Stopping — But When?. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-35289-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35288-1
Online ISBN: 978-3-642-35289-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics