Skip to main content

Bias in Estimating the Variance of K-Fold Cross-Validation

  • Chapter

Abstract

Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the perforniance of different algorithms (in particular, their proposed algorithmn). In order to be able to draw statistically convincing conclusions, it is important to estimate the uncertainty of such estimates. This paper studies the very commonly used K-fold cross-validation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of K-fold cross-validation, based on a single computation of the K-fold cross-validation estimator. The analysis that accompanies this result is based on the eigen-decomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alpaydin, E. (1999). Combined 5 × 2 cv F test for comparing supervised classification learning algorithms. Neural Computation, 11:1885–1892.

    Article  Google Scholar 

  • Anthony, M. and Holden, S.B. (1998). Cross-validation for binary classification by real-valued functions: Theoretical analysis. In Proceedings of the International Conference on Computational Learning Theory, pages 218–229.

    Google Scholar 

  • Blum, A., Kalai, A., and Langford, J. (1999). Beating the hold-out: Bounds for k-fold and progressive cross-validation. In Proceedings of the International Conference on Computational Learning Theory, pages 203–208.

    Google Scholar 

  • Breiman, L. (1996). Heuristics of instability and stabilization in model selection. The Annals of Statistics, 24:2350–2383.

    Article  MATH  MathSciNet  Google Scholar 

  • Dawid, A.P. (1997). Prequential analysis. In Kotz, S., Read, C.B., and Banks, D.L., editors, Encyciopedia of Statistical Sciences, Update Volume 1, pages 464–470. Wiley-Interscience.

    Google Scholar 

  • Devroye, L., Györfi, L., and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York.

    Google Scholar 

  • Dietterich, T.G. (1999). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10:1895–1924.

    Article  Google Scholar 

  • Efron, B. and Tibshirani, R.J. (1993). An Introduction to the Bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman & Hall, London.

    Google Scholar 

  • Hastie, T.J. and Tibshirani, R.J. (1990). Generalized Additive Models. Volume 43 of Monographs on Statistics and Applied Probability, Chapman & Hall, London.

    Google Scholar 

  • Kearns, M. and Ron, D. (1996). Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11:1427–1453.

    Article  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 1137–1143.

    Google Scholar 

  • Nadeau, C. and Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52:239–281.

    Article  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36:111–147.

    MATH  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Bengio, Y., Grandvalet, Y. (2005). Bias in Estimating the Variance of K-Fold Cross-Validation. In: Duchesne, P., RÉMillard, B. (eds) Statistical Modeling and Analysis for Complex Data Problems. Springer, Boston, MA. https://doi.org/10.1007/0-387-24555-3_5

Download citation

Publish with us

Policies and ethics