On the asymptotic distribution of Pearson’s X 2 in cross-validation samples

Joe, Harry; Maydeu-Olivares, Albert

doi:10.1007/s11336-005-1284-z

On the asymptotic distribution of Pearson’s X ² in cross-validation samples

Published: 25 August 2006

Volume 71, pages 587–592, (2006)
Cite this article

Psychometrika Aims and scope Submit manuscript

Harry Joe¹ &
Albert Maydeu-Olivares²

178 Accesses
7 Citations
Explore all metrics

Abstract

In categorical data analysis, two-sample cross-validation is used not only for model selection but also to obtain a realistic impression of the overall predictive effectiveness of the model. The latter is of particular importance in the case of highly parametrized models capable of capturing every idiosyncracy of the calibrating sample. We show that for maximum likelihood estimators or other asymptotically efficient estimators Pearson’s X ² is not asymptotically chi-square in the two-sample cross-validation framework due to extra variability induced by using different samples for estimation and goodness-of-fit testing. We propose an alternative test statistic, X _xval ², obtained as a modification of X ² which is asymptotically chi-square with C - 1 degrees of freedom in cross-validation samples. Stochastically, X _xval ² ≤ X ². Furthermore, the use of X ² instead of X _xval ² with a χ_C - 1 ² reference distribution may provide an unduly poor impression of fit of the model in the cross-validation sample.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Rejoinder: More Limitations of Bayesian Leave-One-Out Cross-Validation

Article Open access 15 January 2019

A Bayesian approach for comparing cross-validated algorithms on multiple data sets

Article 24 March 2015

References

Agresti, A. (2002). Categorical data dnalysis (2nd ed.). New York: Wiley.
Google Scholar
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis. Cambridge, MA: MIT Press.
Google Scholar
Bock, R.D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197.
Article Google Scholar
Browne, M.W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44, 108–132.
Article PubMed Google Scholar
Chernyshenko, O.S., Stark, S., Chan, K.-Y., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523–562.
Article Google Scholar
Collins, L.M., Graham, J.W., Long, J.D., & Hansen, W.B. (1994). Crossvalidation of latent class models of early substance use onset. Multivariate Behavioral Research, 29, 165–183.
Article Google Scholar
Drasgow, F., Levine, M.V., Tsien, S., Williams, B., & Mead, A. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.
Article Google Scholar
Du Toit, M. (ed.) (2003). IRT from SSI. Lincolnwood, IL: Scientific Software International.
Google Scholar
Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association, 75, 336–344.
Article Google Scholar
Levine, M.V. (1984). An introduction to multilinear formula score theory. Measurement series 84-4. Champaign, IL: Model Based Measurement Laboratory.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Google Scholar
Maydeu-Olivares, A. (2005). Further empirical results on parametric vs. non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 275–293.
Article Google Scholar
Thissen, D., Chen, W.-H., & Bock, R.D. (2003). Multilog (version 7) [Computer software]. Lincolnwood, IL: Scientific Software International.
Google Scholar
Zucchini, W. (2000). An introduction to model selection. Journal of Mathematical Psychology, 44, 41–61.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

University of British Columbia, Columbia
Harry Joe
University of Barcelona and Instituto de Empresa, Barcelona
Albert Maydeu-Olivares

Authors

Harry Joe
View author publications
You can also search for this author in PubMed Google Scholar
Albert Maydeu-Olivares
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

This paper is dedicated to the memory of Michael V. Levine.

Requests for reprints should be sent to Albert Maydeu-Olivares, Faculty of Psychology, University of Barcelona, P. Valle de Hebrón, 171, 0835 Barcelona, Spain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joe, H., Maydeu-Olivares, A. On the asymptotic distribution of Pearson’s X ² in cross-validation samples. Psychometrika 71, 587–592 (2006). https://doi.org/10.1007/s11336-005-1284-z

Download citation

Received: 20 January 2005
Accepted: 30 May 2005
Published: 25 August 2006
Issue Date: September 2006
DOI: https://doi.org/10.1007/s11336-005-1284-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the asymptotic distribution of Pearson’s X ² in cross-validation samples

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Rejoinder: More Limitations of Bayesian Leave-One-Out Cross-Validation

A Bayesian approach for comparing cross-validated algorithms on multiple data sets

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the asymptotic distribution of Pearson’s X 2 in cross-validation samples

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Rejoinder: More Limitations of Bayesian Leave-One-Out Cross-Validation

A Bayesian approach for comparing cross-validated algorithms on multiple data sets

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

On the asymptotic distribution of Pearson’s X ² in cross-validation samples