Abstract
In categorical data analysis, two-sample cross-validation is used not only for model selection but also to obtain a realistic impression of the overall predictive effectiveness of the model. The latter is of particular importance in the case of highly parametrized models capable of capturing every idiosyncracy of the calibrating sample. We show that for maximum likelihood estimators or other asymptotically efficient estimators Pearson’s X 2 is not asymptotically chi-square in the two-sample cross-validation framework due to extra variability induced by using different samples for estimation and goodness-of-fit testing. We propose an alternative test statistic, X xval 2, obtained as a modification of X 2 which is asymptotically chi-square with C - 1 degrees of freedom in cross-validation samples. Stochastically, X xval 2 ≤ X 2. Furthermore, the use of X 2 instead of X xval 2 with a χ C - 1 2 reference distribution may provide an unduly poor impression of fit of the model in the cross-validation sample.
Similar content being viewed by others
References
Agresti, A. (2002). Categorical data dnalysis (2nd ed.). New York: Wiley.
Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis. Cambridge, MA: MIT Press.
Bock, R.D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197.
Browne, M.W. (2000). Cross-validation methods. Journal of Mathematical Psychology, 44, 108–132.
Chernyshenko, O.S., Stark, S., Chan, K.-Y., Drasgow, F., & Williams, B. (2001). Fitting item response theory models to two personality inventories: Issues and insights. Multivariate Behavioral Research, 36, 523–562.
Collins, L.M., Graham, J.W., Long, J.D., & Hansen, W.B. (1994). Crossvalidation of latent class models of early substance use onset. Multivariate Behavioral Research, 29, 165–183.
Drasgow, F., Levine, M.V., Tsien, S., Williams, B., & Mead, A. (1995). Fitting polytomous item response theory models to multiple-choice tests. Applied Psychological Measurement, 19, 143–165.
Du Toit, M. (ed.) (2003). IRT from SSI. Lincolnwood, IL: Scientific Software International.
Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials. Journal of the American Statistical Association, 75, 336–344.
Levine, M.V. (1984). An introduction to multilinear formula score theory. Measurement series 84-4. Champaign, IL: Model Based Measurement Laboratory.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Maydeu-Olivares, A. (2005). Further empirical results on parametric vs. non-parametric IRT modeling of Likert-type personality data. Multivariate Behavioral Research, 40, 275–293.
Thissen, D., Chen, W.-H., & Bock, R.D. (2003). Multilog (version 7) [Computer software]. Lincolnwood, IL: Scientific Software International.
Zucchini, W. (2000). An introduction to model selection. Journal of Mathematical Psychology, 44, 41–61.
Author information
Authors and Affiliations
Additional information
This paper is dedicated to the memory of Michael V. Levine.
Requests for reprints should be sent to Albert Maydeu-Olivares, Faculty of Psychology, University of Barcelona, P. Valle de Hebrón, 171, 0835 Barcelona, Spain.
Rights and permissions
About this article
Cite this article
Joe, H., Maydeu-Olivares, A. On the asymptotic distribution of Pearson’s X 2 in cross-validation samples. Psychometrika 71, 587–592 (2006). https://doi.org/10.1007/s11336-005-1284-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-005-1284-z