Abstract
Model selection uncertainty in longitudinal data analysis is often much more serious than that in simpler regression settings, which challenges the validity of drawing conclusions based on a single selected model when model selection uncertainty is high. We advocate the use of appropriate model selection diagnostics to formally assess the degree of uncertainty in variable/model selection as well as in estimating a quantity of interest. We propose a model combining method with its theoretical properties examined. Simulations and real data examples demonstrate its advantage over popular model selection methods.
Similar content being viewed by others
References
Barron A.R. (1987) Are Bayes rules consistent in information?. In: Cover T.M., Gopinath B. (eds) Open Problems in Communication and Computation. Springer-Verlag, Berlin, pp 85–91
Breiman L. (1996) Heuristics of instability and stabilization in model selection. The Annals of Statistics 24: 2350–2383
Buckland S.T., Burnham K.P., Augustin N.H. (1997) Model selection: An integral part of inference. Biometrics 53: 603–618
Burnham K.P., Anderson D.R. (2004) Multimodel inference: Understanding AIC and BIC in model selection. Sociological Methods and Research 33(2): 261–304
Cantoni E., Field C., Flemming J.M., Ronchetti E. (2007) Longitudinal variable selection by cross- validation in the case of many covariates. Statistics in Medicine 26: 919–930
Chatfield C. (1995) Model uncertainty, data mining and statistical inference (with discussion). Journal of the Royal Statistical Society, Series A 158: 419–466
Diggle P.J., Heagerty P., Liang K.Y., Zeger S.L. (2002) Analysis of longitudinal data (2nd ed). Oxford University Press, New York
Draper D. (1995) Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society, Series B 57: 45–70
Efron B., Tibshirani R. (1993) An introduction to the bootstrap. Chapman & Hall, New York
Fan J., Li R. (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Journal of the American Statistical Association 99: 710–723
Fitzmaurice, G. M., Laird, N. M., Ware, J. H. (2004). Applied longitudinal analysis. New York: Wiley. http://biosun1.harvard.edu/~fitzmaur/ala/.
Geisser S. (1993) Predictive inference: An introduction. Chapman & Hall, New York
Henry K., Erice A., Tierney C., Balfour H.H.Jr., Fischl M.A., Kmack A. et al (1998) A randomized, controlled, double-blinded study comparing the survival benefit of four different reverse transcriptase inhibitor therapies (three-drug, two-drug, and alternative drug) for the treatment of advanced AIDS. Journal of Acquired Immune Deficiency Syndromes and Human Retrovirology 19: 339–349
Hjort N.L., Claeskens G. (2003) Frequentist model average estimators. JASA 98: 879–899
Hoeting J., Madigan D., Raftery A., Volinsky C. (1999) Bayesian model averaging: A tutorial (with discussion). Statistical Science 14: 382–417
Huang J., Liu N., Pourahmadi M., Liu L. (2006) Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93: 85–98
Juditsky A., Nemirovski A. (2000) Functional aggregation for nonparametric estimation. The Annals of Statistics 28: 681–712
Liang K.Y., Zeger S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22
Lin D.Y., Ying Z. (2001) Semiparametric and nonparametric regression analysis of longitudinal data (with discussion). Journal of the American Statistical Association 96: 103–126
Pan W. (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57: 120–125
Ruppert D., Wand M.P., Raymond C. (2003) Semiparametric regression. Cambridge University Press, Cambridge
Shen X., Ye J. (2002) Adaptive model selection. Journal of the American Statistical Association 97: 210–221
Tsybakov, A. B. (2003). Optimal rates of aggregation. In Proceedings of 16th annual conference on learning theory (COLT) and 7th annual workshop on Kernel machines. Lecture notes in artificial intelligence (Vol. 2777, pp. 303–313). Heidelberg: Springer.
Wang L., Qu A. (2009) Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. Journal of the Royal Statistical Society, Series B 71: 177–190
Yafune A., Funatogawa T., Ishiguro M. (2005) Extended information criterion (EIC) approach for linear mixed effects models under restricted maximum likelihood (REML) estimation. Statistics in Medicine 24: 3417–3429
Yang Y. (2001) Adaptive regression by mixing. Journal of the American Statistical Association 96: 574–588
Yang Y. (2003) Regression with multiple candidate models: Selecting or mixing?. Statistica Sinica 13: 783–809
Yang Y. (2004) Combining forecasting procedures: Some theoretical results. Econometric Theory 20: 176–222
Ye J. (1998) On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association 93: 120–131
Yuan Z., Yang Y. (2005) Combining linear regression models: When and how?. Journal of the American Statistical Association 100: 1202–1214
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Liu, S., Yang, Y. Combining models in longitudinal data analysis. Ann Inst Stat Math 64, 233–254 (2012). https://doi.org/10.1007/s10463-010-0306-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-010-0306-5