Skip to main content
Log in

Pitfalls of post-model-selection testing: experimental quantification

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

Traditional specification testing does not always improve subsequent inference. We demonstrate by means of computer experiments under which circumstances, and how severely, data-driven model selection can destroy the size properties of subsequent parameter tests, if they are used without adjusting for the model-selection step. The investigated models are representative of macroeconometric and microeconometric workhorses. The model selection procedures include information criteria as well as sequences of significance tests (“general-to-specific”). We find that size distortions can be particularly large when competing models are close, with closeness being defined relatively to the sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Akaike K (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716–723

    Article  Google Scholar 

  • Andrews DWK, Guggenberger P (2009a) Hybrid and size-corrected subsampling methods. Econometrica 77: 721–762

    Article  Google Scholar 

  • Andrews DWK, Guggenberger P (2009b) Incorrect asymptotic size of subsampling procedures based on post-consistent model selection estimators. J Econom 152: 19–27

    Article  Google Scholar 

  • Andrews DWK, Guggenberger P (2010) Asymptotic size and a problem with subsampling and with the m out of n bootstrap. Econom Theory (forthcoming)

  • Bancroft TA, Han C-P (1977) Inference based on conditional specification: a note and a bibliography. Int Stat Rev 45: 117–127

    Google Scholar 

  • Breitung J, Hassler U (2002) Inference on the cointegration rank in fractionally integrated processes. J Econom 110: 167–185

    Article  Google Scholar 

  • Danilov D, Magnus JR (2004) On the harm that ignoring pretesting can cause. J Econom 122: 27–46

    Article  Google Scholar 

  • Demetrescu M, Kuzin V, Hassler U (2008) Long memory testing in the time domain. Econom Theory 24: 176–215

    Article  Google Scholar 

  • Dickey DA, Fuller WA (1979) Distribution of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74: 427–431

    Article  Google Scholar 

  • Giles JA, Giles DEA (1993) Pre-test estimation and testing in econometrics: recent developments. J Econ Surv 7: 145–197

    Article  Google Scholar 

  • Griffiths WE, Beesley PAA (1984) The small-sample properties of some preliminary test estimators in a linear model with autocorrelated errors. J Econom 25: 49–62

    Article  Google Scholar 

  • Guggenberger P (2010) The impact of a Hausman pretest on the asymptotic size of a hypothesis test. Econom Theory (forthcoming)

  • Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J R Stat Soc B41: 190–195

    Google Scholar 

  • Hoover KD, Perez SJ (1999) Data mining reconsidered: encompassing and the general-to-specific approach to specification search. Econom J 2: 167–191

    Article  Google Scholar 

  • Judge GG, Bock ME (1978) The statistical implications of pre-test and Stein-rule estimators in econometrics. North-Holland, Amsterdam

  • Kabaila P (1995) The effect of model selection on confidence regions and prediction regions. Econom Theory 11: 537–549

    Article  Google Scholar 

  • Kabaila P (1998) Valid confidence intervals in regression after variable selection. Econom Theory 14: 463–482

    Article  Google Scholar 

  • Kabaila P, Leeb H (2006) On the large-sample minimal coverage probability of confidence intervals after model selection. J Am Stat Assoc 101: 619–629

    Article  Google Scholar 

  • Kapetanios G (2001) Incorporating lag order selection uncertainty in parameter inference for AR models. Econ Lett 72: 137–144

    Article  Google Scholar 

  • King ML, Giles DEA (1984) Autocorrelation pre-testing in the linear model: estimation, testing and prediction. J Econom 25: 35–48

    Article  Google Scholar 

  • Leeb H (2005) The distribution of a linear predictor after model selection: conditional finite-sample distributions and asymptotic approximations. J Stat Plan Inference 134: 64–89

    Article  Google Scholar 

  • Leeb H, Pötscher BM (2003) The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econom Theory 19: 100–142

    Article  Google Scholar 

  • Leeb H, Pötscher BM (2005) Model selection and inference: facts and fiction. Econom Theory 21: 21–59

    Article  Google Scholar 

  • Leeb H, Pötscher BM (2008) Sparse estimators and the oracle property, or the return of Hodges’ estimator. J Econom 142: 201–211

    Article  Google Scholar 

  • Lovell MC (1983) Data mining. Rev Econ Stat 65: 1–12

    Article  Google Scholar 

  • MacKinnon JG (1991) Critical values for co-integration tests. In: Engle RF, Granger CWJ (eds) Long-run economic relationships. Oxford University Press, Oxford, pp 267–276

    Google Scholar 

  • Nakamura A, Nakamura M (1978) On the impact of the tests for serial correlation upon the test of significance for the regression coefficient. J Econom 7: 199–210

    Article  Google Scholar 

  • Ng S, Perron P (1995) Unit root tests in ARMA models with data-dependent methods for the selection of the truncation lag. J Am Stat Assoc 90: 268–281

    Article  Google Scholar 

  • Pötscher BM (1991) Effects of model selection on inference. Econom Theory 7: 163–185

    Article  Google Scholar 

  • Pötscher BM (1995) Comment on “Effects of model selection on confidence regions and prediction regions” by P. Kabaila. Econom Theory 11: 550–559

    Article  Google Scholar 

  • Pötscher BM (2007) Confidence sets based on sparse estimators are necessarily large. Sankhya (forthcoming)

  • Pötscher BM, Novák AJ (1998) The distribution of estimators after model selection: large and small sample results. J Stat Comput Simul 60: 19–56

    Article  Google Scholar 

  • Pötscher BM, Schneider U (2009) Confidence sets based on penalized maximum likelihood estimators. arXiv:0806.1652v2

  • Romano JP, Wolf M (2005) Stepwise multiple testing as formalized data snooping. Econometrica 73: 1237–1283

    Article  Google Scholar 

  • Said SE, Dickey DA (1984) Testing for unit roots in ARMA(p,q)-models with unknown p and q. Biometrika 71: 599–607

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464

    Article  Google Scholar 

  • Schwert GW (1989) Tests for unit roots: a Monte Carlo investigation. J Bus Econ Stat 7: 147–160

    Article  Google Scholar 

  • Sen PK (1979) Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann Stat 7: 1019–1033

    Article  Google Scholar 

  • White H (2000) A reality check for data snooping. Econometrica 68: 1097–1126

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uwe Hassler.

Additional information

An earlier version was presented at the first meeting of the European Time-Series Econometrics Research Network (ETSERN) in Frankfurt, June 17, 2008, and at the Econometrics Workshops at the University of Konstanz and at the University of California, Los Angeles.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demetrescu, M., Hassler, U. & Kuzin, V. Pitfalls of post-model-selection testing: experimental quantification. Empir Econ 40, 359–372 (2011). https://doi.org/10.1007/s00181-009-0334-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-009-0334-2

Keywords

JEL Classification

Navigation