Skip to main content

Missing Data

  • Living reference work entry
  • First Online:
Book cover Principles and Practice of Clinical Trials

Abstract

Missing data are commonly seen in randomized clinical trials. When missingness is not completely random, a complete-case analysis that ignores the missing data process often leads to biased estimates of the average treatment effect. This chapter defines different missing data mechanisms, discusses their impact on inference, and presents statistical methods that address missing data, including likelihood-based analysis, inverse probability weighting, and imputation. Each of these methods either models the missingness process or the observed outcome distribution. A more robust approach that combines the virtue of each of these modeling approaches is also introduced. This approach is doubly robust such that it yields a consistent estimate of the average treatment effect if either one of the missingness model or the outcome model is correctly specified, but not necessarily both. The chapter concludes with a brief discussion of sensitivity analyses used to assess the impact of unmeasured factors that affect both the missingness and outcomes. Throughout, statistical and practical considerations are discussed in the context of randomized clinical trials where the primary analysis is to compare two treatments and to estimate the average comparative effect among the enrolled population.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Akande O, Li F, Reiter J (2017) An empirical comparison of multiple imputation methods for categorical data. Am Stat 71:162–170

    Article  MathSciNet  Google Scholar 

  • Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Sat Assoc 88:669–679

    Article  MathSciNet  Google Scholar 

  • Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–455

    Article  Google Scholar 

  • Barnard J, Rubin DB (1999) Miscellanea. Small-sample degrees of freedom with multiple imputation. Biometrika 86:948–955

    Article  MathSciNet  Google Scholar 

  • Browne WJ (2006) MCMC algorithms for constrained variance matrices. Comput Stat Data Anal 50:1655–1677

    Article  MathSciNet  Google Scholar 

  • Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, London

    MATH  Google Scholar 

  • Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35:417–446

    MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap. Chapman and Hall/CRC, New York

    Google Scholar 

  • Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29

    Article  MathSciNet  Google Scholar 

  • Hanson RH (1978) The current population survey: design and methodology. Department of Commerce, Bureau of the Census

    Google Scholar 

  • Hoff PD (2009) A first course in Bayesian statistical methods. Springer Science & Business Media, New York

    Book  Google Scholar 

  • Hollis S, Campbell F (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 319:670–674

    Article  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685

    Article  MathSciNet  Google Scholar 

  • Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, New York

    Book  Google Scholar 

  • International Conference on Harmonization (1998) Statistical principles for clinical trials E9. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf

  • Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:523–539

    Article  MathSciNet  Google Scholar 

  • Kenward MG, Molenberghs G (2009) Last observation carried forward: a crystal ball? J Biopharm Stat 19:872–888

    Article  MathSciNet  Google Scholar 

  • Li F, Thomas LE, Li F (2018) Addressing extreme propensity scores via the overlap weights. Am J Epidemiol. https://doi.org/10.1093/aje/kwy201

  • Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237

    Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Inc., Hoboken, New Jersey

    Google Scholar 

  • Little RJ (2014) Dropouts in longitudinal studies: methods of analysis. Wiley StatsRef: Statistics Reference Online

    Google Scholar 

  • Little R, Kang S (2015) Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med 34:2381–2390

    Article  MathSciNet  Google Scholar 

  • Little RJ, Rubin DB (2014) Statistical analysis with missing data. Wiley, Hoboken

    MATH  Google Scholar 

  • Little RJ, D’Agostino R, Dickersin K et al (2010) The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. In: Committee on national statistics, division of behavioral and social sciences and education. The National Academies Press, Washington DC

    Google Scholar 

  • Little RJ, Wang J, Sun X, Tian H, Suh EY, Lee M et al (2016) The treatment of missing data in a large cardiovascular clinical outcomes study. Clin Trials 13:344–351

    Article  Google Scholar 

  • Lunceford JK, Davidian M (2004) Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 23:2937–2960

    Article  Google Scholar 

  • Mallinckrodt CH (2013) Preventing and treating missing data in longitudinal clinical trials: a practical guide. Cambridge University Press, New York

    Book  Google Scholar 

  • Meng X-L (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558

    Article  Google Scholar 

  • Oehlert GW (1992) A note on the delta method. Am Stat 46(1):27–29

    MathSciNet  Google Scholar 

  • Press SJ (2005) Applied multivariate analysis: using Bayesian and frequentist methods of inference. Dover Publications, INC. Mineola, New York

    Google Scholar 

  • Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85–96

    Google Scholar 

  • Ridgeway G, McCaffrey DF (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:540–543

    Article  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63:581–592

    Article  MathSciNet  Google Scholar 

  • Rubin DB (1978) Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association. American Statistical Association, pp 20–34

    Google Scholar 

  • Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489

    Article  Google Scholar 

  • Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley, New York

    MATH  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall/CRC, New York

    Book  Google Scholar 

  • Seaman SR, Vansteelandt S (2018) Introduction to double robust methods for incomplete data. Stat Sci Rev J Inst Math Stat 33:184–197

    MathSciNet  MATH  Google Scholar 

  • Tsiatis A (2007) Semiparametric theory and missing data. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Tsiatis AA, Davidian M (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:569–573

    Article  Google Scholar 

  • van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45:1–67

    Google Scholar 

  • White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew S. Allen .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Tong, G., Li, F., Allen, A.S. (2020). Missing Data. In: Piantadosi, S., Meinert, C. (eds) Principles and Practice of Clinical Trials. Springer, Cham. https://doi.org/10.1007/978-3-319-52677-5_117-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52677-5_117-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52677-5

  • Online ISBN: 978-3-319-52677-5

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics