Abstract
Missing data are commonly seen in randomized clinical trials. When missingness is not completely random, a complete-case analysis that ignores the missing data process often leads to biased estimates of the average treatment effect. This chapter defines different missing data mechanisms, discusses their impact on inference, and presents statistical methods that address missing data, including likelihood-based analysis, inverse probability weighting, and imputation. Each of these methods either models the missingness process or the observed outcome distribution. A more robust approach that combines the virtue of each of these modeling approaches is also introduced. This approach is doubly robust such that it yields a consistent estimate of the average treatment effect if either one of the missingness model or the outcome model is correctly specified, but not necessarily both. The chapter concludes with a brief discussion of sensitivity analyses used to assess the impact of unmeasured factors that affect both the missingness and outcomes. Throughout, statistical and practical considerations are discussed in the context of randomized clinical trials where the primary analysis is to compare two treatments and to estimate the average comparative effect among the enrolled population.
References
Akande O, Li F, Reiter J (2017) An empirical comparison of multiple imputation methods for categorical data. Am Stat 71:162–170
Albert JH, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Sat Assoc 88:669–679
Angrist JD, Imbens GW, Rubin DB (1996) Identification of causal effects using instrumental variables. J Am Stat Assoc 91:444–455
Barnard J, Rubin DB (1999) Miscellanea. Small-sample degrees of freedom with multiple imputation. Biometrika 86:948–955
Browne WJ (2006) MCMC algorithms for constrained variance matrices. Comput Stat Data Anal 50:1655–1677
Carpenter J, Kenward M (2012) Multiple imputation and its application. Wiley, London
Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā Indian J Stat Ser A 35:417–446
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol 39:1–38
Efron B, Tibshirani RJ (1994) An Introduction to the Bootstrap. Chapman and Hall/CRC, New York
Frangakis CE, Rubin DB (2002) Principal stratification in causal inference. Biometrics 58:21–29
Hanson RH (1978) The current population survey: design and methodology. Department of Commerce, Bureau of the Census
Hoff PD (2009) A first course in Bayesian statistical methods. Springer Science & Business Media, New York
Hollis S, Campbell F (1999) What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 319:670–674
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47:663–685
Imbens GW, Rubin DB (2015) Causal inference in statistics, social, and biomedical sciences. Cambridge University Press, New York
International Conference on Harmonization (1998) Statistical principles for clinical trials E9. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_Guideline.pdf
Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:523–539
Kenward MG, Molenberghs G (2009) Last observation carried forward: a crystal ball? J Biopharm Stat 19:872–888
Li F, Thomas LE, Li F (2018) Addressing extreme propensity scores via the overlap weights. Am J Epidemiol. https://doi.org/10.1093/aje/kwy201
Little RJ (1992) Regression with missing X’s: a review. J Am Stat Assoc 87:1227–1237
Little RJA, Rubin DB (2002) Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons, Inc., Hoboken, New Jersey
Little RJ (2014) Dropouts in longitudinal studies: methods of analysis. Wiley StatsRef: Statistics Reference Online
Little R, Kang S (2015) Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med 34:2381–2390
Little RJ, Rubin DB (2014) Statistical analysis with missing data. Wiley, Hoboken
Little RJ, D’Agostino R, Dickersin K et al (2010) The prevention and treatment of missing data in clinical trials. Panel on handling missing data in clinical trials. In: Committee on national statistics, division of behavioral and social sciences and education. The National Academies Press, Washington DC
Little RJ, Wang J, Sun X, Tian H, Suh EY, Lee M et al (2016) The treatment of missing data in a large cardiovascular clinical outcomes study. Clin Trials 13:344–351
Lunceford JK, Davidian M (2004) Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Stat Med 23:2937–2960
Mallinckrodt CH (2013) Preventing and treating missing data in longitudinal clinical trials: a practical guide. Cambridge University Press, New York
Meng X-L (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558
Oehlert GW (1992) A note on the delta method. Am Stat 46(1):27–29
Press SJ (2005) Applied multivariate analysis: using Bayesian and frequentist methods of inference. Dover Publications, INC. Mineola, New York
Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85–96
Ridgeway G, McCaffrey DF (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:540–543
Rubin DB (1976) Inference and missing data. Biometrika 63:581–592
Rubin DB (1978) Multiple imputations in sample surveys-a phenomenological Bayesian approach to nonresponse. In: Proceedings of the survey research methods section of the American Statistical Association. American Statistical Association, pp 20–34
Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91:473–489
Rubin DB (2004) Multiple imputation for nonresponse in surveys. Wiley, New York
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman and Hall/CRC, New York
Seaman SR, Vansteelandt S (2018) Introduction to double robust methods for incomplete data. Stat Sci Rev J Inst Math Stat 33:184–197
Tsiatis A (2007) Semiparametric theory and missing data. Springer Science & Business Media, New York
Tsiatis AA, Davidian M (2007) Comment: demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22:569–573
van Buuren S, Groothuis-Oudshoorn K (2011) MICE: multivariate imputation by chained equations in R. J Stat Softw 45:1–67
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this entry
Cite this entry
Tong, G., Li, F., Allen, A.S. (2020). Missing Data. In: Piantadosi, S., Meinert, C. (eds) Principles and Practice of Clinical Trials. Springer, Cham. https://doi.org/10.1007/978-3-319-52677-5_117-1
Download citation
DOI: https://doi.org/10.1007/978-3-319-52677-5_117-1
Received:
Accepted:
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52677-5
Online ISBN: 978-3-319-52677-5
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering