Analysis of Incomplete Data Using Inverse Probability Weighting and Doubly Robust Estimators
Abstract
This article reviews inverse probability weighting methods and doubly robust estimation methods for the analysis of incomplete data sets. We first consider methods for estimating a population mean when the outcome is missing at random, in the sense that measured covariates can explain whether or not the outcome is observed. We then sketch the rationale of these methods and elaborate on their usefulness in the presence of influential inverse weights. We finally outline how to apply these methods in a variety of settings, such as for fitting regression models with incomplete outcomes or covariates, emphasizing the use of standard software programs.
References
2005). Doubly robust estimation in missing data and causal inference models. Biometrics, 61, 692–972.
(2006). A semiparametric model selection criterion with applications to the marginal structural model. Computational Statistics and Data Analysis, 50, 475–498.
(2000). Bootstrap confidence intervals: When, which, what? A practical guide for medical statisticians. Statistics in Medicine, 19, 1141–1164.
(2006). A comparison of multiple imputation and doubly robust estimation. Statistics in Society, 169, 571–584.
(1983). Imputation models based on the propensity to respond. Proceedings of the business and economic statistics section, (pp. 168–173). American Statistical Association.
(2005). Semiparametric estimation of treatment effect in a pretest-posttest study with missing data. Statistical Science, 20, 261–301.
(1999). Confounding and collapsibility in causal inference. Statistical Science, 14, 29–46.
(1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47, 663–685.
(2008). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, 22, 523–539.
(2007). Multiple imputation: Current perspectives. Statistical Methods in Medical Research, 16, 199–218.
(2004). Analysis of longitudinal data with irregular, outcome-dependent follow-up. Journal of the Royal Statistical Society – Series B, 66, 791–813.
(2004). Robust likelihood-based analysis of multivariate data with missing values. Statistica Sinica, 14, 949–968.
(1988). Correcting for the bias caused by drop-outs in hypertension trials. Statistics in Medicine, 7, 941–946.
(2001). Comment on the Bickel and Kwon article, “Inference for semiparametric models: Some questions and an answer”. Statistica Sinica, 11, 920–936.
(1994). Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89, 846–866.
(1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of the American Statistical Association, 90, 106–121.
(2008). Performance of double-robust estimators when ‘inverse probability’ weights are highly variable. Statistical Science, 22, 544–559.
(2000). Inference for imputation estimators. Biometrika, 87, 113–124.
(1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. Journal of the Royal Statistical Society Series A, 147, 656–666.
(2005). Inverse probability weighted in survival analysis. In , The encyclopedia of biostatistics. (2nd ed., Vol. 4, pp. 2619–2625). Chichester, UK: Wiley & Sons.
(1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
(1999). Adjusting for non-ignorable drop-out using semiparametric non-response models. Journal of the American Statistical Association, 94, 1096–1120.
(2008). Understanding OR, PS, and DR. Statistical Science, 22, 560–568.
(2007). Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. Biometrika, 94, 841–860.
(