Abstract
The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the Big Observational Data prior to the application of machine learning algorithms. The design step directly helps to reduce biases in the causal inferences that arise due to the non-randomized treatment assignment. In particular, performing the design step prior to implementing a machine learning algorithm ensures that subjects in different treatment groups with comparable covariates are subclassified or matched together, which reduces biases due to the confounding of covariates with treatment. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational data. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study. Ultimately, our procedure enables researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.
Similar content being viewed by others
References
Abadie A, Imbens GW (2008) On the failure of the bootstrap for matching estimators. Econometrica 76(6):1537–1557
Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci 113(27):7353–7360
Austin PC (2009) Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 28(25):3083–3107
Austin PC, Small DS (2014) The use of bootstrapping when using propensity-score matching without replacement: a simulation study. Stat Med 33(24):4306–4319
Cochran WG (1965) The planning of observational studies of human populations. J R Stat Soc Ser A 128:234–265
Cochran WG (1968) The effectiveness of adjustment by classification in remove bias in observational studies. Biometrics 24:295–314
Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā: Indian J Stat Ser A 35(4):417–446
D’Amour A, Ding P, Feller A, Lei L, Sekhon J (2021) Overlap in observational studies with high-dimensional covariates. J. Econom. 221(2):644–654
Dehejia RH, Wahba S (1999) Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc 94(448):1053–1062
Diamond A, Sekhon JS (2013) Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev Econ Stat 95(3):932–945
Dorie V, Hill JL (2020) bartCause: causal inference using Bayesian additive regression trees (R package version 1.0-4)
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26
Efron B (1981) Nonparametric standard errors and confidence intervals. Canadian J Stat/ La Revue Canadienne de Statistique 9(2):139–158
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, London
Foster JC, Taylor JM, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30(24):2867–2880
Greifer N (2020) cobalt: covariate balance tables and plots (R package version 4.2.2)
Gutierrez P, Gérardy JY (2017) Causal inference and uplift modelling: a review of the literature. Proc Mach Learn Res 67:1–13
Hahn PR, Murray JS, Carvalho CM (2019) bcf: causal inference for a binary treatment and continuous outcome using Bayesian causal forests (R package version 1.3)
Hahn PR, Murray JS, Carvalho CM (2020) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal 15(3):965–1056
Hahs-Vaughn DK, Onwuegbuzie AJ (2006) Estimating and using propensity score analysis with complex samples. J Exp Edu 75(1):31–65
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York, NY
Hesterberg TC (2015) What teachers should know about the bootstrap: resampling in the undergraduate statistics curriculum. Am Stat 69(4):371–386
Hill JL (2011) Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 20(1):217–240
Hill JL, Reiter JP (2006) Interval estimation for treatment effects using propensity score matching. Stat Med 25(13):2230–2256
Ho DE, Imai K, King G, Stuart EA (2007) Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal 15(3):199–236
Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81(396):945–960
Imbens GW, Rubin DB (2015) Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge University Press, Cambridge
Kenett RS, Rahav E, Steinberg DM (2006) Bootstrap analysis of designed experiments. Qual Reliab Eng Int 22(6):659–667
Kuhn M (2020) caret: classification and regression training (R package version 6.0-86)
Künzel SR, Sekhon JS, Bickel PJ, Yu B (2019) Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 116(10):4156–4165
Lee BK, Lessler J, Stuart EA (2010) Improving propensity score weighting using machine learning. Stat Med 29(3):337–346
Lu M, Sadiq S, Feaster DJ, Ishwaran H (2018) Estimating individual treatment effect in observational data using random forest methods. J Comput Graph Stat 27(1):209–219
McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF (2013) A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med 32(19):3388–3414
McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(4):403–425
McCartney G, Hacker T, Yang B (2014) Empowering faculty: a campus cyberinfrastructure strategy for research communities. Educause review. https://er.educause.edu/articles/2014/7/empowering-faculty-a-campus-cyberinfrastructure-strategy-for-research-communities
McConnell KJ, Lindner S (2019) Estimating treatment effects with machine learning. Health Serv Res 54(6):1273–1282
Otsu T, Rai Y (2017) Bootstrap inference of matching estimators for average treatment effects. J Am Stat Assoc 112(520):1720–1732
Powers S, Qian J, Jung K, Schuler A, Shah NH, Hastie T, Tibshirani R (2018) Some methods for heterogeneous treatment effect estimation in high dimensions. Stat Med 37(11):1767–1787
Rosenbaum PR (1989) Optimal matching for observational studies. J Am Stat Assoc 84(408):1024–1032
Rosenbaum PR (1991) A characterization of optimal designs for observational studies. J Roy Stat Soc: Ser B (Methodol) 53(3):597–610
Rosenbaum PR (1999) Choice as an alternative to control in observational studies. Stat Sci 14(3):259–304
Rosenbaum PR (2020) Design of observational studies, 2nd edn. Springer, Switzerland
Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55
Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79(387):516–524
Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38
Rubin DB (1973) Matching to remove bias in observational studies. Biometrics 29(1):159–183
Rubin DB (1973) The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29:184–203
Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701
Rubin DB (1976) Multivariate matching methods that are equal percent bias reducing, i: some examples. Biometrics 32(1):109–120
Rubin DB (1979) Using multivariate matched sampling and regression adjustment to control bias in observational studies. J Am Stat Assoc 74:318–3283
Rubin DB (1980) Bias reduction using mahalanobis-metric matching. Biometrics 36(2):293–298
Rubin DB (2006) Matched sampling for causal effects. Cambridge University Press, Cambridge
Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840
Rudolph KE, Colson KE, Stuart EA, Ahern J (2016) Optimally combining propensity score subclasses. Stat Med 35(27):4937–4947
Rzepakowski P, Jaroszewicz S (2012) Decision trees for uplift modeling with single and multiple treatments. Knowl Inf Syst 32(2):303–327
Samuels LR, Greevy RA Jr (2018) Bagged one-to-one matching for efficient and robust treatment effect estimation. Stat Med 37(29):4353–4373
Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: the Matching package for R. J Stat Softw 42(7):1–52
Stuart EA (2010) Matching methods for causal inference: a review and a look forward. Stat Sci 25(1):1–21
Tu W, Zhou XH (2002) A bootstrap confidence interval procedure for the treatment effect using propensity score subclassification. Health Serv Outcomes Res Method 3(2):135–147
Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113(523):1228–1242
Wendling T, Jung K, Callahan A, Schuler A, Shah NH, Gallego B (2018) Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases. Stat Med 37(23):3309–3324
Wu P, Zeng D, Wang Y (2020) Matched learning for optimizing individualized treatment strategies using electronic health records. J Am Stat Assoc 115(529):380–392
Yu R, Silber JH, Rosenbaum PR (2020) Matching methods for observational studies derived from large administrative databases. Stat Sci 35(3):338–355
Zubizarreta JR (2012) Using mixed integer programming for matching in an observational study of kidney failure after surgery. J Am Stat Assoc 107(500):1360–1371
Acknowledgements
We are grateful to two reviewers for many valuable comments that improved this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research is supported by the Purdue University ITaP Explanatory Modeling Project Grant.
This article is part of the topical collection “Special Issue: State of the art in research on design and analysis of experiments” guest edited by John Stufken, Abhyuday Mandal, and Rakhi Singh.
Rights and permissions
About this article
Cite this article
Zhang, Y., Sabbaghi, A. The Designed Bootstrap for Causal Inference in Big Observational Data. J Stat Theory Pract 15, 80 (2021). https://doi.org/10.1007/s42519-021-00213-z
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-021-00213-z