Skip to main content
Log in

The Designed Bootstrap for Causal Inference in Big Observational Data

  • Original Article
  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

The combination of modern machine learning algorithms with the nonparametric bootstrap can enable effective predictions and inferences on Big Observational Data. An increasingly prominent and critical objective in such analyses is to draw causal inferences from the Big Observational Data. A fundamental step in addressing this objective is to design the Big Observational Data prior to the application of machine learning algorithms. The design step directly helps to reduce biases in the causal inferences that arise due to the non-randomized treatment assignment. In particular, performing the design step prior to implementing a machine learning algorithm ensures that subjects in different treatment groups with comparable covariates are subclassified or matched together, which reduces biases due to the confounding of covariates with treatment. However, the application of the traditional nonparametric bootstrap on Big Observational Data requires excessive computational efforts. This is because every bootstrap sample would need to be re-designed under the traditional approach, which can be prohibitive in practice. We propose a design-based bootstrap for deriving causal inferences with reduced bias from the application of machine learning algorithms on Big Observational Data. Our bootstrap procedure operates by resampling from the original designed observational data. It eliminates the need for additional, costly design steps on each bootstrap sample that are performed under the standard nonparametric bootstrap. We demonstrate the computational efficiency of this procedure compared to the traditional nonparametric bootstrap, and its equivalency in terms of confidence interval coverage rates for the average treatment effects, by means of simulation studies and a real-life case study. Ultimately, our procedure enables researchers to effectively use straightforward design procedures to obtain valid causal inferences with reduced computational efforts from the application of machine learning algorithms on Big Observational Data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abadie A, Imbens GW (2008) On the failure of the bootstrap for matching estimators. Econometrica 76(6):1537–1557

    Article  MathSciNet  MATH  Google Scholar 

  2. Athey S, Imbens G (2016) Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci 113(27):7353–7360

    Article  MathSciNet  MATH  Google Scholar 

  3. Austin PC (2009) Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med 28(25):3083–3107

    Article  MathSciNet  Google Scholar 

  4. Austin PC, Small DS (2014) The use of bootstrapping when using propensity-score matching without replacement: a simulation study. Stat Med 33(24):4306–4319

    Article  MathSciNet  Google Scholar 

  5. Cochran WG (1965) The planning of observational studies of human populations. J R Stat Soc Ser A 128:234–265

    Article  Google Scholar 

  6. Cochran WG (1968) The effectiveness of adjustment by classification in remove bias in observational studies. Biometrics 24:295–314

    Article  MathSciNet  Google Scholar 

  7. Cochran WG, Rubin DB (1973) Controlling bias in observational studies: a review. Sankhyā: Indian J Stat Ser A 35(4):417–446

    MATH  Google Scholar 

  8. D’Amour A, Ding P, Feller A, Lei L, Sekhon J (2021) Overlap in observational studies with high-dimensional covariates. J. Econom. 221(2):644–654

  9. Dehejia RH, Wahba S (1999) Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc 94(448):1053–1062

    Article  Google Scholar 

  10. Diamond A, Sekhon JS (2013) Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies. Rev Econ Stat 95(3):932–945

    Article  Google Scholar 

  11. Dorie V, Hill JL (2020) bartCause: causal inference using Bayesian additive regression trees (R package version 1.0-4)

  12. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26

    Article  MathSciNet  MATH  Google Scholar 

  13. Efron B (1981) Nonparametric standard errors and confidence intervals. Canadian J Stat/ La Revue Canadienne de Statistique 9(2):139–158

    Article  MathSciNet  MATH  Google Scholar 

  14. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, London

    Book  MATH  Google Scholar 

  15. Foster JC, Taylor JM, Ruberg SJ (2011) Subgroup identification from randomized clinical trial data. Stat Med 30(24):2867–2880

    Article  MathSciNet  Google Scholar 

  16. Greifer N (2020) cobalt: covariate balance tables and plots (R package version 4.2.2)

  17. Gutierrez P, Gérardy JY (2017) Causal inference and uplift modelling: a review of the literature. Proc Mach Learn Res 67:1–13

    Google Scholar 

  18. Hahn PR, Murray JS, Carvalho CM (2019) bcf: causal inference for a binary treatment and continuous outcome using Bayesian causal forests (R package version 1.3)

  19. Hahn PR, Murray JS, Carvalho CM (2020) Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal 15(3):965–1056

    Article  MathSciNet  MATH  Google Scholar 

  20. Hahs-Vaughn DK, Onwuegbuzie AJ (2006) Estimating and using propensity score analysis with complex samples. J Exp Edu 75(1):31–65

    Article  Google Scholar 

  21. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York, NY

    Book  MATH  Google Scholar 

  22. Hesterberg TC (2015) What teachers should know about the bootstrap: resampling in the undergraduate statistics curriculum. Am Stat 69(4):371–386

    Article  MathSciNet  Google Scholar 

  23. Hill JL (2011) Bayesian nonparametric modeling for causal inference. J Comput Graph Stat 20(1):217–240

    Article  MathSciNet  Google Scholar 

  24. Hill JL, Reiter JP (2006) Interval estimation for treatment effects using propensity score matching. Stat Med 25(13):2230–2256

    Article  MathSciNet  Google Scholar 

  25. Ho DE, Imai K, King G, Stuart EA (2007) Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Polit Anal 15(3):199–236

    Article  Google Scholar 

  26. Holland PW (1986) Statistics and causal inference. J Am Stat Assoc 81(396):945–960

    Article  MathSciNet  MATH  Google Scholar 

  27. Imbens GW, Rubin DB (2015) Causal inference for statistics, social, and biomedical sciences: an introduction. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  28. Kenett RS, Rahav E, Steinberg DM (2006) Bootstrap analysis of designed experiments. Qual Reliab Eng Int 22(6):659–667

    Article  Google Scholar 

  29. Kuhn M (2020) caret: classification and regression training (R package version 6.0-86)

  30. Künzel SR, Sekhon JS, Bickel PJ, Yu B (2019) Metalearners for estimating heterogeneous treatment effects using machine learning. Proc Natl Acad Sci 116(10):4156–4165

    Article  Google Scholar 

  31. Lee BK, Lessler J, Stuart EA (2010) Improving propensity score weighting using machine learning. Stat Med 29(3):337–346

    Article  MathSciNet  Google Scholar 

  32. Lu M, Sadiq S, Feaster DJ, Ishwaran H (2018) Estimating individual treatment effect in observational data using random forest methods. J Comput Graph Stat 27(1):209–219

    Article  MathSciNet  Google Scholar 

  33. McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, Burgette LF (2013) A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat Med 32(19):3388–3414

    Article  MathSciNet  Google Scholar 

  34. McCaffrey DF, Ridgeway G, Morral AR (2004) Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods 9(4):403–425

    Article  Google Scholar 

  35. McCartney G, Hacker T, Yang B (2014) Empowering faculty: a campus cyberinfrastructure strategy for research communities. Educause review. https://er.educause.edu/articles/2014/7/empowering-faculty-a-campus-cyberinfrastructure-strategy-for-research-communities

  36. McConnell KJ, Lindner S (2019) Estimating treatment effects with machine learning. Health Serv Res 54(6):1273–1282

    Article  Google Scholar 

  37. Otsu T, Rai Y (2017) Bootstrap inference of matching estimators for average treatment effects. J Am Stat Assoc 112(520):1720–1732

    Article  MathSciNet  Google Scholar 

  38. Powers S, Qian J, Jung K, Schuler A, Shah NH, Hastie T, Tibshirani R (2018) Some methods for heterogeneous treatment effect estimation in high dimensions. Stat Med 37(11):1767–1787

    Article  MathSciNet  Google Scholar 

  39. Rosenbaum PR (1989) Optimal matching for observational studies. J Am Stat Assoc 84(408):1024–1032

    Article  Google Scholar 

  40. Rosenbaum PR (1991) A characterization of optimal designs for observational studies. J Roy Stat Soc: Ser B (Methodol) 53(3):597–610

    MathSciNet  MATH  Google Scholar 

  41. Rosenbaum PR (1999) Choice as an alternative to control in observational studies. Stat Sci 14(3):259–304

    Article  MATH  Google Scholar 

  42. Rosenbaum PR (2020) Design of observational studies, 2nd edn. Springer, Switzerland

    Book  MATH  Google Scholar 

  43. Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70(1):41–55

    Article  MathSciNet  MATH  Google Scholar 

  44. Rosenbaum PR, Rubin DB (1984) Reducing bias in observational studies using subclassification on the propensity score. J Am Stat Assoc 79(387):516–524

    Article  Google Scholar 

  45. Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39(1):33–38

    Google Scholar 

  46. Rubin DB (1973) Matching to remove bias in observational studies. Biometrics 29(1):159–183

    Article  Google Scholar 

  47. Rubin DB (1973) The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29:184–203

    Google Scholar 

  48. Rubin DB (1974) Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 66(5):688–701

    Article  Google Scholar 

  49. Rubin DB (1976) Multivariate matching methods that are equal percent bias reducing, i: some examples. Biometrics 32(1):109–120

    Article  MathSciNet  MATH  Google Scholar 

  50. Rubin DB (1979) Using multivariate matched sampling and regression adjustment to control bias in observational studies. J Am Stat Assoc 74:318–3283

    MATH  Google Scholar 

  51. Rubin DB (1980) Bias reduction using mahalanobis-metric matching. Biometrics 36(2):293–298

    Article  MATH  Google Scholar 

  52. Rubin DB (2006) Matched sampling for causal effects. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  53. Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840

    Article  MathSciNet  MATH  Google Scholar 

  54. Rudolph KE, Colson KE, Stuart EA, Ahern J (2016) Optimally combining propensity score subclasses. Stat Med 35(27):4937–4947

    Article  MathSciNet  Google Scholar 

  55. Rzepakowski P, Jaroszewicz S (2012) Decision trees for uplift modeling with single and multiple treatments. Knowl Inf Syst 32(2):303–327

    Article  Google Scholar 

  56. Samuels LR, Greevy RA Jr (2018) Bagged one-to-one matching for efficient and robust treatment effect estimation. Stat Med 37(29):4353–4373

    Article  MathSciNet  Google Scholar 

  57. Sekhon JS (2011) Multivariate and propensity score matching software with automated balance optimization: the Matching package for R. J Stat Softw 42(7):1–52

    Article  Google Scholar 

  58. Stuart EA (2010) Matching methods for causal inference: a review and a look forward. Stat Sci 25(1):1–21

    Article  MathSciNet  MATH  Google Scholar 

  59. Tu W, Zhou XH (2002) A bootstrap confidence interval procedure for the treatment effect using propensity score subclassification. Health Serv Outcomes Res Method 3(2):135–147

    Article  Google Scholar 

  60. Wager S, Athey S (2018) Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 113(523):1228–1242

    Article  MathSciNet  MATH  Google Scholar 

  61. Wendling T, Jung K, Callahan A, Schuler A, Shah NH, Gallego B (2018) Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases. Stat Med 37(23):3309–3324

    Article  MathSciNet  Google Scholar 

  62. Wu P, Zeng D, Wang Y (2020) Matched learning for optimizing individualized treatment strategies using electronic health records. J Am Stat Assoc 115(529):380–392

    Article  MathSciNet  MATH  Google Scholar 

  63. Yu R, Silber JH, Rosenbaum PR (2020) Matching methods for observational studies derived from large administrative databases. Stat Sci 35(3):338–355

    MathSciNet  MATH  Google Scholar 

  64. Zubizarreta JR (2012) Using mixed integer programming for matching in an observational study of kidney failure after surgery. J Am Stat Assoc 107(500):1360–1371

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are grateful to two reviewers for many valuable comments that improved this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yumin Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research is supported by the Purdue University ITaP Explanatory Modeling Project Grant.

This article is part of the topical collection “Special Issue: State of the art in research on design and analysis of experiments” guest edited by John Stufken, Abhyuday Mandal, and Rakhi Singh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Sabbaghi, A. The Designed Bootstrap for Causal Inference in Big Observational Data. J Stat Theory Pract 15, 80 (2021). https://doi.org/10.1007/s42519-021-00213-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42519-021-00213-z

Keywords

Navigation