Elsevier

Journal of Econometrics

Volume 225, Issue 2, December 2021, Pages 175-199
Journal of Econometrics

Estimating dynamic treatment effects in event studies with heterogeneous treatment effects

https://doi.org/10.1016/j.jeconom.2020.09.006Get rights and content

Abstract

To estimate the dynamic effects of an absorbing treatment, researchers often use two-way fixed effects regressions that include leads and lags of the treatment. We show that in settings with variation in treatment timing across units, the coefficient on a given lead or lag can be contaminated by effects from other periods, and apparent pretrends can arise solely from treatment effects heterogeneity. We propose an alternative estimator that is free of contamination, and illustrate the relative shortcomings of two-way fixed effects regressions with leads and lags through an empirical application.

Introduction

Rich panel data has fueled a growing literature estimating treatment effects with two-way fixed effects regressions. This body of applied work has prompted a corresponding econometrics literature investigating the assumptions required for these regressions to yield causally interpretable estimates. For example, Athey and Imbens (2018), Borusyak and Jaravel, 2017, Callaway and Sant’Anna, 2020a, de Chaisemartin and D’Haultfœuille, 2020 and Goodman-Bacon (2018) interpret the coefficient on the treatment status when there are treatment effects heterogeneity and variation in treatment timing. Researchers are often also interested in dynamic treatment effects, which they estimate by the coefficients μ associated with indicators for being periods relative to the treatment, in a specification that resembles the following: Yi,t=αi+λt+μ1{tEi=}+υi,t.Here Yi,t is the outcome of interest for unit i at time t, Ei is the time when unit i initially receives the binary absorbing treatment, and αi and λt are the unit and time fixed effects. Units are categorized into different cohorts based on their initial treatment timing. The relative times =tEi included in (1) cover most of the possible relative periods, but may still exclude some periods.

The first goal of this paper is to uncover potential pitfalls associated with using the estimates of the relative period coefficients μ as “reasonable” measures of dynamic treatment effects. We decompose μ to show it can be expressed as a linear combination of cohort-specific effects from both its own relative period and other relative periods; unless strong assumptions regarding treatment effects homogeneity hold, the terms that include treatment effects from other relative periods will not cancel out and will contaminate the estimate of μ. Importantly, this demonstrates that the widespread practice of using estimates of treatment leads in (1) as a way of testing for parallel pretrends is problematic. Roth (2019), in his survey of the applied literature, notes that checking whether μ=0 for leads of treatment is a common test for pretrends. Our decomposition result implies that such a test would be invalid because the estimate of μ is affected by both pretrends and treatment effects heterogeneity, thus any test of μ=0 cannot accept or reject the existence of pretrends without further assumptions on treatment effects.

We show how to calculate the weights underlying the linear combination of treatment effects in μ using an auxiliary regression. This auxiliary regression depends only on the distribution of cohorts and the relative time indicators included in (1). Examining the weights allows researchers to gauge how large the amount of treatment effects heterogeneity needs to be for μ to be contaminated by treatment effects from other relative periods. Our publicly-available Stata package eventstudyweights automates the estimation of these weights using the panel dataset underlying any given specification of (1).

The second goal of this paper is to propose an alternative regression-based method that is more robust to treatment effects heterogeneity than regression (1). For dynamic treatment effects, researchers are usually interested in estimating some average of treatment effects from periods relative to the treatment. Our alternative method estimates the shares of cohort as weights. These weights are more interpretable than the weights underlying regression (1) in the presence of treatment effects heterogeneity, and the resulting weighted average of treatment effects extends beyond a convex combination of treatment effects (Słoczyński, 2020). As discussed in Section 4.2, using the procedures of Callaway and Sant’Anna (2020a), our alternative method can also accommodate covariates.

We illustrate both our decomposition results and our alternative method via an empirical application, estimating the dynamic effects of a hospitalization. We follow Dobkin et al. (2018) in using the publicly-available dataset, Health and Retirement Study (HRS), to first estimate two-way fixed effects regressions. We then illustrate our alternative estimation method with this example. Among the outcomes studied by Dobkin et al. (2018), we focus on out-of-pocket medical spending and labor earnings. Our alternative method yields similar big-picture findings as the original paper that uses two-way fixed effects regressions: the earnings decline due to hospitalization is substantial compared to the transitory out-of-pocket spending increase. However, the two-way fixed effects estimates sometimes fall outside the convex hull of the underlying effects. In contrast, estimates using our alternative method, by construction, are guaranteed to be easy-to-interpret because they are weighted averages of the underlying effects, with weights corresponding to cohort shares.

The rest of the paper is organized as follows: In the next subsection, we review the theoretical literature. Section 2 formally introduces the event study design and discusses our definition in relation to the applied literature. Section 3 derives the estimands of two-way fixed effects regression, and introduces sufficient assumptions for them to be causally interpretable. Section 4 develops our alternative estimator. Section 5 illustrates our results using an empirical example and Section 6 concludes. All proofs are contained in the Appendix B Proofs of decomposition results, Appendix C Supplementary results related to IW estimator.

This paper makes two main contributions within an active literature on the causal interpretations of two-way fixed effects models in settings with staggered treatment adoption (Athey and Imbens, 2018, Borusyak and Jaravel, 2017, Callaway and Sant’Anna, 2020a, de Chaisemartin and D’Haultfœuille, 2020, Goodman-Bacon, 2018). Our paper is also related to the traditional literature analyzing non-separable panel and treatment effects models e.g. Heckman et al., 1998, Heckman et al., 1997, Blundell et al., 2004, Abadie, 2005, Chernozhukov et al., 2013.

The first main contribution of our paper is to interpret estimates from two-way fixed effects specifications when researchers include “dynamic” indicators for time relative to treatment and when treatment effects are heterogeneous across adoption cohorts. We derive our results for a general class of two-way fixed effects specifications where “dynamic” indicators can be flexibly specified as single relative periods or sets of relative periods g (thus also capturing any “static” specification where all post-treatment indicators are collected in a single set). This class of specifications encompasses all specifications addressed by Athey and Imbens, 2018, Borusyak and Jaravel, 2017, Callaway and Sant’Anna, 2020a, de Chaisemartin and D’Haultfœuille, 2020 and Goodman-Bacon (2018).

As a building block for the causal interpretation of estimates, we define CATTe,, the cohort average treatment effects on the treated as the cohort-specific average difference in outcomes relative to never being treated. Our choice of a “building block” is governed by the counterfactual and the type of heterogeneity of interest. This object coincides with the “group-time average treatment effect” studied by Callaway and Sant’Anna (2020a) and is more granular than the building block used by Goodman-Bacon (2018) that is an average of CATTe, over some relative period range. Athey and Imbens (2018) consider an alternate counterfactual to never being treated: being treated at a different time. Borusyak and Jaravel (2017) implicitly assume away heterogeneity across cohorts within a relative period, so their building block reduces to ATT. de Chaisemartin and D’Haultfœuille (2020) allow for heterogeneous treatment paths within a cohort over time across “groups”, thus their building block is at the group level. We defer a discussion of assumptions underlying the causal interpretation of these building blocks to Section 2.

The second main contribution of our paper is to propose a simple regression-based alternative estimation strategy that produces a more sensible estimand than conventional two-way fixed effects models under heterogeneous treatment effects. Our procedure is most similar to Callaway and Sant’Anna (2020a), but has the following differences. First, in the setting where there is no never-treated group, our method uses the last cohort to be treated as a control group, whereas Callaway and Sant’Anna (2020a) use the set of not-yet-treated cohorts. Our method and theirs thus rely on different, but non-nested parallel trends assumptions. Second, our estimation method can be cast as a regression specification and thus may be more familiar to applied researchers. However, a third difference is that the procedure of Callaway and Sant’Anna (2020a) allows for conditioning on time-varying covariates. de Chaisemartin and D’Haultfœuille (2020) and Goodman-Bacon (2018) respectively propose alternative estimators and diagnostic tools for estimation of causal effects in staggered settings, but do not consider the estimation of the dynamic path of treatment effects as we do.

Section snippets

Event studies design

In this section we first formalize the “event studies design”. As discussed in Section 2.3, based on how this term is deployed in the empirical literature, an event study design is a staggered adoptiondesign where units are treated at different times, and there may or may not be never treated units. It also nests a difference-in-differences design, where units are either first treated at time t0 or never treated.

Specifically, we consider a setting with a random sample of N units observed over T+

Estimators from linear two-way fixed effects regression

We consider a two-way fixed effects (FE) regression of the following form, estimated on a panel of i=1,,N units for t=0,1,,T calendar time periods: Yi,t=αi+λt+gGμg1{tEig}+υi,t.Here Yi,t is the outcome of interest for unit i at time t, Ei is the time for unit i to initially receive a binary absorbing treatment, and αi and λt are the unit and time fixed effects. The set G collects disjoint sets g of relative periods [T,T]. We allow some relative periods to be excluded from the

Alternative estimation method

We propose a new estimation method that is robust to treatment effects heterogeneity. The goal of our method is to estimate a weighted average of CATTe, for g with reasonable weights, namely weights that sum to one and are non-negative. In particular, we focus on the following weighted average of CATTe,, where the weights are shares of cohorts that experience at least periods relative to treatment, normalized by the size of g: νg=1|g|geCATTe,Pr{Ei=eEi[,T]}.One can aggregate CATT

Empirical illustration

We illustrate our findings in the setting of Dobkin et al. (2018). Dobkin et al. (2018) study the economic consequences of hospitalization, which is a large source of economic risk for adults in the United States. To quantify these economic risks, in the first part of their analysis, Dobkin et al. (2018) leverage variation in the timing of hospitalization observed in the publicly-available dataset, Health and Retirement Study (HRS), which we describe in more detail in Section 5.1. Their

Conclusions

This paper analyzes the behavior of relative period coefficients μ on the indicator for being periods away from the treatment from two-way fixed effects regressions in settings with variation in treatment timing and treatment effects heterogeneity. For dynamic treatment effects, researchers are usually interested in estimating some average of treatment effects from periods relative to the treatment, and it is common to report the coefficient estimate μ̂ assuming that interpretation is

Acknowledgments

We are grateful to Isaiah Andrews, Amy Finkelstein, Anna Mikusheva, and Heidi Williams for their guidance and support. We thank Alberto Abadie, Junyuan Chen, Jonathan Cohen, Nathan Hendren, Peter Hull, Guido Imbens, Yunan Ji, Sylvia Klosin, Kevin Kainan Li, Paichen Li, Therese A. McCarty, Whitney Newey, James Poterba, Pedro H. C. Sant’Anna, Gergely Ujhelyi and Helen Willis for helpful discussions. This research was supported by the National Institute on Aging, United States of America , Grant

References (36)

  • MalaniAnup et al.

    Interpreting pre-trends as anticipation: Impact on estimated treatment effects from tort reform

    J. Publ. Econ.

    (2015)
  • Sant’AnnaPedro H.C. et al.

    Doubly robust difference-in-differences estimators

    J. Econometrics

    (2020)
  • AbadieAlberto

    Semiparametric difference-in-differences estimators

    Rev. Econom. Stud.

    (2005)
  • AshenfelterOrley

    Estimating the effect of training programs on earnings

    Rev. Econ. Stat.

    (1978)
  • AtheySusan et al.

    Design-based Analysis in Difference-in-Differences Settings with Staggered AdoptionWorking Paper

    (2018)
  • BaileyMartha J. et al.

    The war on poverty’s experiment in public medicine: Community health centers and the mortality of older Americans

    Amer. Econ. Rev.

    (2015)
  • BlundellRichard et al.

    Evaluating the employment impact of a mandatory job search program

    J. Eur. Econom. Assoc.

    (2004)
  • BorusyakKirill et al.

    Revisiting Event Study DesignsWorking Paper

    (2017)
  • BoschMariano et al.

    The trade-offs of welfare policies in labor markets with informal jobs: The case of the ”seguro popular” program in Mexico

    Amer. Econ. J. Econ. Policy

    (2014)
  • BotosaruIrene et al.

    Difference-in-differences when the treatment status is observed in only one period

    J. Appl. Econometrics

    (2018)
  • CallawayBrantly et al.

    Difference-in-Differences With Multiple Time Periods and an Application on the Minimum Wage and EmploymentWorking Paper

    (2020)
  • CallawayBrantly et al.

    Did: Treatment effects with multiple periods and groups

    (2020)
  • de ChaisemartinClément et al.

    Two-way fixed effects estimators with heterogeneous treatment effects

    Amer. Econ. Rev.

    (2020)
  • ChernozhukovVictor et al.

    Average and quantile effects in nonseparable panel models

    Econometrica

    (2013)
  • ChettyRaj et al.

    Active vs. Passive decisions and crowd-out in retirement savings accounts: Evidence from Denmark

    Q. J. Econ.

    (2014)
  • DeryuginaTatyana

    The fiscal cost of hurricanes: Disaster aid versus social insurance

    Amer. Econ. J.: Econ. Policy

    (2017)
  • DeschênesOlivier et al.

    Defensive investments and the demand for air quality: Evidence from the NOx budget program

    Amer. Econ. Rev.

    (2017)
  • DobkinCarlos et al.

    The economic consequences of hospital admissions

    Amer. Econ. Rev.

    (2018)
  • Cited by (0)

    1

    The views expressed herein are solely those of the author, who is responsible for the content, and do not necessarily represent the views of Cornerstone Research.

    View full text