Identifying Behavioral Responses to Tax Reforms: New Insights and a New Approach0.75cm

We revisit the identification of behavioral responses to tax reforms and develop a new approach for graphical validation and representation of treatment effects. We show that the standard estimation strategy relies on an assumption of constant trend differentials. In the context of income taxation, this implies that differences in income trends across the income distribution should remain constant in the absence of tax reforms. Similar to pre-trend validation of differences-in-differences studies, we can validate this assumption by comparing the evolution of income in untreated parts of the income distribution. We illustrate our new approach by studying several tax reforms in Denmark. (JEL: C14 H30 J22) *We thank Amalie Jensen, Thomas Jørgensen, Henrik Kleven, Claus Thustrup Kreiner, Andreas Peichl, Søren Leth-Petersen, Ben Lockwood, Sebastian Siegloch, and Floris Zoutman for helpful comments and discussions. We also thank editor Juan Carlos Suarez Serrato and three anonymous referees for a very constructive revision process. We gratefully acknowledge support from the Danish Economic Policy Research Network (EPRN) and from the Center for Economic Behavior and Inequality (CEBI) at the University of Copenhagen, financed by grant #DNRF134 from the Danish National Research Foundation. Jakobsen also acknowledges support from the Independent Research Fund Denmark, Grant #0128-00007B. †University of Oxford and CEBI. Email: katrine.jakobsen@economics.ox.ac.uk ‡University of Copenhagen and CEBI. Email: jes@econ.ku.dk


Introduction
Behavioral responses to taxes are key inputs in evaluations of tax distortions and the trade-off between equity and efficiency (Saez, 2001;Saez et al., 2012), and they serve as evidence on behavioral parameters in economic models more broadly (Chetty et al., 2011a,b).
While the large literature studying behavioral responses to taxation is diverse in nature, a common challenge is that treatment (e.g., marginal tax rates) is determined by the outcome of interest (e.g., taxable income) and, thus, endogenous. To overcome this challenge, the standard estimation strategy isolates exogenous variation in treatment using tax reforms and assigns treatment status based only on pre-reform information (Gruber & Saez, 2002;Kleven & Schultz, 2014;Weber, 2014). 1 However, as this treatment assignment is a function of past outcomes with limited or no other sources of variation (e.g. spatial), almost any serial correlation in outcomes will violate the common trend assumption underlying differences-in-differences (DiD) studies; and serial correlation is a first order issue in the empirical tax literature as outcomes, such as wealth and income, are severely affected by secular trends and, in particular, mean reversion.
The additional challenge created by serial correlation is well known in panel data models and this is reflected in the solutions developed by the empirical tax literature. Serial correlation is dealt with either by modeling it as an autoregressive process, which can be controlled for by including functions of past outcomes in the regressions (Gruber & Saez, 2002;Kleven & Schultz, 2014), or -inspired by Arellano & Bond (1991) -by using further lags of pre-reform information to assign treatment status (Weber, 2014). However, by relying on these solutions, the empirical tax literature has diverged from modern reduced- 1 The literature studying behavioral responses to income taxation often seek to estimate the elasticity of taxable income (ETI) and, hence, is often referred to as the ETI literature. However, this naming essentially refers to the choice of outcome variable and not the empirical strategy, which is the main focus in our paper. Applying the standard estimation strategy on hours worked yields an estimate of the more traditional (intensive margin) labor supply elasticity. Indeed, many papers in the ETI literature also consider measures of income than come closer to pure labor supply responses (see e.g. Kleven & Schultz, 2014). form studies and lacks the ability to (graphically) validate identifying assumptions. Hence, researchers often find that estimation results are highly sensitive to the exact specification (see e.g., Kopczuk, 2005;Neisser, 2018) with no tools for choosing between them.
In this paper, we revisit the standard estimation strategy to identify behavioral responses to tax reforms and develop a new reduced-form approach that allows for clear validation of identifying assumptions. 2 Cast in the context of income taxation, we show that, in essence, the standard estimation strategy relies on an assumption that any trend differences in income across the income distribution remain constant in the absence of tax reforms. The assumption of constant trend differentials is equivalent to the common trend assumption underlying DiD studies, and we show how the assumption can be validated econometrically and graphically, in a way similar to the comparison of pre-trends in DiD studies.
One way to see the assumption of constant trend differentials of the standard estimate strategy is to consider the canonical study of the US Tax Act of 1986 by Feldstein (1995).
Feldstein (1995) employs a simple DiD and estimates the reform effects by comparing the changes in taxable income for high and low-income individuals over a period, in which the tax reform lowered marginal tax rates more for high-income individuals than for lowincome individuals. However, with today's knowledge, this estimate is likely to be biased as the underlying income trends for high and low-income individuals may be very different. To correct for underlying differences in income trends we can run the same DiD in a pre-period unaffected by tax changes and subtract the estimate from the reform DiD, thus in effect turning it into a DiDiD in time. This approach yields a causal estimate of the reform effect under the assumption that the trend differences estimated in the pre-reform DiD would remain constant in the absence of the reform. In its simplest form, this is what the standard estimate strategy does by controlling for past income, and we show how this insight extends to the use of further lags of pre-reform information to assign treatment status, as done by Weber (2014). 2 We lay out our new approach while remaining agnostic about the underlying anatomy of behavioral responses to tax reforms, including shifting across tax bases (Slemrod, 1995;Gordon & Slemrod, 1998;Pirttilä & Selin, 2011), anticipation effects and shifting across time (Goolsbee, 2000;Kreiner et al., 2016), the endogeneity of the responses to the design of the tax system (Slemrod & Kopczuk, 2002;Kopczuk, 2005;Fack & Landais, 2016), and the presence of optimization frictions (Chetty, 2012). Our new approach enables a simple validation of the assumption of constant trend differentials. The key insight is that individuals treated similarly by a tax reform (e.g., individuals within a given tax bracket) are not a homogeneous group, but are drawn from a wider income range. Thus, within these groups we would expect differences in income trends due to, for example, mean reversion, but under the assumption of constant trend differentials we should observe no changes in these trend differentials within the untreated parts of the income distribution. Hence, we can validate the identifying assumption by non-parametrically comparing trend differentials in two periods: a reform period, where a reform changes tax rates differently for different groups, and a pre-reform period, where ideally the tax system was stable.
We illustrate our new approach using a number of tax reforms in Denmark with a par- Applying our new estimation approach to the 2009-10 tax reform, we find significant negative correlations between initial income and subsequent income growth, which is consistent with mean reversion being the dominant (but not necessarily the sole) feature of the underlying income process in both the pre-reform and reform periods. Next, comparing the pre-reform period to the reform period, we find that the income trend differentials remained stable for the untreated bottom part of the income distribution, while for the treated upper part of the income distribution we find significantly higher income growth.
The changes in trend differentials are strongly increasing in initial income and somewhat larger in the medium run than in the short run. Taken together, we see this as compelling evidence of behavioral responses to the 2009-10 tax reform.
The behavioral responses translate into an average elasticity of taxable income with respect to the marginal net-of-tax rate of 0.227, ranging from 0.1 in the middle of the income distribution to 0.5 at the top. 3 We find that most of the responses are driven by income shifting from pension contributions to taxable income induced by the reform. Using a broader income measure that is unaffected by the shifting of income, we find an elasticity of 0.016.
Turning to the 2004 reform, we find close to the same average elasticities as with the 2009-10 reform when we apply the standard approach. However, applying our new approach and inspecting the trend differentials across the income distribution we find no changes around the changes in tax treatment. Instead, the elasticity estimate is driven by changes in trend differentials well within the control group and is most likely unrelated to the reform. Once we account for these changes, we obtain a precisely estimated zero response for both taxable and broad income. This illustrates the importance of careful validation of the underlying identifying assumptions.
The potential biases that we address are not new. The problems of mean reversion and secular differences in income trends were highlighted already by Auten & Carroll (1999) and discussed extensively in the large literature that followed (see e.g. Saez et al., 2012).
Our contribution is to bring the empirical tax literature up to modern empirical standards by clarifying the assumptions underlying the standard estimation strategy and develop tools to (graphically and econometrically) validate these assumptions. In doing so, we extent the initial steps towards graphical validation of identifying assumptions taken by Weber (2014) and synthesize her work with the earlier approach by Gruber & Saez (2002).
Our paper also relates to the recent work by Kumar & Liang (2020), who focus on estimation and validation using variation in tax rates within narrowly defined income groups, where mean reversion and differential income trends are less of a direct concern. In our paper, we discuss the use of tax variation both within and between income groups, but focus on the latter more "stereotypical" variation created by, for example, changes in top marginal tax rates.
The rest of the paper is organized as follows. Section 2 revisits the standard estimation strategies of behavioral responses to taxation and outlines our new estimation approach.
Section 3 introduces the institutional setting and data used in our empirical application, while Section 4 presents graphical evidence on income responses to the 2004 and 2009-10 reforms. Finally, Section 5 concludes.

Estimating Behavioral Responses to Taxation
In this section, we develop our new approach for estimation of behavioral responses to taxation. We will do so in three steps. First, we set up a simple theoretical framework in the context of income taxation and show the basic difficulties in estimating behavioral responses to different types of tax variation. Second, we revisit and reinterpret the standard estimation strategies covering both the original approach by Gruber & Saez (2002) as well as the more recent work by Weber (2014). Finally, we lay out a new estimation approach that allows for graphical validation and identification of behavioral responses to tax reforms.

Theoretical Framework
To fix ideas we start by setting up a simple model for the "supply" of taxable income. 4 Each individual i at time t maximizes the following quasi-linear utility function subject to a potentially non-linear budget set where c it is consumption, z it is taxable income and x it is a set of other variables, such as the number of children, marital status or underlying components of taxable income that may affect individuals' tax liability T t (z it ; x it ) in addition to their taxable income. The last term of equation (1) captures the disutility associated with earning income, which is governed by the parameters n it , and ε. These parameters can -as we show below -be interpreted as individuals' counterfactual income in the absence of taxation (potential income) and the 4 It is custom in the ETI literature implicitly to ignore the "demand" side of taxable income. In part this goes back to the idea of the ETI being a sufficient statistic for the computation of the deadweight loss (Feldstein, 1995(Feldstein, , 1999, which rests on the assumption that the "price" of taxable income is fixed. If instead, for example, an increase in hours worked reduces the hourly wage rate, using the ETI will underestimate the deadweight loss. elasticity of taxable income with respect to the marginal net-of-tax rate. 5 As in other studies of taxable income responses, we think of z it as being a function of a range of underlying margins, such as hours worked, choice between pecuniary and nonpecuniary job attributes, form and timing of compensation, tax avoidance, and evasion, all of which may be part of individuals' responses to tax changes. Similarly, n it captures all other fixed or time-varying factors that affect taxable income in addition to the marginal net-of-tax rate.
Utility maximization yields the first-order condition where Rewritten as log differences, equation (3) becomes Econometrically, we are interested in the causal effect of a change in the marginal netof-tax rate on taxable income captured by the elasticity ε in equation (4). However, a naive estimation of equation (4) is unlikely to give causal estimates as the marginal tax rate is a (deterministic) function of the dependent variable and, hence, the error term (the change in potential income, △ ln n it ).
To break this endogeneity problem, researchers, starting with Auten & Carroll (1999), have employed a simulated instrumental variable (IV) strategy using tax reforms. 6 The basic idea in this approach is to use predicted changes in marginal net-of-tax rates (△ ln τ p it−k ) driven only by changes in the tax system as an instrument for △ ln τ it . Hence, we compute which compare individual marginal tax rates in period t and t − 1 given the in-period tax system, but holding fixed taxable income and other characteristics at their t − k levels.
With k = 1, we use information at the beginning of the period over which we consider the change in the tax system. This corresponds to the approach in most earlier studies (Auten & Carroll, 1999;Gruber & Saez, 2002;Kleven & Schultz, 2014). With k > 1, we use lagged information corresponding to the approach suggested by Weber (2014).
This basic IV strategy only yields causal estimates of ε if the instrument given by the predicted changes in marginal net-of-tax rates is uncorrelated with the changes in potential Formally, this assumes that the instrument is mean independent (Angrist & Pischke, 2008).
In addition we need the standard assumptions laid out by Angrist et al. (1996). In particular, we require the tax reform to affect taxable income only through the changes in marginal tax rates (the exclusion restriction) as well as a significant first stage relationship between the simulated and actual changes in the marginal net-of-tax rate (cov(△ ln τ p it−k , △ ln τ it ) ̸ = 0). The exclusion restriction holds by definition in the above theoretical framework as individual behavior given by equation (3) depends only on the marginal net-of-tax rate. 7 However in practice, the exclusion restriction is non-trivial as tax reforms typically change a number of institutional details in addition to the marginal tax rates, and must be evaluated case-by-case according to the specific tax reform. 8

Why the Basic IV Strategy is Unlikely to Work
To think about the independence assumption (6), it is useful to distinguish between two types of tax variations. Similar to Kumar & Liang (2020), we can break up the total simulated tax variation into variation between groups with different income levels (betweenincome tax variation) and variation within groups with the same income level (within-income 7 Alternatively, we can think of n it as also capturing the effects of other institutional features on individual taxable income. 8 Finally, we require the stable unit treatment value assumption (SUTVA). This assumption is also nontrivial in the case of tax reforms due to, for example, spillover or general equilibrium effects. The identification of such effects require other sources of variation than we consider here (see, e.g., Zidar, 2019).
where △ lnτ p zt−k = E △ ln τ p it−k z it−k is the average reform-driven change in the net-oftax rate given initial (k = 1) or lagged initial (k > 1) income z it−k .
The between-income variation is created by reforms that change marginal tax rates differently for different income groups (such as a reduction in a top tax), while the withinincome variation is created by reform changes in marginal tax rates that differ among individuals with the same initial income level (such as tax credits for individuals with children or treatment of itemized deductions). We distinguish between these two types of tax variation, because the empirical strategies differ in the two cases. Below we focus on between-income variation, which is the stereotypical tax variation considered in the literature (Saez et al., 2012;Weber, 2014), and we delegate the treatment of within-income variation to Section 2.6.
With reforms that only create between-income variation, equation (6) will most likely not hold. To see this, note that the instrument τ p it−k is a function of two elements: the change in the tax schedule created by the reform and the individuals' (lagged) initial income, z it−k , which in turn is correlated with initial potential income, n it−k . Hence, if a tax reform changes marginal tax rates in a way that is correlated with income, and if the changes in potential income (△ ln n it ) are correlated with the initial level (n it−k ), the instrument is not independent.
Most tax reforms do change marginal tax rates in a way that is correlated with income because most reforms aim at adjusting the balance between efficiency and redistribution of the tax system and, hence, adjust the progressivity of the tax schedule. 9 Similarly, there are at least two reasons why changes in potential income are correlated with initial income: • Mean reversion: if part of n it−k is driven by temporary shocks, individuals with higher z it−k are more likely to experience a negative change in potential income. 9 A prominent example of this is the US tax act of 1986 (TRA86) studied by, for example, Feldstein (1995), which lowered marginal tax rates much more at the top of the income distribution than at the bottom.
Hence, E (△ ln n it |z it−k ) is decreasing in z it−k .
• Differential secular income trends: e.g., secular increases in inequality, where individuals with higher z it−k on average have larger growth in potential income for reasons other than tax reforms. In this case, E (△ ln n it |z it−k ) is increasing in z it−k .
These potential biases are well known in the literature and, in most studies with k = 1, mean reversion appears to be severe and the dominant source of bias. When the variation in marginal tax rates predominately comes from larger reductions at the top of the income distribution, mean reversion biases the estimate downward, which is consistent with the negative elasticity estimates usually obtained in the literature when using the basic IV strategy (Gruber & Saez, 2002;Kopczuk, 2005;Kleven & Schultz, 2014;Giertz, 2015). By choosing k > 1, we are likely to reduce the biased created by mean reversion -but not differential income trends -as any transitory component in potential income (by definition) will play out over time and, hence, not affect changes in potential income sufficiently far into the future (Weber, 2014).

Reinterpreting the Standard Estimation Strategy
To deal with the likely violation of independence in equation (6), researchers haveagain dating back to Auten & Carroll (1999) -included controls for initial income (z it−k ) in the estimation, which changes the assumption of independence to Looking only at a single reform period (changes from before to after a tax reform) this condition is fulfilled trivially in cases with only between-income variation. In these cases, zt−k is a constant for a given level of initial income, and equation (8) thus holds regardless of the changes in potential income. However, controlling fully nonparametrically for initial income as implied in equation (8) also absorbs all identifying variation in the tax instrument in the first stage. 10 10 The first stage equation corresponding to equation (8) is cov(△ ln τ p it−k , △ ln τ it |z it−k ), which is zero when △ ln τ There are two ways to break this deadlock. We can abandon the non-parametric controls for initial income and assume a functional form of the relationship between △ ln n it and z it−k . Or we can employ more time periods to obtain variation in △ lnτ p zt−k conditionally on z it−k . Auten & Carroll (1999) only had one period available and, hence, they were only able to control for initial (log) income linearly. Most subsequent studies have had access to longer panels and have thus been able to use less parametric specifications.
With longer panels, △ lnτ p zt−k only varies due to the time variation created by tax reforms. Hence, a sufficient condition for equation (8) to hold is that the dynamic process of potential income given initial income does not change systematically over time. Formally, we can write this condition as where g(z it−k ) is some time-independent function of initial income describing the dynamic income process, and δ t is a common income growth rate. 11 Equation (9) is an assumption of constant trend differentials across the income distribution over time in the absence of tax reforms. In the absence of changes in the tax schedule, changes in taxable income are solely driven by changes in potential income, and equation (9) states that, relative to the overall growth in the economy (δ t ), the differences in income growth across the distribution (g(z it−k )) should be constant. This is the key identifying assumption underlying the standard estimation strategy as employed by, for example, Gruber & Saez (2002) and Kleven & Schultz (2014).

Graphical Validation and Identification of Behavioral Responses
With the above insight, the identification of income responses and the validation of the identifying assumption underlying equation (9) are straightforward. For identification, 11 One way to think about equation (9) is to consider the a general process of potential income, △ ln n it = g t (z it−k , x it ) , where the change in potential income can be a function of initial income and other covariates (x it ), such as age, experience, children etc. and that this function may change over time.In this setting, there are two sufficient conditions for equation (9) to hold. First,g t (z it−k , x it ) must be independent of time up to a constant:g t (z it−k , x it ) =g (z it−k , x it ) + δ t . Second, the distribution of other (relevant) covariates conditional on initial income must be constant over time: F t (x it |z it−k ) = F(x it |z it−k ). If the distribution of x it is not constant, we can control for these in the estimations as discussed in Section 2.6. we compare the changes in income trends for the parts of the income distribution affected by tax changes to the changes in income trends for the untreated (or less treated) parts of the distribution, while for validation, we can compare the changes in income trends for different subgroups within the untreated parts of the distribution. Under the assumption of constant trend differentials, we should observe no changes in trend differentials within the untreated (or less treated) parts of the income distribution.
We illustrate our new approach in Figure 1, which shows the growth in income across the income distribution for two time periods: a pre-reform period, where the tax schedule remains stable, and a reform period, where the top tax rate is reduced. In both the prereform and the reform period, we draw the growth in income as a decreasing function of initial income (z t−k ). I.e., income trends differ across the income distribution with individuals at the top experiencing lower income growth, on average, than individuals with low initial income. This pattern is consistent with mean reversion being the dominant feature of the underlying income process. Importantly, we do not impose any functional form assumptions on the underlying income process, nor on the relative strength of mean reversion and differential secular income trends.
Next, we compare the trend differentials in the two periods, and for this purpose, we divide the income distribution into two regions. An untreated validation region, where the comparison of trend differentials serves the same validation role as the comparison of pre-treatment trend in DiD studies, and treated identification region, where we -under the identifying assumption -can interpret changes in trend differentials as behavioral responses to the reform. In our illustration, the trend differentials in the validation region follow the same pattern in the pre-reform period as in the reform period, while the trend differentials for the reform period lie above the pre-reform period. Thus, in our illustration we would find positive reform effects with clear validation of the identifying assumption.
Our graphical approach corresponds to the reduced-form of the IV estimation, and hence, the reform effect identified above is the intention-to-treat (ITT) effect of the reform, which relies solely on the assumption of independence given by equation (8). Given that the exclusion restriction also holds, we can obtain an estimate of the average elasticity of taxable income for the set of individuals, who stay within their income bracket and, hence, are treated as prescribed by the tax reform (the compliers), by scaling the ITT with the corresponding first stage.
In Figure 1, we have draw the reform effect as smoothly building up above the kink point, and there are two reasons why this is the empirically realistic scenario. First, with variation in potential income over time individuals close to the kink point are more likely to move out of their tax bracket. This translates into a smaller first stage and a smaller ITT even with a constant elasticity across the population. Second, abandoning the assumption of a constant elasticity, there are reasons to expect smaller behavioral responses around kink points. For example, in the presence of optimization frictions individuals may be unaware of the precise location of the kink point (Chetty, 2012;Rees-Jones & Taubinsky, 2020) and, hence, whether their marginal tax rate have changed following the reform. This translates into a lower effective elasticity and, hence, ITT close to the kink point. 12

A New Estimation Approach
Building on the discussion above, we layout the following three-step estimation procedure, which we implement in Section 4: 1. Assignment of validation and identification regions: Compare predicted betweenincome changes in marginal net-of-tax rates across the income distribution in the pre-reform and reform period. Identify validation region(s), where the marginal tax rates are (close to) stable in both periods, and identification region(s), where marginal tax rates change significantly in the reform period only.

Comparison of trend differentials:
Compare income trends across the income distribution in the pre-reform and reform periods and examine if the assumption of constant trend differentials holds in the validation region(s). This can be done graphically as in Figure 1, and econometrically by estimating a flexible relationship between changes in taxable income and initial income in the two time periods and test for significant changes in trend differentials in the validation region.
3. 2SLS estimation: Conditional on positive validation in step 2, we can obtain an estimate of the elasticity of taxable income by combining the equations (4) and (9). This leads to the estimation equation where f (z it−k ) is a flexible control function of initial income and D re f orm it is a dummy for the reform period, and where we instrument ∆lnτ it with △ lnτ p zt−k . 13 We can run this regression using either the entire sample, or using only particular parts of the validation and identification regions to investigate heterogeneity in the behavioral responses..
It is worth noting that as the regression in equation (10) fundamentally is the same as in the standard estimation strategy (Gruber & Saez, 2002;Kleven & Schultz, 2014), the estimated elasticities from our approach will not (necessarily) differ from the standard estimation strategy. However, without the validation provided by step 2, any aggregate elasticity from the standard estimation strategy may just as well be produced by changes in trend differentials within the validation region, which is a strong indication of violations of the identifying assumption. We illustrate this issue empirically in Section 4.

Extensions and Additional Considerations
Using Lagged Initial Income: In Figure 1 above, we illustrated our new approach in a setting, where mean reversion was the dominant feature of the underlying income process, which most often is the relevant empirical setting when k = 1. With k > 1 mean reversion becomes less severe, and picking a sufficiently high k we can hope to fully eliminate this source of bias (Weber, 2014). Hence, with k > 1 and no secular income trends, 13 By using △ lnτ P zt−k = E(△ ln τ p it−k |z it−k ) as the instrument instead of the actual predicted tax changes, △ lnτ P it−k , we only identify the elasticity from the between-income tax variation, which is the tax variation we can validate in our new approach. the predicted tax changes △ lnτ p zt−k may fulfill the basic independence assumption (6), cov △ ln τ p it−k , △ ln n it = 0, without the need to control for initial income. We can validate the basic independence assumption using the same approach as described on Section 2.4. Under independence, the relationship between z it−k and E (△ ln z it |z it−k ) should be flat in untreated parts of the income distribution as illustrated in Figure A.I in the Online Appendix. In our empirical application in Section 4, we focus on the baseline setting with k = 1 and consider settings with k > 1 in Online Appendix B.
Adding Additional Controls: Similarly to the standard estimation strategy, our new approach can incorporate additional controls such as changes in past income and more traditional controls for demographics etc.With additional controls, the assumption of constant trend differentials needs to hold conditional on the controls, which we can implement using the weighting strategy suggested by DiNardo et al. (1996) and adjust for changes in the distribution of x using inverse propensity weighting. 14 Formally, we would estimate P(D re f orm it = 1|x, z it−k ) and reweight the observations in the pre-reform period by . We implement this reweighting strategy for our empirical application in Figure A.II in the Online Appendix. 15 Within-Income Tax Variation: While our new approach focuses on the use of betweenincome variation, most tax reforms create both between and within-income tax variation.
With within-income tax variation we do not face the same immediate problem of differential income trends due to mean reversion or secular income trends, as we can identify behavioral responses by comparing individuals with similar initial income (Kumar & Liang, 2020). Hence, we can -with a few adaptations -analyze this type of tax variation in a DiD framework either by splitting the population directly on the observables that drive the within-income variation (e.g. the number of children) or by using a tax simulator to compute the within-income change in marginal net-of-tax rate for each individual. Splitting the population directly on observables is arguably the most transparent and elegant estimation strategy, but it typically requires that the tax variation is a simple function of a few variables. 16 Using a tax simulator is a "one-size-fits-all" approach that works irrespective of the source of the tax variation, but only by maintaining an element of black box in the estimation, where it is not clear what part of the reform variation is driven the estimates. We layout these points in Online Appendix C, where we also discuss related cases with disperse between-income tax variation.
Heterogeneous Elasticities: In practice, there may be a number of reasons why elasticities differ across the population. Behavioral responses to tax changes may, for example, be larger at the top of the income distribution due to income shifting, or they may be smaller close to tax thresholds due to optimization frictions such as imperfect knowledge of the exact location of the threshold etc. It is well-known that heterogeneous elasticities bias the elasticity estimates when there is no pure control group (with unchanged marginal tax rates), but only differences in treatment intensities. The same applies in our approach, but as we show in Section 4, it is often straightforward to investigate whether the elasticities are heterogeneous and, hence, to determine whether the estimates are likely to be biased.
Income Effects: Income effects are a particular reason why the (uncompensated) elasticities considered in our theoretical framework could vary across the population. Hence, as we lay out in Online Appendix D, we can use our new approach to investigate whether the heterogeneity in the elasticities follows the pattern predicted by the presence of income effects similarly to the detection of heterogeneous elasticities discussed above. In the case of a top tax reduction, we would expect the income effect to be larger for high-income individuals, who experience a larger mechanical reduction in tax payments. In contrast, individuals right above the top tax threshold experience only a marginal mechanical tax reduction. Hence, given the presence of income effects, we would expect the largest behavioral responses among the individuals right above the top tax threshold as these individuals almost exclusively respond to the change in the marginal tax rate (a positive substitution effect), while the behavioral responses of higher income individuals are more muted due to the negative income effect. However, the pattern of behavioral responses may be complicated by the presence of other sources of response heterogeneity, which 16 The large literature on the effects of the earned income tax credit (EITC) (Eissa & Liebman, 1996;Meyer & Rosenbaum, 2001; Kleven, 2020) is a prominent example of this approach. make tax reforms in general less ideal to identify income effects compare to, for example, the study of lottery winnings (Imbens et al., 2001;Cesarini et al., 2017).
Multiple Pre-Reform Periods: We can use more pre-reform periods to validate the assumption of constant trend differential. In particular, additional pre-reform periods provide additional validation for the assumption of constant trend differentials further away from the validation region, where the comparison of trend differentials in the validation region are less informative. This is similar to the estimation of long-run treatment effects in DiD studies, where even small differences in pre-treatment trends become significant when extrapolated over long time periods (Kahn-Lang & Lang, 2020;Rambachan & Roth, 2020). The practical difficulty in using additional pre-reform periods is to find sufficiently long periods with a stable tax system. In our empirical application in Section 4, we consider changes in taxable income over 4-year periods. Hence, using two pre-reform periods ideally requires 8 years with no changes in the tax system. In Figure A.III in the Online Appendix, we consider 2-year periods, which allows us to use two pre-reform periods as validation.
Relationship with DiD Studies with Pre-Trend Corrections: In some DiD studies researchers correct for differential pre-trends by estimating a linear trend on pre-reform data and subtracting this from the post-reform estimates (see, e.g., Jakobsen et al., 2020). Hence, the identifying assumption in these studies is also one of constant trend differentials. However, this approach is unlikely to work in the case of tax reforms when we assign individuals to treatment and control groups based on taxable income, as mean reversion creates very different income trends going towards and away from time of assignment. Moving away from the year of assignment, mean reversion tends to reduce the income of highincome individuals relative to low-income individuals, while income growth tends to be higher for the (to be) high-income individuals prior to assignment. Controlling for differential pre-trends in a standard DiD with this assignment of treatment and control groups will only exacerbate the bias created by mean reversion. This is the reason why we set up our new approach in differences instead of a standard DiD in levels.

Institutional Setting
In Denmark, individuals are taxed according to a dual tax system with generally higher taxes on labor income (and transfers) than on capital income. 17 The system operates with six income concepts summarized in Table 1. The income concepts are labor market income (LI), personal income (PI), capital income exclusive of income from stocks (CI), stock income (SI), itemized deductions (ID) and taxable income (TI) with LI, PI, and TI constituting the main tax bases for labor income. 18 In addition, we have added broad income (BI) to the table, which we use to study different margins of response in our empirical analyses.
Overall, these income concepts have remained stable over the time period 2000-12, which surrounds the 2004 and 2009-10 reforms we consider in our main application. Table 2 shows the tax bases and associated marginal tax rates before and after the two reforms. The tax system consists of a number of flat elements levied on the entire base with only minor allowances (labor market contribution, regional taxes, bottom tax) combined with progressive elements created by the income thresholds of the middle and top tax brackets. The system creates three overall tax brackets, which we label bottom, middle, and top.
The tax rates shown in the table are cumulative such that a taxpayer in the top tax bracket is subject to the sum of all tax rates (except the tax on stock income). Hence, a top taxpayer faced a marginal tax of 63.0 percent on labor income before the 2009-10 reform and 56.1 percent after. This corresponds to a change in the net-of-tax rate of 17 log points.
In contrast, the marginal tax rate for a bottom taxpayer is only reduced by 3 log points from 42.6 to 40.9 percent. Figure 2 illustrates the development in the marginal tax rates on labor income (Panel A) and income thresholds (Panel B) for the three overall tax brackets for a typical tax payer.

Tax Reforms
There are a number of important points to take away from this figure. First, in the period leading up to the 2009-10 reform the income tax system was stable in terms of both marginal tax rates in each tax bracket and the bracket thresholds. Second, the 2009-10 reform affected the tax system in two ways. In 2009, the middle tax threshold was increased to be the same level as the top threshold, and in 2010 the middle tax bracket was abolished entirely. 19 Because the tax rates are cumulative, this affects all income above the (old) middle tax threshold, including income above the top tax threshold. Third, after the 2009-10 reform, the tax system is again relatively stable until 2012 with only minor changes in the top tax threshold. Taken together, these features come close to the stylized tax reform considered in Section 2 with a stable pre-period and reform that generates substantial between-income tax variation through the initial threshold increase and subsequent abolition of the middle tax.
In addition, it is worth noting that the 2009-10 reform introduced a cap on certain deductible pension contributions. Prior to the 2009-10 reform, all contributions to employer administrated annuity pensions were deducted from personal income and hence tax exempt at the point of contribution. 20 With the reform, the government introduced a cap of DKK 100,000 (≈ USD 15,000) on expiring annuities (typically with payouts over 10 years).
Many employers reacted to the cap by automatically shifting contributions from expiring to non-expiring annuities, but as shown by Andersen (2018) many of the affected tax payers reduced overall pension contributions in response. In our analysis, this shifting of income will be captured as part of the behavioral responses to the 2009-10 reform, and for this reason, we consider income responses both with and without pension contributions in our empirical analysis in Section 4.
Compared to the 2009-10 reform, the 2004 reform comes after a more unstable preperiod with gradual declines in the marginal tax rates in all brackets and a gradual increase in the middle tax threshold, and instead of changing statutory tax rates, the main element of the 2004 reform is a significant increase in the middle tax threshold. Moving outside the 2000-12 window, the Danish tax system is affected by a number of other reforms. In Figure 2, we also highlight the tax reforms in 1987 and 1994, which we analyze in Online Appendix E.

Data
We use administrative data for the full population of Denmark since 1980. The data combine several administrative registers (linked at the individual level via personal identification numbers) that contain detailed information on labor market history, education, earnings, and demographics with almost all income data being third-party reported (Kleven et al., 2011). However, our new approach does not rely on the availability of a rich set of control variables and is, therefore, implementable using data from tax authorities only. In addition, our new approach does not depend on the availability of large administrative data sets. In our empirical application in Section 4, we use the available statistics power to map out income trend differentials non-parametrically, but with less power we can implement our new approach in a much less granular manner as illustrated in Figure  For each reform, we construct a pre-reform period leading up to the reform and a re-form period spanning the reform. As our baseline, we consider four year changes. For each period, we select individuals who are present in the tax data with positive income over the period considered and with initial income within a range around the tax variation created by the reform. For the 2009-10 reform we select individuals with personal income above DKK 250,000 (USD 37,000, 2019-level), while for the 2004 reform we include individuals with personal income starting already from DKK 200,000 (USD 29,000, 2019-level) as the main tax variation from this reform is located lower in the income distribution. However, the conclusions from our empirical analysis are not sensitive to the sample restrictions.
The sample restrictions leave us with over 2,000,000 individuals per period or approximately half of the Danish adult population, as shown in Table 3. In the table, we also show the descriptive statistics for the pre-reform and reform periods for the 2004 and 2009-10 reforms, respectively. These statistics are measured in the initial year of each period. Considering the 2004 reform, we see that the initial characteristics are stable from the pre-reform period (initial year 1999) to the reform period (initial year 2003). For the 2009-10 reform, we only include individuals with initial income above DKK 250,000, and hence, the samples are, on average, older, more likely to be married, and have higher income.

Graphical Evidence of Income Responses
In this section, we turn to the empirical implementation of our new approach for studying income responses to taxation using the 2004 and 2009-10 reforms described in Section 3. We start by analyzing the 2009-10 reform as this is the largest and most salient of the two reforms and changes marginal tax rates in a way that comes close to the stereotypical reform considered in Section 2.

The 2009-10 Tax Reform
We implement our new approach following the steps layed out in Section 2.5. First, in Figure 3, Panel A, we show the predicted changes in the marginal net-of-tax-rate across the income distribution for two periods. 21 In this step, we define the validation and identification regions. In the pre-reform period from 2004 to 2008, where the tax schedule was close to stable, we see close to no changes in marginal net-of-tax rates both across the income distribution and, as illustrated by the P10-P90 range, within a given income level.
In the reform period spanning 2008 to 2012, the abolition of the middle tax increased the marginal net-of-tax rate by approximately 17 log points for the highest income individuals, while the average changes for individuals with incomes below DKK 350,000 are less than 5 log points. 22 Second, in Panels B and C of Figure 3, we investigate the income trend differentials across the income distribution in the pre-reform (2004-08) and reform  periods to validate the assumption of constant trend differentials. To construct Panel B, we run the following regression separately for each period t where D inc it−1 is a vector of 50 initial income bin dummies based on initial income measured in 2004 or 2008. We exclude the dummy for the DKK 300,000 income bin, and hence, the coefficients β t 1 measure the differences in income growth (income trend differentials) between individuals with different initial income relative to the DKK 300,000 bin. Plotting β t 1 for the two periods produces the empirical equivalent of our stylized Figure 1 in Section 2. We consider changes in income over the full 4 year period to allow time for potentially gradual behavioral responses to materialize and to avoid short term income shifting affecting the estimates. In particular, we avoid capturing the anticipation effect of the 2010 abolition of the middle tax, which caused individuals to shift income from 2009 to 2010 as documented by Kreiner et al. (2016Kreiner et al. ( , 2017 In Panel C we estimate the changes in trend differentials between the pre-reform and reform periods by running the regression In this estimation, we pool the two periods in Panel B and include a vector of income bin dummies D inc it−1 interacted with a dummy for the reform period D re f orm it . We exclude the interacted dummy for the DKK 300,000 income bin and, hence, the coefficients δ 3 capture the changes in income trend differentials across the income distribution relative to the change at this income level. We estimate the changes in trend differentials for different groups and income concepts: Taxable income, which corresponds to the estimates from Panel B; Excluding self-employed, which excludes all individuals with income from self-employment; and broad income, which considers changes in taxable income before deductions, primarily consisting of employer administrated pension contributions. From Panel B, we see the expected downward-sloping curves for both periods, which is consistent with mean reversion being the dominant (but not necessarily the sole) feature of the underlying income process. However, on top of this overall pattern, we see a marked difference between the pre-reform and reform periods. In the identification region, where individuals experience a large increase in the marginal net-of tax rate, the growth rate of taxable income in the reform period lies significantly above the growth rate in the prereform period (taking into account changes in the overall income growth rate). In contrast, we see essentially the same trend differentials in the two periods in the validation region, where individuals were largely unaffected by the tax reform.
Panel C isolates these changes in income trend differentials and is consistent with the identifying assumption of constant trend differentials as we observe no systematic changes in the validation region. 24 For the individuals in the identification region, in contrast, and 23 In Figure A.V and A.VI in the Online Appendix we consider income changes over 2 and 3 years to show how the 4-year effects on taxable income build up over time. 24 A standard F-test of δ 3 = 0 for taxable income in the validation region yields p-value of 0.0012. This finding of significant changes in trend differentials is driven by the dummy for incomes around DKK 340,000. Excluding this dummy yields a p-value of 0.1272. we observe changes in trend differentials for taxable income that are strongly increasing in initial income, and assuming that the trend differentials across the entire income distribution would have remained constant in the absence of the reform, Panel C provides causal and non-parametric estimates of the taxable income responses to the 2009-10 tax reform.
An assumption that appears reasonable given the constant trend differentials observed in the untreated part of the income distribution.
Finally, in Panel D of Figure 3, we translate observed changes in trend differentials into income elasticities, but instead of estimating a single aggregate elasticity, we explore the heterogeneity across the income distribution by running a 2SLS local linear estimator centered at different points throughout the identification region.
Our implementation of the 2SLS local linear estimator follows a standard local linear regression. For each point of initial income in the identification region (h ∈ 400,000; 450,000; ...), we separately estimate a weighted 2SLS with the following second stage equation where D inc it−1 is a vector of DKK 10,000 income bin dummies. 25 In this equation we include the marginal net-of-tax rate both by itself and interacted linearly with initial income to capture both the level and slope of the income elasticity centered on h. Both of these terms are endogenous variables (∆lnτ it , ∆lnτ it (z it−1 − h)) and we instrument these with the corresponding predicted between-income changes in the net-of-tax rate (△ lnτ P zt−1 , △ lnτ P zt−1 (z it−1 − h)). For weights, we first assign individuals within the (less treated) validation region to a control group with a constant weight of 1 across all estimations. In the case of the 2009-10 reform, we select the entire validation region as illustrated by the shaded gray area in Panel D. For the treated individuals in the identification region we assign, separately for 25 The income bin dummies run from DKK 250,000 and are capped at DKK 1,500,000, where the data becomes too thin for separate estimation of dummies. Hence, our regressions on the whole sample include 125 income dummies. Using instead the more fine grid DKK 1,000 dummies (1,250 dummies) does not change the results. each estimation, triangular weights (w) within ± DKK 50,000 of h computed as w = max (50, 000 − |z − h| , 0) /50, 000.
Hence, with this weighting strategy we keep the control group constant and estimate heterogeneous elasticities by moving the treatment group up through the income distributions. Using this strategy, we estimate elasticities for taxable income that are increasing in initial income from below 0.1 for incomes around 400,000 to more than 0.5 for the top of the income distribution. 26 Overall, Figure 3 provides compelling evidence of behavioral responses to the 2009-10 tax reform, however, the substantial effect on taxable income could be driven by a range of factors from "real" labor responses reflecting unobserved efforts, occupational choice, hours worked etc., to avoidance or evasion. To separate out some of these factors, we start by excluding self-employed individuals, who typically have more room to change behavior through, for example, tax planning and retained earnings in the firm (see, e.g., le Maire & Schjerning, 2013). Consistently, we find slightly smaller effects when dropping individuals with income from self-employment, but overall the estimated trend differentials in Panel C and elasticities in Panel D are very similar to those found for the whole population. Next, we consider a broader income measure defined as our taxable income measure (personal income in Table 1) before deductions, which primarily consist of employer administrated pension contributions as discussed in Section 3. Looking at broad income, we find significantly smaller responses across the whole income distribution. Hence, most of the behavioral responses to the 2009-10 reform are likely driven by income shifting from pension contributions to taxable income. The responses in broad income are marginally significant for initial income around DKK 450,000, but insignificant at the top of the distribution. 27 26 A finding of heterogeneous elasticities will normally invalidate the elasticity point estimates when a reform affects the entire income distribution with no pure control group but only differences in treatment intensities. Hence, in cases with no pure control groups, heterogeneity analyses such as the one above become particularly important as a way of validating the estimated elasticities. This is less of an issue in our setting, where the individuals in the validation region are only treated by small tax changes. 27 Our finding that most of the behavioral responses are driven by avoidance is consistent with the findings in other papers that the scope for avoidance has significant influence on the estimated elasticities (Fack & Landais, 2016;Doerrenberg et al., 2017;Neisser, 2018). The heterogeneity in the responses in taxable income

The 2004 Tax Reform
Next, we turn to the 2004 tax reform, which we analyze following the same three steps as above. As mentioned in Section 3, the main feature of the 2004 reform was an increase in the middle tax income threshold, and in Panel A of Figure 4, we see the largest changes in marginal net-of-tax rates for personal income between DKK 290,000 and DKK 360,000. 28 We define this interval as the identification region with validation regions both above and below.
In Panels B and C of Figure 4 we examine the income trend differentials across the income distribution for the pre-reform period (1999)(2000)(2001)(2002)(2003) and the reform period (2003-07). 29 Panel B reveals the expected downward sloping pattern consistent with mean reversion, but notably we observe essentially no changes in income trend across the identification region from the pre-reform to the reform period. The only changes in trend differentials that stand out from Panel B are for initial income between DKK 200,000 and DKK 225,000, which is difficult to reconcile with being responses to the changes in taxation. The same is visible in Panel C where, in addition to taxable income, we also consider the changes excluding self-employed and changes in broad income. For broad income, we find slightly higher income trends for the top of the income distribution (above DKK 500.000), but similar to the change in income trends in the bottom, these are difficult to reconcile with the changes in taxation.
In Panel D, we translate the estimated trend differentials in Panel C into elasticities using the local linear 2SLS estimator described in equations (13)-(14) above. However, as the reform creates only a narrow identification region, it is unfeasible to explore heterogeneity within this region so we only estimate a single point elasticity. A more relevant question when estimating elasticities from the 2004 reform is who to include in the control group.
Based on the discussion above, and contrary to our analysis of the 2009-10 reform, we do to the 2009-10 reform is consistent high-income individuals being more likely to be active savers and, hence, react more to the changed saving incentives (Chetty et al., 2014). 28 Similar to the abolition of the middle tax in the 2009-10 reform, the increased threshold also created within-income tax variation due to the joint taxation of couples. 29 It is worth noting that the reform period for the 2004 reform overlap with the pre-period for the 2009-2010 reform. Hence, strong behavioral responses to the 2004 reform risk confounding our analysis of the 2009-10 reform. In particular if they occur gradually. However, as we show below this is not the case. not include the entire validation region. Instead, we select the gray income intervals immediately adjacent to the identification regions in Panel D as control groups. In this way, we avoid letting the changes in trend differentials observed at the very top and bottom of the income distribution affect the estimates, and from this we obtain a precisely estimated zero for both taxable and broad income. This result is not necessarily surprising given that the 2004 reform was smaller and, hence, less likely to prompt individuals to respond (Chetty, 2012).

Results from the Standard Estimation Strategy
Above, we analyzed the 2004 and 2009-10 reforms using our new approach, which allowed us to inspect the changes in trend differentials graphically and, hence, to validate the identifying assumption of constant trend differentials in the absence of tax reforms. To highlight the importance of such validation exercises, we examine the two reforms using the standard estimation strategy. More specifically, we run the following 2SLS regression separately for the 2004 and 2009-10 tax reforms where D inc it−1 is a vector of DKK 10,000 income bin dummies. This equation contains one endogenous variable (∆lnτ it ), which we instrument with the corresponding predicted between-income changes in the marginal net-of-tax rate (△ lnτ P zt−1 ). We run the regressions on the same data and consider the same four year periods and outcomes as in Figure   3 and 4 above. We present the results in Table 4.
Considering first the results for the 2009-10 reform in columns (4)-(6), we estimate an average elasticity of just above 0.2 for taxable income with (column 4) or without (column 5) self-employed individuals, while the elasticity for broad income is close to zero but still positive and significant. These estimates are well in line with the results from our graphical analysis above, which is not surprising given that we found that trend differentials were constant across the entire validation region in the case of the 2009-10 reform.
Next, considering the 2004 reform in columns (1)-(3), we find elasticities that are very similar to results from the 2009-10 reform. However, in our graphical analysis in Figure   4 we saw no changes in trend differentials in connection with changes in tax treatment.
Instead, the results we obtain from the standard estimation strategy are driven by the changes in trend differentials observed at the very top and bottom of the income distribution, which is less likely to reflect behavioral responses to the tax reform. Hence, in this case, the estimates from the standard approach are most likely biased.
In Online Appendix E, we analyze two additional tax reforms and essentially arrive at the same conclusion as for the 2004 and 2009-10 reforms. For a reform in 1987, we find support for the identifying assumption of constant trend differentials and positive responses for taxable income, while for the reform in 1994, we find clear violations of the identifying assumption. Summarizing the results from the standard approach for the four major Danish tax reforms since 1980, our point estimates are broadly consistent with the estimates found by Kleven & Schultz (2014). However, as highlighted by our new approach, we need to properly validate the assumption of constant trend differentials before assigning a causal interpretation to these estimates.

Conclusion
Behavioral responses to taxes are key inputs in the design of economic policy and serve as evidence on behavioral parameters in economic models more broadly. In this paper, we revisited the identification of behavioral responses to tax reforms and developed a new approach that allows for graphical validation of key identifying assumptions and representation of treatment effects.
Considering stereotypical tax reforms, which change tax rates in one part of the income distribution, while keeping them constant in the rest of the distribution, we show that the standard estimation strategy employed by, for example, Gruber & Saez (2002) and Kleven & Schultz (2014), in essence, relies on an assumption that trend differences in income across the income distribution remain constant in the absence of reforms. Similar to the pre-trend validation of differences-in-differences studies, this identifying assumption of constant trend differentials can be validated by comparing the evolution of income in untreated parts of the income distribution over time.
We illustrate the importance of our new validation approach by studying a number of tax reforms in Denmark, with the 2004 and 2009-10 reforms as our main applications.
Analyzing both reforms through the lens of the standard estimation strategy we find very similar average elasticities in the order 0.2 for taxable income and 0.01 for broad income.
However, comparing the trend differentials for the pre-reform and reform periods, it is clear that only the results from the 2009-10 reform are likely to be causal. For this reform, we find that the income trend differentials remained stable in the untreated bottom part of the income distribution, which is consistent with the identifying assumption, while for the treated upper part of the income distribution, we find significantly higher income growth that is strongly increasing in initial income and somewhat larger in the medium run than in the short run. In contrast, we find no changes in trend differentials around the changes in tax treatment created by the 2004 reform. Instead, the elasticity estimates for this reform are driven by changes in trend differentials well within the control group, which are most likely unrelated to the reform. Once we account for these changes, we obtain a precisely estimated zero for both taxable and broad income.  Initial Income

Reform Period Pre-Reform Period
Notes: The figure shows E (△ ln z it |z it−k−1 ), i.e., the changes in log income across the income distribution for two time periods: a pre-reform period, where the tax system remains stable and a reform period, where the top tax is reduced. The negative relationship between E (△ ln z it |z it−k−1 ) and z it−k−1 is consistent with mean reversion being the dominant -but not necessarily the only -feature of the underlying income process. Under the assumption that this pattern would have remained constant (relative to the overall growth in the economy) in absence of the reform, we can identify the reform effect from the differences between the reform and pre-reform periods for the population with initial income above the top tax threshold (the identification region). In contrast, we should observe no differences for the part of the population with initial income below the top tax threshold (the validation region). Thus, comparing the reform and pre-reform periods in this part of the income distribution acts as a placebo test for the validity of the identifying assumption. Notes: The figure shows the key features of the Danish tax system from 1980-2020. Panel A shows the marginal tax rates of the three (main) income tax brackets, while Panel B shows the income thresholds. The thresholds are deflated to 2019-levels using the implied wage indexes in the tax code. Hence, changes in the thresholds reflect active policy decisions. The figure highlights the four major reforms that we analyze. USD 1 ≈ DKK 6.8. Notes: Panel A shows the predicted changes in the log net-of-tax rate (τ p it−1 ) from 2004 to 2008 (the pre-reform period) and 2008 to 2012 (the reform period). The curves show the average changes within each income bin and the shaded areas show the 10th and 90th percentile ranges. Panel B shows the estimated income trend differentials using equation (11) for 2004-08 and 2008-12 relative to the average growth rate for incomes around DKK 300,000. Panel C shows the estimated changes in trend differentials based on equation (12) for different samples and income concepts. The use of different income concepts only affects the dependent variable (y-axis). Initial income always refers to personal income as defined in Table 1 and is measured pre-reform in 2004 and 2008. Panel D shows the implied elasticities over in the identification region using the 2SLS local linear estimation described in equation (13). The gray shaded area illustrates the interval included as the control group in the elasticity estimations. Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8. Notes: Panel A shows the predicted changes in the log net-of-tax rate (τ p it−1 ) from 1999 to 2003 (the pre-reform period) and 2003 to 2007 (the reform period). The curves show the average changes within each income bin and the shaded areas show the 10th and 90th percentile ranges. Panel B shows the estimated income trend differentials using equation (11) for 1999-2003 and 2003-07 relative to the average growth rate for incomes around DKK 250,000. Panel C shows the estimated changes in trend differentials based on equation (12) for different samples and income concepts. The use of different income concepts only affects the dependent variable (y-axis). Initial income always refers to personal income as defined in Table 1 and is measured pre-reform in 1999 and 2003. Panel D shows the implied elasticities over in the identification region using the 2SLS local linear estimation described in equation (13). The gray shaded area illustrates the interval included as the control group in the elasticity estimations. Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8.  a) Because labor market income enters the other tax bases net of the labor market contribution, the effective tax rate on labor income equals the statutory tax rate times (1 -the labor market contribution rate). b) The Danish EITC is treated as an itemized deduction in taxable income (TI). Hence, the tax value of the earned income tax credit (EITC) is the EITC rate times the marginal tax on taxable income. c) The regional tax includes municipal taxes and health contributions. The regional tax rate in the table is the average across all municipalities. d) If the sum of all regional and national tax rates (excluding the stock income tax and the labor market contribution) exceeds the specified ceiling, the top tax is adjusted downward until the marginal tax rate equals the ceiling. e) There have been a few minor changes to the tax bases over time, related to the treatment of capital income.   1999-20031999-20031999-20032004-20082004-20082004-20082003-20072003-20072003-20072008-20122008-20122008-2012 Notes: The table summarizes the results of the standard estimation strategy described in equation (15) for the 2004 and 2009-10 reforms. Positive validation refers to the graphical inspection of trend differentials in Figure 3 and 4. Only for the 2009-10 reform we found support for the identifying assumption of constant trend differentials, and hence, only for this reform is the estimates using the standard estimation strategy likely to be causal. Robust standard errors in parentheses. Notes: The figure shows E (△ ln z it |z it−1−k ), i.e., the change in taxable income across the lagged income distribution for two time periods: a pre-reform period, where the tax system remains stable and a reform period, where the top tax is reduced. The flat relationship between E (△ ln n it |z it−1−k ) and z it−1−k in the validation region is consistent with the identifying assumption in Weber (2014). That is, the use of further lags of income to construct the simulated instruments fully resolves the problem of mean reversion. Under this assumption, the reform effect can simply be estimated as the difference in income growth between treated and untreated parts of the income distribution. However, with a pre-reform period available, the identifying assumption can be validated even further, as illustrated in the figure.  Notes: The figure replicates Panel C from Figures 3 and 4 in the main paper using weights to control for additional covariates as described in Section 2.6. Specifically, we follow DiNardo et al. (1996) and estimate P(D re f orm it = 1|x, z it−k ) separately for each bin of z it−k as a logistic function of dummies for being married, being self-employed, 5-year age categories, and quintiles of the capital income distribution. We weight the observations in the pre-reform period by P(D re f orm it = 1|x, z it−k )/P(D re f orm it = 0|x, z it−k ) when estimating the changes in trend differences based on equation (12) in the main paper.     (13). Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8.

B Using Lagged Initial Income to Assign Treatment Status
In this section, we apply the Weber (2014) version of our new approaches described in section 2.6 to the tax reform in 2009-10. In Figure A.VII we investigate the income trend differentials across the income distribution in the pre-reform (2004-08) and reform  periods similar to Panel B in Figure (A.V) in the main paper, but using further lags of taxable income (z it−k ) to group individuals. Naturally, this grouping requires additional years of data leading to the pre-reform period.
Formally, we run a regression similar to equation (11) separately for each period t, where D inc it−k is a vector of (lagged) initial income bin dummies measured in t − k for k ∈ {1, 2, 3, 4}. Plotting β t 1 for the two periods produces the empirical equivalent of our stylized Figure A.I. 30 Consistent with Weber (2014), the mean reversion problem becomes less pronounced when using further lags of initial income as the overall slope of both curves becomes flatter, and -looking at Table (A.I) -the 2SLS estimates without income controls become less negative. However, even with a lag length of three years, we still find differences in income trends across the income distribution in our case.
Looking at the regressions including income controls in Table A.I, where we use the pre-reform period to control for the (remaining) income trend differentials, we find positive elasticity estimates of the same order as our baseline approach but increasing in the lag length. Mechanically, this pattern emerges due to the first stage estimates falling at a faster rate than the reduced form estimates. More conceptually, we can think of this pattern as being driven by a changing set of compliers. The compliers in these estimations are the set of individuals who stay within their income bracket and, hence, are treated as prescribed by the tax reform and using further lags we are selecting a decreasing set of individuals who stay within their tax bracket over longer periods. 30 For each lag length (k), we select the sample as individuals who have taxable income above DKK 250,000 in both t − 1 and t − k. In principle, selecting individuals based on their income in multiple years could in itself reduce the problem of mean reversion. In our case, the changing samples only have minor effects on the estimates.  Table 1. Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8.  Notes: The table summarizes the estimated elasticity of taxable income from the 2009-10 reform using the 2SLS estimator similar to equation (15) in the main paper, but using lags of initial taxable income (z it−k ) when computing the predicted changes in marginal net-of-tax rates. Panel A shows the estimated elasticities using only the reform-period and no income controls. Panel B shows the estimated elasticities using both the pre-reform and reform periods and including controls for (lag) initial income. For each lag length (k), we select the sample as individuals who have taxable income above DKK 250,000 in both t − 1 and t − k. The changing sample only has minor effects on the estimates. Robust standard errors in parentheses.
Kleven, 2020) is a prominent example of this approach. Splitting the population directly on x it−k is arguably the most transparent and elegant estimation strategy, but it typically requires that the tax variation is a simple function of a few variables (as, e.g., in the case of the EITC). However, in many cases the within-income tax variation is created by changes in the treatment of various underlying income sources (e.g., capital income), specific itemized deductions, jointness in the taxation of households etc., and in these cases, the variation is likely to be too muddy to be applied in a standard DiD.
The second option uses a tax simulator to assign reform driven changes in marginal netof-tax rates to individuals both in the actual reform period and in pre-reform periods as placebo changes (i.e., assigning marginal net-of-tax rates as if the reform was implemented in earlier periods) and compare the estimates from the actual reform to the placebo estimates from pre-reform periods. Using a tax simulator in this way is a "one-size-fits-all" approach that works irrespective of the source of the tax variation, but only by maintaining an element of black box in the estimation, where it is not clear what part of the reform variation is driving the estimates.

Disperse Between-Income Tax Variation
In cases where tax reforms create between-income variation in marginal tax rates that is sufficiently dispersed throughout the income distribution, the variation may come close to being uncorrelated with initial income. The extreme case is changes in marginal tax rates that are solely a function of income, but where the changes are distributed randomly across the income distribution in a way that fulfill the independence assumption (6) from the basic IV strategy. Hence, this source of variation allows for a DiD analysis similar to the case with within-income tax variation. One example of this is Saez (2003), who studies the large "bracket creeps" in the US in the late 1970s. In this case, treatment and control groups were distributed throughout the income distribution instead of being concentrated at the top, as in our example in the main text. 33 Determining whether a reform creates 33 In the case of Saez (2003), the changes in marginal tax rates turned out to be correlated with income. Thus, Saez (2003) still includes controls for initial income. However, the particular source of variation does enable Saez (2003) to replace the requirement of a panel with at least two periods with an assumption that the relationship between the change in income and initial income, g(z it−1 ) in equation (9) , is sufficiently smooth. Intuitively, Saez (2003) compares a group of individuals, who are pushed into another bracket by sufficiently disperse between-income variation can be done similar to the standard pretrend validation in DiD studies. If the variation fulfills independence assumption (6), we should not see any differences in income trends between treated and untreated individuals in the period leading up to the reform.

Estimates Using Within-Income Tax Variation from the 2004 and 2009-10 Tax Reforms
Mirroring the between-income estimates presented in Section 4 of the main paper, we estimate income elasticities using the within-income tax variation created by the 2004 and 2009-10 tax reforms. showing the median and selected inter-percentile ranges (Panels A-C) and within-income tax variance (Panels B-D) of the changes in the marginal net-of-tax rates created by the two tax reforms. For the 2009-10 reform, the median changes closely follow the changes in statutory tax rates, which drive the between-income variation used in the main paper, but around this level, we see a dispersion of around 10 log points for income levels below DKK 500,000 and with particular concentrated within variation between DKK 350,000-450,000.
Similarly, for the 2004 reform, the median changes also closely follow the changes in the statutory tax rates, which drives the between-income variation used in the main paper. 34 Next, in Figure A.IX and Table A.II we show the 2SLS elasticity estimates of the taxable income (Panels A-C) and broad income (Panels B-D). Specifically, we run high inflation with individuals with income just below and just above, who remain in their bracket. Thus, the counterfactual income growth of the treatment group can be estimated as a weighted average of the income growth of the two "control" groups. 34 The dispersion of the changes in net-of-tax rates for both the 2004 and 2009-10 tax reforms is predominately created by the fact that the middle tax was based on couples' total income. Hence, differences in spousal incomes create differences in the income level at which individuals start paying the middle tax, and whether they are affected by the increase in the middle tax threshold in the 2004 reform and the abolition of the tax in the 2009-10 reform. A more tailored estimation strategy would exploit this jointness more explicitly instead of the "one-size-fits-all" approach applying in this section, which also captures tax variation created by municipal tax changes and other minor sources.
using only a single period (e.g., the reform period 2008-12) and where we instrument the endogenous variable (lnτ it ) with the within-income tax variation △ ln τ p it−1 − △ lnτ p zt−1 . 35 In Figure A.IX, we explore heterogeneity across the income distribution by running the estimation separately for 10 equally sized groups of the population, while in Table A.II we estimate average elasticities. The exercise reveals significantly higher elasticities for primarily the lower income groups for the 2009-10 tax reform, while for the 2004 reform we find the highest elasticities for incomes around DKK 300,000. 36 To validate these estimates we implement the placebo exercise based on the tax simu- For the 2004 reform, we find an elasticity for taxable income of -0.038 for the reform 35 In practice, we could just as well have instrumented with the total tax variation, as we with only a single period and saturated income controls (DKK 10,000 bins) soak up all between-income variation. 36 The 2SLS estimator implicitly weights heterogeneous treatment effects by their relative variance in treatment intensity and, hence, for the 2009-10 reform the estimator puts more weight on the relatively low estimates for incomes between DKK 350,000-450,000 when computing the aggregate elasticity. 37 In cases where the pre-period also contain variation in tax rates, the placebo exercise is only informative if the reform changes in marginal tax rates are uncorrelated with the pre-reform changes. Hence, in Table  A.II we also include the results from a regression of the placebo reform changes in tax rates on the actual changes in tax rates in the pre-reform period. For the 2009-10 reform, we only find a very small estimate due to the stability of the tax system in the pre-reform period. For the 2004 reform, we find a negative estimate of -0.195. period and -0.088 using the placebo period, and examining the elasticity estimates across the income distribution, we see that these negative estimates are driven by the top of the income distribution. Furthermore, we see that the reform estimates closely follow the placebo estimate except for incomes in the range of DKK 300,000-350,000.      2003-2007 2003-2007 1999-2003 1999-2003 1999-2003 2008-2012 2008-2012 2004-2008 2004-2008 2004-2008 Observations 2,240,833 2,240,832 2,097,862 2,097,861 2,097,862 Notes: The table shows the details of the 2SLS estimates based on equation (18) for the whole sample in Figure A.IX. Robust standard errors in parentheses.

D Estimating Income and Substitution Effects
In the analysis in the main paper, we disregarded the distinction between income and substitution effects of tax changes and simply sought to estimate what, in practice, are uncompensated elasticities. However, many papers in the empirical tax literature (including Gruber &Saez, 2002 andKleven &Schultz, 2014) try to split the behavioral responses into income and substitution effects, and in this appendix, we revisit these efforts in light of our new approach.
Most papers motivate the study of income effects by linearizing the (net-of) tax schedule using virtual income. In this case, let the uncompensated supply of taxable income (z) be defined as z(τ, y), where τ is the marginal net-of-tax rate and y ≡ z(1 − τ) − T(z) + y is virtual income with T(z) being the tax function and y being other (untaxed) sources of income. Following the derivations in Gruber & Saez (2002), we can write the uncompensated change in z following a reform that changes τ and y as where ε u = ∂z ∂τ τ z is the uncompensated elasticity and η = ∂z ∂ y y z is the elasticity wrt. virtual income. Further, using the Slutsky equation ε u = ε c + ητz/ y, we can rewrite equation where dT(z) is the mechanical effect of the reform on the individuals' after-tax income.
This equation describes the behavioral responses we should expect to see across the income distribution depending on the "structural" parameters ε c and η.
Next, let us consider a tax reform that reduces the top tax in a similar way to the reform analyzed in Section (2) of the main paper. As the reform only reduces the marginal tax rate above the top tax cutoff K, dτ > 0 is constant for all top taxpayers and zero for everyone else. Thus, from equation (20) we should expect a positive and constant substitution effect for all top tax payers. In contrast, dT(z) = d(1 − τ)(z − K), which is a decreasing function (as d(1 − τ) < 0) of the taxpayer's income that exceeds the top tax threshold. Thus, we should expect an income effect starting at zero for tax payers just above the top tax threshold and gradually becoming more negative for higher incomes. Combining the income and substitution effects, we obtain the largest predicted behavioral responses for individuals just above the top tax threshold as these are only subject to the positive substitution effect with the effect declining for higher income due to the increasing income effect as illustrated in Figure A.X.
Econometrically, the pattern in Figure A.X is what we would look for in the data with the substitution effect being identified from the estimated treatment effect close to the top tax threshold, and the income effect identified from the slope of the treatment effect above. However, the presence of income effects is just one among many potential reasons why treatment effects may vary over the income distribution. Behavioral responses to tax changes may, for example, be larger at the top of the income distribution due to income shifting, or they may be smaller close to the tax threshold due to optimization frictions such as imperfect knowledge of the exact location of the threshold, and trying to fit these heterogeneous treatment effects into a framework with a constant substitution and income elasticity is to likely yield misleading results. For example, blindly fitting the estimated elasticities from the 2009-10 reform in Figure 3 in the main paper to such a framework would yield a small substitution elasticity and a large but positive income elasticity. Hence, it is not surprising that the income effects estimated in the empirical tax literature do not align with the estimates using other sources of variation, such as, for example, lottery winnings (see, e.g., Imbens et al., 2001 andCesarini et al., 2017).
Other tax reforms, such as changes in tax thresholds may create variation that is better suited to separate income and substitution effects. However, estimating the heterogeneity of treatment effects non-parametrically, as we do in Figure 3, preserves the transparency of the estimation and avoids functional form assumptions of constant substitution and income elasticities. 38 38 In addition to other types of reforms, other estimators may also be better suited to distinguishing between income and substitution effects. For example, the bunching estimator uses only variation locally around a kink point and, hence, in principle only identify the compensated behavioral responses (Saez, 2010;Kleven, 2016). However, the bunching estimator is not immune to bias from other sources of response heterogeneity (He et al., 2020). Notes: The figure illustrates the predicted behavioral responses from a reduction in the top tax rate based on equation (20), assuming constant substitution (ε c ) and income (η) elasticities.

E Examining Additional Tax Reforms
In this appendix, we apply our new approach to two additional tax reforms in Denmark. 39 Figure 2 in the main paper shows the marginal tax rates on labor income over time, and in addition to the 2009-10 and 2004 tax reform studied in the main paper, we highlighted the 1987 and 1994 reforms. We analyze these two reforms in the same three steps, using the four key graphs and estimate income elasticities using the standard approach, similarly to our analysis of the 2004 and 2009-10 reforms in the main paper.

The 1987 Tax Reform
The 1987 tax reform changed the Danish tax system in multiple dimensions, most notably by introducing a dual tax system with lower tax rates on capital income (see also Gruber et al. (Forthcoming)). In addition, the reform reduced the marginal tax rates on labor income for tax payers in the top and middle tax brackets, while increasing them slightly in the bottom bracket as illustrated in Figure 2 in the main paper. Hence, at first glance the reform variation resembles the variation created by the 2009-10 reform, and similar to our analysis in the main paper of the 2009-10 and 2004 reform, we analyze the 1987 reform in four key graphs in Figure A.XI.
We start in Panel A by showing the distribution of changes in the marginal net-of-tax rates across the income distribution in the pre-reform (1982-1986) and reform (1986-1990) periods. 40 From this panel, we notice two features that make the 1987 reform less ideal.
First, the pre-period also contains considerable tax variation, and second, while both the within and between-income tax variation is considerably larger in the reform period, the between-income variation is more dispersed across the income distribution with less clear untreated groups. To make progress, we define the validation region as taxable incomes below DKK 260,000, as this region experiences somewhat similar changes in marginal netof-tax rates in the two periods. Consequently, we define the identification region as taxable 39 An overview of the Danish tax reforms since 1980 can be found on the homepage of the Ministry of Taxation (in Danish) 40 For the 1987 reform we use legal taxable income as defined in Table 1 in the main paper as initial income. Taxable income was the main tax base for labor income in our two initial years (1982 and 1986). Personal income only become the main tax base for labor income after the reform. income above DKK 260,000.
Panel B plots the differences in income growth across the income distribution with the change for incomes around DKK 225,000 normalized to zero. Despite the less ideal tax variation, the graph reveals the expected downward sloping pattern, which is consistent with mean reversion. In the validation region, the observed income trends largely follow the same pattern in the reform period as in the pre-reform period, while for incomes above DKK 260,000 we observe significantly higher income growth compared to the pre-reform period. The same is visible for taxable income in Panel C, which shows the estimated changes in income trend differentials from 1982-86 to 1986-90. Looking at broad income, we observe the reverse, with significantly negative changes in income trends in the identification region.
In Panel D, we translate the estimated trend differentials in Panel C into elasticities using a local linear 2SLS estimator similar to the method used in Section 4 in the main paper. In contrast to the 2009-10 reform, we find more homogeneous elasticities for taxable income across the income distribution, but the implied elasticities for broad income are negative.

The 1994 Tax Reform
The 1994 reform followed the same direction as the 1987 reform by reducing marginal tax rates for labor income and widening tax bases. From Panel A in Figure A.XII, we see that, on average, the reform increased the marginal net-of-tax rate by around 12 log points relative to the pre-reform period for incomes above DKK 325,000, while the changes in marginal tax rates for income below were similar in the reform and pre-reform periods (keeping in mind the minor differences in treatment for individuals with income below DKK 300,000). 41 Hence, we define the validation region as personal income between DKK 250,000 and DKK 325,000, and the identification region as above DKK 350,000.
Panel B plots the income trend differentials across the income distribution with the change for incomes around DKK 300,000 normalized to zero. The graph reveals the ex-41 Note also that the changes in marginal tax rates were phased in over a number of years, as illustrated in Figure 2 in the main paper. pected downward sloping pattern consistent with mean reversion, but we find no evidence of an upward shift in income growth for individuals in the identification region.
In fact, we find the opposite with the 1993-97 curve lying below the pre-reform 1989-03 curve. The same is visible in Panel C, which shows the changes in income trend differentials from 1989-93 to 1993-97. In this panel, we do find higher growth for broad income in the identification region, but only for individuals with relatively high initial income. In Panel D, we translate the estimated trend differentials in Panel C into elasticities using a local linear 2SLS estimator similar to the method used in Section 4 in the main paper. Consistent with Panel C, we find heterogeneous and negative estimates for taxable income and heterogeneous and positive for broad income.
Overall, Panels B and C in Figure A.XII raise doubts about the validity of the identifying assumption of constant trend differentials in the case of the 1994 reform, as we see significant changes in trend differentials in the validation region, and as the overall pattern of changes in Panel C appears unrelated to the tax variation.

Results From the Standard Approach
Above, we have analyzed the 1987 and 1994 reforms using our new approach to validate the identifying assumptions by graphically inspecting the changes in trend differentials from the pre-reform to the reform periods. Parallel to the main paper, we also examine the reforms using the standard approach by running equation (15) in the main paper separately for the two reforms. Table A.III presents the results.
Considering first the results from the 1987 reform in columns (1)-(3), we estimate an average elasticity of 0.094 for taxable income (column 1) with and without the self-employed (column 2). Considering a broader income measure, we find a negative average elasticity (column 3). Turning to the 1994 reform in columns (4)-(6), we find a negative estimate for taxable income with (column 4) and without (column 5) the self-employed. Considering a broad income measure, we find a positive elasticity estimate. However, these estimates are less likely to be causal given our inspection of the trend differentials above.  Notes: Panel A shows the predicted changes in the log net-of-tax rate (τ p it−1 ) from 1982 to 1986 (the pre-reform period) and 1986 to 1990 (the reform period). The curves show the average changes within each income bin and the shaded areas show the 10th and 90th percentile ranges. Panel B shows the estimated income trend differentials using equation (11) for 1982-86 and 1986-90 relative to the growth rate for income around DKK 260,000. Panel C shows the estimated changes in trend differentials based on equation (12) for different samples and income concepts. The use of different income concepts only affects the dependent variable (y-axis). We always use taxable income as initial income (x-axis) as taxable income most accurately determines treated by the reform. Initial income is measured in 1982 and 1986 and thus is pre-reform. Panel D shows the implied elasticities over the income distribution estimated using the 2SLS local linear estimation described in equation (13). Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8.  Notes: Panel A shows the predicted changes in the log net-of-tax rate (τ p it−1 ) from 1989 to 1993 (the pre-reform period) and 1993 to 1997 (the reform period). The curves show the average changes within each income bin and the shaded areas show the 10th and 90th percentile ranges. Panel B shows the estimated income trend differentials using equation (11) for 1989-93 and 1993-97 relative to the growth rate for income around DKK 300,000. Panel C shows the estimated changes in trend differentials based on equation (12) for different samples and income concepts. The use of different income concepts only affects the dependent variable (y-axis). We always use taxable income as initial income (x-axis) as taxable income most accurately determines treated by the reform. Initial income is measured in 1989 and 1993 and thus is pre-reform. Panel D shows the implied elasticities over the income distribution estimated using the 2SLS local linear estimation described in equation (13). Confidence bounds are based on robust standard errors. USD 1 ≈ DKK 6.8.  1982-19861982-19861982-19861989-19931989-19931989-19931986-19901986-19901986-19901993-19971993-19971993-1997 Notes: The table summarizes the results from the standard estimation approach described in equation (15) for the 1987 and 1994 reforms. Robust standard errors in parentheses.