Treatment-effect identification without parallel paths

Imagine a region suffering from a widening income gap that becomes eligible for a generous transfer programme (the treatment). Imagine difference-in-differences analysis (DD) — a before-and-after comparison of the income-level difference — shows that the handicap has risen. Most observers would conclude to the policy's inefficiency. But second thoughts are needed, because DD rests heavily on the validity of a key assumption: parallel paths in the absence of treatment; an assumption that is often violated. To cope with this problem, economists traditionally include polynomial (linear, quadratic...) trends among the regressors, and estimate the treatment effect as a once-in-a-time trend shift. In practice that strategy does not work very well, because inter alia the estimation of the trend uses post-treatment data. What is needed is a method that i) uses pre-treatment observations to capture linear or non-linear trend differences, and ii) extrapolates these to compute the treatment effect. This paper shows how this can be achieved using a fully-flexible version of the canonical DD equation. It also contains an illustration using data on a 1994–2006 EU programme that was implemented in the Belgian province of Hainaut. JEL C21 R11 R15 O52


Introduction
This paper deals with how to properly evaluate the impact of convergence policies like Objective 1-Hainaut. At its core lies a methodological proposal. But, before turning to its full exposition, here are a few words about Objective 1 and the province of Hainaut in Belgium.
Objective 1-Hainaut is an example of a European-Union (EU)-funded transfer policy aimed at helping European regions reduce their socio-economic handicap. The policies have a relatively old history. The underpinning idea was present in the preamble to the Treaty of Rome in 1957, and has been further emphasised in the 1980s with the entry of Greece, Portugal and Spain. In 1987, with the Single European Act, the EU received explicit competence for undertaking a regional policy aimed at ensuring convergence. Over the decades, a growing political concern for the so-called "regional problem" has meant that a considerable -and increasing -amount of resources has been spent to mitigate regional income disparities. 1 Since the mid-1980s, the importance of EU development/convergence policies has not ceased to increase. In budgetary terms, the policies have grown from representing a mere 10% of the EU budget and 0.09% of the EU-15 GDP in 1980, to more than one third of the budget and around 0.37% of the EU GDP as an average of the period 1998-2001(Rodríguez-Pose & Fratesi, 2003. The policies have become, after the Common Agricultural Policy (CAP), the second largest policy area in the EU. Also, every recent step towards greater economic integration at EU level has been accompanied by measures aimed at supporting financially the lagging countries or regions. For instance, the decision in the Maastricht reform to create the Single European Currency that was tied in with the establishment of the Cohesion Fund to alleviate the burdens that transition to EMU would impose on the less developed territories.
After the reform, more than two thirds of all Structural Fund expenditure have been concentrated in the so-called Objective 1 regions. These are territories whose GDP per capita, measured in purchasing power standards (pps), is less than 75% of the EU average. In the 1990s, the list comprised 64 NUT2 regions 2 (Tondl, 2007), one of them being Hainaut in Wallonia/Belgium 1 The European Commission's focus on regional disparities has been paralleled by a renewed academic interest -both theoretical and empirical -in the economic analysis of growth and (non) convergence. From the work of Romer (1986), (1990) and Lucas (1988), a growing body of literature, known as 'new growth theories', has started to question the optimistic predictions of the traditional neoclassical model laid out by Solow (1956), which leaves little or no role to regional/convergence policy.
( Figure 1). The Belgian province benefited from Objective 1 money between 1994 and 1999. And from 2000 to 2006 it also benefited from the "phasing out" programme.
Yet, despite their rising macroeconomic importance, questions are being raised about the capacity of European development/convergence policies in general, and of policies targeted at Objective 1 regions, to achieve greater economic and social cohesion and to reduce income gaps. These questions are fundamentally based on rather mixed evidence about convergence following implementation (Magrini, 1999). In that context, it is a bit surprising that there are very few ex post economic evaluation studies 3 of the monetary benefits of Objective 1. More precisely, there are very few papers answering questions such as "what would be the level of income per head in region X had it not benefited from Objective 1 money? ». Along the same line, and in contrast with what economists and econometricians have done to evaluate other types of policy interventions (higher minimum wages, employment subsidies, active labour-market or social policies…), very little work has been done using microdata, in a quasi-experimental setting, to evaluate the effectiveness of Objective 1 (or other EU policies aimed fostering convergence across regions or countries). In a sense, this paper aims at filling that void. This said, at its core, lies a methodological discussion of what can (or cannot) be achieved within the canonical differences-in-differences (DD) estimator, and how best to address its limitations.
DD is a statistical technique commonly used in microeconometrics (Angrist & Krueger, 1999) that mimics an experimental research design using observational data, by studying the different evolution of 'treated' ' vs 'control' groups in a (quasi) natural experiment. It calculates the effect of a treatment (e.g. Objective 1) on an outcome (e.g. income per capita) by comparing i) the average change over time in the outcome variable for the treatment group (e.g. the municipalities of Hainaut), to ii) the average change over time for the control group (e.g. the municipalities forming the rest of Belgium). But the validity of that method rests heavily on the parallel-path assumption: in the absence of treatment (and in particular before its inception) the (average) outcome-level difference between the treated and the control entities (municipalities hereafter) must be timeinvariant; so that the observation of a statistically significant change of the pre-treatment difference countries for statistical purposes. The standard is developed and regulated by the EU, and thus only covers the member states of the EU in detail. 3 There are several macroeconomic models that have been used to assess the potential impact of EU funds on economic growth (e.g. HERMIN model). All these models estimate positive growth effects from cohesion spending, but their size changes depending on the theoretical assumptions upon which the model is based. after the treatment's inception can be ascribed to the latter. Key to this paper is the idea that, whenever data permit, one should go beyond the canonical DD model and the parallel-path assumption underpinning its capacity to properly identify a treatment's effect. It is also that this should be done not simply via the addition of a polynomial (linear, quadratic…) time trend to the canonical DD model, as most authors do. A more promising avenue is to i) estimate the generalized, fully-flexible DD model proposed by Mora & Reggio (2012) ii) and so to account [and correct] for the absence of parallel paths, using only pre-treatment observations. When data contain 2 or more pre-treatment periods it is easy to verify if parallel path holds. And quite often it does not. As said above, what most authors do when confronted to that problem, is to augment the canonical DD model (that contains a time dummy, a treatment dummy and the interaction between these two) with a polynomial (linear, quadratic…) time trend, and to estimate the treatment effect as a once-in-a-time shift of that trend (e.g. Friedberg, 1999;Autor, 2003;Besley & Burgess, 2004). In practice, that strategy does not work very well, because inter alia the estimation of the trend uses post treatment data. Wolfers (2006) for instance explains that Friedberg's (1999) work on the legalisation of divorce if a point in case. Friedberg controls for treated vs control US state diverging trends using a sample that covers only one year before treatment and many years after Her estimates of the state trends relies almost completely on posttreatment developments, and absorbs most of the treatment's effect. focuses on the evolution of outcome-level differences, DD[2] tracks the evolution of outcomegrowth differences; and DD[3] that of outcome-acceleration differences. The other key feature of the Mora & Reggio framework is that it solves the problem identified by Wolfers (1996) with polynomial time-trend corrected DD. We show hereafter that this is because -unlike what is done by authors resorting to polynomial time-trend corrections-only pre-treatment observations are used to capture trend differences, and because the estimation of the treatment effect rests on a simple extrapolation of these pre-treatment trends.
This said, the reader should be aware that the economic efficiency criteria associated the different DD estimators vary dramatically. In the context of a deprived region receiving financial aid, using DD[1] as a treatment-evaluation method means a focus on the reduction and the initial income-level handicap of that region. Under DD[2] the requirements are intrinsically milder. Efficiency exists as soon as one detects a reduction of the pre-treatment income-growth rate handicap. And there is no paradox in DD[1] results being negative, while those delivered by DD[2] are positive. That simply means that the initial income-level handicap has risen, but less than it would had the growth rate handicap not been reduced (see Figure 2 for an illustration). By contrast, if even DD [2] shows no significant gains, then it means that the policy has not been not very effective at all; as it has not even been able to reduce the pre-treatment growth rate handicap.
In the case of Objective 1-Hainaut, using per head income data and the rest of Wallonia or Belgium as a control group, our DD[1] results suggest a negative impact. But the analysis of pre-treatment data clearly shows that Parallel[1] does not hold before inception. We rather find statistically significant evidence of Parallel[2] (constant growth rate handicap before 1994). This is thus the assumption we retain for identifying Objective 1's causal impact. And when doing so, results change considerably, as our DD[2] estimates are positive and statistically significant. This is supportive of the idea that Objective 1 reduced the growth rate handicap that affected Hainaut before 1994. In the absence of this correction, the income-level handicap increment -the one typically measured by DD[1] -would have been larger. Over the year 2010 horizon, we find that Hainaut experienced a rise of its income-level handicap compared to the rest of Belgium of 426 euros. But we find a statistically significant DD[2] of 491 euros. This means is that in the absence of the growth rate handicap (positive) correction; the income-level handicap rise would have been of 426 + 491 euros.
The rest of the paper is divided into five sections. Section 1 exposes analytically the

DD[1]/Parallel[1], DD[2]/Parallel[2]… DD[q]/Parallel[q]
sequence, and how they can be implemented using simple OLS estimates. Section 2 briefly discusses Objective 1-Hainaut ; its particularities and the calendar of its implementation. Section 3 presents the dataset used in this paper and some descriptive statistics. Section 4 presents the main estimation results. Section 5 concludes. The advantages of such a specification are manyfold. First, conditional on the availability of many pre-treatement periods in the data, the OLS-estimated coefficients can be used to compute a whole family of difference-in-difference estimators DD [p], where p=1, 2...q is the degree of parallelism underpinning identification. The canonical differences-in-differences model is DD [1], and rests on parallelism of degree 1 (Parallel[1]), meaning that outcome levels must stay parallel in the absence of treatment. 5 Without Parallel[1] -as depicted on Figure 2  rests solely on pre-treatment observations. 5 If outcome level change by unit of time is "speed" (i.e 1st-order derivate of outcome vis-à-vis time), then Parallel[1] means stable outcome level differences due to identifical speeds 6 If outcome growth rate change by unit of time is "acceleration" (2nd-order derivate), then Parallel[2] means stable outcome growth rate differences due to same accelerations. 7 If outcome acceleration change by unit of time is "surge" (3nd-order derivate), then Parallel[3] corresponds to a situation where outcome acceleration differences remain stable due to identifical surges. 8 The pattern of lagged effects is usually of substantive interest. We might, for example, believe that treatment effect should grow or fade as time passes.

Figure 2 -The inadequacy of traditional difference-in-(level) differences estimator (DD[1]) in the presence of non-parallel paths $
Consider the canonical DD[1]/Parallel[1] estimator, with just before-and-after observations t* and t*+1. 9 Treatment effect corresponds to 10 , 11 [ Hereafter the range of periods used by the estimator appears as superscript in DD[p=1] t*+1;t* 10 When estimating eq. [4] with only 2 periods (T=2), γ D t* is subsumed into the constant γ D and DD[1] is directly captured by the timeXtreatment interaction term coefficient. 11 Treatment effect' standard error must account for the fact that it is a linear combination of estimated coefficients.
Variance/standard error must account for the covariance between corresponding variables. That is done e.g. by STATA test or lincom commands, that use the variance-covariance matrix of the estimated coefficients.
The point is this can be easly achieved by computing 12 or, said differently, the difference between the OLS-estimated observed post-treatment t+1 outcome level handicap 13 i.e. γ D t*+1 and the predicted one (γ D t*+ DD[1] t*;t*-1 ) given the level handicap in t* and its expected rise due to growth rate difference between t* and t*-1 (see Figure 3 for the link between the algebra and the graphical representation). This prediction uses only regression coefficients driven by pre-treatment observations; a major difference with the traditional polynomial time-trend corrected method mentioned in the introduction.

Figure 3 -How difference-in-[growth-rate] differences (DD[2]/Parallel[2]) can cope with nonparallel paths
The DD[2] t*+1;t*-1 estimator can be generalised to the case where one wants/has de possibility to calibrate Parallel[2] using more that 2 adjacent pretreatment periods. Imagine one has v>2 pretreatment and 1 post-treatment observations. The estimator becomes 12 Again, when estimating eq. [4] with only 3 periods, γ D t*-1 is subsumed into γ D and DD [2] is computed using only 2 coefficients. 13 Net of the initial handicap in t-1 : with v≥p-1 One can also account for the possibility that treatment lasts more than one period or, alternatively, that its effects are lagged (ie. it takes several periods for the treatement to deliver significant effects). In t*+s ;s≥1, the difference between the observed level handicap and the expected one is The ultimate generalisation is to assume Parallel [p=q]. As to data, the minimal requirement is to possess q pretreatment observations, and one postreatment observation at horizon t*+s ; s≥1 . The treatment effect can then be estimated using the OLS-estimated coefficients of the q-1 interaction terms D.I in eq.
[1]. It is in fact equal to Note that in eq. [6] the treatment effect remains computed as a difference between i) an observed (difference) in t*+s (ie γ D t*+s ) characterising the treated vs control entities and ii) a predicted difference, whose level is solely based on pre-treatment observations (ie. Finally, note that when p=q=1 eq. [10] simplifies to which is equivalent to eq.  During the first phase (1994)(1995)(1996)(1997)(1998)(1999), the sums injected in the province's economy by both the EU and Belgian authorities (due to mandatory national co-financing) were relatively high at 2.43 billion EUROS (1994 nominal), representing a bit less than 5% of the province's GDP for each of the year ranging from 1994 to 1999. 15 Priorities ascribed to Objective 1-Hainaut were i) the improvement of the competitiveness of enterprises (e.g.; R& D credits) (1/3 of the total), ii) the attractiveness of the region (e.g. through cleaning up of old industrial sites) (1/4 of total), iii) prospects for tourism and research facilities (1/5 each) (IMF, 2003).
It is also worth underlying that the treatment in the form of financial support from the EU did not stopped completely in 1999. Beyond that point, the province benefited from the EU's Objective "phasing out" programme (2000)(2001)(2002)(2003)(2004)(2005)(2006), representing a total injection of an extra 2.22 billion EUROS (2000 nominal).
As to the control entities, we use three: the mucipalities province of Liège, the rest of Wallonia and the rest of Belgium (Figure 4). A priori, we expected the province of Liège to be the best control territory for the implementation of the canonical DD[1] model. That province has many things in common with Hainaut. Although its economy was faring better in 1993 judging by the level of income ( Figure 5), the province has also suffered from systematic deindustrialisation over the past

Data, descriptive statistics
The data used in this paper consist of municipal-level taxable net 16 income (all earnings 17professional and other deductible expenses) per head, provided by Statistics Belgium. These are available for each of Belgium's 589 municipalities (Table 1)  tend to favour this one because it corresponds relatively well to the goal assigned by EU decision makers to Objective 1; but also because it is likely to capture the (monetary) spillovers of the programme (e.g. beyond net job creation or higher wages due to higher productivity (i.e. the direct benefits), an enhanced capacity to attract wealthier residents…).

Econometric results
We first report the results for the canonical/two periods (i.e. before and after) DD[1] model. before the treatment was implemented. 18 The after-treatment years are t*+s=2000 (immediately after the end of Objective 1) and 2007 (immediately after the end of the phasing-out period. Results (Table 2)  assumption.
We thus need to go beyond Parallel[1] in order to say something relevant about the true impact of Objective 1. Interestingly, as we possess many pre-treatment periods, we are able to assess the plausibility of Parallel [2] or Parallel[3] by estimating DD [2] or even DD [3] , again prior to Objective 1's inception. Parallel [2] consists of assuming that Hainaut and its controls where experiencing different growth rates before 1994; but that the latter difference was stable/timeinvariant. We are able to test the plausibility of that assumption by estimating DD[2] for the pretreatment years; and verifying that is it close to zero. Figure 6 (red dashed lines) suggests that was the case, at least between 1988 and 1993, for each of the three controls. The tentative conclusion is that Parallel[2] is a much more realistic description of the relative dynamics of Hainaut's income per head in the absence of Objective 1. And logically, the next steps of our econometric analysis will rest on DD[2]/ Parallel [2].
The key results are on display on Figure 7. And the underlying numbers can be found in Table 3.

Concluding remarks
The traditional difference-in-differences DD[1] model -and the parallel-paths Parallel [1] assumption on which it rests -seems to be particularly irrelevant in the case of Objective1-Hainaut; and perhaps also for other EU rust-belt regions that became eligible to Ojective1. Remember that Hainaut got selected by the EU expressly because "it was suffering from a substantial deterioration of its economic and social situation". This statement hints at a development path that was not parallel to that of other EU or Belgian regions. We show in this paper that this was indeed the case before the introduction of Objective estimates are much more likely to lead to the conclusion that the treatment has been effective: all it takes is a small reduction of the pre-treatment growth-rate [acceleration, surge…] handicap to conclude that treatment has generate economic gains. And in the case of Hainaut, we show that this can happen against a background of a steadily rising income-level handicap; i.e. something that most people would probably interpret as an absence of convergence.