The impact of public transport expansions on informality: the case of the São Paulo Metropolitan Region

The São Paulo Metropolitan Region (SPMR) displays a strong core-periphery divide. Central areas concentrate the bulk of formal jobs while peripheral areas display high incidence of informal employment. This pattern is reinforced by a large deficit in urban transport provision. Against this background, we estimate the impact of expansions of the public transport system on local informality rates for the SPMR between 2000 and 2010. We compare the average changes in informality in areas which received new public transport infrastructure with the average changes in areas which were supposed to receive infrastructure according to official plans, but did not because of delays. After controlling for endogenous selection, we find that informality decreased on average 16 percent faster in areas receiving new public transport infrastructure compared to areas that faced project delays.


Introduction
Cities in developing and emerging economies display high levels of socio-economic segregation. Central areas with good accessibility concentrate the bulk of formal jobs, that is, jobs that are fairly remunerated, stable, secure, legally recognized and protected.
Lower-income peripheral areas, on the other hand, display limited accessibility and high incidence of informal employment. Although the definition of informality varies from place to place, informal employment is generally characterized by contractual relations that do not comply with national labor laws. In its most common form, informal employment refers to the case of workers who are not reported as such by their employers to the corresponding national authorities. Informal workers face more precarious conditions than formal workers: they may receive a lower compensation, do not contribute to pension system, have no record of job experience or opportunities for advancement, are not eligible for subsidies and leaves (maternity, sickness, etc.), and have more difficulties accessing credit. For the economy at large, the existence of informal employment also implies losses, not only in tax revenues and a heavy social protection burden, but also in terms of productivity.
This formal-informal division is reinforced by a suboptimal and skewed provision of urban public transport. Because of acute public transport deficits and the historical prioritization of individual over collective modes, a large segment of the lower-income population has to bear not only longer commuting distances, but also longer commuting times for the same distance travelled (Biderman, 2008). As a result, access to formal employment centers is constrained. Against this background, transport policies, and more specifically, the expansion of public transport networks, be seen as an alternative for reducing informality rates. To date, however, there are no estimates of the effect of improved accessibility on informality.
In this paper, we estimate the impact of public transport expansions on local informality for the case of the São Paulo Metropolitan Region (SPMR). With a population of 20 million inhabitants in 2010, the SPMR is the Brazilian economic powerhouse contribut-ing with approximately 20 percent to the national GDP and concentrating 10 percent of the Brazilian population. Despite an expansion of formal employment leading to a sharp decrease of nearly 9 percentage points in the informality rate between 2000 and 2010, the SPMR still displays a particularly marked core-periphery split (Ramos, 2014). The region has faced serious mobility issues, partly related to unforeseen delays in public transport projects. One salient example is the metro Line 4, which was conceived in the 1940s, included in the 1968 network plan, but was still under construction in 2015. We investigate whether public transport expansions undertaken between 2000 and 2010 led to reductions in informality rates in areas with improved network access relative to areas which faced project delays.
Theoretical predictions on the expected effect of public transport expansions on local informality rates are scarce and lead to ambiguous predictions. On the one hand, in a model where workers are either informal and save in commuting costs by undertaking some of their productive activities at home, or formal and commute every day to the city center, public transport expansions can result in lower spatial compensation costs for formal firms, and ultimately higher (local) formal job creation (Moreno-Monroy and Posada, 2014). On the other hand, in a model where high-income and low-income workers choose to either pay higher costs but commute faster by car, or spend more time but spend less by commuting by public transport, public transport expansions can result in concentrations of lower-income workers around public transport access points (LeRoy and Sonstelie, 1983). The direction and magnitude of the impact remains an empirical question (Gibbons et al., 2012).
Estimating the impact of urban transport expansions is methodologically challenging.
In an urban system, residential and job choices are determined by multiple variables, one of which is access to public infrastructure. Furthermore, transport provision is not determined randomly, but it is based on observable and unobservable attributes of the areas which are likely to be correlated with local informality rates. One strand of literature proposes addressing these issues with the use of instrumental variables. An instrument that determines public transport expansions, but remains exogenous to informality, can provide a source of quasi-random variation through which the impacts can be estimated net of endogenous selection. As noted by Redding and Turner (2014), most of the existing works estimating the effect of changes in highway networks and railways on the distribution of economic activity have built such instruments based on (a combination of) past planned infrastructure (Duranton and Turner, 2012;Michaels, 2008;Baum-Snow, 2007;Baum-Snow et al., 2015;H. and Zhang, 2014;Mayer and Trevien, 2015), historical route maps (Garcia-Lopez et al., 2015;Duranton and Turner, 2012;Volpe Martincus et al., 2013) and inconsequential placement of infrastructure (i.e., identifying places that received infrastructure because of reasons other than explicit planning based on their characteristics) (Chandra and Thompson, 2000;Faber, 2014;Mayer and Trevien, 2015).
Another strand of literature suggests the use of difference-in-difference methods to tackle the endogeneity of urban transport infrastructure allocation. The idea is to find "control areas" which would have experienced similar change in outcomes as areas receiving transport infrastructure had they not received it. Related works have used this strategy to estimate the effect of subway networks and rail lines on real state prices (Ahlfeldt et al., 2014;Billings, 2011;Gibbons and Machin, 2005) and poverty (Glaeser et al., 2008).
Our approach is a combination of a difference-in-difference method and an instrumental variable strategy. We use a historical network plan for the SPMR as an instrument in order to identify the impact of public transport expansions on local informality rates between 2000 and 2010. The validity of our strategy relies on the correction for possible endogenous selection, as well as the choice of a "control group" against which to compare our "treatment group", i.e., the areas close to bus corridors and metro and railway stations opened between 2000 and 2010. In order to attribute the estimated impact to public transport expansions, we need to ensure not only that the chosen areas were in principle suitable for new transport infrastructure, but also that they were similar in terms of relevant characteristics. We include the pre-treatment values of relevant socio-economic variables as controls, and carefully construct our sample to include all areas that were preselected for transport project interventions within the same time-frame. One advantage of considering areas for which infrastructure plans were laid out but not implemented is that these areas are similar precisely with respect to relevant characteristics for the allocation of transport infrastructure. An additional advantage is that we can interpret the impacts as the "penalty" or cost of transport infrastructure project delays. We find this cost to be significant: in areas close to transport expansions, the average informality rate decreased 16 percent faster than areas that should have received infrastructure but did not because of delays.
Our empirical application is connected to a large body of literature analyzing the reasons behind the existence and persistence of an urban informal sector in developing and emerging economies (Camacho et al., 2013;Ferreira and Robalino, 2010). The existence and persistence of an informal sector has been attributed mostly to institutional factors (Ferreira and Robalino, 2010;Perry et al., 2007), while the role of accessibility has not yet been considered. There is an extensive literature on the impact of transport infrastructure on different outcomes such as property values (Baum-Snow and Kahn, 2000), sprawl (Burchfield et al., 2006) and poverty (Glaeser et al., 2008), but no works analyzing the impact of transport infrastructure on informality rates or the quality of labor at large. For the particular case of improvements in public transport, the Spatial Mismatch Hypothesis (SMH) empirical literature offers some evidence in support of a positive and significant effect of public transport improvements on labor market outcomes in the US. 1 Kawabata (2003) finds an increase in the likelihood of working and the number of hours worked for car-less individuals as a result of a better job-access by public transport. Holzer et al. (2003), based on data on hiring before and after the expansion of the railway system in San Francisco, find that hiring of Latinos increased near a new station. There is no avail-1 According to the SMH, adverse labor outcomes of minorities result from the spatial disconnection between low-skilled jobs and minorities' residencies. US metropolitan areas experienced increased residential and job suburbanization in the second half of the 20th century. Minorities allegedly relocated at a slower pace than jobs because they faced discrimination in the housing market or were subject to zoning regulations, leading to a concentration of minorities in inner-city areas where low-skilled job creation was slow (Ihlanfeldt and Sjoquist, 1998). able evidence on the effect of transport expansions on workers in cities in emerging and developing countries with a sizeable informal sector.
Our empirical approach offers an alternative for overcoming the methodological challenges faced by empirical tests of the SMH related to endogenous selection (Ihlanfeldt and Sjoquist, 1998). We consider our methodology to be an attractive alternative to Propensity Score Matching (PSM). Several papers have used PSM to estimate the impact of infrastructure on different outcomes. 2 The basic idea is to retrieve the causal effect of infrastructure changes by accounting for the co-variates that predict receiving the treatment allocation. In our case, we would have to correctly specify a transport infrastructure assignment model based on the characteristics of local areas that influence their likelihood of being chosen to receive infrastructure (i.e., their "program participation" probability). The problem is that the criteria used by planners for assigning new infrastructure at a certain moment in time is not known. This implies that the empirical specification of the determinants of transport infrastructure changes would be ad-hoc and possibly driven by data availability. Under these circumstances, it is likely that it would suffer from omitted variables and mis-specification problems, invalidating the estimated causal effects. Our approach only requires one variable (the instrument) to significantly explain changes in transport infrastructure. At the same time, we exploit the fact that the planner's criteria is "revealed" also for the case of pre-selected areas that eventually do not receive infrastructure projects in time.
The paper is organized as follows. Section 2 reviews theoretical predictions regarding the impact of transport expansions on informality rates. Section 3 presents some generalities of our area of study, a brief historical review of the evolution of the public transport system in the region, and data and definitions for the empirical analysis. Section 4 presents our empirical approach, detailing our identification challenges and proposed strategies. Section 5 discusses the results. Section 6 concludes.

Theoretical predictions
Existing theoretical models offer some insights on the effect of new transport infrastructure on variables such as income, productivity and employment levels. From the perspective of the firm, improved access could have a positive effect on employment and productivity through lower input of labor costs, higher agglomeration externalities or more efficient sorting, but also a negative effect through higher commercial rents (Redding and Turner, 2014;Gibbons et al., 2012). The underlying models used to make these predictions do not consider how workers sort into different occupational statuses (e.g., formal and informal) and into locations within the city. The model of Moreno-Monroy and Posada (2014) takes a step in this direction by relating the informality rate to commuting costs. Here we summarize the set up and predictions of the model, and refer the reader to the original paper for details and derivations.
The model considers a linear, monocentric city with a unique Central Business District (CBD), where all formal firms locate. Formal and informal workers optimally decide to reside at any point between the center and the city fringe. In the formal sector, the hiring process is subject to search frictions (Pissarides, 2000). Formal workers commute every day to work to the CBD. In the informal sector, workers can undertake productive activities at the CBD or at home. The informal wage is assumed to be fixed, higher at the CBD than at home, but in any case lower than the productivity in the formal sector. Besides the wage, informal workers receive a social protection transfer from the government. Unlike formal workers, informal workers optimally choose to commute to the CBD or stay at home, given the wage differential between the two locations and the level of commuting costs.
The urban land use equilibrium obtained after defining the bid-rents and instantaneous utilities for each type of worker yields a segmented city, where formal workers reside at the CBD, and informal workers reside next to this area. In equilibrium, formal workers face higher urban costs because of higher commuting and rent costs. The formal wage is a function of the compensation that formal firms have to pay in order to induce unemployed workers to accept a job in the formal sector. This compensation is dependent on the informal sector income, which besides the informal wage, includes subsidies and commuting costs savings. The model yields the following general expression for the informality rate: Where X includes, besides parameters, the formal sector output, the income of the unemployed and of informal workers (at home and at the CBD), the population level, and T is fixed commuting costs parameter. Holding all other variables and parameters constant, a decrease in commuting costs leads to an decrease in the informality rate, because the required spatial compensation borne by formal firms becomes smaller, leading to more formal job creation.
The model describes a mechanism through which commuting cost reductions (in terms of time and/or money) lead to lower informality rate levels at the city level. Two qualifications are in order. The first one is that in reality accessibility does not improve in all areas of the city simultaneously. Following the logic of the model, we would expect two different effects in areas receiving new infrastructure. The first is a direct effect of the expansion of the formal sector following a reduction in commuting costs. Previous informal workers would be now able to find formal employment, leading to a reduction in local informality rates. The second is a displacement effect. Formal workers commuting daily to the CBD would be willing to pay for better accessibility and outbid informal workers in areas with new infrastructure. Note that the strength of these effects decreases monotonically with distance to the CBD, so that the displacement effect is stronger the closer the area is to the CBD.
The second qualification is that we are not considering different modes of transportation. The model of LeRoy and Sonstelie (1983) and the empirical application of Glaeser et al. (2008) offer some insights as to how the inclusion of a second mode could change the predictions described above. In a linear city model, assuming a two transportation modes (public transport, which is cheaper but slower, and cars, which are more expen-sive but faster) and two income groups (the rich and the poor), the optimal car trip length and distribution of rich and poor within the city depends on the cost of cars relative to income. Glaeser et al. (2008) show that for appropriate values for the income elasticity of housing demand and other parameters, local poverty rates can increase as a result of local improvements in public transport. This happens because the rich, valuing their time more highly as they do, have a preference for car commuting, while the poor seek proximity to the cheaper mode, public transport. In our case, if we consider that informal workers have a lower valuation of time than formal workers, the prediction is that local informality rates could experience an increase in areas near new public transport access points. Note however that this effect is likely dissipate over time if informal workers find formal jobs once they search from their new location with better accessibility.
The highly stylized models discussed in this section yield contradictory predictions regarding the impact of public transport expansions on local informality rates. As in the case of the impact of transport in other economic outcomes, establishing the direction of the effect remains an empirical question (Gibbons et al., 2012).

General facts
The São Paulo Metropolitan Region (SPMR) hosted nearly 20 million inhabitants in 2010. The formation of the city is the most eloquent example of the rapidly urbanizing process that Brazil experimented during the last century, when the city displayed average annual growth rates higher than 4,5 percent until 1950. After the fifties, the city experienced its most intense expansion process influenced by the placement of industrial parks, leading to a persistent structural spatial reconfiguration that holds close relation with a strong monocentric structural organization. In the 1960s, during the military dictatorship, the city government made efforts to re-organize the urban space through the construction of extensive social housing complexes on the East-side of the city, and through large-scale transport projects such as the metro (Ramalhoso, 2013). However, in the following decades, a vast suburban peripheral belt occupied by the poor and less instructed population was formed by a process of unplanned centrifugal expansion. The accompanying extensification of urban land use was associated to a sub-market of informal land allotment, combining the strategic behavior of informal developers seeking cheap and undeveloped land, and the permissiveness of the state (Rolnik, 1997). Today, the continuous urbanized area extends for more than 2,000 km2, including areas in 30 different municipalities.
During the last decades, the SPMR experienced the transition from a industrial to a service based economy. After the seventies, the industrial sector lost its relative importance to the tertiary sector in a rapid process of productive restructuring reflecting at the same time the decrease in the relative importance of the SPMR in the national industry and a profound internal organizational and technological transformation (Diniz and Diniz, 2007). In 2010, the tertiary sector was responsible for more than 75 percent of total output. 3 In terms of employment, this sector was responsible for 61 percent of the workplaces, while the industrial sector was responsible for 23 percent. 4 The overall unemployment and informality rates in Brazil fell considerably between

Evolution of the mass public transport system in the SPMR
The history of the mass public transport system of the SPMR dates back to the opening of the São Paulo Railway in 1867, linked to the transportation needs of the growing agricultural exports. In the next decades, several railway lines were constructed to connect São Paulo with the rest of the country and with surrounding rural areas. Today, six railway lines make part of the integrated transport system of the city. Between 2000 and 2010, a total of 18 stations where built and re-opened, in an effort to use the existing railway infrastructure to expand the capacity of the urban rail system. Many of these improvements, and the modernization of the trains, were already planned in the 1980s, but only executed more than two decades later (Kiyoto, 2013). 6 The first official plan for the metro network dates back to 1968, when São Paulo was an established metropolis of around seven million inhabitants. The basic network plan,

Data sources and definitions
In order to construct the informality rate by area in each period, we aggregate the number of informal workers in each area, and divide it by the total number of workers (i.e., the sum of formal and informal workers). In order to determine the status of each worker, we use micro-data from the 2000 and 2010 Demographic Census of Brazil on: occupational status and type of employment (on the main job), and whether the person made contributions to social security. 8 A worker is classified as informal if he or she is an unregistered employee (empregado sem carteira assinada), or a self-employed individual not contributing to social security, or an employer not contributing to social security (Jonasson, 2011;Henley et al., 2009). A formal worker, by contrast, is a registered employee (empregado com carteira assinada), or self-employed individual contributing to social security, or an employer contributing to social security. As explained by Jonasson (2011)

Empirical strategy
Our aim is to estimate the impact of public transport expansions on informality. To do so, we compare the average changes in informality rates in areas which received new transport infrastructure with the average changes in areas which were supposed to re-8 Given the structural nature of the changes we aim to measure, it would have been desirable to extend our period of analysis to include previous decades. Unfortunately, the 1991 Census does not include information on job status by area.
ceive new transport infrastructure according to official plans, but did not for diverse reasons. In terms of the differences-in-differences strategy that we will implement, our sample is composed of areas which were supposed to have, according to official plans laid out in the 1990s, new transport infrastructure by the end period. We split this sample into areas which effectively received the new infrastructure (the treatment group) and areas which did not (the control group).
Coming back to equation 1, a general structural equation describing the informality rate could be expressed as: where In f it is the informality rate in area i and year t, T it is a vector of treatment variables which supposedly have a (causal) effect on the informality rate; X it is a matrix of observed control variables; U i is a vector unobserved components influencing the informality rate, and e it is an error term. Assuming there are only two periods, and that all areas have not received treatment in the base period (i.e. the treatment variable is 1 only in the post-treatment period), first differencing equation (2) yields: First differencing allows cancelling out unobserved time-invariant fixed effects, and also time-invariant observable controls that are uncorrelated with T. Thus, X i includes a vector of ones, and the initial values of X i . 9 Under the condition that treatment is fully randomized, an OLS estimate of δ can be interpreted as the "intention to treat effect" (ITT) given that some people may not make use of the new infrastructure (Gibbons et al., 2012).
A fundamental issue with our identification strategy is the need to select areas which are similar in terms of relevant characteristics, but which differ in their level of treatment.
How comparable are our treatment and control groups? Note that the fact that all the 9 We do not include the first difference of controls that are likely to be correlated with ∆T, as this would render the estimates of δ inconsistent (Baum-Snow and Ferreira, 2014). areas in the sample were officially considered to be suitable for transport projects means that they share similar characteristics precisely in terms of those variables which are relevant for the allocation of transport infrastructure (e.g. unmet demand for mass transport, soil quality, distance to the existing network, etc.). However, this does not guarantee that the treatment and control groups have the same joint distribution of observables and unobservables, as required for estimating ITT effects. One option to re-balance the treatment and control groups is to control for a series of relevant observable area attributes.
We include the pre-treatment values of relevant socio-economic variables interacted with a time dummy (which can be consequently seen as exogenous to treatment).

Identification and econometric issues
Even after selecting an appropriate control group and controlling for relevant observables, the possibility that there is selection into treatment remains. Selection is likely to happen for two reasons. The first is that people who will be impacted by future public transport expansions may be actively involved in the project decision making. The second is that the selection procedure follows a predefined logic where areas with certain characteristics are preferred over others (e.g., central areas or areas that are closer to the existing network may be preferred by planners). The presence of endogenous selection into treatment means that OLS estimates are biased.
These concerns can be potentially tamed with an Instrumental Variables (IV) strategy.
Coming back to equation 3, we can think of ∆T as an endogenous binary treatment variable modelled as stemming from an unobserved latent variable ∆T * , which is, in turn, specified as a linear function of an exogenous covariate (instrument) z and a random component µ i : and the observed treatment decision rule is ∆T i = 1 if ∆T * i > 0, 0 otherwise. ε i and µ i are bivariate normal, and the correlation between these two terms is given by ρ.
By replacing the decision rule in equation 3, we can also express the model as a switching regression with two regimes (treatment and non-treatment) (Quandt, 1972): If the allocation of treatment is not randomized, there is possible correlation between ε i and µ i . If there is endogenous selection into treatment, there is possible correlation between ε i and an unobserved variable driving selection, and µ i and the same variable that drives selection, resulting in correlation between ε i and µ i through this third variable.
It is possible to derive a join density function of ∆In f i and ∆T, a likelihood function of the model represented by equations 3 and 4, and an efficient Maximum Likelihood estimator (Maddala, 1983). For achieving consistency, the instrument z has to meet two conditions: to explain changes in public transport (i.e., have power on a first-stage regression with dependent variable ∆T), and satisfy the restriction of affecting the outcome exclusively through the measure of public transport expansions conditional on other controls.
The correlation term ρ signs the endogeneity bias. If the null hypothesis that ε i and µ i are uncorrelated is rejected, OLS estimates are biased, and the sign of ρ indicates whether OLS estimates are upward or downward biased. It is difficult to conceptually pinpoint the direction of the endogeneity bias, as transport infrastructure provision is guided by multiple, overlapping criteria. For instance, planners could prioritize central, rich areas where informality rates are improving because they concentrate jobs; but if infrastructure provision is part of a poverty reduction strategy, they could also favour lagged areas where informality rates are worsening (Mayer and Trevien, 2015).

Variables and sample
Our dependent variable is the informality rate growth between 2000 and 2010, ap- To build the control group, we carefully analyzed official transport infrastructure plans (such as PITU 2020, released in 1999) and news reports about public transport project delays in the period 2000-2010. We also used information on bids for bus corridors for construction works which were supposed to be finalized by 2010 but which in reality had not started by 2010. Using the summary statistics for this sample (not shown), we check for outliers, defined as observations displaying significantly different maximum or minimum values of income or population in the two groups (i.e. twice the maximum value or half the minimum value). Given our limited sample, we exclude outliers only in the case their inclusion significantly affect the results. In the final sample, outliers in the population variables did not have a significant effect in the estimations. On the other hand, some observations with significantly higher values of income per capita in one of the groups did affect the results. In order to ensure that the differential impact across the two groups is not driven by initial differences in income per capita, we exclude them from our final sample. Table 1 shows the summary statistics for the final sample. The mean values for the informality rate for the two groups are broadly in line with the aforementioned global changes for the SPMR. Although all the areas in our sample experienced a decrease in the informality rate there is significant spatial variation across areas. Figure 4 displays the location of the treatment and control groups.

Instrument and estimator
Following the most recent developments in the econometric estimation of transport change impacts (Redding and Turner, 2014;Baum-Snow and Ferreira, 2014), we use a historical plan as an instrument for transport access changes. In particular, we construct a variable defined as the km of line of the plan outlined by the HMD consortia in 1968 that cross the area (Figure 1). The idea is that initial network plans can predict future network developments, but are exogenous to changes in informality rates. The exogeneity argument relies on the changes experimented in the size and structure of the city between 1968 and 2000, and which could not be foreseen by planners in the 1960s. What is required in particular is that the 1968 plan was not designed to anticipate the change in the informality rate between 2000 and 2010. The 1968 plan was made in a context of high economic growth and strong planning during the military dictatorship, mostly to satisfy the immediate transport demands of existing central and high-density residential areas (Kiyoto, 2013). Ramalhoso (2013) discusses how the HMD consortia acknowledged they had based their plan on "natural commuting trends" because they lacked a general urban plan for the city (a plan that was eventually delivered in 1969). It seems plausible, then, that urban transport planners at the time could not foresee the pattern of urban occupation through massive rural-urban migration waves in the following decades, nor the emergence and evolution of a segmented labor market.
Still, it is possible to argue that the historical network plans are correlated with third variables that may be related to the current distribution of the informality rate, such as distance to the center, ruggedness of the terrain and the size of areas. We include these variables as additional controls. We also include population lags for 1991 and 2000. What is desirable is that the significance of the instrument z on the first-stage regression is not affected by the inclusion of these controls, as in Garcia-Lopez (2012).
We proxy distance to the center as the linear Euclidean distance between each area's centroid and the geographical coordinates of the main center of the urban agglomeration, identified as the place with higher employment density and higher number of employments among all the AEP (Ramos, 2014); the ruggedness of the terrain as standard deviation of the altitude of the terrain, considering the digital elevation model derived from the Shuttle Radar Topographic Mission (Biderman and Ramos, 2013); and the total size of the area, calculated from the geometric attribute of the georeferenced polygonal data.
All geography controls are transformed using natural logarithms to improve normality. Maddala (1983) derives the likelihood function of the model represented by equations 3 and 4, and derives a Maximum Likelihood (ML) estimator of ∆T, which is more efficient than a two-step estimator. We initially use the ML estimator provided in the Stata command treatreg, and given the likely presence of heterogeneity, we use the robust variance estimator. Note that in principle, in this model the instrument z refers to a treatment allocation rule so that if z is below some threshold, treatment is given, and otherwise if z is below the threshold. As it is difficult to find an observable treatment allocation rule in our case, we treat the historical plan as such rule. There is a risk, however, that equation 4 is misspecified if the distributional assumption of joint normality of ε i and µ i is not correct. In this case, the ML estimator is inconsistent. We also obtain two-step con-sistent estimates. If these estimates are close to the ML estimates (but as expected, less efficient), we can conclude that the restriction imposed by the distributional assumption is not problematic.

OLS
We first estimate equation 3 by OLS. Table 2 presents the results. Column (1) shows the results using the full sample. As explained earlier, these estimates are likely to be biased because we are comparing the treatment group with the very heterogeneous group of areas that were unconnected by 2000 and remained so by 2010. Columns (2) and (3) show the OLS results for the restricted control group after controlling for initial conditions and geography. All geography controls are not statistically significant (not reported) at the 10 percent level. The effect of public transport expansions on informality rates, conditional on initial conditions and/or geography, is negative but small and remains insignificant. As discussed earlier, we suspect that the OLS estimates are biased because of selection, for which we turn our attention to the endogenous treatment results. As explained before, the validity of the proposed instrument relies on its relevance and exogeneity with respect to the outcome variable. To assess its relevance, we estimate equation 4 using a probit model. Table 3 shows the results. As can be seen in Column (1), the instrument is statistically significant and positive, indicating that areas that were part of the original network plan of 1968 were more likely to receive infrastructure between 2000 and 2010. The instrument seems to be relevant, as it can, by itself, explain 11 percent of the variation in the treatment variable. By adding initial and geography controls, we test the possibility that the relevance of the instrument is not affected by the inclusion of factors that could affect both the changes in transport access and the historical network plan. This seems to be the case, as the magnitude and significance of the point estimates associated with the historical plan are barely affected by the inclusion of initial or geography controls.

Endogenous treatment effects
We now turn to the results of the endogenous treatment-effects model. We estimate equation 3 using the ML estimator described in section 4.4.2. Our instrument list includes, besides the historical network plan variable, other exogenous covariates if appropriate (initial controls or initial and geography controls). The results are shown in Table 4.
Before analyzing the results, we discuss the validity of the model and the approach. We begin by assessing the goodness of fit of the models. The p-value of the Wald test of all coefficients in the regression being zero supports the relevance of the covariates used in the regression. We then assess the appropriateness of the endogenous treatment model.
The p-value of the Wald test of independence of equations 3 and 4 is shown at the bottom of Table 4. The null hypothesis is that ρ is equal to zero, or in other words, that equations 3 and 4 are independent. We can reject the null hypothesis at a 5 percent level for all the specifications. This points to selection bias in the OLS estimations. We now turn to the discussion of the main results. The estimated "intention to treat" effect, which is an indicator of public transport expansions impact net of observed selection bias, is given by the coefficient associated with the public transport expansions dummy (δ in equation 3). As can be seen in Table 4, this coefficient varies between -0.15 and -0.17. Initial and geography controls and geography do not seem to influence the magnitude and significance of the treatment variable, and are statistically not significant in the full specification. Recall, however, that these estimates may be biased downwards because we are not considering the initial value of the informality rate. Table A1 in the ap-pendix shows that, as expected, a regression where the dependent variable is the (natural log) of informality rate in 2010, and the (natural log of) informality rate is on the righthand side produces estimates of δ that are significant and smaller (in absolute value) than those in Table 4, but only by a small margin. The estimated impact is thus between -0.147 and -0.167 for the specification without controls, which we prefer because it allows a more straightforward interpretation of the effects and because none of the controls are statistically significant in the full specification.
According to the results, the OLS estimates underestimate the impact of transport access on changes in informality rates. The endogenous treatment-effect estimates imply that, net of endogenous selection and keeping other things equal, areas which received new transport infrastructure between 2000 and 2010 reduced their average informality rate 16 percent faster than areas which were supposed to receive infrastructure but did not because of delays. The estimated impact seems to be on the high side, but it has to be interpreted with respect to the period we are considering. Between 2000 and 2010, the decrease in informality rates in our sample varied between 63 and 12 percent, with the average area experiencing a decrease of 36 percent (see Table 1). These are substantial changes. Our estimates suggest that the average area in the control group would have had an average informality rate 4 percentage points lower, had it received new urban transport infrastructure.
We can also draw some conclusions based on the comparison between the OLS and endogenous treatment estimates. As can be seen at the bottom of Table 4, ρ is estimated to be positive, which means that the OLS estimates are biased up. Since δ is negative in the estimations, the ML estimator yields a smaller point estimate of δ than the OLS estimator (a larger negative number). The usual expectation is that IV estimates should be smaller than OLS estimates (Wooldridge, 2002), but works using a historical plan to estimate the impact of highways on population density have found both IV estimates larger (Baum-Snow, 2007;Duranton and Turner, 2012;Baum-Snow et al., 2015), and smaller than OLS estimates (Garcia-Lopez et al., 2015). Redding and Turner (2014) suggest that comparing OLS and IV estimates gives implicit evidence on the underlying transport infrastructure allocation process. In our case, IV (negative) estimates smaller than OLS (negative) estimates would suggest that urban transport allocation is biased towards areas experiencing a smaller decrease in informality rates. Alternatively, there could be unobservables or missing variables associated with decreases in the informality rate and lower public transport expansions (Duranton and Turner, 2012).
The ITT estimate provides a lower bound of average treatment effects. Note however that this interpretation would not be valid if the assumption of homogeneous response to treatment does not hold (Blundell and Costa-Dias, 2008). We check for possible heterogeneous response to treatment in the next section.

Robustness checks
In order to assess the robustness of the results, we conduct a series of checks. First, we experiment with other proxies for our control variables and also include additional control variables in equation 3. Second, we consider the possibility that the area of influence of the new infrastructure is larger, and create 100 and 200 meter buffers around the stations and bus corridors to construct a new sample. We then re-estimate equation 3 with this new sample. Third, we consider the presence of heterogeneous effects. It could be the case that the new infrastructure has a different impact on poorer areas than in richer areas (Boarnet, 2007), or in areas that are closer to the CBD compared to areas that are further away. If this is the case, the ML estimates could be biased. Fourth, we perform a "placebo test". We run the ML regressions using a different control group including areas considered for future public transport expansions according to the PITU 2025 official plan released in 2006, but which were not expected to receive new infrastructure by 2010. We expect the coefficient associated with transport expansions to be statistically insignificant.
We first replace the proxies for initial level of economic development and population. We estimate equation 3 using the percentage of people with basic education (i.e., at least 7 years) in total population older than 10 years-old instead of the log of income per capita 10 , and population density instead of total population. These changes do not affect the magnitude, significance or validity of the estimates of δ discussed previously. We also include an additional variable measuring the initial demand for public transport, proxied by the number of collective trips as a percentage of total trips originated in the area in 1997. We construct this variable using data from the 1997 origin-destination survey for the SPMR. Alternatively we use data from the 2000 Population Census on the number of owned cars by household divided by the number of households as an alternative proxy.
These variables are not significant in the regressions with and without controls, while the estimates of δ remain unaffected.
By using buffers around stations and corridors, we want to dismiss the possibility that areas in the control group do receive treatment, which would invalidate interpretation of our ITT estimates. As can be seen in Table A3 in the Appendix, the results for the endogenous treatment-effect estimates with and without geography controls hold for the samples with 100 and 200 meter buffers.
Next, we consider the presence of heterogeneous response to treatment. We estimate a binary treatment model with idiosyncratic average effect that controls for selection on unobservables (IV estimation) and for heterogeneous effects. For the estimation, we use the user-written Stata command ivtreatreg developed by Cerulli (2012). The command returns probit Two-Stage Least Square (TSLS) estimates, where the predicted probabilities resulting from estimating equation 4 are used as instruments for z (Cerulli, 2012). We consider heterogeneous response to treatment with respect to: 1) initial conditions and, 2) geography. Table A4 in the Appendix shows the estimation results. The TSLS estimator yields similar but less precise point estimates of the effect of public transport expansions (this is expected as the ML estimator is superior to a TSLS estimator in terms of efficiency). The point estimates associated with the additional heterogeneous treatmentresponse (denoted by the suffix ws) are not statistically significant, indicating that heterogeneity in the response to treatment may not be a major concern.
Lastly, Table A5 in the Appendix shows the results of the placebo test. As expected, the impact of transport access in ML regressions with and without controls is no longer significant when we use a new control group.

Conclusions
We have estimated the impact of public transport expansions on informality for the São Paulo Metropolitan Region. We measure local informality rates using individuallevel data from the 2000 and 2010 populations censuses. We identified areas which received new transport infrastructure (bus lines, and/or train or metro stations) between 2000 and 2010, and compare their informality rates with those of areas which were supposed to receive infrastructure in the same period but that ultimately did not receive it because of project delays. To circumvent possible endogenous selection issues, we instrument public transport expansions with a variable based on a 1968 network plan. The results suggest that endogenous selection is indeed a valid concern. According to our preferred estimates, informality rates decreased on average 16 percent faster in areas receiving new public transport infrastructure compared to areas that faced project delays.
These results are robust to the specification changes, alternative control variables, different distance buffers, using a different estimator, and considering heterogeneous effects.
In this paper we have provided a first approach to the study of the effects of public transport expansions on labor market outcomes. By considering project delays, we have given a meaningful interpretation to difference-in-difference estimations, and at the same time, we have included in our sample areas that are in principle comparable. Our study suffers, however, from a series of limitations.
First of all, it is necessary to stress that the estimates apply to the case of the selected sample in the SPMR, and cannot be readily applied to other cities, or even to different zones within the SPMR.
Second, the unavailability of finer geographical detail in the labor-market data means that we cannot establish the true spatial range of the estimated effects. It also impedes us to analyze separately the effect of metro/train stations and bus corridors.
Third, more detailed information on individual residential and work choices would allow a more meaningful interpretation of the estimated impact. It would be desirable to know, for instance, which part of the estimated impact is due to reallocation from already formally-employed workers, and which part is due to the improvement in job quality of local residents. Another key missing element is information on modal choices of workers.
We would like to know, for instance, what kind of choices workers make when faced with better quality jobs, better transport access and the possibility of switching to cars. Future studies will hopefully address these kind of questions.     *** p<0.01, ** p<0.05, * p<0.1