Does farm structure affect rural household incomes? Evidence from Tanzania

Many African countries have recently experienced rapid growth in the numbers of mediumand large-scale farms. These developments have generated considerable speculation about the impacts of farmland concentration and inequality on smallholder households and communities. This study exploits inter-district variation in farm landholding patterns in Tanzania to determine how differences in localized farmland structure affect rural household incomes using nationally representative household panel survey data. Because farm structure is a multifaceted concept, five alternative indicators of farmland structure are defined for 142 districts in Tanzania: (i) the Gini coefficient; (ii) skewness; (iii) coefficient of variation; (iv) share of controlled farmland under medium-scale farms; and (v) share of controlled farmland under large farms. These alternative farm structure variables are included in models of rural household income to test their effects after controlling for available household and community covariates. The study highlights four main findings. First, most indicators of farmland concentration are positively associated with rural household incomes. Second, household incomes from farm and non-farm sources are positively and significantly associated with the share of land in the district controlled by farms in the 5–10 and 5–20 ha category. Third, these positive spillover benefits are smaller and less statistically significant in districts with a relatively high share of farmland controlled by farms over 20 ha in size. Fourth, poor rural households are least able to capture the positive spillovers generated by medium-scale farms and by concentrated farmland patterns.


Introduction
This study is motivated by the need to better understand the impacts of changing farm structure in Africa. As medium-and large-scale farms have acquired substantial amounts of farmland in Africa since 2000 (Deininger and Byerlee, 2011;Schonevald, 2014;Jayne et al., 2014a), important questions arise about the impacts of localized farmland inequality and size distribution patterns on smallholder households and rural communities. This study traces its origins to a longstanding strand of the development economics literature arguing that the scale and distribution of farmland holdings tends to influence local demand patterns, factor markets, and associated multiplier effects of agricultural growth (Johnston and Mellor, 1961;Mellor, 1976;Vollrath, 2007;Eastwood et al., 2010). Most of the evidence undergirding this literature is drawn from Asia and Latin America; to our knowledge there is virtually no applied evidence testing the relationship between localized farm distribution patterns and rural household incomes in Africa.
To address these issues, this study exploits inter-district variation in farmland distribution patterns in Tanzania to determine its impact on rural household incomes and other characteristics using nationally representative household panel survey data. Our key research question is whether or not the concentration of farmland under medium-and largescale farms influences the economic outcomes of households within the same areas. Our motivation starts with the longstanding recognition that farm structure may influence the pace of income growth (Johnston and Mellor, 1961;Johnston and Kilby, 1975;Vollrath, 2007) and by evidence that farm structure is changing in many African countries . A stylized fact from Asia's agricultural development experience is that relatively uniform land distribution patterns may stimulate rural development more effectively than highly concentrated landholding patterns. Smallholders have high marginal propensities to consume and spend their money in the local rural economy, thereby stimulating growth linkages between farm and non-farm sectors (Mellor, 1976). If a few large-scale farmers dominate production and spend their money outside the local rural economy, then local growth multipliers may be weaker than in areas with more egalitarian land distributions (Johnston and Kilby, 1975).
There is also countervailing evidence, much of it more recently from Africa, indicating that large farms may attract public and private investments that provide nearby surrounding smallholders with improved access to markets and services. For example, Sitko et al. (2018) found that large-scale traders have invested in crop buying operations in parts of Africa with a high concentration of medium-scale farms in response to the surplus potential of such farms, thereby improving market access conditions for nearby smallholders too. Other recent studies have found some evidence that smallholder farm households benefit indirectly from being located close to large farms (Deininger and Xia, 2016;Lay et al., 2018).
Identifying the spillover effects of large farms on smallholder farms is complicated because these spillovers may depend on the scale, number, and socio-demographic characteristics of nearby "large" farms. Smallholder households may interact differently with nearby commercialized farms of 5-10 ha than they do with much larger farms for many reasons, not least because they tend to share common social, ethnic or familial connections. Many medium-scale investor farmers go back to their home rural areas to acquire land . Large farms in the region are commonly owned and/or operated by individuals from outside the local community. The size and strength of spillover effects between smallholder farms and large farms may therefore depend on the size and characteristics of the large farms, and is an important unresolved empirical question.
To address this question, we examine the case of Tanzania, one of at least several African countries where the numbers of medium-and large-scale farms have grown rapidly in recent years (Schoneveld, 2014. We assemble data from the 2009 Tanzanian Agricultural Sample Census (ASC), as well as the 2009, 2011 and 2013 rounds of the Tanzanian National Panel Survey (NPS). We construct several indicators of farmland structure for rural Tanzania, using the ASC, which is statistically representative at the district level. Evidence indicates that certain forms of farm structure, e.g., a high share of farms between 5 and 10 ha, have positive impacts on rural household incomes in the district. Other forms of farm structure, including those with a high concentration of farms larger than 20 ha, have smaller and in some cases even negative impacts on the incomes of smallholder households.
Our primary questionwhether or not the local structure of land ownership matters for rural growthis important for several reasons. First, changes in farm structure are occurring rapidly in many sub-Saharan African countries, with a major trend being one of increasing land concentration driven by growing numbers of medium-and largescale farms (Jayne et al., 2014Sitko and Chamberlin, 2016;Anseeuw et al., 2016). These studies suggest a de facto move towards greater land concentration. However, land concentration may be occurring in different ways. The contribution of our study is to emphasize the multi-dimensional nature of farm structure and land concentration, to demonstrate that alternative indicators of land concentration that emphasize different dimensions are often poorly correlated with one another, and most importantly to show how these alternative indicators of farm structure exert varying and in some cases very strong influences on the farm and non-farm incomes of rural households in the vicinity. The study concludes that if land distribution patterns matter for rural transformation, as is strongly indicated by our findings, then researchers and policy makers may need to more accurately understand how farm structure is changing under alternative land tenure systems and more explicitly consider the impact of farm structure on development outcomes and policy objectives.
The remainder of the paper is structured as follows: Section 2 expands on the theoretical relationships between farm structure and economic development outcomes in agrarian areas. Section 3 describes the data used in this study, and Section 4 discusses the challenges of empirically addressing our research question. Descriptive and econometric results are presented in Sections 5 and 6, respectively, followed by conclusions in Section 7.

Definition
"Farm structure" is a multidimensional concept that incorporates both the distribution of farm sizes and the inequality of farm landholdings (e.g., Stanton, 1991). To illustrate this distinction, consider two districts, one with all farms being 10 ha in size, and the other with all farms being one hectare in size. The farm size distributions of these two districts would be very different, but most measures of land inequality would be the same. In both districts, the Gini coefficient would be zero. We consider that both termsinequality and the distribution of farm sizesare subsumed under the broader term "farm structure", and that both may matter in terms of impacts on rural household incomes.

Core theoretical perspectives
There are two competing ways in which we might think about the relationship between farm structure and household income growth. The first of these is rooted in the seminal work of Johnston and Kilby (1975) and Mellor (1976), who emphasized the importance of growth multipliers as drivers of rural development. The core idea is that because the propensity to spend additional income on local goods and services is greatest for smallholder households, then virtuous cycles are engendered by broad-based agricultural growth in which income gains by smallholders are re-cycled through local farm and non-farm economies. Broad-based agricultural growth tends to generate greater secondround expenditures in support of local non-tradable goods and services in rural areas and towns. In contrast, if agricultural productivity and household income gains are concentrated within relatively few households (as might be the case in areas where a few large farms have a disproportionate share of land and production), then growth multipliers from agricultural surplus may be more limited, as compared with more egalitarian land distributions. Empirical work by Deininger and Squire (1998) and Vollrath (2007) support this idea, providing evidence that relatively egalitarian national-level land distribution pattern are associated with more broadly based agricultural productivity growth, and higher rates of growth, than more concentrated land distributions.
Other ways of thinking about the influence of land concentration on smallholder welfare posit different channels, but which are consistent with the above framework. For example, Berry and Cline (1979), using national-level data, found that the relative underutilization of agricultural land increases with the degree of inequality in land distribution. Sitko and Jayne (2014) describe similar findings for farm-level data from Zambia: larger farms had lower shares of land being used for cropping or other intensive productive activities than smaller holdings. Due to ethnic and social differences, large-scale and small-scale farmers may have little social interaction, minimizing potential synergies from learning, cooperation, and economic exchange, which could be important avenues by which productivity and income gains and spillovers may be realized. Also at the country-level, Binswanger and Deininger (1997), Engerman and Sokoloff (1997), and Sokoloff and Engerman (2000) discuss ways in which land inequality may be associated with institutional control. In particular, land concentration (inequality) is often associated with an elite class of rural landholders that wields political power; this power often limits the ability of non-elites to participate in political systems or benefit from public institutions such as crop marketing boards, input promotion programs, and education, which may condition household income. These processes may play out at both national, regional and local levels. For example, if large farms dominate in a particular area, they may influence the nature of local supply chains, such that input providers, commodity traders and other service providers are more oriented towards supporting larger farms in ways that are less accessible to smaller producers.
A countervailing hypothesis is that larger farms (at least under some conditions) may generate important spillover benefits for smallholders operating in their vicinity. The surplus production of relatively large farms may attract private investment in crop buying, storage, transport, input supply and finance into rural areas, providing spillover benefits to all households in the areas (Collier and Dercon, 2014). The political clout of large farmers may also attract state investment in infrastructure development, which would also benefit all farms in an area (von Braun and Meinzen-Dick, 2009;Deininger and Xia, 2016). Introduction of new production technologies may facilitate technological spillovers via knowledge transfers and increased access to agricultural technologies (Kleemann et al., 2013;Rakotoarisoa, 2011). 1 Direct linkages between large and small farmers may also exist, e.g. out-grower schemes, contract farming arrangements, and the generation of wage employment opportunities. Direct or indirect service provision may also characterize linkages across farm size categories; for example, van der Westhuizen et al. (2018) find that small-scale farmers in Tanzania are substantially more likely to rent tractors in areas with a high concentration of medium-scale farms. Sitko et al (2018) also found that large-scale grain traders tend to locate their buying operations in areas with a high concentration of medium and large farms, and that small-scale farms in the vicinity also benefit from the improved market access conditions that these large grain buyers provide. Knowledge transfers from largescale to nearby small-scale farmers may also be important in some cases (De Schutter, 2011;Mujenja and Wonani, 2012). To the extent that such positive spillovers exist, then land concentration may promote income growth across all households in a shared location.
As it stands, the evidence base for either positive or negative impacts of large farm spillovers on nearby smallholders remains weak. Lay, Nolte and Sipangule (2018) find some evidence for localized positive productivity spillovers of large agricultural investments to nearby smallholders in Zambia. 2 Deininger and Xia (2016) find that large-scale investments produce short-run positive and negative effects on nearby smallholder farms in Mozambique. These studies identify spillover effects through a smallholder household's physical proximity to the number of large-scale farms within a certain radius or whether there are any large farms within the locality. While this may be a reasonable approach for addressing specific kinds of questions, we believe that the effects of land concentration and spillover effects from particular kinds of farms may be more comprehensively understood by constructing measures that represent the concentration of large, medium, and small farms within in a given area, as reflected in various measures of localized farm size distributions. 3 A key premise of this paper should now be clear, i.e. that different aspects of farm structure cannot be fully captured in a single indicator, such as the Gini coefficient, or the number of farms of a certain size in a given area. Conceptually, the pathways by which large farms may influence the behavior and welfare of nearby smallholder households may only incidentally be related to standard land inequality measures. Because farm structure is a multidimensional concept, empirical analysis seeking to understand the effects of alternative land distribution patterns on local growth patterns must consider alternative dimensions of farm structure.

Model of per-full time equivalent gross income determinants
We may generalize the above ideas as follows. Let us start with a farm-level production function: where Y is gross income per full-time equivalent (FTE) for farmer i in community j at time t; X is a vector of household-level characteristics, C is a vector of local geographic context characteristics, G is a measure of access to local public and private stocks of capital, information, services and other resources in community j, and ε is an idiosyncratic error term which may be heteroskedastic and clustered at the household level. If we accept that (unobservable) access to local public and private resource endowments is conditioned by the (observable) localized distribution of land control, i.e.
, , where I is a measure of farmland structure in community j at time t, and Z is a vector of other factors which influence G, then we may rewrite an estimable production function as:

∊ ∊
where δ is an estimable coefficient on observable farm structure. If δ < 0, then the net effect of more concentrated land distribution patterns or the share of farmland controlled by medium or large farms on smallholder household incomes is negative. If δ > 0, then positive spillovers dominate the relationship.

Data sources
Data used in this analysis come from two main sources. Data on household per-FTE gross income measures, along with other householdand community-level controls, were constructed from the Tanzanian National Panel Survey (NPS), available for three waves (2009, 2011, and 2013). 4 Given the nature of our research question, we are interested in estimating impacts on households in rural areas. However, the census definitions of "urban" SEAs in this sample are not urban in the conventional sense of being primarily composed of town dwellers without agricultural land and farming activities. Therefore, we include households in "urban" SEAs which have population densities of less than 500 persons per square kilometer. (As a robustness check, we compare results with models estimated on samples which only include rural SEAs.) After dropping households which only appear in a single wave, we have an unbalanced panel of 7450 observations across the three waves.
While the NPS is considered to be statistically representative of households in rural Tanzania, it may not have sufficient observations to be statistically representative of the full range of farm sizes found in Tanzania (Christiaensen and Demery, 2018). We therefore constructed indicators of farm structure from the Tanzanian Agricultural Sample Census (ASC) for 2009. The ASC contains data on 52,635 rural agricultural households randomly selected from the prior census, as well as all 1006 farms categorized as large scale 5 that were identified in the 1 Such knowledge transfer from large to small farmers may take place directly, e.g. via technical assistance, formal and informal training and/or service provision, or indirectly, e.g. via learning-by-doing.
2 Empirical evidence is somewhat limited. Some literature uses firm level data (Javorcik, 2004;Görg and Greenaway, 2004), and does not focus on agriculture. At least two studies have provided evidence in support of large-scale land-based investments contributing to infrastructural improvements in the investment locations (Mujenja and Wonani, 2012;FAO, 2012). 3 In addition to the several studies cited here, we also found a few studies examining the impacts of large-scale farm investments on local communities using qualitative case study approaches (e.g. Cotula et al., 2009;Anseeuw et al., 2012).
country at the time. The ASC enables district-level inference, but the large-farm component only contains regional identifiers. We thus faced a quandary: either construct measures of farm structure at the more localized district level (n = 142) and omit the large farm module, or construct measures of farm structure at the much larger region level (n = 26) that include the large farm module. To evaluate this decision, we constructed the Gini coefficient and other measures of farm structure at the region level, first including the large farm module and then excluding it. As will be shown below, alternative measures of land concentration have different sensitivity to whether the large-farm module is included or not. Because the district-level indicators enable us to examine the relationship between farm structure/inequality and household incomes at a more disaggregated geographic level, the analysis in this paper focuses on these results. However, to address potential bias, the models which use district-level concentration measures also include a dummy indicator for regions which are sensitive to exclusion of the ASC large-farm component (described further below). We also estimated models using regional-level land concentration measures for comparison; these results are shown in the appendix.

Variables measuring farm structure and land concentration
Our exogenous variables of main interest pertain to farm structure and land concentration, which we compute for each of Tanzania's 142 districts with a population density of less than 500 persons per square kilometer. Household landholdings are defined in this analysis as all the land controlled by the household (including land rented in), including land that is cultivated, in fallow, undeveloped, under pasture, planted with trees or other permanent crops, or any other land usage. For every household in the sample, the total landholding size is constructed as the sum of plot-level records. There are many alternative possible measures, including (i) the Gini coefficient; (ii) skewness (third standard moment); (iii) coefficient of variation (standard deviation/mean); (iv) share of operated farmland on farms between 5 and 10 ha; and (v) share of operated farmland on farms over 10 ha. Each of these variables measure different aspects of farm structure, with some emphasizing the importance of specific scales of farm operation, while others focus more on the degree of concentration of landholdings. Because of this, we do not necessarily expect these indicators to be highly correlated. Fig. 1 shows stylized landscapes which represent alternative farm size configurations of a constant total area. Concentration metrics are calculated for each, and shown in the figure. For the most part, these correspond with an intuitive understanding of concentration in that, as we progress from the upper left, through the upper right, lower left and lower right, we have generally increasing values in most metrics.

Outcome variables
Our key outcome variables of interest are per-FTE (full-time equivalent) income measures. These are constructed using householdlevel earnings divided by the household sum of individual-level fulltime equivalent values calculated from labor-allocation data as recorded by the NPS. Four main categories of income are considered: farm, non-farm, agricultural wages, and total income. Farm income includes income from crop production and livestock income (which includes the value of sales of live animals, value of slaughtered animals, and value of production of milk, eggs, honey, hides and skins). Nonfarm income includes income from non-farm business activities and offfarm wage employment (excluding agricultural wage labor). Agricultural wage employment income was also included as a separate category. Finally, total income is the sum of farm, agricultural wage, and non-farm income.

Full-time equivalents
To calculate FTEs, we add up the hours an individual household member reports allocating to on-farm activities, non-farm business activities, and wage employment activities. For any given individual, the total hours per week spent working cannot exceed 112 (=16 h * 7 days). If the amount reported across all categories exceeds this amount, we scale hours in each category proportionally, such that 112 h per week is not exceeded. All monetary values were converted to real 2010 USD.

Implementation challenges
There are several important challenges in estimating our model of interest. These include data quality constraints, an inherent arbitrariness in defining measures of land concentration, and endogeneity issues in ascertaining impacts of land concentration on growth outcomes.

Data quality issues
The Tanzanian National Panel Surveys, described above, are limited in how information on per-FTE gross income was collected. First of all, for wage income, the amount worked by an individual over the previous 12 months was only calculated for the last two waves. Furthermore, while the second and third waves asked about primary and secondary jobs, the first wave only asked about the primary job. Thus, we were not able to use the first wave of the NPS (2008) in this analysis.
Even though total time worked for wage income over the previous year was nominally recorded (via three questions: "During the last 12 months, for how many months did you work in this job?", "During the last 12 months, how many weeks per month do you usually work in this job?" and "During the last 12 months, how many hours per week do you usually work in this job?"), the informal nature of much wage employment in Tanzania (perhaps particularly in rural areas) implies a high degree of variability, which may not easily filter through such averaging questions. The consequence of this is that our income data are somewhat noisy, as are our measures of per-FTE gross income based thereon. Measurement error in the dependent variable does not bias coefficient estimates but it may inflate their standard errors (Wooldridge, 2010). To address sensitivity of estimation results to such noise, we also estimate regression models for dependent variables which are not normalized by FTEs (i.e. on household-level income measures). These regression results differ little from our per-FTE measures, and we therefore report results from the per-FTE models.

Endogeneity concerns
In estimating our model of interest, equation (3), there are several endogeneity concerns. The first of these is that localized farm structure and per-FTE gross income may be jointly driven by unobserved factors. For example, if land concentration is associated with commercial land investments that target areas of favorable agro-ecological or market access conditions, we may get upwardly biased coefficient estimates. We take three steps to control for this. First, we include available agroecological and market access controls in our regression models. Agroecological zones were identified for survey locations from the Har-vestChoice database maintained by IFPRI. 6 Market access variables, calculated by the World Bank, are based on estimated travel time to the nearest road or market town. Second, we include year and year*zone dummy variables to control for unobserved time-constant and timevarying regional effects that are not otherwise captured by our controls.
While the first and second steps above can control for unobserved (footnote continued) production). See National Bureau of Statistics (2012: p11) for more details.
regional effects, there remains the issue of unobserved heterogeneity at the household level. To address this, our third step is to exploit the panel nature of the data to incorporate the Mundlak-Chamberlain (MC) device (Mundlak, 1978;Chamberlain, 1984) into our models, giving us an estimator that Wooldridge (2010) refers to as the Correlated Random Effects model. The MC device employs household-level averages of all time-varying components of the model in order to control for unobserved time-constant heterogeneity, under the assumption that such heterogeneity is correlated with the time-averages. While we cannot fully eliminate all potential sources of endogeneity, we feel that the three steps described here go as far as possible to do so with the available data.

Attrition
A third concern is potential attrition bias arising from the use of panel data. We test for this and reject the null hypothesis of no attrition bias in our models. We therefore define and implement attrition weights in all of the regression models, following the methods described in Baulch and Quisumbing (2011). 7

Farm size structure
The Tanzanian farm sector is dominated by smallholdings, as elsewhere in the region. However, measures of farm structure are sensitive to choice of dataset. Table 1 shows distributions of farm sizes across the country, using the NPS and ASC for 2009. Three observations are highlighted here. First, the distribution of farm holdings using NPS is sensitive to whether landless households are included in the analysis, at least at very low percentiles of the distribution. Less than one percent of rural Tanzanian households are found to be landless. Second, and more importantly for our analysis, farm sizes start to diverge between NPS and ASC at high ends of the farm size distribution. At the 95th percentile, the ASC shows farm size to be 8.1 ha compared to 6.8 ha for NPS. At the 99th percentile, the ASC shows farm size to be 33.8 or 31.7 percent higher than that of NPS, depending on whether the ASC's largescale module is included or not. Third, the distribution of farm sizes for ASC up to the 99th percentile is virtually the same regardless of whether the large-scale farm module is included or not. As mentioned earlier, we can define district-level measures of farm structure using ASC only if the large-scale module is excluded, which fortunately has little bearing on most indicators of farm structure. Given the much larger sample size of the ASC and its statistical representativeness at district level, we prefer it to the NPS for constructing indicators of farm structure and land concentration.  7 In practical terms, we find that whether or not we use attrition weights, our coefficient estimates change very little, and the overall analytical conclusions are the same. Nonetheless, because we reject the null hypothesis of zero attrition bias in many of our model specifications, we report results from the weighted models in this paper. Unweighted model results are also available from the authors upon request.

Income trends, by farm size category
rose by a small amount between 2009 and 2013: by 0.8% for the < 2 ha landholding category, which constitutes roughly 56% of the rural population; 0.2% for households in the 2-5 ha category; 2.0% for 5-10 ha farms, but negative for the largest category (although with the caveat that the number of farms in this category in the NPS is relatively small). With the exception of agricultural wage income, incomes grew faster for farms over 5 ha than for the majority of farms below 2 ha. Overall, the non-farm and agricultural wage income categories experienced the most growth. This is consistent with other analyses showing a shift in Tanzania's employment patterns from farm to off-farm and non-farm sources of income in recent years (Yeboah and Jayne, 2018).

Land concentration measures
To evaluate sensitivity of land concentration indicators to choice of landholding dataset, we constructed land concentration measures at the national level from both the NPS and ASC for 2009 (Table 3). Comparing measures constructed using the small farm component of the ASC with measures from the NPS, we find that measures differ substantially from one another in some respects and very little in other respects. As expected, when including the large scale farm component of the ASC, some measures are substantially higher, e.g. skewness, CV and the share of land under farms of 10 or more hectares, although other measures are very similar with those based on only the small farm portion (Gini and the share of land in farms of 5-10 ha).
To further explore the comparability of alternative farm structure measures, we construct correlation matrices for alternative indicators at the region level (Table 4). Alternative indicators correlate imperfectly with one another. This should not be considered surprising considering that they emphasize different aspects of farm structure. In any case, the results in Tables 3 and 4 point to the need to evaluate the robustness of our results to the choice of alternative farm structure indicators.
Given the possible distortion of district-level land concentration measures from the ASC which are not able to include the large farm component, we evaluate the correlation of measures constructed at the regional level (which does permit inclusion of the large-farm module). Comparing such region-level measures constructed with and without the large farm sample, we find that most regions do not vary much. As an example, comparisons of Gini coefficients are shown in Fig. 2. Nonetheless, to account for potential biases in our regression work relying on district-level concentration measures, we include a dummy   variable to identify regions where the inclusion of the large farm component results in differences in the more sensitive land concentration measures.

Baseline specification
Coefficient estimates for land concentration measures from our baseline regression specifications are shown in Table 5 (the full set of results are shown in Appendix Table A1). There are 4 dependent variables: (a) agricultural per-FTE gross income, (b) non-agricultural per-FTE gross income, (c) off-farm agricultural per-FTE wage income, and (d) total per-FTE gross income. All of these dependent variables are transformed using the inverse hyperbolic sine transformation (MacKinnon and Magee, 1990). Because most of this function's domain approximates that of a logarithm, the coefficient estimates can be interpreted as one would in a log-level specification. For each of these dependent variables, the specifications differ only in the choice of farm structure measure: (i) Gini coefficient, (ii) skewness, (iii) coefficient of variation, (iv) share of land in farms of 5-10 ha, (v) share of land in farms of > 10 ha, and (vi) share of land under farms of 5-10 and > 10 ha, respectively. Because initial testing indicated that panel attrition was influenced by variables in our models, these specifications all use inverse probability weights to correct for the probability of household attrition. However, it is noted that weighted and unweighted regressions differ very little in the resulting coefficient estimates. Finally, we estimate heteroskedasticity-consistent standard errors clustered at the household level to account for potential non-constant variance.
The main analytical conclusion is as follows: while impacts on any particular income type are highly dependent upon which measure of land concentration is used, the overall impacts of more concentrated landholding patterns on farm, non-farm and total per-FTE gross income is positive for most measures. The impact of the share of land in medium-scale farms (between 5 and 10 ha) is particularly pronounced and highly significant. Estimated impacts on agricultural wage income are not significant, however, for any of the specifications. The higher the share of district farmland among farms 5 to 10 ha, the greater the impact on farm, non-farm and total incomes among households within the district. This positive contribution of medium-scale farms does not appear to carry over to farms > 10 ha. In fact, a higher share of district farmland under farms over 10 ha has negative influences after controlling for share of land in 5-10 ha farms, although this result is not significant at conventional levels. This suggests that commercial operations over 10 ha in our sample may not engender the same kinds of positive spillover effects as with medium-scale farms of 5-10 ha. 8 This finding is consistent with the idea that income multipliers are smaller when very large farms possess a relatively large portion of the land under production in a localized farm-based economy.
As mentioned previously, to address concerns about omission of the large farm component of the ASC in our district-level land concentration measures, we include a dummy indicator for regions where the regional-level landholding coefficient of variation (CV) is sensitive to exclusion of the large-farm component of the ASC. Other land concentration measures are much less sensitive to whether the large-farm component of the ASC is included in their calculation. This dummy takes a value of 1 for all regions with greater than average changes in CV. Interestingly, this dummy is significant and negative in the farm and total income models (panels a and d) and significantly positive in the agricultural wage model (panel c). One interpretation of this result is that while commercial farm operations may generate some kinds of agricultural wage opportunities, they may negatively impact farm and total incomes through reduced multiplier effects (which we would expect of larger farming operations which invest smaller shares of their income in locally traded goods and services). While this result is interesting, we do note that the other coefficient estimates are not particularly sensitive to inclusion of this control. Furthermore, we estimated models using regional-level concentration measures which include the large-farm component (presented in appendix table A4). 9 In comparing these estimation results, we find little substantive difference, providing some reassurance about robustness of the overall results to alternative data choices. One difference worth noting is that the models with regional-level land concentration measures do find positive impacts of medium-scale farm concentration on agricultural wages, although negative impacts of concentration of 10+ hectare farms and the Gini coefficient. This signals that our assessments of larger farm impacts on agricultural wage income should be taken more cautiously than for other income categories, which generally have more participants and may be observed with greater precision.
We also ran models with alternative definitions of land concentration, e.g. share of land in 5-20 ha farms, 20+ ha farms, etc., in order to evaluate whether any additional structural insights might be gained. For all of the alternative definitions we checked, the same basic results were obtained, i.e. the coefficient on share of land in the "medium- Regional Gini from ASC (small only) Regional Gini from ASC (sm+lg) Regional Gini coefficients on landholding size  scale" category was significant and positive, but decreasing in magnitude with the width of the category (i.e. the positive impact of the coefficient on the share of land under 5-10 ha farms is smaller in magnitude than the coefficient on the share of land under 10-20 ha farms); the coefficient on share of land in the largest category (whether > 10, > 20 or > 50 ha) was invariably insignificant. A graphical presentation of coefficient estimates from these alternative models is presented in Fig. 3 (with full estimation results reported in appendix Table A4). These results suggest that we should be wary of being overly dogmatic in our definition of "medium scale" farms using any particular farm size definition: we would obtain broadly similar results using a 5-10 ha categorization, although some precision is lost as we expand the range of farm sizes that we include in this category. Given the predominance of gendered differences in land access in sub-Saharan Africa (Doss et al., 2013), we considered the possibility that land concentration may be correlated with poorer access to land by women (in which case gendered land access would constitute an omitted variable that is correlated with and biasing the coefficient on land concentration). To address this concern, we first note that all specifications include a female-head dummy, which is only significant in the total income models. Furthermore, when we specify a model which interacts the female-head dummy with the land concentration measures, the coefficient estimate for the interaction term is not significantly different from that of the non-interacted term. This provides some reassurance that a gendered access story is not driving our results.
To test the robustness of these findings, we estimated a large number of alternative specifications. To begin with, to allay concerns about potentially noisy per-FTE income measures, we also estimate models using farm-level income measures (i.e. not normalized by household FTEs for any given income category), and with per-capita income measures, as dependent variables. Coefficient estimates from these models are extremely similar to the per-FTE measures. We also estimated models using a sample restricted to households in enumeration areas classified as "rural" by the Tanzanian National Bureau of Statistics, to address the possibility that our results are influenced by households which are not truly rural; results are very similar in this case as well. Table 6 shows results from model specifications which interact district land concentration measures with household-specific productive asset categories, defined as terciles of the sample distribution. For conciseness, we only show model results where the dependent variable is agricultural per-FTE income. Full estimation results are shown in Appendix Table A2. While impact estimates vary across land concentration measures and their interactions, a general pattern may be observed in which the positive impacts of land concentration are larger for the top two asset terciles. Relatively poor households benefit the least from localized land concentration (and in some cases are worse off from it), whereas the majority of households and especially the wealthiest one-third of households tend to benefit significantly.

Interacting concentration with household asset dummies
These differential wealth-related impacts on farm income indicate that spillover effects from land concentration are not equally accessible to all farms within an area. Even though Sitko, Burke and Jayne (2018) found that medium-scale farms tend to attract private traders and improve market access conditions for nearby smallholders, relatively poor households may not be able to benefit from this if they cannot produce a farm surplus in the first place. While van der Westhuizen et al., (2018) found that medium-scale farmers tend to rent out their tractors to smallscale farms in their areas, relatively poor households may not be able to afford to take advantage of such services. While we feel these  Table A4 (with groupings a-d in the figure above corresponding to the specifications in the table). The dependent variable in these models is the inverse hyperbolic sine transformed per-FTE gross total income measured in 2010 Tanzanian shillings.  Table A2 panel a. Robust p-values in parentheses, with significance indicated by asterisks: ***p < 0.01. * p < 0.1. ** p < 0.05.
interpretations are quite plausible in light of other evidence, our data do not allow us to do more than surmise what specific channels of spillover may be taking place. These results are similar to results of models (not shown here) that use farm size categories in place of household asset categories, which is not surprising given the high correlation between asset wealth and land holding size.

Exploring spillover pathways
Given the indicators of some positive spillovers in the previous results, it is of interest to know whether any particular spillover pathways are identifiable in our data. As a simple way to explore this, we test the individual and joint impact of augmented versions of the baseline model which include, as additional controls, measures of (a) the district share of households with non-farm wages; (b) the district share of households with agricultural wages; (c) the district share of households with rented or borrowed farming equipment (e.g. tractor, oxen, harrow); (d) the district share of households with purchased fertilizer; (e) the district share of households which received extension advice; and (f) the district share of households which received market information from large farmers. To the extent that such indicators are significant, and the extent to which their inclusion in the model diminishes the size and significance of the medium-scale farm concentration variable, we may gain insights into the corresponding pathways of spillover. Results are summarized in Table 7 below (with full estimation results presented in the Appendix table A3).
For farm income (panel a), we see that district agricultural wage market participation is negatively associated with farm income (possibly reflecting impacts on own-farm labor supply), and that the share of households purchasing fertilizer is positively associated with farm income. However, the coefficient estimate on the share of land in 5-10 ha farms is reduced by only a small amount upon inclusion of these controls, and is fairly consistent across specifications, suggesting that there are other channels of positive spillovers (or that the indicators included only capture some of the salient variation).
Unsurprisingly, the district share of households with non-farm income (which may represent non-farm economic vibrancy) is a positive and significant determinant of non-farm per capita income levels (panel b), while agricultural wage market participation is a positive and significant determinant of agricultural wage per-FTE income (panel c). Otherwise, however, the coefficient estimate for 5-10 ha farms is robust to these additional controls. For total income (panel d), the district share of households with non-farm and farm income are significant regressors, but again, the estimated spillover impact of 5-10 ha farms is largely unaffected. In sum, the processes by which medium-scale farms may be driving income improvements in surrounding areas appear to be diffuse, and extend beyond the simple indicators of input, service and labor markets that we control for here.

Simulated impacts of changes in farmland distributions
To understand the magnitude of these impacts, we simulate the impacts of a change in the farm structure variables on total per-FTE income, using the baseline results in Table 5. We consider a change in land concentration, moving from the 25th percentile to the 75th percentile. For example the 25th and 75th percentiles of the farmland Gini coefficient in our sample are 0.41 and 0.53, respectively). 10 Results, shown in Table 8 below, indicate that such an increase in land concentration, as measured by the Gini coefficient, is associated with an average gain of 140,000 TSh in total per-FTE gross income (about 100 USD), ceteris paribus, which is equivalent increasing the mean level of total per-FTE gross income in our dataset by 15%. The estimated impact of increasing the share of land under medium-scale farms from the 25th to the 75th percentile is even larger: a 217,000 TSh gain, 23% of the sample mean. We reiterate that the model contains numerous granular controls for agro-ecological, rainfall, population density, and market access conditions, and their interactions with year dummies. While we cannot rule out that the magnitude of these simulation results are affected to some degree by unobserved time-varying heterogeneity correlated with district-level farm structure, the inclusion of these granular controls with year interaction terms, with CRE estimation, provides fairly strong evidence that differences in localized farm structure independently exert major effects on economic activity in an area, judging from the very strong estimated effects on rural households' farm and non-farm incomes. We find these findings to be very plausible in light of growing evidence that medium-sized farms tend to be more commercialized than small-scale farms in both input market participation and output market sales, and are attracting private investments in a variety of ways that are improving market access conditions for all rural residents in an area. Anecdotal evidence also suggests that medium-scale farms are a major source of cash expenditures in the local non-farm economy, potentially increasing the demand for a wide variety of goods and services locally (e.g., Poulton, 2018).

Policy implications
Our findings suggest that localized heterogeneity in farm structure may be conducive to rural development in smallholder-dominated agricultural production systems. This appears to be particularly true when such heterogeneity includes medium-scale farms in the 5-10 ha range. Such farms may have stronger multiplier effects than larger commercial farms, and may also have greater direct linkages (e.g. through wage employment, equipment rental, service provision) than either larger or small-scale farms.
Our findings for Tanzania may be viewed as being somewhat in contrast to what has become a widely accepted stylized fact from the Asian literature on agricultural development and structural transformation, i.e., that relatively unconcentrated unimodal patterns of farmland distribution tend to generate the greatest localized multiplier effects, non-farm employment growth, and economic transformation resulting from agricultural growth. We do not wish to challenge this literature or assert that it does not apply to Africa, especially not based on one case study. We do raise the following points that may reduce the apparent inconsistency between our results and the Asian literature stressing the developmental superiority of a unimodal, smallholder-led development approach. First, the Asian literature is largely derived from areas with irrigation or water control that allow for two or three production seasons from the same plots, which clearly generates much greater farm commercialization and associated multiplier effects from relatively small farms compared to farms totally reliant on rainfed moisture with one growing season per year, which is the case in much of Tanzania and Africa more generally. If we accept that differences in crop productivity and surplus output per land unitdriven by fundamental differences in soil, rainfall and water control conditionsmay influence the scale of farm production capable of driving economic transformation, then it may be plausible that the farm scales and farm size distributions required to achieve these outcomes in sub-Saharan Africa may be different than in much of Asia. Second, we find that it is farms in the 5-10 ha range that generate the strongest indirect effects on the incomes of households around them, and these farms are not very much bigger than the "commercialized smallholder farms" that Mellor (2014) and others refer to as the dynamic change agents in Ethiopian agriculture. As district-level shares of medium-scale farms in the 5-10 ha range are positively correlated with measures of land inequality, such as the Gini and CV of landholdings, this may explain why Land share: 5-10 ha farms 6.456*** 7.638*** 6.040*** 6.441*** 6.290*** 6.287*** 6.566*** 6.874*** (1. we find a positive effect of land concentration on rural household incomes in the area. However, this does not mean that extreme forms of land inequality are favorable for economic development. Consistent with the Asian unimodal vs. bimodal literature, we find some evidence that a higher concentration of district-level farmland under farms holding over 20 ha adversely affects rural household incomes in a given district. Therefore, we do not view the results of this study from Tanzania as challenging the longstanding evidence from Asia in favor of an inclusive smallholder-led approach to development, but rather as provoking the need for a more nuanced discussion in Africa about the farm scales associated with agricultural commercialization, factor market participation, and economic transformation in light of fundamental differences in farm output per land unit per year. Additional evidence from sub-Saharan Africa on this issue will certainly contribute to a deeper understanding of the role of alternative farm size distributions in promoting transformation dynamics in Africa, and whether the size of farms and distribution of farmland that tended to generate the greatest indirect benefits to surrounding rural households in Asia may be somewhat different in Africa, where over 95% of agricultural land remains under rainfed production. While the findings of this study indicate that localized differences in farmland inequality and size distributions are associated with major differences in rural household incomes, this does not imply that land redistribution programs are warranted to alter farm structure in a particular direction. As stressed above, much more evidence from other sub-Saharan African countries is needed to better understand whether robust cross-country evidence emerges in support of spillover effects resulting from particular farm scales or farmland structure variables. Moreover, it will be important to take account of evidence in related literatures on land tenure security, investment and land productivity, and broader livelihood effects resulting from struggles over the institutions governing access to land (e.g., Boone, 2014;Lawry et al., 2014;Melesse and Bulte, 2015), which identify important rural development effects resulting from land tenure programs that may alter farm structure in ways not considered in this analysis.

Conclusions
This paper is motivated by the need to better understand the impacts of changing farm structure in Africa. Recent research has documented the rapid pace of new land acquisitions by foreign large-scale interests (Deininger and Byerlee, 2011) and by medium-scale African farmers . Tanzania in particular has experienced a rapid increase in land controlled by medium-and large-scale farms in recent years (Schoneveld, 2014;Jayne et al., 2016). These changes in farm structure and composition have generated much speculation and debate about the impacts on smallholder households and rural communities, unfortunately with very little hard evidence to guide land policies and programs. Recent analysis has addressed the potential for spillover effects of large farms on technology adoption and yields of nearby smallholder farms, but the broader impacts on household incomes, disaggregated by farm and non-farm sources, and how these impacts may differ according to gender and wealth characteristics of the household have yet to be explored. Moreover, the few studies on this topic tend to identify the effects of large farms based only on proximity to a given smallholder household, without considering potential differences in impacts between large-scale and medium-scale farms or the degree of concentration of such farms within a given locality. This study addresses these issues by characterizing the structure of farm operations at the district level in Tanzania, both in terms of the relative importance of small, medium, and large farms, as well as by various indicators of farmland concentration in the locality. We estimate whether different kinds of farm structure are beneficial or detrimental to rural development as measured by rural household incomes, disaggregating the effects on income from own farming, agricultural wages, and non-farm sources. Farm structure indicators are derived from a unique Agricultural Sample Census carried out in Tanzania in 2009, which is statistically representative for the country's 142 districts. 11  Note: Values in columns (a), (b) and (c) are thousands of real 2010 Tanzanian shillings (TSh). In 2010, 1 USD ≈ 1450 TSh. These simulation results based on the baseline regression specifications shown in Table 5 and Appendix Table A1. 11 The number of districts has since increased. As of the 2012 census, there were 169 districts in Tanzania.
Our first observation is that alternative indicators of farm structure chosen for this study differ considerably from one another. Some indicators emphasize the relative importance of different scales of farm operations in the locality, while others focus on the degree to which farmland operations are concentrated or unequally distributed. These observations reinforce a point that should already be well acceptedthat farm or "agrarian structure" is a multifaceted concept and that specific indicators of agrarian structure may not be highly correlated with one another. For this reason, the impacts of different farm structures are likely to be highly sensitive to the choice of indicator.
Guided by these findings, we estimate several models of household income per full-time labor equivalent using panel estimation techniques, based on five alternative indicators of farm structure, including (i) the Gini coefficient; (ii) skewness; (iii) coefficient of variation; (iv) share of controlled farmland under 5-10 ha farms; and (v) share of controlled farmland under large farms (defined here as farms of 10 or more hectares).
The study highlights four main findings. First, farmland concentration is generally positively associated with rural household incomes, after controlling for other geographical and household-level factors. Second, household incomes from farm and non-farm sources (excluding agricultural wages) are particularly positively and significantly associated with the share of land in the district under farms of 5-10 ha. Third, these positive spillover benefits are smaller and less statistically significant in districts with a relatively high share of farmland under farms over 10 ha in size. While our econometric results do not identify the reason why medium-scale farms appear to generate greater spillover effects to local communities than relatively larger farms, recent published studies give us some cluesmost medium-scale farmers come from the same social and ethnic backgrounds as smallscale farmers, and tend to have more extensive social interactions with the local community than do most large-scale farms (Sitko and Jayne, 2014). Fourth, poor rural households are least able to capture the positive spillovers generated by medium-scale farms and by concentrated farmland patterns. The greatest benefits to household income were enjoyed by households in the upper two-thirds of the wealth distribution, which still includes the majority of rural households. We speculate that poor rural households are less able to afford taking advantage of the improved access to markets and services that medium-scale farms tend to provide. This aligns with our finding that agricultural wages do not seem to be a primary channel of influence. However, more detailed research is needed on the pathways by which medium-and large-scale farms affect household and broader local community welfare. 12 The analysis presented in this paper contributes to our understanding of how various dimensions of farm structure may uniquely influence rural development trajectories, both conceptually and empirically in a particular African context. Our analysis has demonstrated the value of structurally interpretable measures of farmland distributione.g. local share of land under 5-10 ha farmsas compared with more generic measures of distribution, such as the Gini coefficient.
More generally, our study underscores the importance of good data on land distributions in developing countries (see also Lowder et al., 2016). The recent bounty of nationally representative data available through such initiatives as the World Bank's LSMS-ISA program is certainly to be applauded. Nonetheless, we would advocate for even greater investments in expanding the sampling frameboth to ensure adequate representation of larger farms and to enable more spatially disaggregated measures of farm structure and concentration. Furthermore, more detailed data collection on localized tenure arrangements, settlement histories and the modalities of land acquisition should help clarify policy linkages with land distribution outcomes. Although this will require increased investments in data collection, the analytical payoffs stand to be substantial, particularly in countries where mediumand large-scale land acquisitions are known to be taking place. Our analysis suggests that small-to-medium-scale local investors, rather than large-scale foreign investors, are key players in the farmland dynamics being experienced in Tanzania, in ways which are consistent with expected patterns of structural transformation. But much empirical work remains to be done to flesh out our understanding of such dynamicsand their impactsin Tanzania and elsewhere in the region. Ultimately, the scope for research to inform and guide African states will depend upon how well researchers and policymakers are able to accurately monitor and evaluate these changes taking place on the ground.