Reducing the maize yield gap in Ethiopia: Decomposition and policy simulation

Maize is an important staple crop in Ethiopia. Reducing the yield gap the difference between actual and (water-limited) potential yield has wide implications for food security and policy. In this paper we combine stochastic frontier analysis of household survey data with agronomic information on (water-limited) potential yield to decompose the maize yield gap in Ethiopia and highlight policy solutions to reduce the yield gap. Our analysis suggests that lack of access to advanced technologies makes up the largest component of the maize yield gap but market imperfections, economic constraints and management constraints are also important determinants. Potentially, maize production can be increased almost fivefold if all these constraints would be addressed simultaneously and the yield gap could be fully closed. Another finding of the paper is measurement issues in the national household survey (LSMS-ISA), a key source of information for scientists to assess agricultural policies in Ethiopia and other African countries. A comparison with results from a crop model suggests a large number of unrealistic values related to key maize input and output variables. Combining economic and agronomic approaches is therefore not only useful to identify policies to reduce maize yield gaps, but also to assess and improve the quality of data-bases on which recommendations are made.


Introduction
Maize is critical for food security in Ethiopia. More than 9 million smallholders grow maize on about 2 million ha (14% of total land area in Ethiopia) and around 88% of their production is used for food consumption (Abate et al., 2015). In terms of calorie intake, maize is the most important staple crop for the rural Ethiopian population (Berhane et al., 2011).
Over the last two decades, the maize sector in Ethiopia has experienced an unprecedented transformation. Maize yields have doubled from around 1.6 t/ha in 1990 to more than 3.7 t/ha in recent years, the highest level in sub-Saharan Africa after South Africa (FAO, 2019). Important causes for the increased productivity include increased availability and use of modern inputs (e.g. modern varieties and fertilizer), better extension services and increasing demand (Abate et al., 2015).
Despite the recent progress in productivity, yield levels in Ethiopia are still very low relative to what they could be. According to the Global Yield Gap Atlas (GYGA, 2019), the water-limited yield potential of maize in Ethiopia is on average 12.5 t/ha, implying that farmers realize only around 30% of that potential. This is in contrast with for example Latin American countries which are able to reach around 45% of potential maize yield (GYGA, 2019).
Increasing maize yield and reducing the yield gap are essential to ensure future food security in Ethiopia. In a recent paper Ethiopia's capacity to feed itself by 2050 was analyzed (Van Ittersum et al., 2016). The analysis showed that the country needs to continue the current observed increase in cereal yield (of which maize makes up the largest share) to maintain its present self-sufficiency rate of 95% in 2050, as by then the population will have probably more than doubled and consumption per capita levels have increased in line with a higher projected income level. This would be equivalent to a yield increase to around 50% of the water-limited potential yield of cereals. If the yield level stays at the present level, Ethiopia will only be able to produce 40% of its cereal needs in 2050, which is a potential risk for food security.
In order to propose policies that help to reduce the maize yield gap in Ethiopia, it is important to quantify the key contributing factors. The main aim of this paper is to provide a detailed analysis of the maize yield gap in Ethiopia, identify its main causes and highlight policy options that contribute to reducing the yield gap. We start by presenting a framework to decompose the conventional yield gap into four components, which capture major causes of below-potential production (Van Dijk et al., 2017). Based on a literature review, we link the four yield gap components with key policy solutions that have been proposed to increase the crop yield of smallholders in developing countries (e.g. credit provision, agricultural R&D and extension services). Next, the framework is operationalized by combining stochastic frontier analysis (Coelli et al., 2005) with information on potential yield from crop model simulations. This ensures that both the economic and agronomic features of the yield gap are adequately captured (Sumberg, 2012). Our main data sources include a nationally representative household survey, which was thoroughly screened for outliers, and information on potential maize yield from the Global Yield Gap Atlas (GYGA, 2019). Finally, the yield gap analysis is used as a basis for a policy simulation that compares the impact of policy options to close the yield gap on total maize production in Ethiopia. The results of our study can be used to inform targeted policy decisions and assess potential entry points to increase maize productivity in Ethiopia.

Conceptual framework
The conventional yield gap is defined as the difference between (water-limited) potential yield and actual farmers yield (Fischer, 2015;Lobell et al., 2009;. Potential yield is the maximum yield that can be produced on a parcel of land given agroclimatic conditions, assuming that water and nutrients are non-limiting, and pests and diseases are effectively controlled. Water-limited potential yield (Yw) is defined similar as potential yield, but crop growth is also limited by water supply, and hence influenced by soil type and field topography. Using insights from agronomy and agricultural economics Van Dijk et al. (2017;also see Silva et al., 2017) distinguish three additional yield levels that can be used to decompose the conventional total yield gap.
The conceptual framework is illustrated in Fig. 1. The theoretical yield response function presents the relationship between yield and inputs under perfect crop management, use of advanced technologies and a constant agro-ecological environment. The maximum of the function equals potential yield (or water-limited potential yield in case of rainfed crops, such as maize in Ethiopia). The function can be estimated using crop models, highest yields at agricultural research stations and highest yield in farmer contests (Lobell et al., 2009). Again assuming constant agro-ecological conditions, the frontier response function shows best-practice yield at each level of input and reflects the best-management practices and technology that are available in a certain region. 1 In combination with additional information on input (f) and output (m) prices and detailed agronomic information on optimal nitrogen application and seed rate, five yield levels can be derived: (1) actual yield (Ya), (2) technical efficient yield (Yte), which reflects the best-practice yield for a given amount of inputs, (3) economic yield (Ye), which is the yield level, when input and output combinations are profit maximizing; (4) feasible yield (Yf), which measures the bestpractice yield if there would be no economic constraints and (5) potential yield (Yp) or water-limited potential yield (Yw) depending on whether crops are irrigated or rainfed.
Combining the different yield levels results in four yield gap components ( Fig. 2): (1) the technical efficiency yield gap (TEYg), which measures crop management inefficiencies in production; (2) the allocative yield gap (AYg), which captures the suboptimal allocation of resources; (3) the economic yield gap (EYg), which reflects economic constraints; and (4) the technology yield gap (TYg), which captures lack of access to (advanced) technologies. Together these four components add up to the total yield gap (Yg): Fig. 1. Conceptual framework to decompose the yield gap. It identifies five different yield levels (Y): actual yield (Ya); technical efficient yield (Yte); economic yield (Ye); feasible yield (Yf); and (water-limited) potential yield (Yw or Yp) with associated input levels (x), resulting in four yield gap components: technical efficiency yield gap (TEYg); allocative yield gap (AYg); economic yield gap (EYg); and technology yield gap (TYg) that together add up to the total yield gap (Yg). f = input price; m = output price. Source: Van Dijk et al. (2017), also see Silva et al., 2017. 1 In practice, farmers will be located in areas that are characterized by a wide range of agro-climatic conditions. In the empirical illustration below, we use climate zone specific estimates of water-limited potential yield and control for differences in agro-ecological conditions in the estimation of the yield response curve.
Dividing both sides by Yg gives the contribution of each part to the total yield gap. We refer to Van Dijk et al. (2017) for more information on the conceptual framework. Below we explain how the framework is operationalized for our case-study on Ethiopia.

Policies to reduce the yield gap
Several (sets of) policy options have been proposed to increase the productivity of smallholders (World Bank, 2008). Policy solutions that are frequently proposed to improve the performance of small-scale farmers include providing credit and insurance, investing in agricultural R&D, improving extension services and providing input subsidies. In the remainder of this section, we link these and other smallholder policy options that have been proposed in the literature to the yield gaps that result from the decomposition.
The main causes for the technical efficiency yield gap are gaps in knowledge, information and skills (e.g. the appropriate use, combination and timing of inputs, including for instance crop protection measures), which prevent farmers from reaching best-practice. 2 The relevance of this gap is supported by a review study, which found that crop management constraints contributed to around 23-29% of the yield gap in sub-Saharan Africa (Waddington et al., 2010). Similarly, a meta-analysis reported an average technical efficiency of 68% in Africa (Ogundari, 2014). Both studies indicate there is ample room to increase yield by addressing knowledge gaps, resulting in a reduction of the technical efficiency yield gap.
Extension services are the main policy instrument to close the technical efficiency yield gap. The core objective of providing extension services is to address the knowledge and information gap of farmers by offering technical education and sharing information on new technologies, use of inputs and the prevention of pests and diseases (Evenson, 2001). Investment in farmer education is another strategy that will contribute to closing the technical efficiency gap.
The existence of the allocative yield gap indicates that farmers choose below profit maximizing input-output combinations, which suggests the existence of poorly functioning (agricultural) markets (see Stiglitz (1989) and Dillon and Barrett (2017) for a discussion of the causes of missing and poorly functioning markets in developing countries). Factors that explain this outcome operate at the demand and supply side and can be categorized as knowledge constraints, financial constraints, risk issues and information asymmetries (Kelly et al., 2003;Poulton et al., 2006).
A number of policy options have been proposed to address these broad constraints. First, apart from contributing to more efficient use of inputs, extension services will also help farmers to overcome knowledge constraints by informing them about input use and new technologies. Second, limited financial services to deal with credit and risk constraints are frequently mentioned as a key source of market failure in African factor markets, resulting in low input use and output (Karlan et al., 2014;Poulton et al., 1998). Policy solutions that have been offered to deal with credit and risk problems in rural areas are support of micro-credit arrangements, weather insurance and mobile banking (Triki and Faye, 2013). Third, the promotion of rural agro-dealer networks can simulate the use of modern inputs by improving the technical knowledge and managerial skills of traders and potentially offering credit and guarantees that solve capital constraints (Kelly et al., 2003). Fourth, support for market information systems that strengthen the public dissemination of input and output prices will help farmers and traders to operate more efficiently and improve the functioning of input and output markets (Aker and Fafchamps, 2015). Finally, 'smart' input subsidy policies, have been proposed as a solution to overcome externalities associated with learning and other risk issues related to input use (Morris et al., 2007).
The economic yield gap captures the economic constraints that prevent farmers from using the (often large) amount of inputs that are needed to reach the (water-limited) potential yield level. It is unlikely that the economic yield gap can be closed completely as this would require very low relative input/output prices which are unrealistic in practice. Jayne et al. (2003) present a detailed study on the marketing costs of fertilizer in Ethiopia, Kenya and Zambia, and showed that costs 2 Note that the size of the technical efficiency yield gap will also be affected by other factors than farmer knowledge and experience. Several authors have pointed out that the farmer's choice of inputs is endogenous to a wide number of factors, in particular weather, which is not perfectly predictable (Amsler and Prokhorov, 2016). Failing to control for endogeneity might result in biased technical efficiency estimates.
can be reduced with 11 to 18% by means of a combination of investments in infrastructure, reducing port fees and addressing uncertainties in government input distribution programs. Similarly, Minten et al. (2013) found that high transportation and transaction costs play an important role in explaining the limited use of improved seeds and chemical fertilizer in Ethiopia. Broader macro-level policies directed at the improvement of (rural road) infrastructure, the streamlining of regulations and trade barriers and better governance will reduce transportation and transaction costs in the economy (Antle, 1983;Jayne et al., 2010) and contribute to closing the economic yield gap.
The main cause of the technology yield gap is (the lack of) access to and availability of appropriate and advanced technologies for smallholders in sub-Saharan Africa (Rosegrant et al., 2014). Closing the technology yield gap demands investment in strategic and applied agricultural R&D to facilitate the diffusion and adoption of advanced technologies, such as the development of new improved varieties that are adapted to local conditions (Evenson, 2001;Pardey et al., 2006). The use of new technologies will increase the response to nitrogen at given inputs and shift the frontier response curve upwards in the direction of the theoretical yield response curve. The high internal rate of return that is often found in agricultural R&D impact assessments confirm the importance of investment in R&D. A recent meta-analysis found a median internal rate of return of 35% for agricultural R&D in sub-Saharan Africa (Pardey et al., 2016). Fig. 2 summarizes the major causes for the various yield gaps as well as the potential policies that contribute to closing them. Although we link the policies only to one yield gap, we emphasize that in practice many of the proposed policies will contribute to closing multiple gaps through second-order effects (e.g. an upward shift of the frontier curve, followed by a shift over the curve or vice versa). For example, investment in rural roads will decrease the costs of fertilizer and decrease relative prices, contributing to closing the economic yield gap. At the same time, better infrastructure will also increase access to financial services and market information, resulting in a smaller allocative yield gap, and facilitate the diffusion of technologies and stimulate farmer-tofarmer knowledge exchange, thereby contributing to closing the technical efficiency gap. The impact of smart subsidies is similar. Subsidies will reduce the risk of using fertilizers and stimulate learning, resulting in closing the allocative yield gap. At the same time, lower fertilizer prices will reduce relative prices and reduce the economic yield gap. The adoption of advanced technologies will also have impact on multiple yield gaps. On the one hand, it will result in higher yields at given input levels thereby closing the technology yield gap. On the other hand, it will lead to higher marginal yield response rates that will increase the profitability of inputs, resulting in intensified production and closing of the economic yield gap.

Methods
In order to decompose the yield gap, we need information on the five yield levels depicted in Fig. 1. Plot-level actual yield and (waterlimited) potential yield can be taken directly from household surveys and crop model output respectively, whereas technical efficient yield, economic yield and feasible yield require the estimation of the frontier yield response curve. There are several methods for estimating this curve, including data envelopment analysis, corrected ordinary least squares and stochastic frontiers maximum likelihood (Coelli et al., 2005). We followed the latter method (Aigner et al., 1977;Meeusen and van Den Broeck, 1977), which involves specification of the production technology and a composite error term that reflects both statistical error in the model and an asymmetric inefficiency term.
where y i is the log of the actual maize yield on plot i, x i is the log of the inputs including, nitrogen, household labor and seed, and β is the input coefficient. The composite error term ε i includes a truncated normal inefficiency term u i and a statistical error term v i where u and v are independent. Given a suitable form for the production technology, the parameters of the stochastic frontier model can be estimated using maximum likelihood estimation. The most common functional forms in production and yield gap analysis are the Cobb-Douglas and translog production function (Henderson et al., 2016;Neumann et al., 2010). The translog is the more flexible of the two and includes squared terms and interactions of the main inputs (e.g. nitrogen rates, seed rates and labor), which are not present in the basic Cobb-Douglass model. Model selection can be done on the basis of a likelihood ratio (LR) test. We included a number of spatially explicit variables W i to control for the impact of differences in agro-climatic conditions, including growing degree days, aridity index, temperate seasonality, soil acidity, soil organic carbon content and slope. We also added a variable for farm size and dummy variables for animal traction (e.g. use of oxen), improved seeds, manure, sole maize plots and nitrogen use. 3 The latter controls for the relatively large number of plots that have zero application rates (Battese, 1997). This results in the following translog frontier yield response function: where β, γ and θ are the coefficients of the K main inputs, their squared terms and interaction effects and the environmental variables, respectively. The technical efficiency term is defined as: which measures the yield y i of plot i relative to the yield of a fully efficient plot that is located on the frontier yield response function yte i , assuming the same combination of inputs and similar environmental conditions (Coelli et al., 2005). We used Eqs. (6) and (7) to estimate the technical efficiency yield gap and the technical efficient yield for each plot.
To estimate the economic optimal nitrogen level and associated economic yield, we needed to find the point where the relative input/ output price is equal to the marginal physical productivity (MPP), i.e. the slope of the frontier yield response curve. Similar to other studies (e.g. Burke et al., 2017;Liverpool-Tasie et al., 2016), we only assessed the yield response to nitrogen and assume the other inputs (e.g. seeds, labor and animal traction) are constant. This is a valid assumption in the short-run when it can be assumed that production factors such as land and assets are fixed, but is less plausible in the long-run when farmers may decide to adjust other inputs (e.g. land and equipment) to maximize profit. For each plot the optimal nitrogen level is found by numerically solving the following equation for x i1 (Jauregui and Sain, 1992): where x i1 is nitrogen input, x ij are the remaining inputs, f is the nitrogen price and m is the maize price. Evaluating the production function at Xe gives the economic yield. We used Eq. (6) to calculate the feasible yield, which represents the maximum yield that can be reached on a plot with available technologies and best-practice management, assuming no economic constraints (e.g. inputs have zero costs). To estimate the feasible yield, we made the following assumptions: (1) nitrogen application rates and planting density are (near) the level that are needed to reach the water-limited potential yield (S10 Table). Planting density is translated into seed rates, using a thousand seed rate of 0.5 kg/ha (MacRobert et al., 2014); (2) all farmers use hybrid seeds and apply organic manure; and (3) animal traction and an additional 50% of household labor is required to facilitate the application of the additional inputs. These values were combined with Eq. (6) to estimate the feasible yield for all plots. Finally, in combination with spatial information on the water-limited potential yield for maize in Ethiopia, the technical efficiency, allocative, economic and technology yield gap were calculated as defined in Fig. 2 (also see S8 Table).

Data
The main data sources for this study are the second and third waves of the Living Standards Measurement Study -Integrated Surveys on Agriculture (LSMS-ISA) Ethiopian Socioeconomic survey. 4 These surveys cover the years 2013-2014 and 2015-2016 and was implemented by the Central Statistical Agency of Ethiopia and supported by the World Bank. The survey recorded key production inputs and outputs at plot, crop and household level, including seed, fertilizer, labor and production rates. The LSMS-ISA also includes a community survey, which, among others, includes information on market prices of key food crops, including maize. Households GPS coordinates were recorded with a small offset making it possible to link households to other data sources. Unfortunately, individual plots were not tracked over time, which makes it impossible to use panel approaches. For this reason we decided to use a pooled data sample.
Over the course of the analysis it became apparent that several essential variables, in particular fertilizer use and crop yield, suffered from serious measurement errors in both waves of the LSMS-ISA survey. We used strict exclusion criteria that resulted in a decrease of observations from 6708 in the raw LSMS-ISA sample to 3824 observations in our final sample. S1 Text provides a detailed explanation on the criteria used to clean the data and summary statistics for the full and cleaned samples.
As part of the LSMS-ISA survey, detailed information was gathered on the harvest of each plot-crop combination and plot size was measured using GPS, making it possible to calculate the maize yield on each plot. Several studies have pointed out that the measurement of crop yield is fraught with difficulties and estimates are particularly sensitive to the definition (Reynolds et al., 2015) and measurement (Carletto et al., 2015) of plot area. For 32% of the plots farmers reported that only a fraction of the area was planted or harvested for maize, with the remainder being used for other crops. Using GPS measured plot area would introduce a downward bias in our measurement of yields. To correct for this we combined the farmers estimated percentage of harvested area with GPS data on plot size to calculate the area that was actually used to grow maize and used this as the denominator in our yield calculations (S2 Text).
To calculate the relative nitrogen-maize prices for each plot, we derived farm gate prices for maize and urea (S3 Text). Although three types of fertilizer are used in Ethiopia, we base our nitrogen prices on urea only, as diammonium phosphate (DAP) and nitrogen-phosphorus-sulfur (NPS) are multinutrient fertilizers and their prices do not adequately reflect the cost of nitrogen (Flynn, 2003).
In addition to a collection of climate and geographical variables that accompany the LSMS-ISA surveys, we augmented the data set with granular data from the Africa Soil Information Service (AfSIS, www. africasoils.net) and the Global Yield Gap Atlas (GYGA, 2019). AfSIS provides soil quality maps for Africa at 250 m spatial resolution and various depths based on thousands of sampling locations (Hengl et al., 2015). We used AfSIS data to derive the soil organic carbon stock and pH for the top 30 cm soil layer. The GYGA uses crop simulation models combined with local-specific observed weather data to estimate (waterlimited) potential yield for a large number of countries (S4 text). The water-limited potential yield acts as a cap on the highest yield levels that can be attained in Ethiopia from an agronomic standpoint. To extrapolate location-specific weather information, the GYGA uses a zonation scheme based on a combination of growing degree days, aridity index and temperature seasonality (Van Wart et al., 2013). We used the same variables to control for climatic effects on yield in our estimations. The two other variables from GYGA, which we used as input in our model are the optimal planting density (plants/ha) and minimum nitrogen requirements (kg/ha) (see also Ten Berge et al., 2019) to reach the water-limited potential yield.
The GYGA presents water-limited potential yield values for 14 climate zones that overlap with the major maize growing areas, accounting for 70% of total maize production in Ethiopia (Fig. 3). As a substantial number of smallholders in the LSMS-ISA survey are located outside these areas, we can only apply the decomposition analysis to 2415 out of the 6708 plot observations in the LSMS-ISA (Fig. 3).

Frontier yield response estimation
We began by estimating a stochastic frontiers model in translog form including all of our environmental and dummy variables. In the full translog form the squared terms of the main inputs as well as several interaction terms were not significant. In order to produce a more parsimonious model, we decided to drop the squared terms (Liu and Myers, 2009). We used a likelihood ratio test to compare the reduced translog function and the full translog function and did not find a significant difference between the two (see S5 Text). Similarly, a likelihood ratio test pointed out that the reduced translog model is preferred over the even more parsimonious Cobb-Douglas model where all the interaction terms are dropped.
The resulting parameters of the stochastic frontiers model are shown in Table 1. Apart from coefficients for seed, labor and their interaction effect, all main input coefficients were significant at the 5% level. The use of nitrogen and improved seeds increased yields while the dummy for fertilizer use had a negative effect. The latter suggests that fertilizer is more likely to be used on plots after the soil has been depleted (Burke et al., 2017). The dummy variables for manure and oxen were not significant. Soil quality also had an impact on productivity. Yields were relatively lower on more acidic soils (pH lower than 5.5) and semineutral (pH between 5.5 and 7.0) plots in comparison to more calcareous ones (pH higher than 7.0), while soil organic carbon content (SOC) had a positive effect. In line with expectations, of the environmental variables, growing degree days (GDD) and the aridity index (AI) were significant and positive, whereas slope and temperature seasonality (TS) had significantly negative effects on maize yield. In line with most of the literature, we found an inverse relationship between farm size (harvested area) and productivity (Eastwood et al., 2010). Finally, the results showed that there was no difference in yield between sole and intercropped plots. Fig. 4a shows the five yield levels that can be distinguished in our framework for the 13 major climate zones where maize is grown in Ethiopia as well as the total average. Average actual yield was between 1.3 and 2.3 t/ha for the period 2013-2016. The technical efficiency of 4 The first wave of the LSMS-ISA did not cover crop harvests for the majority of plots making it unsuitable for yield gap analysis M. van Dijk, et al. Agricultural Systems 183 (2020) 102828 all maize plots was on average 52%, which resulted in a technical efficient yield of 2.4-3.7 t/ha. The economic yield was in the range of 3.8-8.0 t/ha. To achieve the economic optimum yield, nitrogen application rates would have to increase substantially from an average of 7-124 kg/ha to the economic optimal level of 137-262 kg/ha (S9 Table). The difference between the technical efficiency yield and the economic yield points towards the existence of unrealized profit opportunities. The feasible yield level was around 4.9-12.0 t/ha, which indicates that yield could be increased substantially if farmers would be able to use more inputs. A large part of the difference between the economic yield and the feasible yield was caused by different fertilizer input levels. If fertilizers would have been available at no costs, farmers would have had an incentive to increase nitrogen application rates to 127-498 kg/ha, the (minimum) nitrogen requirement to reach the water-limited potential yield (S10 Table). Other factors that explained the difference between the economic yield and the feasible yield are the increase of other inputs (i.e. oxen, labor and hybrid seeds) that were also needed to realize the feasible yield levels. Finally, the water-limited potential yield in Ethiopia was 6.3-18.1 t/ha. The gap with the feasible yield shows that farmers could have increased their yield even further if they would have adopted advanced technologies and farm management practices. The various yield levels can be combined to decompose the total yield gap into the four aforementioned components, depicted in Fig. 4b. Considering all climate zones, we find that the technology gap makes up the largest component of the yield gap (41%), followed by the allocative yield gap (26%), the economic yield gap (19%) and the technical efficiency yield gap (14%). However, a close look at the individual climate zone results suggests a number of different patterns.

Yield levels and yield gap decomposition
The yield gap shares for climate zones with the highest water-limited potential yield (6701, 6801, 6501, 7801, 7501 and 7701) are similar to the average climate zone distribution, while the zones with the lowest yield (7201, 6301, 5801, 7401 and 5501) show a different pattern. 5 In these zones, the technology yield gap component is the smallest (5-12%) and the allocative yield gap (39-62%) by far makes up the largest part. The other two yield gap components are broadly similar in size as those in the other climate zones although the technical efficiency yield gap seems somewhat larger. The size of the allocative yield gap can largely be explained by the relatively higher maize prices, and hence, lower relative prices in these climate zones (S3 Fig. 1). This in turn, points towards larger, although unrealized, opportunities for farmers to increase profits by purchasing more fertilizer. Finally, with a very large technology yield gap share and a very low allocative yield gap share, the pattern for climate zone 5701 is distinct from all the other regions. This climate zone is somewhat of an anomaly as does not belong to the main maize belt areas and contains the lowest number of plots, which nearly all are characterized by nitrogen application rates that are close to the economic optimum level (S9 Table). Fig. 3. Average actual maize yield and water-limited potential maize yield (Yw). Actual yield represents the yield for each survey location averaged over the 2013 and 2015 survey waves using plot size as weight. Water-limited potential yield estimates are only available for climate zones that overlap with major maize growing areas (colored areas). 2415 out of the 6708 LSMS-ISA observations are located in these zones. Climate zone 8601 does not contain any of the LSMS-ISA observations and is therefore excluded from the analysis. Average yield from LSMS-ISA, water-limited potential yield from GYGA (2019).

Simulating closure of the yield gap
The yield gap decomposition can be used for a 'what if' simulation that shows by how much maize production can be increased if the maize yield gap(s) would be closed in all climate zones (Fig. 5). For comparative purposes, we also simulate the full implementation of extension services on maize production. According to the subnational statistics national maize production in Ethiopia is on average 5.1 million tons for the period 2013-2016 (S6 Text). If we suppose that, for example, because of access to more and better extension services, all farmers are able to achieve the technical efficient yield, maize production will increase by 2.7 million tons. The implementation of policies that improve the operation of maize and fertilizer markets, for example because of better access to finance, price information and implementation of smart subsidy programs, has the potential to increase maize production by 4.7 million tons.
We can use our framework to assess the impact if all farmers would implement the recommendations as described in the extension services' guidelines (Ethiopian Institute of Agricultural Research, 2002) regarding seed and fertilizer use. More specifically, these include applying 25 kg/ha of hybrid seeds and 150 kg/ha of DAP and 200 kg/ha of urea. Using the nitrogen contents of these fertilizers, this translates to 119 kg N/ha (0.18 × 150 + 0.46 × 200 = 119 kg N/ha). Adopting the extension services' recommendations results in 5.6 million tons of additional maize production.
If we assume that there are no economic constraints and farmers would be able to use the optimal agronomic level of inputs, maize production could increase by 3.6 million tons. Finally, the technology  yield gap might be closed by investment in agricultural R&D that supports the assimilation and adoption of advanced technologies by farmers in Ethiopia. Such policies have the potential to increase maize production over all 13 climate zones with 8.1 million tons. Closing the entire rainfed yield gap has the potential to expand maize output to around 24.2 million tons, close to five times the present maize production volume.

Policy implications
Our decomposition analysis suggests that the difference between actual and water-limited potential yield is caused by a combination of constraints. It therefore requires a combined and targeted agricultural policy strategy to reduce the yield gaps.
Expansion and improvement of the quality of extension services to farmers is regarded as a key success factor behind the rapid increase in yield over the past decade (Abate et al., 2015). Others, however, have identified a number of constraints that hamper the effectiveness of the extension system in Ethiopia, including lack of resources and limited skills as well as the need for a more farmer-driven and market oriented focus (Davis et al., 2010). Policies that tackle these issues and improve the effectiveness of extension services will contribute to closing the technical efficiency gap. Further, although a thorough evaluation of the extension services system is beyond the scope of this study, we used our analytic framework to simulate the implementation of maize input recommendations of extension services. We found that the adoption of the standard input package would result in closing a large part of the yield gap. Nonetheless, the recommended fertilizer application rate is much higher than what is actually used by farmers. The observation of a large allocative yield gap suggests that a combination of knowledge, financial, risk and information related constraints prevent maize farmers from using more inputs. This implies that without additional policies to tackle these constraints, there will be no incentives for farmers to follow the fertilizer guidelines provided by the extension agents.
The finding of a large allocative yield gap is in line with the existing literature on seed, fertilizer and credit markets in Ethiopia (Alemu and Tripp, 2010;Rashid et al., 2013;Spielman et al., 2010), which indicate problems with the timely delivery, packaging and quality of seed and fertilizer as well as limited availability and high interest rates of rural credit. Most of these issues have been attributed to public sector institutions, which dominate the agricultural input and credit markets in Ethiopia. As suggested by these authors, policies to liberalize seed, fertilizer and credit markets and increase private sector participation are needed to tackle these issues, thereby contributing to closing the allocative yield gap.
Economic constraints are responsible for 19% of the yield gap, which is very close to the 20% gap that is used as reference for the economic yield gap in many yield gap papers (Cassman, 1999;Cassman et al., 2003;Fischer, 2015;. Investment in road infrastructure, supporting national production of fertilizer and reforms to reduce transactions cost have the potential to narrow the economic yield gap in Ethiopia. However, as agricultural production will always be economically constrained, the economic yield gap can never be closed completely. Finally, closing the technology gap requires the diffusion of advanced technologies (e.g. precision agriculture and improved protection of the crop against weeds, pests and diseases) and an increase in the use of improved varieties (e.g. double cob maize and hybrid maize varieties) tailored to Ethiopian agro-ecological conditions. Our sample indicates that only 26% of Ethiopian smallholders use improved seeds (S7 Table 1). To develop the necessary 'absorptive capacity' (Cohen and Levinthal, 1990) to 'assimilate' and 'learn' new technologies, policies should be directed at improving the agricultural system of innovation M. van Dijk, et al. Agricultural Systems 183 (2020) 102828 (World Bank, 2006) and increase investment in agricultural R&D, which is among the lowest compared to other African countries (Beintema and Stads, 2017).

Profitability of fertilizer use in Ethiopia
A recent body of literature has investigated the extent to which small-scale farmers in Africa are using economically optimal levels of nitrogen fertilizers (Burke et al., 2017;Jayne and Rashid, 2013;Koussoubé and Nauges, 2016;Liverpool-Tasie et al., 2016;Sheahan et al., 2013). The standard approach in these studies is to estimate the economic optimal nitrogen level and calculate the value cost ratio (VCR), which is defined as the ratio between the marginal physical product (MPP) and the relative price. A VCR greater than one indicates that it is profitable for farmers to increase the use of nitrogen under perfect market conditions. It is often assumed that the maize revenue must be at 1.5 times the costs (i.e. a VCR of 1.5 or larger) for farmers to overcome the additional costs caused by knowledge, credit, risk and information constraints that characterize African markets (Jayne and Rashid, 2013). As our estimation of the economic yield is directly related to this literature, it is relevant to compare our results with studies that investigate fertilizer profitability in Ethiopia.
We found MPP values of around 5.6-38.6 kg maize/kg N (S9 Table), which are higher than the results presented in Rashid et al. (2013) and Minten et al. (2013), who reported MPP values for maize in the range of 4.1-12.1 kg maize/kg N and 11-12 kg maize/kg N, respectively. The larger values we found are consistent with the use of a frontier yield response curve, which reflects the MPP of best-practice farmers. This type of farmers are likely to show a higher yield response than average farmers, which are used as a reference in the other studies. Combining the MPPs with information on relative nitrogen and maize prices resulted in VCR values of around 1.0-6.8, which are similar to the findings of Rashid et al. (2013), who found a VCR for maize in the range of 1.7-5.3. The profitability analysis confirms our finding of a substantial allocative yield gap in the present sample covering the 2013/2014 and 2015/2016 seasons, which suggest that there were considerable but unrealized opportunities for smallholders to increase profits by applying more fertilizer.

Methodological and data limitations
The analysis in this paper suffers from a number of limitations, which should be taken into account when interpreting the results. First, we discovered serious measurement problems in the LSMS-ISA data, most importantly, (1) measurement errors related to the use of fertilizer and (2) the presence of a large number of outliers, with values exceeding plausible agronomic values of maize cultivation in Ethiopia as reflected in the Global Yield Gap Atlas (GYGA, 2019).
The average maize yield in our final sample (1.8 t/ha) after removing a large number of outliers is notably lower than the 3.3-3.7 t/ ha reported by FAO (2019) for the period 2013-2016. The accumulation of a number of factors is probably responsible for this finding, including differences in sampling (LSMS-ISA focuses on smallholders only, while FAO (2019) represents all farmers), measurement issues and our data cleaning procedure, in particular the exclusion of observations for which values exceed agronomic boundaries. To assess the impact of our sample selection on the yield gap analysis, we conducted a sensitivity analysis in which we used a broader sample based on less strict criteria (see S7 Text). In comparison to the results presented above the allocative yield gap is somewhat smaller but overall results are very similar.
The LSMS-ISA datasets are a key source of information for the study of agriculture, nutrition and health issues in Africa (Christiaensen and Demery, 2018). It is therefore important that the data is reliable and of the best possible quality, notwithstanding the well-known problems that exist with the collection of household survey data (Tasciotti and Wagner, 2017) and the measurement of agricultural production (Reynolds et al., 2015). How is it possible that the agricultural input and output variables of many plots exceed realistic agronomic boundary values? What is the cause of the fertilizer measurement errors? Are the LSMS-ISA surveys for other countries affected by the same issues? More in-depth analysis and validation of the data is needed to answer these questions.
Second, data limitations prevented us to adequately account for the impact of all inputs and costs in the estimation of the frontier yield response function and yield levels, potentially leading to a bias in the yield gap decomposition. We were only able to include dummy variables for the use oxen and the use of manure, which are at best partial proxies for the use of agricultural capital (e.g. animal traction and farm equipment) and inorganic fertilizer, respectively. This is probably also the reason, why both variables were not significant in the estimation of the frontier yield response function. Missing information of important inputs may result in the mismeasurement of technical efficient yield and allocative yield as well as related yield gaps. Due to lack of data, we were also not able to incorporate farm-gate fertilizer transport costs in our profitability analysis. Accounting for these costs would probably result in a lower estimate of the economic optimal fertilizer level (Liverpool-Tasie et al., 2016;Sheahan et al., 2013), a lower allocative yield gap and a higher economic yield gap.
Third, the analysis considered only yield gaps in maize crop production. A large part of agriculture in Sub-Saharan Africa and Ethiopia can be described as mixed crop-farming systems (Thornton and Herrero, 2015). In such systems, crops and livestock are raised on the same farm and interact in various ways. Livestock provides manure to fertilize the field and draft power to work the land while crop residues are used to feed livestock (Herrero et al., 2010). Mixed systems also produce a wider range of outputs, including crops, meat and dairy products. All of this will have an impact on the measurement and decomposition of the yield gap, which is disregarded in our analysis. One way forward to improve the estimation yield gaps in our study would be to adopt the approach of Henderson et al. (2016), who combine multioutput distance functions and stochastic frontier analysis to assess yield gaps in sub-Saharan African mixed farming systems.
Lastly, we used a basic stochastic frontier production framework to estimate the frontier yield response curve. The analysis does not take into account the unobserved heterogeneity between plots and farmers and the endogenous choice of inputs (Shee and Stefanou, 2015) that might bias the estimation of the yield response curve. Recent studies that assess optimal fertilizer use (e.g. Liverpool-Tasie et al., 2016;Burke et al., 2017), apply a combination of panel and instrumental variable approaches to control for these issues. Comparable methods for stochastic frontier analysis are currently under development but not yet readily available for applied research (Amsler and Prokhorov, 2016). In any case, these approaches would be difficult to apply in our case where plots cannot be tracked over time and the panel is not balanced.

Conclusions
In this paper we combine stochastic frontier analysis with agronomic information on (water-limited) potential yield to decompose the maize yield gap in Ethiopia. The results of the analysis are used to identify relevant policies to reduce the yield gap and simulate the potential increase in national maize production if such policies would be successfully implemented and the yield gap could be fully closed. The analysis indicates that the technology gap (i.e. lack of access to advanced technologies) makes up the largest component of the yield gap, followed by the allocative yield gap (i.e. market imperfections), the economic yield gap (i.e. economic constraints to increase input use) and the technical efficiency gap (i.e. inefficiencies in production caused by crop management constraints). Tackling these yield gap components demands an agricultural strategy that involves the combination of various policies, including, improving extension services, investment in road infrastructure, reduction in transaction costs, liberalization of input and credit markets and technology policies.
The research suffers from a number of limitations, which may have affected the results. In particular, we encountered several measurement issues in the World Bank LSMS-ISA household survey, which is also used by many other researchers to conduct agricultural sector research in Africa. A comparison with agronomic potential values from the Global Yield Gap Atlas (GYGA), based on crop model simulations, revealed a large number of unrealistic values related to key maize input and output variables. More research is needed to investigate the origin of these potential errors and, where possible, correct them.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.