Assessing the productivity and profitability of the Solar Market Garden

Abstract Successful scale-up of any development project requires a deep understanding of the real-world economics of the intervention, and compelling evidence that such an investment would be worthwhile. This cost-benefit estimate is typically assessed in two ways: (a) by comparing the coefficient of impact along some margin measured in an impact evaluation ( β ˆ ) to the unit implementation cost of the project, and/or (b) by conducting adoption studies, where autonomous adoption is assumed to indicate that the adopter has deemed the investment worthwhile (i.e., financially sustainable). However, these two techniques can be particularly difficult for development engineering projects that are large at the unit scale (or are group-based) and for projects that may have impacts on many margins or outcomes at once. Here we present the framework for, and analysis from, a field monitoring campaign in the interim evaluation period for community-scale solar-powered irrigation systems (Solar Market Gardens, or SMGs) in northeast Benin, West Africa. We used this interim monitoring to directly construct a CBA, and to document the pathways of impact actually at play for a project hypothesized ex-ante to have potential economic, food security, and gender impacts. We monitored all garden activity a the individual and group level for most of the dry season, including total production, sales, home consumption, input use, marketing, and labor (a key factor often overlooked when considering the cost of agricultural development projects). By combining production and sales data with cost information, we show that the most productive agricultural groups using the system only in the dry season would be profitable in a full cost-recovery model with no economies of scale, but that lower performing groups would not; we also show that many plausible scale-up models and financing mechanisms would be profitable. We then show how this type of monitoring can complement impact evaluation by elucidating different pathways of impact that could be used to understand heterogeneity in outcomes among beneficiaries. We document variance both within and between groups across numerous potential pathways of impact for the SMG; the heterogeneity in intraclass correlation coefficients (ICCs) across these indicators highlights the importance of understanding the causal chain(s), especially for cross-sectoral development engineering projects like the SMG. We conclude by discussing how this monitoring effort fits into the larger evaluation of the SMG, and how such data have been used to both adaptively refine the project, improving the likelihood for successful scale-up.


Introduction
The challenges of scaling up development interventions that have shown promise at a very local scale are well-documented in the recent literature. Although many potential pitfalls may thwart scale-up efforts, at minimum, the decision to undertake scale-up of a development engineering project typically requires convincing evidence that the intervention is financially sustainable for stakeholders. Both potential investors and implementers need to understand the average costs and assumed to represent the proportion of households that find the intervention to be net financially beneficial. (c) Finally, a direct calculation of CBA, and heterogeneity therein, can be estimated with detailed monitoring in-situ. While this third option does not stand in for rigorous evaluation, it can be extremely valuable for development engineering projects that are either of a large unit scale or are group-based, or may have impact may be across many margins. When the unit scale is large, the number of units in a pilot intervention is typically small, so an adoption study is infeasible; this is all the more the case if the intervention is group-based. If the project may have impact on many margins, a pilot evaluation may not be adequately sized or designed to capture various benefits to compare to costs. For such projects, we make the case that focused interim monitoring (i.e., between the baseline and follow-up of an evaluation) can be used to estimate real-world costs and benefits, and heterogeneity therein, for beneficiaries, as well as to get an understanding of the anticipated pathways of impact, such that both the intervention itself and the final evaluation plan can be adjusted for maximum effectiveness.
We use the Solar Market Garden project in Benin, West Africa, as an example to illustrate how interim monitoring can be used to directly estimate CBA among project beneficiaries, understand the most important impact pathways at play, and feed back to help refine the project. In particular, we emphasize how pathways documented in interim monitoring might be used to refine heterogeneity analysis in the full evaluation after follow-up data are collected.

Project context & background
The Solar Market Garden (hereafter SMG) is an agricultural technology and management package designed for women's agricultural groups engaged in hand-watered horticultural production in sub-Saharan Africa. It was designed as a renewable-energy based version of the African Market Garden originally implemented by the International Crops Research Institute of the Semi-Arid Tropics (ICRISAT) in Niamey, Niger and used in projects across West Africa (Woltering et al., 2011b,a;The World Bank, 2008, 2005Burney et al., 2013). The SMG consists of a solar photovoltaic water pumping system, a conventional gravityfed drip irrigation system, and trainings and technical support; it is designed to help individual farmers or farming groups scale up production of nutritious and high-value micronutrient crops in rural regions with pronounced dry seasons and abundant insolation. In the project described here, each SMG is shared by members of farming groups; the system is theoretically cost-competitive with other technologies because the costs of energy and materials for water access and distribution are shared among farmers in the group (typically 30-40), who operate individual plots connected to the pumping and distribution system. Groups can additionally economize on input purchases and marketing costs.
The SMG (designed by the Solar Electric Light Fund, www.self.org) was first tested in the Kalalé district of northern Benin in 2007-2008. Kalalé is a very poor district in a poor country; the median household in the district survives below the global extreme poverty line of (then) $1.25 per person per day. Most of the economic activity in the district (population ∼170,000) is tied to agricultural production, including livestock. The district is located in the northern portion of the Sudanian agroecological zone, at the border of the southern Sahel. The climate is monsoonal, with the rainy season beginning in May and ending in early October. Agricultural production (cotton, corn, sorghum, yams, cassava, groundnuts, soy) is entirely rain-fed, and thus confined to the rainy season. Very little land (with the exception of a small amount of flood-recession agriculture) is irrigated, and access to micronutrient crops is drastically reduced in the dry season. The district is largely food insecure across indicators: total caloric intake is low, diet diversity and access to protein and micronutrients is low, and children suffer from high rates of stunting and wasting. In addition, iron-deficiency anemia among women of child-bearing age and children is among the highest in the country (Food and Agriculture Organization of the United Nations (FAO), 2011; Alaofe et al., 2017).
The SMG consists of a directly-coupled photovoltaic water pump that moves water to a large (∼25 m 3 ) concrete reservoir. The outlet from the concrete reservoir, at around 1 m head, then feeds a low-pressure regulated drip irrigation system. The average garden size 0.5 ha, divided into approximately 40 parallel beds of 120 m 2 each. Although pumps and gardens are sized based on the sustainable recharge rate of the borewell (or sustainable withdrawal rate for a surface source), beyond that constraint, the systems can be passively selfregulating, with pumping power designed to match evapotranspiration (ET) needs. Potential evapotranspiration (PET) obviously varies over the course of the year, but peaks at 7-8 mm in the second half of the dry season. This corresponds to daily needs for the half-hectare plots, when accounting for spaces between beds, of around 30-35 m 3 per day at peak. Details of the 11 SMGs are presented Fig. 1 and Table 1. In all cases, land has been both traditionally allocated by village leaders and legally titled, and the gardens are run by village women's groups.
The SMG feasibility study consisted of installation of 3 systems in two villages (one village had two women's groups). To test the hypothesis that the SMG would positively impact multiple dimensions of food security for project beneficiaries, the SMG collaboration used a before-after measurement design based on matched-pair villages (drawing from the same commune, the two villages closest to the test villages -based on size, demographics, women's group activity, water source, and proximity to markets -were chosen as comparators). Data were collected using detailed household surveys for all women's group members and a random representative sample of non-member households in both treatment and comparison villages. This research design enabled disentanglement of the effects of being in a women's group (irrespective of village), and being in a treatment village (irrespective of participation in the project) from the true project impact. Household surveys collected detailed production, consumption, income, assets, and expenditure data, as well as information on access to services, self-reported health measures, and involvement in local organizations. The household survey was conducted at baseline (November 2007) and after one year (November 2008) for all households. From the data, detailed consumption expenditure (CE) and various food security metrics were constructed; impact was assessed using a differences-indifferences approach. Results from the baseline and follow-up surveys of the pilot phase showed significant improvements in food security for beneficiaries (Burney et al., 2010;Burney and Naylor, 2012;Alaofe et al., 2016). Based on the results of these early studies, funding was secured for an expanded pilot and SMG evaluation for 8 new villages, with groundwork for selection taking place between 2010 and 2012. A full baseline was conducted in 8 new treatment villages and 8 comparison villages in early 2014, just after installation of the SMGs. In this expanded pilot, the selection was randomized among candidate villages that met qualifying thresholds. In addition to collecting the detailed household data described above, baseline and follow-up surveys also included a biometric module. Women ages 18-49 and all children under 5 were measured (height, weight) and tested for anemia (iron deficiency). This survey (including biometrics) was repeated at the end of the dry season in 2015 as a follow-up study, with full analysis of results forthcoming.

Research design and rationale for interim monitoring
Importantly, the initial feasibility study described above pointed to multiple possible impact pathways across the spectrum of economic wellbeing, food and nutrition security, and women's empowerment, for different groups of potential beneficiaries. For example, young children of beneficiary families might stand to benefit most nutritionally from increased and year-round consumption of vegetables grown in the SMGs; older children might additionally benefit from increased income at the household level to pay school fees, or from a large increase in returns to labor that enabled them to go to school instead of working in the fields; women participant farmers might gain economically and in terms of empowerment by controlling a new set of resources, in addition to direct nutritional benefits from consumption. Non beneficiary households in treatment villages might benefit from year-round availability of new products. Each of these domains could be represented by many indicators: for example, food and nutrition security would include.
The detailed household surveys conducted by our collaboration the bread and butter for impact evaluation are key for understanding changes in consumption, well-being, and health at the household and individual levels. In theory, the profits made by women's group members should be reflected in these numbers (that is, the change in consumption, expenditures, savings, assets, etc., should reflect new earnings). However, numerous interim variables for example, direct measure of crop production and yields, the profits made, the crops and quantities that are being consumed at home, how sales and marketing are taking place, how prices are changing around the district over the dry season, and how much time farmers spend engaged in gardenrelated labor are critical to fully understanding the causal pathways of impact and sources of variation across different impact metrics, which often differ dramatically across regions (e.g., Katz (1995)). These data also serve as a valuable cross-check for information collected via house-hold surveys. Most important, they can be used in conjunction with cost data to assess the profitability of the SMG system as a whole, and to refine the business model(s) for implementation elsewhere.
To better understand these short time-scale dynamics, we conducted the study described below from December 2013 through April 2014. The goal was to appropriately and thoroughly sample garden production, sales, and consumption at all gardens over the course of the dry season at all SMG sites. The full picture of garden production can then be used to refine the project economic model for sustainable implementation and scale-up.

Methods
The core idea behind this study was to track all SMG related activity to quantify everything produced in the SMGs and all inputs used to produce it for a representative amount of time, to understand the monetary value of that production and how much was being used for different purposes (home consumption, sales over the back fence, sales in a local or regional market, an in-kind gift, etc.). From such data, we are able to derive yields and understand the relative profitability of different crops in different areas, we can track total volume of production and prices of different products over the course of the dry season, and we can understand the way that the SMGs feed the local network of markets in Kalalé. This is of particular interest because we want to be very aware of any signs of market saturation and falling prices. In addition, we are keenly interested in the variance in performance both between and within garden groups, and the relative engagement of group members in different villages. Ultimately, these data can be wrapped into a more comprehensive understanding of the benefits (profits and value of home consumption) enjoyed by SMG beneficiaries. These benefit data can then be combined with data on the cost side to understand the full spectrum of possibilities for sustainable implementation. Rather than having one data point for benefits and one for costs, we are able to use the full suite of data across gardens to understand high and low bounds, between-and within-variation in different dimensions, and the overall likelihood for profitability. We are also able to explore different subsidy/implementation schemes, from full payback through various levels of support.
To collect these data, we undertook a comprehensive 5-month survey between December 2013 and April 2014, with each garden surveyed on a rolling basis for one week each month. 1 To collect the data from each garden, each group was assigned an internal enumerator, most often a member of the group herself who was able to read and write, who would be responsible for monitoring everyone going in and out of the garden, and all produce leaving the garden, during the survey period each month. Enumerators were equipped with standardized survey sheets and a scale to weigh produce, and were present at the garden from sunrise to sunset. In addition to the internal enumerators, a team of external enumerators was assigned to the gardens (each external enumerator was assigned 2-3 gardens); these individuals helped to oversee data collection, conduct some quality control, and fill in if the internal enumerator needed to step away during the day. Finally, quality control monitors (each one assigned to several gardens) would randomly visit each garden during the survey period for additional oversight and to correct any breaches of protocol.
In addition to internal and external enumerators, quality control monitors visited each garden during each monthly survey period and completed 1 h's worth of parallel data collection to verify that enumerators were following protocol. Quality control visits were conducted randomly, but QC monitors were instructed to visit their assigned sites at different times of day (e.g., lunch hour, Friday prayer time, early morning, closing), again to avoid any systematic oversight problems.
As shown in Fig. 1, the 11 garden sites are spread across the commune of Kalalé. Each site was surveyed for 7 consecutive days each month. To facilitate data collection, and to preserve the anonymity of garden group members (who may be sensitive about economic information being recorded), each garden group member was given a numeric ID tag (with the file containing the names matched to each number kept separately from other data). No one was allowed into the garden without her tag; all activity was recorded associated with the ID number of the person doing the work and/or harvesting the crops. All analysis was conducted with the anonymized data, and no individual ID numbers are presented in this report as an additional safeguard.

Survey and sampling
During the survey months, all produce from a given garden were weighed, assessed (for ripeness, quality), and valued (price per kilogram). Typically, enumerators stationed themselves just inside or outside the garden gate to be able to track the comings and goings. For everything harvested, the enumerator asked the garden group member what she planned to do with the produce (consume it at home, sell, etc.); if she intended to sell it, she was asked where the intended point of sale would be, and who would be doing the selling. In addition, all activity at the garden was tracked -from arrival and departure times to watering to composting to weeding to fertilization or application of pesticides. These data are used to give an estimate of the labor inputs into the garden. Finally, water consumption was also tracked, to better understand the direct connection to the solar-powered pumps, modeled evapotranspiration, and borewell yield. (This full technical water analysis is not presented in this report, although basic findings are summarized.) Gardens were surveyed on a random rolling basis to avoid any systematic seasonal effects (for example, sampling one garden always at the peak of production, with another always surveyed just after planting). The week-long survey period each month also guaranteed that each village was always surveyed over one market day (where presence at the garden would likely be lower), and over each day of the week, again avoiding any systematic calendar effects. Because each garden was surveyed 7 days at a randomly assigned time each month, the production data collected are assumed to be representative, and the average of the entire time period of survey data is assumed to represent 35/151 (total number of days surveyed/total number of days December-April) of production. Values for the full cost-benefit analysis and sustainability analysis sections are scaled accordingly.
To be able to calculate yields, the area planted with each crop needed to be measured. Each month, external enumerators completed a "garden snapshot" form (see Appendix), which allowed them to mark out, in meter-long increments, the areas planted with different crops. These snapshot images were manually entered and merged with the main survey data for each month to be able to calculate approximate yields for each crop. It is worth noting that most of the women's groups planted all beds "in parallel" (i.e., they had similar areas allocated in the same order to the same crops), making the snapshot process easier. However, this was not entirely true for all gardens, and many of the communal plots in each garden were planted in different configurations (e.g., with leftover seeds). The yield data are thus likely not totally accurate. We have, from the snapshots, the possibility of yield analysis at the plot level, where these configuration differences have been accounted for, but that is beyond the scope of the present analysis.

Data cleaning and analysis
Data were collected by hand by the enumerators on the standardized data entry sheets and then scanned to PDF. Data were converted from PDF format to Excel versions of the entry form by a bilingual (French/English) data analyst hired through oDesk. The analyst was briefed on the goals of the project, the structure of the data, and common mistakes that might take place in transcription due to the nature of the data (local names for crops, for example). From the digital spreadsheet files, data were imported to Stata for analysis. Mis-spellings, ID mis-codings, and other mistakes were corrected to the extent possible, if needed by rechecking the original hand-written files.

Summary
Over the 5 months of data collection, our study accumulated 22,190 individual observations (an individual harvesting a particular crop) across the 11 women's groups. For each observation, the individual's unique ID number was recorded, along with the type of crop harvested, the harvested weight, several quality indicators, the intended use of the crop, whether the crop was being harvested for seed production, the value of the harvested crop (either the intended sale price or the hypothetical sale price), the intended seller (the group member or someone else), and the intended sale location.
Broadly, the quality of data collection was outstanding. Of these observations, almost none were suspicious in terms of quality (e.g., an extra '0' that makes a value an obvious outlier). Spelling mistakes and use of local language names for crops were easily corrected during data cleaning. We do not exclude the possibility of measurement error, which is discussed in greater detail below. In addition, we note that the monetary values assigned to harvested crops were not verified. That is, women were asked how much they planned to sell produce for if they were going to sell it at the market or how much they could hypothetically sell it for if they planned to consume it at home. As such, values could be either inflated or deflated. However, as described in detail below, a great number of sales actually happen 'over the back fence' at the garden itself. These prices are not systematically higher or lower (though interesting heterogeneity is discussed below), leading us to believe that the reported prices are believable.
The biggest issue with data collection, entry, and transcription seems to have been in copying of the ID number incorrectly (either on site or when entering data). Most of these errors are correctable through process of elimination (e.g., the closest number that actually exists in the women's group) or by checking the paper record again. Regardless, the loss of this 2.1% of ID numbers does not affect the aggregate analysis, as all price and sales data were still recorded. These missing identifiers do affect the analysis for within-group variation, as it is impossible to attribute the production sampled in these observations to a particular individual. The implications of these missing identifiers for conclusions drawn about group performance are discussed in greater detail below.

Total production
Over the 5 survey months, the 11 operating SMGs produced 27.7 metric tons of produce for both consumption in the home and sales in local and district markets. The total value of production was 20 million FCFA, or around $40,000 at an exchange rate of $1USD = 500 FCFA (the value used for these analyses). Across SMGs, approximately 429 individuals (mostly women, but a few men included) were directly involved in vegetables production (i.e., they were assigned some area within a given SMG). In the data, we see around 400 individual IDs represented. Although there is tremendous variation in performance and profit within groups (and across all individuals), this total production is an equivalent of $100 per person (the scaled value for the entire dry season is thus $140).
As shown in Fig. 2, this production varied substantially by garden, although the overall size of gardens does not vary as much (see the discussions of yields and water use below). The average production is 10.8 tons over the five months, and the average value of production (in highly localized prices) is 1.82 Million FCFA, or approximately $3640 USD. As seen in Fig. 2, total production and total profit are not entirely correlated, indicating significant dispersion in pricing across Kalalé, even when accounting for the fraction consumed at home. Angaredebou is, by a significant margin, the garden reaping the highest profits, while Bessassi 1 has the highest reported production during the survey. For both metrics, Dunkassa, Kidaroukperou, and Basso are the weakest performers. It is worth noting that these relative rankings run counter to the instincts of the project team. That is, by all other accounts, these three villages at the bottom (and in particular Dunkassa and Kidaroukperou) are extremely well organized, well trained, and appear to be highly productive. Some of this discrepancy may be explained by relative abundances of water in different villages (and thus differences in irrigable areas), which is discussed in greater detail below.
In addition, there is tremendous variation in production and value across crops. The top crops are dossi, amaranth, tomato, lettuce, okra, cabbage, eggplant, moringa, papaya, wario, hot pepper, carrot, onion, cucumber, and assorted other greens. However, in addition, garden members produced a wide array of other smaller local crops, as shown in Fig. 2. Again, it is worth noting that the production and value rankings by crop are not perfectly correlated, as some crops are more highly valued. In particular, over this dry season, certain gardens were able to fetch much higher relative prices for dossi (a popular green). This highlights the importance of cropping calendar planning at the garden level.

Crop use
The SMG production goes to a number of uses. Most of the produce (around 75%) is sold, either at the garden itself to consumers who know they can buy on location, or at markets in the greater Kalalé region. The next biggest share of the produce is consumed at home (an average of 16.7% across villages). Another roughly 4% is donated, often to the local elementary school (or someone preparing food for the students) or given as gifts to other families. A small amount is thrown out or used as feed. Again, there is wide variation across villages, as shown in Fig. 3. There was also some variation over time in crop use, particularly for Angaradebou. The group consumed a much higher percentage of produce (pulling up the average for the entire sample) in December, but then began selling more as the season progressed. This could be indicative of two non-mutually-exclusive phenomena: deep food insecurity and a lack of marketing skills at the outset for women in a very  Table 4. In addition to consumption and sales of crops, we see a tremendous variation in seed production across gardens. Fig. 3 shows total production and value of production across gardens and across crops. It should be pointed out that the weight is not the weight of the seeds themselves, but rather of the full biomass (e.g., amaranth stalk or tomato) from which the seeds would then be harvested. The values are the values that the group member estimated the seeds could be sold for (estimating the amount of seeds to be recovered from the plants by eye alone). As such, there is some uncertainty associated with the values. Neverleless, the profitability of seed production is something that a number of farmers, particularly in Kidaroukperou, have seized upon.

Crop quality
Enumerators were given a reference sheet (see Appendix) with numerous quality descriptions to code harvested produce. The vast majority of harvested produce was described as "robust" and "ripe" (defined as picked at the appropriate time) and this was true across gardens and months. Less than 1% of harvested products were coded as not consumable or only partially consumable. However around 6% of total produce was coded as having some sort of visible defect (yellowed, perforated, dried, etc.) that would not render the product inedible but could negatively affect marketing. The number of these 'blemish' codings varies dramatically by garden, but it seems that a lot of this variation may be due to enumerator effects, so it is hard to draw any conclusions from the combination of relatively few 'bad' codings and betweenenumerator differences in quality judgment.

Productivity and returns to land
To assess overall crop productivity (yield, or weight of production per unit area of land) and returns to land (here calculated simply as value of production per unit area of land), we merged daily garden data with the garden snapshot data (see Appendix) to understand the area base for harvested crops. The garden snapshots indicated what was currently planted in each garden during the month, with an assessment of what phase of growth the plants were in. While plants may have been at different maturity points in different gardens at different points in time, the sampling strategy was designed to catch each garden at different phases of campaigns so that such errors would be random.
The infrequent nature of the garden snapshots resulted in some missing yield and returns to land data due asynchronous measurements: what was recorded at the time of the snapshot might not have accurately represented what was in the garden at a different time of the month (for certain gardens in certain months). Moreover, several of the high-value crops (moringa, papaya) are trees that have been planted at the ends of crop beds and thus have effectively 'zero' footprint in the irrigated area of the gardens, and thus nonsensical yield or returns to land values. This is not problematic in that the main goal of the per unit area data is to compare the gardens on equal area bases, and compare productivity and profitability of different crops that use substantial areas in the garden. (Though it is worth noting that none of this analysis takes into account the nutritional value of the crops, which may not correspond with market prices. The same is true for cultural values of different crops.) As discussed in greater detail below, the gardens do not all have equivalent water access, and so the surface areas have been modified to align more closely with water availability in a given location. Table 2 shows the per-area production and value of production for the 5-month period of the survey.

Prices and marketing
Average unitary prices (FCFA/kg) by garden are shown in Table 3. On average, the irrigated fruits and vegetables are worth 332 FCFA/kg, or about $0.66/kg at the time of the survey. These values (and returns to land) are 2-3 times the values for staples crop production (sorghum and corn), and similar to yams. Across villages, what immediately stands out is that the most remote villages (Angaradebou,  Gbessakperou, and Peonga) generally fetch the highest average prices. As discussed further below, their marketing strategies are quite different, and so this may simply indicate the deeper lack of micronutrient crops in more rural areas. Certain crops also routinely (across gardens and types) fetch higher prices than others. The list of all crops grown (surveyed), with the number of survey observations over the 5-month study, is shown in Table 4. Among the crops with fairly high production, the traditional greensdossi, crin-crin, war, In addition, hot peppers, okra, carrots, and onions stand out among the non-leafy vegetables. That these crops traditionally produced locally (with the exception of carrots, which have been very successful) are popular is not surprising, but the prices fetched are. This indicates tremendous potential for gardens without necessarily focusing on new varietals, but rather simply maximizing production of already-accepted crops. Also notable are the prices for seeds produced by garden members. Carrot and Moringa -two new crops introduced by this project -are in very high demand by other smallholders, and the farmers able to produce and sell seeds did so at high prices. (Note that the per kg unit is likely not perfectly well-measured; the weight measure here often refers to the mass of the plant that would then have seeds harvested from it. These prices are thus very low bounds on the true seed prices, but we do not have a good measure on a per-gram cost of actual seeds.) Prices show substantial variation across crop and across garden group. Average price per kilogram of major garden products (excluding seeds) is shown for the major crops across gardens in Table 4. In addition, Fig. S2 shows variation in price per kg of key crops across gardens, averaged over study period. The horizontal line for each box gives the median price, the box represents the inner quartile range (25th to 75th percentile), and the whiskers give the inner 95% range. Outliers have been removed from the plot (but not calculations); they are likely the product of smaller quantities with a price 'floor' being extrapolated. That is, one could not sell a small quantity of vegetables for less than a certain amount (e.g., 25 FCFA), so when that gets extrapolated to a per kilogram basis, the unitary price appears very large. There is some vari- ation across gardens for given products, which is likely an indication of both distinctive and very local preferences (e.g., dossi v. amaranth) and evidence of the utter lack of market connectivity in the region. As discussed below, women's group members (either as individuals or in groups) do not appear to be arbitraging (or attempting to arbitrage) price differentials in different markets.

Seasonality
There is substantial variation in production and prices over time. Figs. S3 through S5 show district-wide changes in total production and average price over the course of the dry season. We see effectively no correlation between prices and production; that is, there is no evidence at this point that oversupply or saturation of any kind is driving prices down. This does not deny the possibility, however, that women are unable to make sales as anticipated (i.e., at the price recorded) and end up selling for less, or not selling at all.
There are no strong trends across gardens in terms of consumption in the home, with the exception of Angaradebou, which shows a strong decline in the amount of food consumed in the home over the course of the dry season. It remains unclear whether the strong consumption rate at the beginning is reflective of deep initial food insecurity, a lack of marketing skills and strategy, or both. A number of gardens do show improving trends (a combination of production and marketing) over the course of the dry season: both Bessassi gardens, as well as Derassi, Dunkassa, and Peonga, all exhibit significant upward trends in production value.

Between-and within-group variation
As shown in Fig. 2, there is substantial variation across gardens in overall production and value of production, though these figures may be somewhat misleading given the different garden sizes. The distribution of value of production is shown in Fig. S6, along with the distribution of returns to land. Of interest is the shape of the distribution for returns to land, in that the gardens are divided into higher and lower performers. Nevertheless the variance is lower (as makes sense), indicating that value of production is scaling with garden size.
Within groups there is also significant variation, as shown in Fig. S7, and the between-group differences are still visible. The low peak for Basso is due to the fact that the group is much larger and most women have half-plots. Taking this into consideration, at an individual level the women of Basso look much more like the average villages. Some of the variance here is due to low sampling of many individuals -some were only surveyed 1-2 times during the entire period; others hundreds. More work is required to understand this variance in participation (or sampling), as under-sampled women's plots do not indicate neglect. It may be that within groups individuals have negotiated arrangements to lend or effectively give their plots to other group members.

Water use
As seen in Table 1, the different gardens vary tremendously in their natural water allotments (or the natural recharge rates of the borewells). Each site has a reservoir with a functional storage volume of around 25 m 3 (the volume of water above the outlet in the 1.8 m radius reservoir. This amount of water is equivalent to 5 mm on half a hectare. The irrigated area of the gardens is actually smaller due to spacing between beds and walkways. And in practice, the groups turn on the pump in the morning and let it operate all day; they open the valve to the irrigation system when the water level reaches an upper limit in the reservoir. The pump continues to operate while the valve is open; the women then apply roughly half of the water to the upper and lower halves of the garden (each garden has effectively two separate drip systems).
Ideally, each garden should get 30-35 cubic meters of water on the hottest, driest days (7-8 mm equivalent Potential Evapotranspiration). However, this is not possible in all locations; the hydrology of northern Benin is extremely heterogeneous. This is most obvious in April, the driest part of the dry season, and the time when recharge rates are slowest in the region. As shown in the table, the water use from sampling during April varied tremendously, as measured by counters on the pumps, with some gardens pumping barely more than half of the ideal. The irrigated areas have been adjusted for most gardens to more closely align with water availability but there is still wide variation on the effective application rate (mm/day).
This difference in water availability nevertheless does not seem to have any significant relationship with yields or returns to land, indicating that other factors explain these differences (management, marketing, etc.). Fig. S8 shows no relationship between yields water availability. The relative lack of influence of water availability on these output metrics may be a function of the fact that many of the crops grown in the highest quantities and sold for the highest prices are local varietals that may have lower water requirements than the fairly generic evapotranspiration calculations; it may also indicate a significant yield gap that could be closed in the future.

Labor
Accounting for labor is of critical importance for development engineering, as many technologies designed for developing communities in effect assume infinite or costless labor supply (Feder et al., 1985;Lee, 2005;Foster and Rosenzweig, 2010). Over the course of the study period we tracked entry and exit of each group member, and the tasks they undertook while present. There is a large dispersion in how often members of the different village garden groups come to their respective gardens over the course of the study period: many individuals were sampled only a handful of times, while others were sampled more than 100 (meaning they came to the gardens nearly 3× per day, daily). The average over all groups was 13.6 times over the study period (35 days total per garden), with each visit lasting an average of 99 min, again with significant variation within and between groups. Table 5 shows the distributions of labor, in terms of number of visits, time spent at the garden, and number of helpers (usually children) accompanying the group member. Table S1 gives an idea of the types of tasks members undertook at the gardens, although it is important to note that the survey period began after preparations and first planting and ended before the end of the harvest and rainy season activities. The distribution of tasks is not representative of the entire dry season or the entire year; it is, however, representative of peak dry season activity. This is important as many agricultural development technologies seek to extend the production season or help kickstart hungry season economic activity.

Marketing
The upper panel of Fig. 4 shows the breakdown of sales locations by garden. There are two main distinctions here relevant for understanding economics of the project -in the garden versus at an established market, and in the local (same village) market versus one further afield. The majority of all transactions for all but one (Gbessakperou) of the women's groups take place in the garden, with strong variation across groups. There is some dispersion between prices fetched at the garden and prices at market, on average, as shown in the lower panel of Fig. 4. Nevertheless, these differences are not substantial or systematic, indicating that, at least at present, selling at the garden is not hurting the farmers in terms of fetching good prices.
Almost all sales that take place in a market take place within the same village, or the next closest. However, there is tremendous variation in marketing between groups. The farmers from Derassi, Dunkassa, and Kourel sold products exclusively within their village markets (not a single sale elsewhere). Farmers from a few other groups made all sales locally with the exception of a few sales (Peonga: Boa (3) and Gbessakperou (3), Kalale: Parakou (5), Gbessakperou: Boro (2), Basso: Kalale (8) and Neganzi (1)). A few groups split their sales between markets: farmers from Bessassi sell locally, in Kalale, and occasionally in Danganzi and Basso; farmers from Kidaroukperou sell locally and in Kalale, with one excursion to Gando Baka. The exception to this trend by far is the group from Angaradebou, who sold produce in all of the following markets: Angaradebou, Badaria, Boa, Bouca, Derassi, Djega I, Dunkassa, Gando Baka, Kakatinin, Korodji, Matcher, Nikki, and Peonga. Of all of the sale data, only 21 sales were made in Nikki, Parakou, and Zambara -larger markets that might have better pricing. These data, however, indicate that the women who made these sales took fairly large quantities and expected to make several thousand FCFA. Women who made sales in Bouca -the largest village in Kalale, with no SMG -also made high overall sales. These data are too sparse to draw strong conclusions but do point to the role of more extensive marketing in future SMG success.

Economic sustainability analysis
To better understand the scenarios for sustainable implementation of the SMG, we incorporate the survey data above, along with information about installation costs and group financial management, into profitability and investment analyses.

Cost-benefit analysis
The SMG project in Kalalé was structured to have all of the capital outlays and 2 years of technical assistance granted to recipient villages, but to have agricultural groups be self-sufficient (and pay for all costs) thereafter. This is one model, but a key goal of the SMG project is to develop a business model (or suite of business models) that could be implemented sustainably in different financing scenarios. As a baseline for this analysis, we used the costs from the expansion of the SMG project to 8 new villages, along with input and maintenance costs from the first campaigns of these gardens, to conduct financial and investment analyses. The cost breakdowns are showing in Table 6, and the full detailed worksheet is included as Supporting Information.
The cost distributions assume two full production campaigns, and full cost coverage for all parts of the system at conservative lifetimes. (This could thus be construed either as women's agricultural groups paying back the cost of equipment over the lifetime of the equipment, or a model in which the initial capital outlay is donated but the groups save for autonomous replacement on timelines for each component of the system.) Costs for annual production campaigns include inputs and agricultural and solar technician support at 100% the level of the SMG project in Benin. The full breakdown of annual (two campaign) pergarden costs is: inputs (577,500 FCFA), technical support and maintenance (1,775,000 FCFA), and amortization of equipment (1,256,950 FCFA), or a total of 3,609,450 FCFA per year.
To harmonize the profits period with the cost period (two full campaigns is longer than the survey period for this study by around two months, as noted above), we scale total value of production and sales by 7/5. (It is possible to, and some gardens do, operate the SMG for a third campaign, depending on the other agricultural activities of the farmers, but we aim here for the most conservative estimate possible, Table 6 Solar Market Garden Costs, derived from 2013 SMG installation in 8 villages in Kalalé, Benin. Table represents maximal cost structure; percentages paid by project were varied to produce different cost estimates presented in Fig. 5. Core assumptions here are a 30-week production season (inclusive of preparation), 40 women or farmers per group/garden, and an exchange rate of 500 FCFA/USD (representative of 2013). We base costs on imported PV systems and drip irrigation components, and thus include transport and clearing for these items.  5. Benefits (total value of production and sales) across villages (bars), along with costs (red lines) for different financing schemes. The uppermost line indicates a scenario in which the entire system is paid for (or a replacement is saved for) according to lifetimes of individual components, and women's groups pay for all technician maintenance, training, and inputs. The lowermost line is a scenario in which the entire system is paid for (or a replacement is saved for) according to lifetimes of individual components, and women's groups pay for all inputs, but no technician salaries. The middle line is a scenario in which groups pay for 50% of equipment and technician salaries for maintenance and training, and all inputs. Scenarios are presented so as to be conservative: equipment lifetimes for large items besides PV modules are at the low end (i.e., components have already lasted longer in pilot villages), production is assumed to stay at year 1 levels (i.e., no learning curve), costs assume seeds are always purchased instead of produced, and production is assumed to be 2 campaigns per year (i.e., only dry-season production). Investment analysis using Angaradebou, and assuming very modest improvements in production and low levels of continued technical support results in a total payback time of 5 years and an internal rate of return of 25%. Using a rate of 10% (the value typically used by the World Bank), the system has a net present value (NPV) of TnqDollar25,836 over 20 years. A worksheet of costs, input details, and investment analysis is included as Supporting Information.
with highest costs and lowest profits.) These values are shown in Fig. 5. The annual cost for the full repayment scheme as outlined above is indicated by the uppermost red line. In this scenario only the two most productive gardens have reached a level of production that exceeds the annual costs, but that includes the value of home consumption. (This would nevertheless be relevant, for example, for a commercial farmer who wanted to purchase or finance a system.) None of the current SMG gardens meets that value on sales alone.
The main cost driver is not the PV pumping equipment (since most of it is rather long-lived, shared among women, etc.), but rather the technician salaries for maintenance and training, and -to a lesser extent -annual inputs. In a scenario in which women's groups pay for all equipment and annual inputs, but no technician salaries, the average garden is profitable (if just barely) on sales alone. (This assumes that farmers would still consume the same percentage under this scheme, which may not be true.) The most profitable garden (Angaradebou) exceeds the value on sales alone by 1,400,000 FCFA. In a garden group of 35 women, that would be equivalent to 40,000 FCFA profits per year in addition to the produce consumed at home, or about $80/year at the time of the survey. The middle line in Fig. 5 represents a scenario in which groups pay for half of equipment costs and technician salaries, as well as 100% of annual inputs. In this scenario, the average garden is right near the break-even point on sales.
Although for clarity we do not show many in Fig. 5, many very realistic scenarios look very promising. As one example, if garden groups reduce input costs by half (e.g., by producing their own seeds, as the existing SMGs have done), extend lifetimes of drip irrigation kits to 8 years (the actual lifetime of the pilot drip kits installed in 2007), are able to purchase all equipment in-country (no transport and shipping costs; this is now possible), and share agricultural assistance and technical assistance costs (like a government or NGO-style extension program, as opposed to fully paid by only SMG groups), the annualized cost of the SMG is reduced to 1,370,350 FCFA, or $2741. All SMG groups surpass this amount, with no assumptions about improving productivity, etc. For a garden group like Angaradebou, this would amount to $114 per person in net profits from sales, or $180 in total value of production. Adding a third campaign would raise profits to $200 per person (sales alone).
For longer-run investment analysis, we assume that gardens may have a learning curve, and improve their production and profitability over time, through higher yields, creation of more deliberate crop calendars, better marketing, etc. Keeping all non-assistance costs the same, and keeping the same conservative assumptions about lifetimes, but allowing for a modest increase in profits (profits slowly ramp up over 7 years to 125% of year 1) and that only basic maintenance and extension exists past year 3 (the SMG Benin model), we calculate an internal rate of return of 25% (over 20 years). This is in line with other estimates for the region (You et al., 2011) and more than twice the typical standard for World Bank investments. At a 10% discount rate, the net present value of the SMG is $26,152 (again over 20 years). The SMG cost workbook included as Supplemental Information allows for exploration of alternative scenarios, financing mechanisms, etc.

Scale-up analysis
When scaling up a project, and particularly when designing such a scale-up to be evaluable, an estimate of the anticipated treatment effect is necessary to properly power the study. Credible estimates of treatment effects in projects like the SMG project (either renewable energy projects or agricultural technology projects) are often sorely lacking, both because there are multiple plausible pathways of impact with presumably different strengths, and because such projects are often by design clustered -whether around a technology or a geographic unit -and the relevant intraclass correlation coefficients (ICCs) are often not known. The ICC is a critical design parameter for a properly-powered evaluation of a clustered intervention: the effective sample size of a clustered randomized controlled trial is somewhere between J, the number of clusters, and N, the total number of individuals (where N = n * J and n is the average number of individuals in a cluster). If the variance between clusters, 2 b , (here, village garden groups) is small compared to the variance within clusters, 2 w , (i.e., clusters are very similar to each other), then the effective sample size is closer to N. But if the opposite is true and the between-cluster variance is not small relative to the within-cluster variance (i.e., the clusters are quite different), the effective sample size is closer to J, and many more clusters are required to adequately power the study. This intuition behind the Effective Sample Size (ESS) can be written in terms of the ICC ( ) and the Design Effect (DE): (1) Table 7 shows the ICCs for different quantities measured in this study. They range from 0.06 to 0.26, meaning that the effective sample size for different metrics would be reduced by anywhere from a factor of 3.3 to a factor of 11.1 (assuming 40 group members per SMG). The most critical result is the fact that ICCs are not identical for these metrics, across the same set of individuals and SMG gardens/villages. This is particularly important for development engineering projects or technologies like the SMG because there are often numerous pathways that might be activated through access to or purchase of such a system of a system. The SMG Benin project is aimed at improving food security, but there are many dimensions to food security: individuals and households might benefit from consumption of their own production, they might benefit from sales of products that allow them to make other purchases, they might benefit from improved returns to land, labor, or water or overall labor savings. Broader effects might result from increased quantities of high-value nutritious crops available in local markets. Household agriculture and food security interventions have been designed around one or more of these pathways without actually understanding the relative importance of each, or the relevant design effects. This study provides critical results that can be used to more credibly design such programs, or to create financial services for private purchasers, with a better knowledge of the correlation structure across numerous metrics.

Conclusions
Here we present the results of an interim evaluation of the Solar Market Garden project in Kalalé, Benin. The detailed nature of the SMG survey provides important information on the dynamics of the SMG functionality and the pathways of impact that would not be captured in a standard experimental or quasi-experimental before-/afterresearch design. We show that, in conservative cost-benefit analyses, the most productive SMGs are profitable and investment analysis shows that the SMG exceeds standard World Bank criteria. However, there is significant variation in performance both within and between SMG gardens/villages; a key task for the full evaluation of the SMG project will be to try to understand the drivers and implications of this variance. This work also presents a set of methods and tools that could easily be adapted for other agricultural or rural energy projects. Although there are some important limitations to our study (for example, we did not capture the full year of activity but rather only the dry season; the external validity of the ICCs we measure is unknown), it is nevertheless represents an important step forward in better project evaluation and understanding the interim dynamics -the how and why -of whether development engineering projects succeed or not.