Rethinking monitoring in smallholder carbon payments for ecosystem service schemes : devolve monitoring , understand accuracy and identify co-benefits

Monitoring is a key aspect of payments for ecosystem services (PES) schemes, providing a basis for payments. PES monitoring however presents challenges, including in balancing technical accuracy with cost, local equity and legitimacy. This is particularly true in smallholder carbon PES, where managers have limited resources and capacity. Here we explore ways to improve monitoring in smallholder projects. We looked at two well-established projects in Uganda and Mexico, and appraised five monitoring methodologies: two remote sensing and three field measurement approaches. Each methodology varied in data resolution, methodological complexity and degree of local participation. We collected quantitative and qualitative information on four aspects of performance: accuracy; costs; local equity; and local legitimacy. We show that methodologies with greater data resolution and local participation performed better in all four aspects, while greater methodological complexity was not associated with significantly improved performance. We conclude that monitoring in smallholder and other types of PES may be improved through: 1) devolving analyses to the local level; 2) communicating to stakeholders a distinction between ‘applied’ and ‘scientific’ accuracy; and 3) documenting and communicating the diverse functions of monitoring, referred to here as co-benefits – a contrast to simple ‘monitor and pay’ conceptions of PES.


Introduction
Payments for Ecosystem Service (PES) schemes are increasingly advocated (Milder et al., 2010), although questions remain about their conceptual validity (Kosoy and Corbera, 2010;Kronenberg and Hubacek, 2013) and technical feasibility (Guerry et al., 2015;Naeem et al., 2015). PES are conceptualised as payments to providers of ecosystem services, conditional on delivery of an ecosystem service, often resulting from maintaining a particular land use . Monitoring ensures conditionality, with providers only paid when they satisfy contractual land use conditions (Corbera et al., 2007;Sommerville et al., 2009;Fisher, 2013). Early conceptions of PES present it as a pure economic incentive focused solely on the technical monitoring of ecosystem service delivery to trigger payments (Ferraro and Kiss, 2002;Ferraro, 2011;Benson and Jafry, 2013). However it is argued that, in practice, managing trade-offs in monitoring, and optimising monitoring to be accurate, costefficient and locally effective remains a key challenge (Fisher, 2013;Meijaard et al., 2014;Naeem et al., 2015). Through this paper we contribute to a more nuanced appreciation of these terms and consider options through which they may be promoted.
One of the origins of this challenge is the need for reductionist approaches to monitoring in complex landscapes. Smallholder carbon PES (SCPES), where farmers plant trees to sequester carbon, exemplifies this challenge, given the need for such schemes to deal with diverse smallholders in diverse landscapes. SCPES projects deploy different remote sensing, activity-based and field ecology measurement methods to monitor impacts of land use, although links between land management and ecosystem service provision are often uncertain (Ascough et al., 2008;Fisher et al., 2009;Meijaard et al., 2014). Attempting to overcome this uncertainty, monitoring often becomes complex (i.e. dependent on complex technologies and technical expertise) and costly (Baker et al., 2010;Meijaard et al., 2014), so becoming geared towards an external technical audience, and less comprehensible to local actors (Peskett et al., 2011;Fisher, 2013;Leach and Scoones, 2013;Lovell, 2015).
This drift towards complex monitoring may create trade-offs between accuracy, costs, equity and legitimacy. The aforementioned compromise between technical complexity and local transparency is one example of a broader trade-off between perceived accuracy on the one hand, and local equity in distributional outcomes (i.e. how monitoring affects participating smallholder income) (Brown, 2003) and legitimacy in decision making (i.e. whether monitoring decisions are perceived as fair and acceptable to smallholders) on the other (Adger et al., 2003). Similarly, more complex monitoring may increase costs, reducing SCPES revenues to existing providers and lowering incentives for potential new providers Berry and Ryan, 2013). Yet, conversely, if simpler monitoring is less precise, this may also lower revenues: due to the principle of conservativeness, less precise monitoring will intentionally underestimate the provision of ecosystem services to ensure that certified services are not false (Hamburg, 2000), in turn reducing services available for sale (Berry and Ryan, 2013;Watson et al., 2013). Understanding how to manage trade-offs in local legitimacy, local equity, cost and accuracy is thus integral to improving the success of PES schemes. This is particularly true for smallholder and community carbon projects in the tropics, which have very limited resources and capacity, and the growing number of REDD+ projects searching for robust and cost-effective monitoring (Chhatre et al., 2012;Torres and Skutsch, 2015;Bayrak and Marafa, 2016).
The available literature on smallholder and community forestry, and ecological monitoring, provides some preliminary insights on managing trade-offs between accuracy, costs, equity and legitimacy. For example, a wide literature suggests that local participation does not itself guarantee local equity in carbon and other types of PES schemes, with equity in outcomes also being heavily dependent on local context, as well as individual financial, human, natural, physical and social capital (Brown, 2003;Corbera and Brown, 2008;Peskett et al., 2011;Fisher, 2013;Pascual et al., 2014;Calvet-Mir et al., 2015;Hendrickson and Corbera, 2015;Kariuki and Birner, 2016). On monitoring specifically, local participation in monitoring can change or perpetuate existing land and resource access arrangements, and so have varied (positive or negative) impacts on local equity, justice and social change (Van Rijsoort and Jinfeng, 2005;Petheram and Campbell, 2010;Osborne, 2011;Funder et al., 2013;Hendrickson and Corbera, 2015). For example, Staddon et al. (2014Staddon et al. ( , 2015 argue that, even in participatory community monitoring, external 'scientific' approaches dominate and local elites can continue to benefit disproportionately. Another potential trade-off with regards to costs and accuracy is that there are divergent views on whether increased methodological complexity and cost (to both farmers and intermediaries) should necessarily result in more robust monitoring. The large literature on particular PESrelated methodologies (Brown, 2002;Wollenberg et al., 2012;Geijzendorffer and Roche, 2013;Porras et al., 2013;de Araujo Barbosa et al., 2015;Bustamante et al., 2016) generally assumes that more complex monitoring will be more accurate (e.g. see the 'Tier' approach in IPCC, 2006). This issue is illustrated by Baker et al. (2010), Cacho and Lipper (2006) and Meijaard et al. (2014) all of whom point to the problem of the complexity and cost of technology and expertise in carbon PES. Studies have begun to question whether the relationship between complexity and accuracy is linear by showing that the relationship does not hold within methods such as field measurement (Danielsen et al., 2008;Danielsen et al., 2013;Brofeldt et al., 2014) and remote sensing (Hill et al., 2013;Mitchard et al., 2014). Additionally, field tests have shown that field ecology measurements (one step in overall monitoring) by local community members can be less costly to projects (in terms of labour) than, and similarly accurate to, those taken by technical intermediaries (Holck, 2007;Danielsen et al., 2008;Brofeldt et al., 2014). The literature also therefore suggests potential to manage trade-offs in PES monitoring through rationalising methodological complexity, costs to smallholders and intermediaries, and perceptions of accuracy.
Another example of a trade-off stems from that fact that perceptions, expectations, assumptions and methods of monitoring vary depending on who demands the monitoring (Meijaard et al., 2014), which may in turn affect how trade-offs in PES monitoring should be managed. For example, carbon PES monitoring is generally an upwardly accountable process, targeted towards a technical audience, and subsequently buyers (Fisher, 2013). Yet, as discussed above, there is apparent disagreement amongst stakeholders on what represents robust or fair monitoring. Additionally, carbon PES is increasingly claimed to be associated with a range of environmental and social 'co-benefits', where other outcomes (in addition to carbon sequestration) are targeted and achieved through a single carbon project (Anderson and Zerriffi, 2012). This may lead to local stakeholders perceiving a project and its benefits differently to external stakeholders. Understanding how monitoring is perceived by different stakeholders (Table 2), and addressing any apparent misconceptions, is thus also integral to achieving accuracy, local equity and legitimacy in PES monitoring.
Our aim is therefore to examine the accuracy, cost, equity and legitimacy performance of five monitoring methodologies of varying complexity (Table 1) used to measure carbon sequestration in smallholder forestry interventions, and the perceptions of these methodologies amongst four key actors: smallholders, local intermediaries, technical experts, and buyers (Table 2). We draw lessons for PES monitoring from two case studies of agroforestry SCPES projects in Uganda and Mexico, which have sold certified carbon offsets for the voluntary carbon market since 2003 and 1997 respectively. The two projects provide good examples because, while SCPES (and these two projects in particular) provide some of the oldest examples of PES, research on specific smallholder monitoring methodologies is limited to general aspects of conditionality (Fisher, 2013) and specific technological aspects (Rosenstock et al., 2013;Seebauer, 2014). Additionally, smallholders tend to be a poorer socioeconomic group who collectively safeguard a wide range of ecosystem services from landscapes, and so may increasingly be targeted by PES (Milder et al., 2010;Daw et al., 2011). Finally, with limited economies of scale, optimisation of monitoring is particularly pertinent to smallholder projects to keep costs down Rosenstock et al., 2013). Although PES design (and therefore monitoring) will differ with the scale, technological context and objectives of the project (Farley and Costanza, 2010), lessons from SCPES schemes may be valuable for PES more generally.
The three research questions framing this study are: 1. How does the choice of monitoring methodology affect perceptions of local equity in outcomes, and legitimacy in decision making?
2. How do costs and accuracy vary with the complexity of the monitoring methodology? 3. How do perceptions and expectations of monitoring vary amongst different actors?
In answering these questions we discuss how data resolution, personal interaction, local labour, and potential PES income are key mechanisms for optimising monitoring in our cases. We then elaborate on why PES monitoring may benefit from ecosystem service analyses by local (as opposed to external) actors, better communication of uncertainty and accuracy to stakeholders, and greater recognition of the diverse social functions of monitoring (in contrast to narrow conceptions of PES as simple 'monitor and pay' interventions).

Study Sites
Our cases, 'Scolel'te' in Mexico and 'Trees for Global Benefits' in Uganda, sell carbon offsets certified by the Plan Vivo Standard (Plan Vivo, 2013). Both projects involve smallholders: a landholder reliant on household land and labour (Plan Vivo, 2013). Scolel'te has been active in the Mexican state of Chiapas since 1994 (certified since 1997), is administered by the local intermediary AMBIO, and currently supports over 1200 smallholders. Trees for Global Benefits has been running in southwest Uganda since 2003 and is administered by Ecotrust Uganda, engaging over 4800 smallholders. The comparative maturity of these projects provided research respondents with an unusually long duration of experience of being monitored in PES.
We applied five monitoring methodologies to the same 31 agroforestry plots in 2015 to estimate with each methodology the change in carbon since each plot joined the project. Ten of the plots were in Mexico, and 21 in Uganda (Fig. 1). The plots ranged from 0.5 to 2.5 ha and had similar management regimes: planting of even-aged, mainly mediumgrowing native tree species (approximately 400 trees/ha) in the first three years after joining the program, with a selective thinning regime, and intercropping in early years until canopy closure. Plots had been in the program for varying amounts of time (4 to 14 years). Plots were selected at random from a pool of available smallholders, stratified to get the full breadth of plot ages. Given that we investigated variation between the monitoring methodologies rather than between plots, all plots were treated as being of one population to which the monitoring methodologies could be applied.

Carbon Offset Monitoring Methodologies
We define a monitoring methodology as the combination of a 'data source' (e.g. field measurements or remote sensing spectral data) and 'analysis method' (e.g. modelling). We selected five methodologies (Table 1) based on feasibility in our cases and their representativeness of the spectrum of methods recommended for forestry projects (IPCC, 2006): two using remote sensing and three using field measurement. Fig. 2 describes the data, analyses and workflow for each methodology (see Appendix 1 for further information). Basic field measurement typifies current monitoring in the projects. At the outset, methodologies were differentiated based on data resolution, methodological Table 1 Monitoring methodologies assessed in the study.

No.
Name Description

RS1
External remote sensing An external specialist without local knowledge conducts a dot-grid analysis of canopy cover using flexible advanced spectral patterns, and relates results to a basic regional-level modelling of CO 2 e at different canopy proportions (modelling done with the SHAMBA GHG model; see Woollen et al., 2014).

RS2
Participatory remote sensing The same as external remote sensing except using only rigid basic spectral patterns and carried out by a local field technician with knowledge of the area and plots. FM3 Basic field measurement A local technician looks up the stocking density of trees with DBH N 5 cm against the basic regional-level SHAMBA modelling of CO 2 e at different stocking densities. FM4 Intermediate field measurement A local technician inputs plot-level data on location, tree stocking density, species and growth rates, and regional-level data on other land use into the SHAMBA model. FM5 Advanced field measurement A local technician inputs detailed plot-level data for all variables (tree growth, stocking density and land use) into the SHAMBA model. complexity (i.e. technological sophistication) and relative level of local participation in data collection/analysis (Table 3). The former two attributes continue to be the criteria for model differentiation under the IPCC's land emissions modelling approach (IPCC, 2006;Smith et al., 2014), while the latter is akin to other studies on participatory monitoring (Danielsen et al., 2008). The data sources for both remote sensing methodologies were WorldView-2 multispectral and pan-chromatic imagery (1.85 m and 0.46 m resolution, respectively) donated by the DigitalGlobe Foundation, and GPS boundaries collected by field technicians. Across field measurement methodologies, data sources for each plot were the GPS field location, stocking density of trees, a systematic random sample of tree diameter, and historical land use information for the plot from semi-structured interviews with the owner (see Stakeholders section below).
For data analysis, emission modelling was conducted using the Small-Holder Agriculture Monitoring and Baseline Assessment (SHAMBA) methodology developed by the University of Edinburgh to estimate emissions from agriculture and tree planting on a per hectare basis . SHAMBA was selected above other potential models (e.g. Hillier et al., 2011) as the only model explicitly designed for tropical smallholder projects (Berry and Ryan, 2013). The model consists of three sub-models: a RothC module for soil; a stock and flow woody and crop biomass model, similar to the DALEC model (Williams et al., 2005); and simpler approaches for non-CO 2 greenhouse gas emissions related to biomass burning, N fixation and the use of fertilisers (model and documentation freely available online: http:// bit.ly/1t58lFd). SHAMBA contains databases of emission factors, tree allometry, climate and soil information so that it can be used by non-specialists (Berry and Ryan, 2013). Given that carbon analyses usually require costly specialist knowledge (Cacho and Lipper, 2006;Meijaard et al., 2014), the use of SHAMBA implies a lower cost for carbon analyses generallythis is a key assumption in our analyses.

Stakeholders
We identified four main stakeholder groups: smallholders, local intermediaries, technical experts and buyers (Table 2). These groupings are noted in other studies (Corbera and Brown, 2008;Peskett et al., 2011). Thirty-one smallholders were selected using random sampling, stratified by plot age, and six intermediaries and were selected purposively based on their roles in the projects. Seventy-five buyers were engaged through a targeted online survey, based on their interaction with Plan Vivo. The empirical methods employed were semi-structured interviews (on both monitoring experiences, and past land use on the plots) and observation for all stakeholders, except for buyers which relied on a questionnaire. We also conducted a joint workshop with intermediaries from each case to further investigate intermediary perceptions. We then employed a thematic analysis (see Vaismoradi et al., 2013) to organise perspectives from different stakeholders under the key aspects of performance (see below).

Performance of Monitoring Methodologies: Definitions and Methods of Comparison
The monitoring methodologies were scored across the four key aspects of performance apparent in the existing literature (see Introduction): • Statistical accuracy: We define statistical accuracy as agreement between the GHG estimate from the methodology and the 'true' value.
In the absence of a known truth and given the infeasibility of advanced field measurements (e.g. destructive sampling of trees or longitudinal soil samples, see Milori et al., 2012), we accepted results from advanced field measurement (FM5) as a benchmark for the 'maximum conservative estimate' of GHG removals for each plot, and as a proxy for true accuracy. We measured statistical accuracy of other tools as the average difference of results from this benchmark, in tCO 2 e ha −1 year −1 . • Cost performance: using semi-structured interviews, a literature review, and Cacho and Lipper's (2006) cost categories, we investigated 'cost performance' in implementation (lower cost means better cost performance) for intermediaries and smallholders. The analysis was based on real costs for FM3 (as it resembles current monitoring), and we used a review of component costs (e.g. remote sensing imagery; local labour costs) and semi-structured interviews with stakeholders to estimate if the other methodologies would likely be higher or lower. • Perceived local equity and legitimacy: through workshops with intermediaries and structured surveys with smallholders, conducted by two researchers (assisted by a primary field assistant in each country), we sought perceptions from six intermediaries and 31 smallholders on local distributional equity (monitoring effects on participating smallholder income) (Brown, 2003) and legitimacy in decision making (whether monitoring decisions are perceived as fair and acceptable to smallholders) (Adger et al., 2003). To avoid bias from asking hypothetical questions, we began interviews with a discussion of the legitimacy and equity aspects of the current monitoring regime with which respondents had experience (and which approximates FM3). We then followed this with a discussion on what would happen if the three key aspects of monitoring tools (i.e. data resolution, complexity and local participation) varied from the current level in (e.g. 'how would you feel if calculations of the amount of carbon in your trees were more/ less complex than they are already?'; 'how would you feel if field visits were more/less frequent?'; 'how would you feel if the monitoring technician measured fewer/more trees?'). We viewed that the practical difference between FM4 and FM5 was minimal in these criteria, so these were assessed together. • Perceived accuracy: we focused on the roles of technical intermediaries and buyers, given their dominant role in shaping monitoring (Fisher, 2013). For technical intermediaries, following Drescher et al. (2013), three technical experts with practical experience in GHG modelling independently conducted a pedigree analysis (Risbey et al., 2001) on sources of error and strength of method. Buyer perceptions were based on an online survey of 75 buyers asking 'which of the monitoring methodologies is most accurate?' along with a short description of each methodology.
Scores for all aspects were converted into ordinal scores of performance (very poor, poor, moderate, good, very good) relative to each other, with the exception of statistical accuracy, which is reported in tCO 2 e ha −1 year −1 . All scoring results were compiled in Table 3, akin to Danielsen et al. (2008). These main results are supplemented by Table 4 and Table 2, respectively describing the 'success factors' and use of information by different stakeholders.

Results
Overall, there was variation between monitoring tools across all four aspects of performance (Table 3). Methodologies using regional-level data recognised less of the natural variability in GHG removals between plots than methodologies using plot-level data (intermediate and advanced field measurement; FM4 and FM5) (Fig. 3). This ability to recognise variability appeared to underpin improved performance in accuracy, equity and legitimacy (discussed further below).
Another main area of difference was the diversity of uses of monitoring information between stakeholders (Table 2). We believe that this variation underpinned the results on perceived local equity and legitimacy, which we consider are by definition related to a stakeholder's existing expectations of monitoring.

Does the Choice of Monitoring Methodology Affect Local Perceptions of Equity?
Perceived local equity varied amongst methodologies, with field measurement methods assessed as more equitable than remote sensing methods (Table 3). Intermediaries perceived that field measurement approaches would have lower costs and higher accuracy (and so higher ecosystem service estimates), leading to more equitable (higher) income for smallholders (Table 4). Smallholder assessments mainly agreed Quantitative results for participatory remote sensing were not completed due to a small sample size.  (Table 4), but gave higher scores for plot-level methodologies (intermediate and advanced field measurement) due to their relative ability to allow heterogeneous land use and to recognise over-performance. Some intermediary views were qualified by the perspective that, in their case, SCPES financial flows in isolation are currently insufficient to fully cover the cost of monitoring (and embedded activities such as capacity building). According to this view, in this case, no monitoring approach can currently provide truly equitable outcomes unless relying on supplementary grant (non PES-generated) funding.
Beyond our immediate remit of equity in outcomes, on the related topic of local equity of participants' access to PES schemes (Brown, 2003), our results suggest that monitoring reforms may have minimal impact on improving carbon PES access for smallholders with less land or lower social capital. Intermediaries stated that changes to monitoring would not impact who could be involved in the scheme, with risks to smallholder food security and livelihoods being maintained as grounds for exclusion regardless of monitoring methodology. Relatedly, field observations and other research (Fisher, 2013) suggested that farmers were mainly recruited through social groups and institutional arrangements (e.g. NGO membership, community ties) that pre-dated the project. Reforms to monitoring would not necessarily impact these separate issues of local equity in access.
Additionally, discussions with local community technicians revealed equity considerations in devolving coordination and data collection to local technicians or volunteers. In both cases there were varying accounts from technicians as to whether their remuneration covered their opportunity costs. While this was not fully investigated by the study, it does raise the issue of how costs are shared within intermediary organisations (e.g. between 'headquarters' staff and local technicians or volunteers). For example, the increased field data collection implied by FM5 would imply increased time costs for local technicians.

Does the Choice of Monitoring Methodology Affect Perceptions of Local Legitimacy?
For legitimacy, field measurement approaches once again performed better than remote sensing (Table 3). Intermediaries and smallholders attributed this to: a) Local participation through field visits, with reduced opportunities for smallholder participation in decision making under remote sensing; b) Local understanding of data collection, with simple measurements such as tree diameter viewed as more legitimate and locally accessible than satellite data (Table 4); and  i This is a deviation from the FM5 benchmark, rather than an absolute value. Therefore the CI can exceed the deviation. • Includes field visits c) Citing a lack of public extension services in the study areas, better opportunities to deliver agroforestry extension services (a monitoring 'co-benefit') through field measurement approaches.
Some intermediaries stated that transparency was less important for the 'data analysis' part of the methodologies. They suggested that, in the case of agroforestry monitoring, all available analyses are too opaque to be truly understood by smallholders and many intermediaries, and so all will continue to fail in this aspect of legitimacy. This was supported by the smallholder semi-structured interviews, which predominantly reported a poor understanding of the analysis method from even the most basic methodology (basic field measurement; FM3) and reported only an occasional desire to better understand the analysis.

Do Costs Vary With the Complexity of the Monitoring Methodology?
Given minimal time requirements for smallholders in all of the examined methodologies, farmer cost performance was largely even, except for external remote sensing, which was marginally lower. For intermediaries, while the least complex methodology (basic field measurement) had the best cost performance, this criterion was not closely coupled with complexity for other methodologies (Table 3). Rather, costs appeared related to data collection and specialist knowledge, rather than to complexity per se (Table 4). Remote sensing methodologies were more costly due to the additional cost of high-resolution remote sensing data (although DigitalGlobe Foundation donated the data for this study) and the relative expense of specialist analysts. Specialist costs for GHG modelling in all tools were assessed to be lower due to the use of SHAMBAa model designed for rapid automated plot-level modelling at the local intermediary level (Berry and Ryan, 2013)thus decoupling higher complexity from monitoring cost. If a different GHG accounting model was used, the analyses of cost outcomes would vary.

Does Accuracy Vary With the Complexity of the Monitoring Methodology?
All methodologies produced conservative statistical accuracy results relative to similar field studies. Average estimates from our tested methodologies ranged from 2.6 to 6.5 tCO 2 e ha −1 year −1 which are conservative relative to estimates for similar interventions in Eastern Africa: 4.6 to 10.1 tCO 2 e ha −1 year −1 (Kimaro et al., 2011) and 6.1 to 13.6 tCO 2 e ha −1year −1 (Nyadzi et al., 2003) (figures scaled to comparable stocking densities). Similarly, in Mexico, de Jong et al. (1995) found an average sequestration of 14.29 tCO 2 e ha −1 year −1 for the same interventions and locality. Thus, despite varying statistical accuracy between the methodologies, all methodologies can be said to produce conservative results on average.
Statistical accuracy improved with complexity for field measurement methodologies, though with diminishing improvements in accuracy from medium to high complexity (Table 3). Table 3 shows that intermediate field measurement was routinely much closer than basic field measurement to the benchmark of advanced field measurement. This can be explained by the only point of differentiation between these two methodologies: the use of plot-rather than region-level monitoring information in intermediate field measurement for the most sensitive model input variables (plot location, tree stocking density, species and growth rates, see sensitivity analysis: Ryan et al., 2014). Advanced field measurement gained only marginally on intermediate field measurement by using plot-level input data for every variable, including less sensitive variables on fire, fertiliser and crop GHG sources .
External remote sensing was rated as equally complex as the benchmark methodology (advanced field measurement), yet it had the lowest statistical accuracy (Table 3). The technical review suggested this was due to: a) data resolution in data collection, where field boundaries had a minimum error of up to 6% on 1 hectare plot (i.e. ± 3 m) and (even with high-quality WorldView-2 products) individual tree canopies were hard to distinguish; and b) data analysis, where regional-rather than plot-level ecosystem service estimates were used. It can be assumed that participatory remote sensing would share the same limitations.
The technical and buyer perspectives on accuracy broadly aligned with the statistical accuracy results, with increasingly complex field measurement approaches scoring higher (Table 4). For buyers, follow up questions indicated a perception of greater accuracy from methodologies that include field visits.
Importantly for reconciling different stakeholder's expectations of monitoring, the buyer survey indicated that purchasing decisions of many buyers of agroforestry carbon offsets are in fact influenced more by price and the presence of social co-benefits, such as capacity building, rather than perceptions of robustness or accuracy.

Do Perceptions and Expectations of Monitoring Vary Amongst Different Actors?
Stakeholder groups had diverse uses of monitoring information ( Table 2). Smallholders were concerned with how monitoring can support income and skills, local intermediaries and technical actors focused on project management, while buyers used information to select and communicate about ecosystem service investments. A key novel aspect to come from smallholder and local intermediary participants was the focus on agroforestry extension as a concurrent aspect of monitoring visits, which was a key factor in the stronger legitimacy scoring of field measurement methodologies. While an agrosilvicultural extension cobenefit was apparent for smallholders, intermediaries and technical actors still appeared to have the greatest use of current monitoring information. This corroborates the finding that carbon PES monitoring regimes evolve to serve a technical rather than local audience (Peskett et al., 2011;Fisher, 2013).

Discussion
There are many monitoring methodologies available to different PES schemes, and the effectiveness of a methodology will vary depending on the context of each project and the demands of key stakeholders (Meijaard et al., 2014). Following the testing (via two cases) of a set of methodologies which span the spectrum of those recommended for forest projects (IPCC, 2006), we elaborate three conclusions that may help to optimise monitoring in PES schemes generally: 1) devolving plotlevel field measurement and analysis functions to the local level can provide a 'quadruple win' by improving all aspects of monitoring performance; 2) the monitoring demands of stakeholders could be rationalised by communicating that simpler methodologies can have the same 'applied accuracy' as more complex methodologies, where estimates are conservative; and 3) documenting and communicating about the broader functions of PES, including monitoring, may support more effective project design and may show that co-benefits are better rewarded by the market.

Local Data Analysis: A Quadruple-win?
Our findings build on existing research on the importance of local actors in project monitoring and administration (Brofeldt et al., 2014;Calvet-Mir et al., 2015) by suggesting that, in our cases, accuracy, costs, local equity and legitimacy may all be supported through devolving, not only data collection, but also data analysis, to the local level. Based on our results, we propose that local participation and data resolution are more important than methodological complexity in improving monitoring in SCPES schemes. We then suggest that the devolution of analytical capacity to local actors (such as technicians employed by local intermediaries) could strengthen local participation and provide a costeffective means of maintaining or increasing data resolution. Fig. 4 presents a conceptual model derived from our results of the pathways by which the methodological attributes of data resolution, local participation and methodological complexity can positively influence the performance aspects of accuracy, costs, local equity and legitimacy.
For local participation, the first pathway is through participation supporting increased field presence of the project and, in turn, promoting perceived legitimacy. We found that methodologies involving higher levels of field visits and personal interaction gave smallholders greater scope for communication and negotiation over project decisions. This supports the broader notion that personal communication, mediation and translation is key to effectively managing the 'boundary' between technical knowledge and practice for sustainable development (Cash et al., 2003). A second pathway is that local participation may include utilising local labour, which we found to reduce costs relative to using external staff. This increased potential local income and, subsequently, perceived local equity. This agrees with existing findings on local data collection on the cost advantages of using local labour, after the initial costs of training are met Brofeldt et al., 2014). Local labour costs will vary significantly between projects, but given the generally high cost of external expertise (Meijaard et al., 2014) local labour may broadly be a more attractive prospect. We discuss below the equity impacts of this at the intra-community level (given that this labour constitutes local technicians and not necessarily members of the wider community).
For data resolution, we suggest three main pathways by which it impacts monitoring performance, all of which stem from our finding that improved data resolution improves accuracy. First, improved data resolution reduced the number of GHG credits lost due to uncertainty: coarser data creates ecosystem service estimates with greater uncertainty and, because conservatism dictates the use of the lower bound of the estimate, lower ecosystem service values for each smallholder. Thus high data resolution can increase potential income and improve local perceived equity. Second, plot-level (i.e. high resolution) estimates can improve both perceived local equity and legitimacy through recognising (and potentially rewarding) over-performance. If a smallholder exceeds performance targets (e.g. greater growth rates), under current approaches using rigid ex-ante estimates (including tiered payment schemes) and without a new plot-level analysis, the intermediary would only know that the smallholder had exceeded expectations, but not by how much. If a smallholder had innovated outside the prescribed intervention (e.g. planted different, more successful trees) the intermediary knows even lessall they know is that the smallholder has not followed the prescribed regional-level land use. Plot-level data analysis would recognise over-performance in both scenarios, which was perceived to improve equity. Finally, we suggest that plot-level data resolution improves local equity through allowing for a greater diversity of land uses. Regional-level ecosystem service estimates during project design lead to smallholders being contractually locked into a relatively specific land use, thus limiting responses to natural variability of social and ecological conditions and potentially inhibiting their adaptive capacity (Anderson and Zerriffi, 2012;van de Sand, 2012;van de Sand et al., 2014). Plot-level data resolution in monitoring (and design) may allow for greater land use flexibility in projects. This aligns with existing research in our Mexico case study suggesting that more flexible land use design and contractual conditions may better reflect the diversity of local social conditions (Costedoat et al., 2016) and farmer aspirations (Otto, 2016). It also agrees with the broader assessment of environmental monitoring by Danielsen et al. (2010) which suggests that local involvement improves outcomes and the decision making at smaller (i.e. village) scales.
Broadly, we suggest that improved local participation and data resolution can be supported through maintaining the role of local actors in data collection (Brofeldt et al., 2014) and further devolving data analysis functions to the local level. In addition to increasing legitimacy through maintaining field visits, enabling local actors to conduct analyses may present a cost-effective way for PES schemes to move from coarse regional-level analyses to plot-level calculations, so increasing accuracy and flexibility. We suggest that field measurement approaches are more suited to this approach: given the apparent technical and cost barriers of remote sensing analyses in smallholder agroforestry monitoring, field measurement analyses by local technicians may be the only currently feasible way to routinely obtain and analyse high resolution plot-level data. In any case, even if the technical and cost barriers to remote sensing in smallholder PES could be overcome, the local equity and legitimacy issues would remain.
Empowering local technicians and distinguishing between underand over-performing smallholders does of course come with its challenges. Benefiting local technicians is not the same as benefiting the wider community, or indeed marginalised members within the community. Thus devolution of monitoring to the local level does not necessarily increase local equity in access to PES income for those currently excluded from schemes, and does not necessarily prevent (and in some cases may enable) elite capture (Funder et al., 2013;Staddon et al., 2015). On overperformance, intermediaries in our cases suggested that, given the opaque nature of carbon PES analyses, communities may not understand how over-performance is calculated, leading to perceptions of inequity. Schemes would need to consider this in the design of monitoring regime, such as through cross-community verification and benefit-sharing mechanismselements of which already exist in our cases.
Aside from these challenges, the question is: how can PES analyses be devolved? We see two parts to this solution. The first involves PES certification standards moving their rules and processes 'closer to the farm', where plot-instead of regional-level project designs and ecosystem service estimates are encouraged. This would be a major reform for the majority of agroforestry projects associated with the main PES certification standards (VCS, 2011, Plan Vivo, 2013, The Gold Standard, 2014, and represents a change of direction from the increasing use of regional 'hands-off' approaches to ecosystem service analyses, such as those that fully rely on remote sensing (see de Araujo Barbosa et al., 2015;Dong et al., 2015). Although RS with local participation may be appropriate in non-smallholder projects (see Bustamante et al., 2016). More broadly, where such plot-level analyses are designed to provide useful information back to farmers (e.g. on how to improve the carbon performance of their land use, or on over-performance to then use in negotiations with intermediaries), they could support adaptive decision making and management amongst PES smallholders (Stringer et al., 2006;Davies et al., 2015). This could allow PES certification bodies to begin to address criticisms that they are fundamentally 'upwardly' accountable (Fisher, 2013) and that they perpetuate existing power asymmetries (Kosoy and Corbera, 2010;Kronenberg and Hubacek, 2013).
The second part of the solution is technical. Existing studies Brofeldt et al., 2014) already show that, with appropriate training, local actors can conduct robust data collection, and innovations are already underway to support data collection in areas with variable infrastructure and literacy (Hartung et al., 2010;Stevens et al., 2013). We suggest that technical innovations can also put previously complex ecosystem service analyses at the fingertips of local non-specialists. Tools are being developed for smallholder plot-level carbon analyses, such as the SHAMBA model used in this study  and the Cool Farm Tool (Hillier et al., 2011), but formal applications to date rely on external expertise for analysis (e.g. Cool Farm Alliance, 2011). We therefore argue a need for further development of such technologies to truly devolve PES analysis to the local level.
Such technical innovations should, however, be treated critically. We highlight two key issues. First, while automation may present an opportunity, intermediaries in our cases who had trialed new technologies reported issues with technological reliability (e.g. data access; system crashes), user issues (e.g. difficulties in correcting data entry mistakes) and backward-compatibility with older (e.g. paper based) systems. Second, care would need to be taken that the efficiencies implied by newer devolved analysis tools (such as the lower costs associated with the use of SHAMBA in this study) do in fact result in better cost outcomes locally, and do not simply transfer costs from intermediaries onto local technicians or volunteers. PES schemes should thus be critical in their decision to adopt new technologies, ensuring that new systems are reliable and the costs justified (Newman et al., 2012) new technology may not always be the best solution.

Clarifying 'Accuracy' in PES
Our results point to the need for more nuanced understandings amongst PES stakeholders of the relationship between methodological complexity and accuracy, and to distinguish between 'scientific' and 'applied' accuracy where conservatism is a key principle of an ecosystem service certification standard.
First, our results show diminishing returns to accuracy for increasing complexity, thus moderating the simple assumption that complexity implies accuracy. Diminishing returns are well established in the statistical design of environmental monitoring (e.g. power analysis and sampling, see Caughlan and Oakley, 2001). However, the association between technological complexity and accuracy is less clear and, at least theoretically, a simpler model could provide a similarly or more accurate result (Brooks and Tobias, 1996;Young et al., 1996). Yet there appears to be a bias in PES schemes towards more technologically complex monitoring, partly to 'prove' conservativeness (Baker et al., 2010;Meijaard et al., 2014). Our results suggest that accuracy and complexity can be balanced in carbon monitoring through improved data-resolution and focusing on the key carbon pools and fluxes. Theoretical approaches to focusing data collection and analysis in environmental management already exist (see for example Runge et al., 2011). Finding practical ways to apply this in PES design so that schemes can assess the marginal accuracy gains from increased monitoring complexity could ensure that monitoring complexity per se does not become an end goal. Second, our results illuminate a potential misunderstanding in the practical application of the complexity/accuracy assumption in the precise-or-conservative approach. As elaborated in the results section, the estimates from all of the tested methodologies were conservative relative to other research studies. Not only this, but carbon PES schemes would employ the lower bound of the 95% confidence interval, leading to further conservatism. So in practice, while 'accuracy' varies amongst the methodologies, carbon units certified by any of these methodologies would all be equally 'real' or 'robust' as they are all conservative. According to the complexity/accuracy assumption, PES schemes using less sophisticated monitoring may be viewed as less robust, whereas in fact they may simply be more conservative (and so equally robust).
This leads us to propose a separation in PES accounting between 'scientific' accuracy, which is concerned with certitude (i.e. the proximity of the estimated value to the true value), and 'applied' accuracy, which is concerned with conservatism (i.e. the probability that the estimated value is lower than the true value). To use the language of risk and uncertainty, scientific accuracy is concerned with reducing statistical uncertainty in estimates as much as possible, whereas applied accuracy is about managing the risks of the (inevitable) residual statistical uncertainty by ensuring that a lower (conservative) value is used in practice. This is similar to the argument already made in the ecosystem service and natural resource management map-making literature which proposes that, due to inherent imprecisions in mapping geographical features, there is a need to distinguish between (sometimes overwrought) claims to scientific accuracy on one hand, and the practical 'usability' of simpler analyses on the othersimple maps may be just as informative (McCall, 2006;Vorstius and Spray, 2015). This distinction is important in the context of competition between PES certification organisations because confusion of the two concepts could drive PES certification organisations to increase monitoring complexity to distinguish their certified carbon credits and remain competitive in the market, when such complexity is perhaps not justified, and in fact can reduce benefits locally by driving up costs and reducing transparency. PES certification organisations may benefit from better communicating this distinction to technical project staff, project auditors, credit resellers and buyers.

Payments for Ecosystem Services: More Than Payments
Our results demonstrate diverse stakeholder purposes and demands in SCPES monitoring, which contrasts with early, narrow conceptions of PES as an institutionally simple economic incentive (Ferraro, 2001). We suggest that recognising the broader functions of monitoring could increase PES success and income, as well as deliver on the ethical obligation to hold monitoring 'downwardly' accountable to smallholders.
Simple conceptions of PES envisage a basic program design relative to other interventions, with a focus solely on administering contracts and conditional payments for conservation, presented as immune from broader 'non-environmental objectives' (Ferraro, 2001;Ferraro, 2011). Our study presents two pieces of evidence which build on existing critiques arguing that PES schemes necessarily evolve to encompass broader social considerations in practice (Fisher, 2013;Pascual et al., 2014). This is particularly true for smallholder and community projects.
First, in both of our cases, monitoring was interwoven with agroforestry extension services. Such capacity building 'co-benefits' have been demonstrated in development aid projects (Wanvoeke et al., 2015), and our results extend this into the field of PES. Both smallholders and intermediaries considered monitoring and extension linked, contributing greatly to perceived legitimacy. Given the persistent lack of public agricultural extension services following their withdrawal in the 1980s (at least partly due to governments seeking economic efficiencies through privatisation) (Haug, 1999;Benson and Jafry, 2013), we suggest that intermediaries found the need to provide extension support to smallholders, paid for by effectively levying a charge on smallholder's ecosystem service credit income. Following the simple economic efficiency logic of the narrow PES conception it may be argued that the ideal approach would be for intermediaries to forgo extension functions and the associated levy, instead permitting smallholders to decide if and how they source such services. Our cases would suggest it is a circular and self-defeating prospect in practice in some contexts: in the name of economic efficiency, externalising on to smallholders the responsibility and risk of sourcing extension servicesservices which smallholders have little prospect of sourcing as public extension was itself withdrawn in the name of economic efficiency. This example demonstrates that PES schemes, including monitoring, evolve to encompass concerns beyond that of a simple economic incentive.
The second, and related, way in which in our study illustrates the broader functions of PES monitoring is through the expectations of buyers with regard to co-benefits, including smallholder capacity building (e.g. through agrosilvicultural extension). Far from the narrower payments-for-conservation view of PES (Ferraro, 2011), the buyer survey indicated that purchasing decisions of many buyers of agroforestry carbon offsets are in fact influenced by the presence of social co-benefits, such as capacity building. This may indicate that ecosystem service buyers' willingness to pay for social co-benefits is filling the void in agricultural extension funding created by the retreat of the state. It also suggests that PES intermediaries may gain from better articulating to buyers the diverse functions of their monitoring as the surveyed buyers indicated a willingness to pay for such co-benefits. Better documenting and communicating extension efforts could thus benefit several different actors and capitalise on buyers' interests in local benefits.
Overall, we argue that accepting at the outset a broader conception of PES, could improve project success through better understanding the needs and desires of all stakeholders, ensuring that these broader functions are costed in the project design, and by articulating to buyers the value of co-benefits. However, this will not transcend the known limitations of the PES concept in addressing broader developmental issues such as empowerment and poverty reduction Fisher et al., 2013), or indeed the known limits of any project-based initiatives to challenge wider structures of marginalisation (Hickey and Mohan, 2005). In agreement with the results of other studies (Corbera and Brown, 2008;Staddon et al., 2015), the monitoring processes in our cases were observed to be embedded within pre-existing institutional frameworks. Depending on implementation, this could reinforce existing inequities or bring about some new element of social change. As argued by Anderson and Zerriffi (2012), if intermediaries are seeking to achieve and market social co-benefits through their PES scheme, they should be realistic about the scheme's potential for social change, given its design.
Given the above proposed roles for devolved data analysis, and improved understandings of accuracy and the diverse functions of monitoring, we summarise below (Box 1) seven issues for consideration in the design of smallholder schemes, and for PES more generally.

Conclusion
PES is one approach in the bigger environment and development intervention toolbox. Where PES is deemed a useful approach, and given its continued and increasing popularity, it is important to optimise PES implementation, including monitoring, to maximise overall benefits for socioecological systems.
Through two cases of SCPES monitoring in Mexico and Uganda, this study suggests that trade-offs between accuracy, costs, local equity and legitimacy can be effectively balanced in PES monitoring, and that all of these aspects may be supported through improving data resolution and participation. We suggest three key measures: supporting local actors to conduct plot-level ecosystem service analyses; clarifying understandings of accuracy; and explicitly valuing the myriad roles monitoring plays for different stakeholders in PES schemes. In doing so, we have framed seven key issues to consider when designing PES monitoring (Box 1).
While our findings stem from two cases of smallholder carbon schemes, the findings may be relevant for PES or land-use schemes with different scales of operation (e.g. larger REDD+ projects) and ecological objectives (e.g. water and or biodiversity PES). Balancing methodological complexity and accuracy with costs and transparency can be seen to be a general issue faced by the majority of PES and other landuse schemes. Our findings on the importance of data resolution and participation in managing this trade-off may serve to inform the design of such schemes.

Acknowledgements
This paper developed from a research project funded by the Ecosystem Services for Poverty Alleviation (ESPA) research programme, grant number NE/L001578/1. The ESPA programme is funded by the Department for International Development (DFID), the Economic and Social Research Council (ESRC) and the Natural Environment Research Council (NERC). Authors would like to thank the following people for their assistance in the research: the participating smallholders from Scolel'te and Trees for Global Benefits projects; Emerenciano Rivera, Nicolás Pérez, Teresa Böttcher, Sotero Quechulpa, Elsa Esquivel and other staff from Cooperativa AMBIO in Mexico; Lydia Kuganyirwa, Innocent Byamukama, Bernadette Kabonesa, Obed Muhindo, Pauline Nantongo and other staff from ECOTRUST Uganda; Chris Stephenson, Matteo Bigoni and Eva Schoof from the Plan Vivo Foundation; Brian Barban at the International Institute for Environment and Development; Paris Kazis, Iain McNicol, Emily Woollen, Sam Bowers, Yaqing Gou, Nicolas Berry and Andy Cross at the University of Edinburgh; and attendees at the international workshops in Kampala and Sigtuna. In addition, the paper benefited greatly from the comments of five anonymous reviewers. Any errors, omissions, or interpretations remain the responsibility of the authors. Authorship assignment follows the first-last-author-emphasis norm (Tscharntke et al., 2007).

Appendix 1. Monitoring Tool Descriptions
For this study, a monitoring tool consists of two parts: 1) the data sources; 2) the method of analysis (see Fig. 2 in manuscript).
Box 1 Summary of issues to consider in designing PES monitoring.
• Empowering local technicians to conduct PES analyses by moving the rules and processes of PES certification 'closer to the farm', and by continuing to develop technologies for their use; • Ensuring that new technologies are in fact the best optionnew technologies may not always be appropriate • Where analyses are devolved, introducing safeguards to protect against dominance/capture by local elites; • Ensuring that the data collected and analysed is appropriate through maximising marginal accuracy gains (and so maximising certified PES credits) from complexity (e.g. moving to plot-level data for key variables) while resisting further complexity; • Understanding and communicating to stakeholders a separation in PES accounting between 'scientific' and 'applied' accuracyless accurate PES monitoring can be just as robust in practice, once conservatism is applied; • Better documenting and communicating extension efforts and other co-benefits to: 1) avoid at the outset misconceptions that PES is 'simple', and; 2) to potentially improve PES sales by capitalising on buyers' interests in local benefits; and • Being realistic about the potential of PES, including monitoring and technical interaction, to support social change.
Five monitoring tools were implemented, collectively using seven data sources. The various data sources are described below.
Data 1: Satellite data WorldView-2 panchromatic (0.5 m) and 8-band multispectral (2 m) resolution was donated by The DigitalGlobe Foundation. Data 2: GPS boundaries On visiting each plot, the smallholder was handed a GPS (model Garmin eTrex 10 or 20) and they tracked the boundary around the plot. This produced a GPS Exchange file (.gpx) for use in analysis. Data 3: Canopy vs stem diameter survey Crown and stem diameter were measured by field technicians for a random mix of species (in Uganda n = 49; in Mexico n = 31) on random plots, stratified by DBH size (bins of 10 cm, from DBH N 5 cm). A linear regression was then conducted to find the relationship between single tree crown and stem diameter for each country. Data was entered manually into monitoring databases. Data 4: Intervention-level literature review and expert opinion Intervention-level data and assumptions were generated for each country through a review of relevant literature (project design documents, technical specifications, feasibility studies, peer-review literature, grey literature). Where values could not be found in the literature, the expert opinion of local field technicians was used. Data was entered manually into monitoring databases. Data 5: Plot-level tree inventory A field technician collected location, tree stocking density, species and tree growth (DBH at 1.3 m where DBH N 5 cm) on each plot. Stocking density only included trees with DBH N 5 cm. Species and tree growth data were collected for trees selected using systematic sampling (every 10th tree with DBH N 5 cm). Data was entered manually into monitoring databases. Data 6: Plot-level smallholder interview The field technician interviewed the smallholder about land management on their plot, including on a counterfactual baseline scenario. Data was collected on: tree thinning and mortality; crops, yields and residues; synthetic and organic fertiliser use; and fire occurrence. Data was entered manually into monitoring databases. Data 7: Basic modelling There were two parts to this modelling: a) An external specialist used the data from the interventionlevel literature review (Data 4) in a hybrid process-empirical model (SHAMBA) (model and documentation freely available online: http://bit.ly/1t58lFd) to do regional level modelling on CO 2 e at different stocking densities.
b) The regional crown/stem relationship (Data 3) was then used to relate the stocking density CO 2 e estimates in the first step to a canopy proportion. FM1: External remote sensing and CO 2 e quantification An external technical specialist without local knowledge of the area quantified CO 2 e through first conducting a dot-grid analysis of canopy cover using WorldView-2 satellite data (Data 1) and plot GPS boundaries collected by the smallholder (Data 2). This canopy proportion was then related to the basic intervention-level modelling of CO 2 e at different canopy proportions (Data 7). FM2: Participatory remote sensing and CO 2 e quantification This analysis was the same as Analysis 1 except that the dot-grid analysis was led by a local field technician with knowledge of the area and plots.
RS3: Basic CO 2 e modelling and quantification Analysis 3 related the stocking density of trees with DBH N 5 cm (Data 5) to the basic intervention-level modelling of CO 2 e at different stocking densities (Data 7). RS4: Intermediate CO 2 e modelling and quantification An external specialist used plot-level data from the tree inventory (Data 5; location, tree stocking density, species and growth rates) and intervention-level data from the literature review (Data 4) for all other inputs. This data was input into SHAMBA to assess CO 2 e at the time of monitoring. RS5: Advanced modelling CO 2 e and quantification An external specialist used only plot-level data (Data 5 and Data 6). This data was input into SHAMBA to assess CO 2 e at the time of monitoring. NUSAP protocol/questions.