Selection biases and spillovers from collective conservation incentives in the Peruvian Amazon

Payments for ecosystem services are becoming popular components in strategies to conserve ecosystems and biodiversity, but their effectiveness remains poorly documented. Here we present counterfactual-based evidence on the conservation outcomes of the pilot stage of Peru’s National Forest Conservation Program (NFCP). The NFCP provides direct payments to indigenous communities in the Amazon, conditional on avoided deforestation and the adoption of sustainable production systems. Using a spatially explicit quasi-experimental evaluation design, we show that the payment scheme has achieved only small conservation impacts, in terms of avoided deforestation. Counter-intuitively, these materialized largely on land not enrolled for conservation, due to spillover effects. Conservation effects on contracted land were negligible because communities were not chosen according to high deforestation threats, and they self-enrolled low-pressure forest areas for conservation. Occasional non-sanctioned contract incompliance contributed to these outcomes. We highlight implications for the design and implementation of up-scaled national conservation programs. Methodologically, we demonstrate the important role of choosing the appropriate spatial scale in evaluating area-based conservation measures.


Introduction
Payments for ecosystem services (PES) are voluntary transactions between services users and providers, conditional on natural resources management rules that generate off-site services [1]. PES may potentially be more direct and cost-effective than traditional conservation tools, such as integrated conservation and development projects (ICDP), and have thus become a popular policy instrument [2][3][4]. Existing PES schemes often target hydrological services, carbon sequestration, and landscape beauty [5]. Payments for reduced emissions from deforestation and forest degradation (REDD+), the second largest source of emissions globally [6], could become an important climate change mitigation strategy [7]. The Paris Agreement encourages developing countries to implement results-based payments such as REDD+to preserve forests and secure non-carbon co-benefits.
And yet, how effective are PES in practice?Many scholars have scrutinized the environmental and social outcomes of PES [8], but few counterfactual-based evaluations exist [9][10][11]. Early results suggested mixed evidence [12]; more research is needed to understand why outcomes differ across programs and sites [5,10,13,14]. Understanding the role of intervention contexts versus scheme design in determining conservation outcomes is an important research gap [12]. This study makes two contributions to address this gap. First, we focus on collective rather than individual PES contracts, designed to conserve communityowned forests-a common institutional arrangement in tropical forests. Second, we provide PES impact estimates at both community and sub-community scales to better account for intra-community spillover effects.
In addition, we contribute methodologically to the conservation impact evaluation literature by estimating effects at two different spatial scales, namely, at the scale of polygons of different sizes, defined by the boundaries of communities, and at grid cells of 225 ha each, located within the communities´polygons. Avelino et al [15] demonstrated a scale effect on impact estimates, resulting from loss of heterogeneity and variation when moving to higher aggregation levels (i.e. spatial aggregation bias). Few forest conservation evaluations have taken this potential source of bias into account [16,17], and thus deserves further scrutiny.
We estimate the environmental impact of a collective PES scheme in Peru, run by the National Forest Conservation Program (henceforth NFCP) in indigenous communities enrolled between 2011 and 2013, using remotely sensed deforestation data from 2001 to 2015 [18][19][20][21]. We use spatial matching techniques [22] to control for self-selection bias and post-matching regression analyses to eliminate unobserved timeinvariant heterogeneity [23]. Our findings indicate positively significant, but marginally sized conservation effects. These accrue outside of self-enrolled community conservation areas, which we attribute to economic and behavioral mechanisms.

NFCP background
In 2012, 51% of total greenhouse gas emissions in Peru originated from deforestation in the Amazon [24], primarily driven by shifting agriculture [25], gold mining [26], and cash-crop plantations such as oil palm [27] and coca (Erythroxylum spp) [28]. Estimates of deforestation suggested an increasing trend [18], with an average of 160 000 ha per year between 2011 and 2016 [21]. As a contribution to climate change mitigation, the Government communicated a zero-deforestation target to the United Nations Framework Convention on Climate Change by 2021 [29]. In 2010, the Peruvian Ministry of Environment created the NFCP 'to contribute to the conservation of tropical forests and the generation of income for the most vulnerable, poor and marginalized peoples' [30] (author's translation). The NFCP seeks to: (i) map forestlands, (ii) promote sustainable production systems, and (iii) strengthen forest conservation capacities [30]. Given the government's lack of experience in paying cash to landholders for not deforesting, conditional 'projects' had to be implemented to provide local compensatory benefits, while also striving to 'green' local livelihoods. This collective PES-cum-ICDP intervention intended to align conservation with poverty alleviation goals, piloted in selected Amazon indigenous communities [31]-some of the poorest population groups in Peru [32]. From the approximately 1300 titled native communities [32] controlling roughly 12 million hectares of forests (figure 1), 50 communities were enrolled between 2011 and 2013 for the pilot phase (table 1). These communities were selected nonrandomly, using criteria ranging from forest conditions to accessibility indices [33], and subsequently applied at two spatial-administrative levels: first, at the province level (second highest sub-national political unit in Peru), and second, at the community level. The logic behind this approach was to first prioritize the provinces with the highest threats of deforestation and then to select communities within them. However, eventually those criteria were not implemented consistently and transparently (SM-Targeting available online at stacks.iop.org/ERL/14/045004/mmedia), leaving room for discretional targeting decisions. Together with the fact that communities voluntarily decide to participate (SM-Engagement), institutional selection created de facto a source of adverse selection bias [34]: as we show under Results, communities with historically higher deforestation were underrepresented in the NFCP.
The NFCP provides collective payments of 10.00 Peruvian Soles (1.00 Peruvian Sol ∼ USD 0.29 in 2015) per year and hectare of forest enrolled under five year contracts, supplemented by technical assistance. The payment is publicly funded and is conditional upon (i) its spending on a collectively agreed investment plan to finance forest-friendly production (e.g. agroforestry, aquaculture, and small animal husbandry), community forest patrolling, and public services or infrastructure, and (ii) the maintenance of forest cover in 'conservation forest zones' (CFZ) that communities define themselves. This community self-selection of land constitutes a second source of adverse selection bias [34]. The remaining land, i.e. 'other use zones' (OUZ), is not subject to land use restrictions and typically contains homesteads, agricultural fields, and secondary as well as primary forests remnants (figure 2). Our empirical strategy seeks to measure the NFCP's impact during its five initial years (2011-2015) on deforestation in the community lands, as a whole, and in both the CFZ (primary effect) and OUZ (spillovers). We use the term 'spillover' to denote that the NFCP's intervention to avoid deforestation, targeted at the enrolled CFZ, may also have indirect yet measurable impacts on the unenrolled subareas of the treated communities (OUZ).

Expected impact channels
The main NFCP's strategy follows a PES-cum-ICDP (payments combined with productive change) logic (see figure S1 available online at stacks.iop.org/ERL/ 14/045004/mmedia), assuming implicitly that capital and technical constraints prevent the adoption of sustainable land use systems [35]. Payments and assistance thus enable 'integrated projects' to provide income streams and compensate for the opportunity costs of avoiding deforestation within CFZ. Economic theory, however, suggests that communities will define areas with low opportunity costs as CFZ [36]: being widely unsuitable for agricultural use, and thus unthreatened, their formal 'protection' provides little if any reductions in deforestation. Moreover, income generation from projects require access to markets and qualified technical assistance, the lack of which has often dampened the success of ICDP [35]. In addition, community forest patrolling could reduce land tenure insecurity and deforestation [37]. However, we expect little forest impact here, due to under-funding of this component (annually only  3-4 paid-for patrolling rounds). Similarly, social investments (e.g. improving school infrastructure) are in-kind payments with disputed linkages to deforestation [38], which we expect to be marginally influential on land-use decisions. We consider two rival explanations of conservation outcomes in CFZ and, by way of spillover effects, in OUZ. First, project implementation (e.g. tree-crop plantations, alternative livelihood investments) will directly increase labor demand in the short run, thus reducing labor available for traditional land-use activities, and thus mitigating deforestation pressures. Second, awareness in participating communities of NFCP's forest monitoring could produce short-run behavioral changes: in order to avoid upfront conflicts and please implementers, communities consciously curb deforestation, i.e. the so-called Hawthorne effect [39]. Data limitations prevent us from explicitly testing for these alternative impacts, but insights below from a dynamic treatment effect analysis provide some supporting evidence.

Data
We created two datasets, one for community polygons and one for cells. The first is comprised of 992 communities (50 treated and 942 non-treated), for which we could gather geographical, biophysical, and socioeconomic attributes (table S2 in the SM). The second only includes cells that partially or fully overlap with polygons representing community lands, including those in CFZ and OUZ (figure 2), and that present a deforestation risk larger than 1% (SM-Forest risk model). We developed a deforestation risk model to cope with the fact that the distribution of the observed annual deforestation is skewed towards zero and thus to focus only on cells with a minimum risk of 1% of having been deforested between 2001 and 2010. We fit a logistic regression model, where the outcome is measured as the fraction of a cell deforested between 2001 and 2010, and covariates include: population, number of houses, area of coca plantations, number of population centers, community area, slope, precipitation, distance to protected areas, forest loss density in 2010, distance to population centers within communities, internal distance to the community's boundaries, share area of forest in 2010, spatially lagged biomass, and distance to deforestation outside communities in 2010. Using the fitted values of the model, we discard all cells with a fitted value smaller than or equal to 0.01. This effectively reduced the number of total cells and treated cells with no deforestation in any year between 2001-2015 to only 11%. Only the trimmed cell-dataset is used for impact evaluation.
We define the outcome variable as the total area (ha) of annual gross deforestation within community lands for both units of analysis. We used a deforestation dataset, covering 15 years (2001-2015), within the Peruvian Amazon [18][19][20][21], and define deforestation as the complete removal of forest cover from a Landsat pixel [18].

Empirical approach
We use a quasi-experimental approach that combines matching and double difference regression models to estimate the average treatment effect on the treated (ATT). As a pre-processing approach, matching reduces the selection bias due to the non-random selection of enrolled communities and CFZ, an approach that is increasingly used to measure conservation outcomes [36,[40][41][42][43][44][45], including in the context of the Peruvian Amazon [32,45,46]. We measure effects at different spatial scales in both enrolled and non-enrolled community lands (SM-Treated units). We start by comparing outcomes at the scale of decision units using spatial polygons and cells (225 ha each) of treated and untreated-communities (SM-Cells and table S1). We chose an aggregated area of 225 ha because the minimum sizes of (i) treated community polygons and (ii) CFZ-polygons are 2700 ha and 1500 ha, respectively. Thus, there are at least approximately 12 and 6 units covering each of these polygons. We believe that aggregating the outcome variable from 0.09 to 225 ha improves the representation of the land use decision unit and reduces potential bias arising from spatial autocorrelation [15,47].
This approach, however, aggregates over intracommunity effects, given that payments restrict deforestation only in CFZ. Avoided CFZ deforestation could be partially outweighed by added OUZ deforestation (negative spillover, i.e. leakage). Alternatively, participants could shift resources to fulfill the NFCP's rules and thereby reduce deforestation in OUZ (positive spillover). Thus we also estimate the ATT in CFZ and OUZ cells, separately. For this, we model at the cell level which areas of non-participating communities would have most likely been selected as CFZ and OUZ from which to withdraw a control group (SM-Modeling untreated CFZ and OUZ).
We use one-to-one nearest neighbor matching with replacement to find a control group from the pool of untreated communities and cells (SM-Matching), using the Genetic algorithm [48] of the R Match function and the Mahalanobis distance for polygons and cells, respectively, and the exact distance for the Department identifier in both. We use a set of covariates from a pre-treatment data set of geo-biophysical, land-use and land-cover, infrastructure, and socioeconomic variables (table S2 and Covariates in the SM). These covariates include 1 : (i) biophysical: elevation † (m), slope † (°), above ground live woody biomass † † (Mg/ha), temperature † (°C), precipitation † (mm), distance to rivers † (m); (ii) infrastructure: distance to roads † (m), accessibility † † (index), distance to district's capitals † (m), distance to population centers † (m); (iii) land use/land cover: forest cover area in 2010 † † (%), deforestation density in 2010 † (ha km -2 ), com-munity´s total area † † † (ha), distance to deforestation outside communities † † † (m), distance to protected areas † † (m), internal distance to community´s boundary † (m), deforestation risk † ; (iv) spatial lags † † † for: deforestation (ha), forest cover area in 2010 (%), slope (°), above ground live woody biomass (Mg/ha), elevation (m); (v) socioeconomic: density of coca plantations † † (ha km -2 ), years passed since communal land titled † † † (years), population † † (person), number of houses † † † (house), access to drinking water † † † (%), access to electricity † (%), population centers within a community † † (center), per capita income † † (PEN), human development index † (index), total poverty † † (%), and extreme poverty † † † (%). These covariates are likely to affect both the selection of participating communities and deforestation, but are not affected by the treatment [53].  where Y ctd is the annual deforestation in each community c, in year t, located in department d (highest subnational political-administrative unit in Peru). D ct is the treatment indicator at the community level, turning positive in year t when a community is enrolled (equation (1)). Enrolled communities become a treatment indicator larger than 0 and up to 1 in the year of enrollment, depending on the number of treated months during that year. Thereafter, the indicator is 1 if the community is still treated. Similarly, Y ictd is the annual deforestation in each cell i, located in community c, in year t, in Department d. D ict indicates treatment at the cell level using the treated share of cell i in year t (equation (2)). Cells within communities have a share of one, and cells at the margin of community borders have a share between 0 and 1. X′ is the vector of time-invariant pre-treatment characteristics, such as slope. These variables can influence the deforestation trend, and are therefore interacted with time indicator t. Out pres _ it is the average distance from community i in year t, to deforestation patches (>1 ha) located outside communities, and represents external deforestation pressure. In equation (2), WZ¢ represents a vector of spatially lagged covariates weighted by a standardized queen contiguity matrix (W) [22]. δ, γ, and λ are coefficients to be estimated.
Year fixed effects, denoted by , t j control for yearly factors influencing all units of analysis equally, such as, policy changes to the Peruvian forestry rules [32]. Individual fixed effects ( c a or i a ) represent the individual unobserved time-invariant heterogeneity (e.g. soil quality). d w is introduced to capture department-specific forest conservation efforts. Finally, u cdt or u icdt denote the idiosyncratic errors [54].
Taking first differences (FD) of equations (1) and (2) eliminates all time-invariant unobserved heterogeneity ( c a and i a ), which could have biased our estimates (SM-Specification Tests). The coefficient β represents the ATT on annual deforestation changes for all years after treatment. For communities, we estimate the FD of equation (1): (2):

For cells we estimate the FD of equation
By comparing fixed and random effects models using the Hausman test [55], we rejected the null hypothesis (p<0.01) in all cases, thus indicating that a fixed effects model was more appropriate. Given that treatment assignment occurs at a higher level than the cell, namely, at the community level, we cluster standard errors of b at the level of the community polygon to avoid inconsistent variance-covariance matrices due to heteroscedasticity and autocorrelation in the error terms [56,57] (see SM-Specification tests).
In addition to the overall effect across 2011-2015, we also estimate the effect of the NFCP over the years after enrollment by replacing the treatment variable in equations (3) and (4) with five new treatment variables, each one representing year zero through year four after enrollment. The estimated coefficients for each of these variables are interpreted as the average effect of the NFCP after t years of enrollment [58].
Used models are presented in SM-Effect over time.

Matching
After matching, balance between treated and nontreated groups improved in almost all covariates for all levels of analysis (tables S3-S6 in SM). We use the normalized difference, the mean difference in the standardized empirical-QQ plot (as proposed by [59]) and the mean difference between treated and control groups to assess covariate balance after matching. For a few time-invariant covariates normalized differences remained above the rule of thumb of 35% even after matching [60]. This potential source of bias is addressed by using the fixed effects model to estimate treatment effects [61].

Main results
Annual averages of deforestation between 2001 and 2015 of all treated communities, non-treated communities, and the matched control group increase over time ( figure 3). The NFCP predominantly selected communities with relatively lower deforestation threats. Figure S2 in the SM confirms this by comparing three alternative measures of pre-treatment deforestation levels among the 50 treated communities and top-50 non-treated communities. Trends were similar at the cell level, where treated cells, CFZ-cells and OUZ-cells, and their corresponding matched control groups, exhibited increased deforestation (figure 3). Non-zero, but low levels of deforestation in the CFZ of treated communities (figures S3 in the SM) also suggest mild levels of noncompliance in most cases. In eight communities, however, the percentage of the deforested area relative to the area of the CFZ exceeds the threshold of 0.3% [62], above which a community is allegedly to be evicted from the NFCP. Nonetheless, from these eight communities, only two were expelled in 2014, the rest remained enrolled. Another eight communities were also evicted (table 1), but due to causes not related to deforestation in their CFZ (e.g. non-compliance with their investment plan).
In addition, we observe that there are substantial differences between the CFZ and the OUZ of participating communities regarding characteristics that could affect both the selection of an area as a CFZ and deforestation outcomes. These include: slope, elevation, distance to rivers, distance to population centers within the community, deforestation previous to the start of the NFCP (2010) and deforestation risk (table 2).
The ATT of the NFCP at community scale is statistically significant and negative (table 3, column 1) at polygon scale (equation (3)), but insignificant at cell scale (table 3, column 2, and equation (4)). However, our results in columns 1 and 2 might have been affected by the independent effects occurred within CFZ and OUZ leading to a relatively less precise ATT [15]. Therefore in column 3 and 4 we present the independent effects of the NFCP within CFZ and OUZ, respectively. Assessing intra-community effects (equation (4)), we only find statistically significant and negative effects in OUZ, not in CFZ (table 3, column 5). This implies that the NFCP may have avoided deforestation within OUZ-cells by an average of 0.4±0.2 ha y −1 (Mean±SE), in every subsequent year after treatment. This estimate represents a total of 557 ha (considering SE: 59-1, 056 ha) of avoided deforestation between 2011-2015, corresponding to a 5.8% reduction (0.61%-11.1%).

Conservation effects over time
In addition to the overall effect for each scale of analysis, we also explore how the effect evolves over time ( figure 4). When analyzing community effects, we find statistically significant and negative ATTs only in the first year after enrollment at the polygon (−8.5±3.5 ha) and cell scales (−0.27±0.13 ha). When analyzing intra-community effects at the cell scale, we only find significant and negative ATTs in the second (−0.21±0.1 ha) and first years after enrollment (−0.45±0.17 ha) for CFZ and OUZ cells, respectively.
These results do not change our previous conclusions regarding overall effects [58], but provide additional clues to understand the potential impact channels. The fact that significant effects throughout are only present in the initial years, dissipating thereafter, suggests that the NFCP might have induced a behavioral change, but probably only for a short period.

Discussion
We provide new evidence on the effectiveness of a collective PES-cum-ICDP scheme in indigenous communities in the Peruvian Amazon. We assess impacts at two different spatial scales, using a quasiexperimental approach with a 15 yr panel of deforestation. We show that the use of polygons or cells affects the significance of the ATT. This effect has only recently received attention in conservation policy evaluation and appears to stem from well-known challenges in geospatial statistics [15]. Avelino et al [15] found that conservation effects in Mexico increase and become less precise with aggregation and that large units of analysis could generate biased estimates when treatment is coarsely measured. Hence, we believe that measuring the effect of the NFCP using highly aggregated units of analysis, such as polygons (mean=12 000 ha), which are defined as treated mainly with a binary variable, could bias the ATT. Consequently, it is key that we also estimate the NFCP impact at the cell level for the entire community, so as to have a 'second opinion' at a lower level of aggregation. In doing so, we found no statistically significant result at the cell level and thus conservatively conclude that there is no robust evidence for NFCP impact if the entire community is considered. Further analyses including a broader spectrum of spatial scales and larger numbers of treated communities may be needed to further explore which spatial scale is more suitable.
To explain our finding we note that deforestation in participating communities (as well as within their separate CFZ and OUZ) has not been halted after the start of the NFCP. This indicates partial non-compliance and failures in the NFCP´s monitoring and enforcement capacity. This finding of deficient enforcement of conditionality is not uncommon to PES schemes around the world [63].  Given that communities' conservation agreements with the NFCP only include a subset of community land, we also explored impacts in CFZ and OUZ separately. Counterintuitively, we only find a small but significant conservation effect in the noncontracted OUZ.
We attribute the program's lack of impact in CFZ to adverse selection at two levels: first, having targeted communities with already low deforestation rates (i.e. an adverse administrative selection) and second, selfselection bias allowing communities to enroll widely unthreatened forests. This is a problematic issue in many conservation programs [12,44,64] and can be avoided by adopting appropriate targeting criteria [65] and enrolling total community area [66].
Even if the NFCP had adopted and appropriately implemented targeting criteria, we do not expect that this would have per se led to a much better outcome. Ultimately, adverse self-selection bias at the community level is likely to be the main reason for the low effectiveness of the NFCP.
Why, then, would we find a negative (proconservation) ATT in non-contract areas?We point to two potential causal mechanisms that may complement each other (figure S1 in SM). First, participating community members know that the NFCP wants to see forest conservation within the whole community area. When a program is recently started, this might produce a so-called 'honeymoon' or 'Hawthorne' effect [39]: communities adopt short-lived conservation measures to honor the goals of their contract partners, but this effect dissipates over time. Second and in the short-run, increased labor demand for project implementation may have mitigated recurrent deforestation pressures, e.g. the opening up of new agricultural fields. Most communities (N=36) invested the bulk of their received payments in adopting agroforestry on abandoned lands-usually a labor-intensive task [67,68] implemented in their OUZ. If deforestation was constrained temporarily through this mechanism, it was a direct result of NFCP transfers and could thus be labelled a positive economic spillover effect.
To address the above-mentioned sources of bias in program design, the NFCP should adapt and test additional selection criteria, giving more weight to the targeted enrollment of threatened forests at the community level. Otherwise, adverse selection biases will continue to jeopardize conservation outcomes [65]. Specifically, we propose the following measures for an up-scaled program design: (i) Pre-target communities with higher deforestation threats, (ii) offer voluntary PES contracts that cover the whole community area, (iii) and ensure the conditionality of payments.
Earlier impact assessment work suggests that the trade-off between boosting additionality of enrolling the highest threatened communities and the opportunity costs of implementing the NFCP in such communities is manageable [31]. However, some high-threat communities may decline participation due to negative (real or perceived) welfare effects [34], so future research should also explore motivations of participation, adopt both monetary and non-monetary approaches to cost-benefit analyses [69,70], and consider program implementation costs [63].
Notably, a PES scheme that effectively curbs forest loss in indigenous communities could affect traditional productive and cultural activities or jeopardize food security [71]. We thus stress that participation in the scheme must remain genuinely voluntary and emphasize that redesigning the scheme as suggested above does not interfere with use and access rights. It merely ensures that communities are being flexibly compensated according to how much deforestation they are able and willing to avoid.
In closing, we recognize that, in addition to the small conservation effects we found, the NFCP may have delivered other important benefits to recipient communities, e.g. in terms of social services and economic development that may justify the program's average annual budget of 3.9 million USD. Many public services remain precarious in Amazon indigenous communities [72]. However, from a conservation point of view, we point to a large potential for boosting impacts through improved design and implementation.

Acknowledgments
We thank the Robert Bosch Foundation, the German Academic Exchange Service (DAAD), the Federal Ministry of Education and Research (BMBF), and Norad for financial support; the GIZ-CBC project for financial and technical support; the NFCP for logistics support and provision of databases; the UNODC for providing data on coca cultivation areas; A Calderón for research assistance; and J Miranda, J Schielein, and H Rosa for helpful suggestions. All tabular datasets and codes for matching and regression analyses are available from the corresponding author upon request.