Evaluating the impact of private land conservation with synthetic control design

Programs to protect biodiversity on private land are increasingly being used worldwide. To understand the efficacy of such programs, it is important to determine their impact: the difference between the program's outcome and what would have happened without the program. Typically, these programs are evaluated by estimating the average program‐level impact, which readily allows comparisons between programs or regions, but masks important heterogeneity in impact across the individual conservation interventions. We used synthetic control design, statistical matching, and time‐series data to estimate the impact of individual protected areas over time and combined individual‐level impacts to estimate program‐level impact with a meta‐analytic approach. We applied the method to private protected areas governed by conservation covenants (legally binding on‐title agreements to protect biodiversity) in the Goldfields region of Victoria, Australia using woody vegetation cover as our outcome variable. We compared our results with traditional approaches to estimating program‐level impact based on a subset of covenants that were the same age. Our results showed an overall program‐level impact of a 0.3–0.8% increase in woody vegetation cover per year. However, there was significant heterogeneity in the temporal pattern of impact for individual covenants, ranging from −4 to +7% change in woody vegetation cover per year. Results of our approach were consistent with results based on traditional approaches to estimating program‐level impact. Our study provides a transparent and robust workflow to estimate individual and program‐level impacts of private protected areas.


INTRODUCTION
Significant losses of biodiversity occur on private land, which covers a large portion of terrestrial ecosystems and contains substantial amounts of threatened biodiversity (Stolton et al., 2014).To address this, private protected areas (PPAs) are gaining prominence as a conservation instrument (Fitzsimons, 2015;Mitchell et al., 2018) and are being used in conjunction with government-owned protected areas to meet national protected area targets.The International Union for the Conservation of Nature defines PPAs as protected areas under private governance, where that governance is controlled by entities, for example, individuals, nongovernmental organizations, private companies, or research entities (Mitchell et al., 2018).As of 2018, over 30 million ha of PPAs have been established worldwide (Palfrey et al., 2022).Given the growing importance of PPAs in protecting biodiversity, it is crucial to understand the impact they are having on reducing biodiversity loss.Impact is measured as the difference between observed biodiversity outcomes with protection and the counterfactual outcomes.Evidence of impact informs policy and practice, improves the design and outcomes of PPAs, holds stakeholders accountable, and allows for effective use of resources.The small number of studies examining the impact of PPAs tend to shows they reduce vegetation loss in, for example, South Africa (Shumba et al., 2020) and the United States (Nolte et al., 2019).
Typically, conservation programs, including protected areas, are evaluated by estimating their average impact (Ribas et al., 2020).The average impact is useful because it generalizes the overall impact across the population of PPAs, making it easy to communicate to stakeholders, and it is less susceptible to bias due to averaging the effects of individual differences within the units in the 2 comparison groups.However, average impact masks important heterogeneity in the response of the individual units to protection, which could be due to the differences in the PPA design (e.g., with and without fencing), landholder compliance to the policy and management activities (e.g., restoring ecosystems, weed control), or the location (e.g., established in areas of high background clearing).In these situations, evaluating impact at the individual unit level is useful because it helps identify units that are performing well and those that need improvement.This helps stakeholders understand differences and adjust their management accordingly, including developing tailored management plans and making informed decisions about resource allocation for PPAs.Sometimes, it may be important to estimate impact at the individual level, such as providing evidence for biodiversity or carbon credits assigned to landholders.Thus, there are a range of benefits to evaluating the impact of PPAs at the program and individual levels.
To understand the effectiveness of PPAs at a program or an individual level, a formal process of impact evaluation can be undertaken.Under a counterfactual framing, impact of PPAs can be estimated by comparing the observed outcomes of the PPAs with a hypothetical scenario estimating what the outcome would have been without protection (Morgan & Winship, 2014;Rubin, 2005).This counterfactual is estimated by finding unprotected locations (referred to as control sites or untreated units) that are as similar as possible to the PPAs (treated units) in terms of relevant characteristics that also influence the outcome (Stuart, 2010).However, the location of PPAs is driven by a range of factors; conservation organizations target areas with conservation value, whereas landholders have a range of motivations to implement PPAs on their properties, from being sympathetic to conservation to being driven by financial incentives (Mitchell et al., 2018;Selinske et al., 2015).When these criteria create systematic differences between the PPAs and control locations, and if they also affect the outcome, they are referred to as confounders.Confounders distort the measure of impact attributed to protection.Identifying and controlling variables that cause confounding is a challenging task and various methods have been developed to address this issue (Ferraro & Hanauer, 2014;VanderWeele, 2019).In conservation science, there has been significant progress in the use of a range of statistical approaches for reducing confounding effects, notably the conventional approaches of matching and difference-in-difference (DID) methods (Baylis et al., 2016;Ferraro & Hanauer, 2014), which are not suitable for estimating individual-level impact (i.e., for individual PPAs) because they were designed to estimate impact at the program level.There are also ways to control for confounding effects that could benefit conservation evaluations.Causal inference science is rapidly advancing with cutting-edge designs that use statistics, machine learning, and predictive modeling techniques to calculate average, conditional average (Athey & Wager, 2019), and individual-level impact estimates (Bolton et al., 2022), while offering greater potential for dealing with the confounding effects (Morgan & Winship, 2014;Sharma & Kiciman, 2020).One such powerful approach that is useful for estimating the impact of individual units (when time-series data are available) is synthetic control design.In this approach the weighted combination of control units (the synthetic control) provides a better comparison for the treated units compared with any single control unit alone (Abadie & Gardeazabal, 2003).Over the years, other data-driven modeling frameworks have been developed to construct synthetic controls (Abadie & Cattaneo, 2021), including time-series analysis (Brodersen et al., 2015;Xu, 2017), factor models, and matrix completion methods (Athey et al., 2021).
Synthetic control design uses the preintervention time series of the outcome variable, which can capture the influence of time-varying confounders and thus reduce bias (Abadie & Cattaneo, 2021).The assumption is that units with similar observed and unobserved determinants of the outcome variable will produce similar trajectories of the outcome variable over time.This is particularly advantageous in impact evaluation in conservation, where sociopolitical and ecological factors interact in a myriad of ways to drive the outcomes (Adams et al., 2019), and many of these factors are difficult to capture in the model.For example, with PPAs, a landholder's attitudes and behaviors can be an important variable, but gathering this data is difficult and expensive, so it is often an unobserved confounder-a potential covariate that cannot be factored in the model.However, a similar preintervention trajectory of vegetation cover for the PPAs and their synthetic control may indicate similar landholder behavior.Although it is difficult to verify this assumption, it is relatively more reasonable to assume this in synthetic control designs because they use a weighted combination of trajectories to generate a trajectory that is nearly identical to that of the intervention unit, as opposed to DID designs, which are based on the assumption that trajectories are identical (i.e., the parallel trend assumption).Additionally, because synthetic control designs require the use of time-series data, they inherently capture the temporal trends, which provides the potential to understand when and for how long the effects last.Temporal trends can also be used to analyze the change in the impact coinciding with policy or climatic changes.For example, they can be used to determine when, over the course of their establishment, PPAs had an impact.
We developed a flexible and transparent workflow in which synthetic control design is used to evaluate the program and the unit-level impact of conservation covenants, a type of PPA, on changes in woody vegetation cover in the Goldfields region of Victoria, Australia.Woody vegetation cover is used as the outcome variable because it is readily available as time-series data and is commonly used as a surrogate for biodiversity (Grantham et al., 2010).Conservation covenants are legally binding perpetual agreements, similar to conservation easements in the United States, that impose restrictions on land use to protect biodiversity, including reducing habitat loss and degradation and supporting active land management activities, such as restoration and revegetation (Fitzsimons & Wescott, 2001).
The Goldfields region contains covenants that are some of the oldest PPAs in Australia (up to 30 years old).Over 90% of these covenants were voluntarily implemented by landholders; however, it is likely there is some variation in the approach landholders take in managing their covenants (D.Robinson, personal communication).Therefore, they can be considered a class of intervention rather than a single homogeneous intervention.The use of synthetic control design is an appropriate method to analyze their impact due to this variability.We constructed a synthetic control unit with untreated units that we matched to each covenant based on a set of covariates and the preintervention relationship between their respective time series of woody vegetation cover change.We used the synthetic control design to explore how the impact changed over time from when each covenant was implemented.We also compared our results with more traditional approaches used to measure average impact.

Study area
The Goldfields region north of Melbourne in the state of Victoria, Australia (Figure 1a).Victoria has one of the highest levels of land conversion among Australian states; approximately 66% of native vegetation has been converted since European colonization (Department of Environment & Primary Industries, 2013).The Goldfields region is among the most heavily modified areas in the state due to forest clearing (mainly box ironbark) during the gold rush years of 1851-1870 (Department of Environment & Primary Industries, 2013).However, the extent of woody vegetation is now increasing in the Goldfields (Geddes et al., 2011;Lunt et al., 2010).Our previous analyses of the woody vegetation cover data for the region show an increase in woody vegetation cover on private land parcels in the region (Figure 1b).Extensive areas of cleared land have regenerated naturally.In some areas the rate of regeneration is much faster than intentional plantings or where fencing schemes have been implemented (Geddes et al., 2011).These changes have been attributed to a range of factors, including changing land use, from agricultural to rural residential development, and changes in disturbance regimes (Lunt et al., 2010).
The Goldfields region has among the oldest and highest densities of covenants in Australia, and the covenanting trend is increasing (Figure 1c).As of 2020 when we undertook these analyses, there were 204 covenants in the region.We restricted our analyses to covenants established after 2000 so that there would be sufficient preintervention time series of woody vege- tation cover to estimate counterfactuals via the synthetic control method.We also removed covenants <5 years old because sufficient time may not have elapsed for an effect to be detectable.We also removed covenants (and potential control sites) that had over 10% of their woody vegetation burned by wildfire during the study period (Appendix S1).We removed fire-affected parcels because it is difficult to model the dynamic nature of the response of vegetation to fire.In total we analyzed 149 covenants.Data on private land conservation covenants were supplied by Trust for Nature, a statutory body that undertakes private land conservation in Victoria (Trust for Nature, 2022).

Modeling approach
The synthetic control designs we implemented require timeseries data of the outcome variable for the treated and the untreated units.The time-series data used for the outcome variable in our analyses depicted the annual presence of woody vegetation at yearly intervals from 1988 to 2020.These data were derived from the National Forest and Sparse Vegetation Dataset, which contains the Landsat satellite imagery from 1988 to 2020 at 25-m resolution (Department of Industry, 2020) (Appendix S1).Covenants may cover all or a part of a parcel.In our study, over 82% of the covenants occupied the full parcel on which they were located.For the covenants that occupied a proportion of the parcel on which they occurred, we used the covenanted proportion of the parcel as the treated unit so to identify control parcels, we searched for parcels similar to the boundary of the covenant, not the boundary of the parcel on which the covenant occurred.Private land boundaries came from the Victorian Land Use Information System 2006/2007 data set, which is the Victorian Government's Cadastral Area Boundary data set (Appendix S1).From this data set, we selected only private land parcels >1 ha because all covenants included were at least 1 ha.Combining these data with the woody vegetation cover data, we determined the percentage of each private land parcel covered in woody vegetation and produced a time series for each parcel depicting the percentage of woody vegetation cover annually from 1988 to 2020.
Because our outcome data were the percentages of parcels (or covenants) covered by woody vegetation (i.e., bounded between 0% and 100%), we restricted the counterfactual prediction (and its error distribution) to be within this range.To do this, we transformed the outcome data before prediction of the counterfactual to a logit scale and then back transformed to the original scale after the prediction (Hyndman & Athanasopoulos, 2021).
To control for the potential of spatial spillover effects from the covenants, land parcels within 1 km of each covenant were excluded from selection as control parcels.These spillover effects are often complex and may vary depending on the specific characteristics and location of each protected parcel, making them difficult to evaluate (Pfaff & Robalino, 2017).Because we lacked data to track landholder ownership and assess these effects, we opted to remove the adjacent parcels.This approach is similar to other Australian studies (Sacre et al., 2020).We also did not select as controls parcels affected by wildfire.Although planned fires for fuel reduction occur mostly on public land, those on private land are small and not easily detectable through Landsat imagery, so they do not affect the outcome variable.However, wildfires are significant disturbance events that could significantly affect the outcome variable.We therefore removed them from the analyses.After these exclusions, there were approximately 45,500 parcels that could be used as control parcels (described in Appendix S1).
For the impact evaluation we found a set of matched control parcels for each covenant (the matched group) that had similar preintervention trend and covariate values; identified visualized diagnostics of preintervention outcome trends (i.e., manually checking whether the covenants and their controls had similar trends) to check the relationship of the covenant outcome and its matched controls; used the outcome time series of each parcel in the matched group in a Bayesian structural time-series (BSTS) model to generate a synthetic control and counterfactual outcomes over time; used the difference between the outcome of synthetic control and the observed outcome of the covenant to estimate impact over time; determined the point estimate of annual impact for each covenant; and aggregated individual impacts to derive a program-level estimate of impact with metaanalysis (Figure 2).Using time series for the impact evaluation allowed us to generate a plot of continuous impact changes over time for each year the covenant was implemented.

Matching to determine control groups for each covenant
Because there were large untreated units in our study area, we screened the units to limit the control pool to high-quality matches that had characteristics similar to the treated units (Abadie, 2021).Constructing a control group before using the time series to construct the synthetic control offered additional control for confounding effects.For example, the preintervention time series of woody vegetation in a parcel with a rocky soil type (which limits vegetation growth) may look similar to a parcel with a more fertile soil type (which supports good vegetation growth) but has not had suitable climatic conditions supportive of growth.If a covenant is established in the parcel with fertile soil and if it coincides with good climatic conditions, then using the parcel with rocky soil type in constructing the synthetic control will bias the results because the postintervention difference will be affected by climate.By matching the soil type before constructing the synthetic control there is explicit control of the confounding effects due to a difference in soil type.
Thus, as a preceding step to constructing the synthetic control, we used statistical matching to develop a control pool for each covenant with land parcels that had similar biophysical characteristics.We matched the static biophysical characteristics and the preintervention outcome trend of woody vegetation cover.We developed a theory of change model to examine the factors that may influence covenant enrolment and vegetation cover based on the peer-reviewed and gray literature, consultation with colleagues working in private land conservation in the study region, and our own expert judgement.We included the biophysical variables slope, elevation, vegetation density, distance to roads, depth to hard rock, soil organic carbon, soil water availability, density of rivers, and area of land parcels that determined the land's suitability for different land uses, for example, agriculture or development (Appendix S1 contains data sources and Appendix S2 has details on the theory of change).Other unobserved covariates, such as landholder characteristics, are discussed in theory of change model (Appendix S2).
We used nearest neighbor matching with the Mahalanobis distance method to determine the control groups (Stuart, 2010).Nearest neighbor matching selected control land parcels for each covenant that had the smallest distance from covenant's set of covariate's values, where distance is measured as the scale-free Euclidean distance in the case of the Mahalanobis distance.To select multiple controls within the area of common support (i.e., range of values or conditions) for each covenant, we set a caliper size on the Mahalanobis distance.The caliper sizes determine the quality of the match.Low caliper values give better matches but fewer controls to use for synthetic control.Although there is no set rule for selecting appropriate caliper size, our aim was to find a balance between having a good match and having enough parcels in the control pools.To explore this, we ran the model with 4 different caliper sizes on the standardized mean difference of each covariate (i.e., 0.6, 0.8, 1.0, 1.2) to test the sensitivity of our estimates.

Constructing a synthetic control and estimating impact
After completing the matching, we had our covenant parcel with a control group of covenants and their respective time series of the proportion of woody vegetation.Each covenant and its respective control were modelled to construct a synthetic control with a BSTS framework with (SSG, using) the package CausalImpact in R (Brodersen et al., 2015).A BSTS is a statespace model based on the notion that the time series observed (y t ) is an incomplete and random function of some unobserved process called the state space (Petris, 2010).We used this Bayesian framework because it is flexible when modeling complex time-series patterns and is more informative because it uses the probability distribution to represent uncertainty in the estimates.Moreover, the model assumptions and outputs are easy to understand (e.g., in Figure 2, the similar preintervention trend of the treated and the control units and the depiction of the evolution of the counterfactual are intuitive).
The approach combines information from the trend of the pretreatment time series of the covenant, the time series of the controls before the treatment, and prior knowledge about the model parameters to construct a synthetic control.Structural time-series models can have different forms.We used a local linear trend model.The data-generating process in these models is divided into observation equations and process equations.The observation equation represents how the state is realized by the observed data: where observed data comprise the pretreatment trend of the woody vegetation cover at the location where the covenant will occur.The treated series at time t (y t ) comes from an unobserved variable called the state component (u t ), the regression component  ′ t x t , and a noise term ( t ).The regression component captures the linear relationship between the outcome series of covenants and control parcels, where  is the regression coefficient and x t is the vector of values from controls at time t.In Bayesian analysis, the spike-and-slab prior is a commonly used method for variable selection (Ishwaran & Rao, 2005).It involves assigning a probability that centers around 0 (referred to as the spike) to determine whether the coefficients of vari-ables should be included in or excluded from the regression.If a variable is included, a Gaussian-shaped prior is used to regularize the model (called the slab).This technique helps identify which controls to use and their allocated weights, preventing model overfitting (Brodersen et al., 2015).We used static regression because our matched data sets showed stable relationships between the treated and the control units before the intervention year, and we assumed this relationship would be stable in the postintervention period.The noise term ( t ) is assumed to be normally distributed with mean 0 and variance of  2  .Beta distribution is a suitable model to capture the random behavior in bounded data, such as proportion or percentages; however, at the time of this study there was no option to specify such distribution in the CausalImpact R package 1.2.7.To model the error terms as a normal distribution, we applied a scaled logit transformation to the outcome data.
The process equation represents the temporal evolution of that state: Equation 2 shows the local linear trend model, where the state variable u t +1 is a function of local level (u t ), trend ( t ), and a noise term which is normally distributed with mean 0 and variance  2  .Equation 3shows the trend  t +1 is a function of trend in the previous time step ( t ) and the noise term ( t ) which is normally distributed with mean 0 and variance  2  .In the BSTS model, the state space is modelled instead of the observed data, thus the parameter of interest is the variance ( 2  ) of the error term ( t ) in the local linear trend model.Because it is a Bayesian framework, a prior should be specified for this term.We used the default value of 0.01, as suggested when there is a strong relationship between the treated and the control units (Brodersen et al., 2015).The strength of the preintervention relationship was established through matching and confirmed visually (e.g., Figure 2, step 2).
The posterior distribution of the synthetic control was then predicted by combining the preintervention treated outcome and postintervention outcome of the control series.The difference in the observed time series of the covenant and the predicted time series of the counterfactual, yielded the impact time series of the intervention.

Aggregating individual effects with meta-analysis
Although the impact estimates for each covenant are useful, it is often of interest to get an overall effect of the program.In such cases, once one has the estimates of impact for each individual covenant, one can synthesize those estimates to determine the aggregate impact of all covenants in the region (i.e., the average treatment effect on the treated) with meta-analysis.Meta-analysis is commonly used to synthesize an effect size from a series of separate studies (Borenstein et al., 2021).It is used to calculate a weighted average of the effects of each treatment by assigning higher weights to effects with lower uncertainty.Thus, in our case, in which we have effect estimates from multiple covenants with their own uncertainties, this meta-analysis approach is an appropriate method to aggregate the effects.Because the covenants were implemented in different years, we used a normalized estimate of impact per year (i.e., total impact the covenant had during its lifetime divided by the age of the covenant) to estimate an average impact per year.In aggregating the effects this way, we may have lost details about the patterns of impact, but it still provided a meaningful overview of the program because the aggregation was based on the assumption that impact accrues over time.Because our covenants were not completely homogenous and differed in subtle ways, we used a random-effects meta-analysis, which is based on the assumption that there is a distribution of true effect sizes rather than a single true effect size.Thus, the random-effect model estimates the mean of the distribution of the true effect.
We used the Bayesian approach for the meta-analysis and assumed that the observed effect size of a single covenant ( θc ) follows a normal distribution (N) of the true effect size (  c ) and variance ( 2 c ) (Equation 4).
Further, the true effect size of each covenant (  c ) is only a sample from a distribution of true population effect size which has a mean () and variance ( 2 ) that result from the heterogeneity in the covenants (Equation 5).The mean of this distribution () is the program-level effect of the covenants: Equation 6 indicates that  and  2 have a prior distribution p(.) and Equation 7constrains the variance between covenants to be >0: and To prevent the model from overfitting and achieve better convergence, we specified weakly informative conservative priors.This is generally considered a good approach in Bayesian meta-analysis (Williams et al. [preprint], 2018).Specifically, we assumed a normal distribution for µ and a half-Cauchy (HC) distribution for  2 .We conservatively assumed that the prior distribution for true effect has a mean of 0 and variance of 4,  ∼ N (0, 4) and that the heterogeneity in effects is  2 ∼ HC(0, 0.5).
Given that we had some idea of the distribution of the effect from estimates of individual covenants, it is a reasonable assumption that indicates there is a 95% probability that the true effect (i.e., vegetation cover change per year) lies somewhere between −4% to 4%.However, we also checked the sensitivity of the estimates by using 2 different priors for the effect:  ∼ N (0, 2) and  ∼ N (0, 10).Similarly, the positive distribution of  2 with peak of 0 and scaling of 0.5 is a conservative estimate, signifying that there is a higher probability of small differences in the effects of different covenants, but it is also likely that there are huge differences (the heavy tails controlled by the scaling parameter).Once we ascertained the posterior distribution, we calculated the mean of the distribution as the effect size and the highest posterior density (HDI) of the distribution as the 95% credible interval (Hespanhol et al., 2019).Bayesian meta-analysis was implemented using the brms package in R (Bürkner, 2017).Model convergence was checked using Rhat statistic, trace pots, and posterior predictive checks (Appendix S3).Data and code to run the analyses and generate the results of this paper have been provided (see Appendix S6).
To compare the results from our approach with more traditional matching methods, we conducted 2 separate analyses to estimate the average impact: standard matching with mean difference and matching with difference in difference (details in Appendix S5).However, because matching is rarely used to analyze time-series data, we used only static covariates for matching (i.e., without matching on the preintervention trend).Because these methods were used to estimate an average treatment effect, we assumed that all the interventions were homogeneous, which implies the covenants were of the same age and involved the same landholder actions.Thus, to implement this method we applied it to a subset of the covenants that were of similar age.Because the largest number of covenants was implemented from 2010 to 2012 (n = 44), we analyzed these covenants as a homogenous group.We also applied our synthetic control and meta-analytic approach to these same 44 covenants for comparison.

RESULTS
We present the results of 4 models with the different caliper values (0.6, 0.8, 1.0, 1.2) used during the matching process to develop a donor pool of controls to be used to construct the synthetic control.Low caliper size yielded high-quality matches for better synthetic controls but fewer controls with which to construct the synthetic control, and in some cases no controls to estimate impact.In contrast, high caliper values produced a large control pool, enabling construction of a synthetic control from a larger number of covenants but potentially inducing bias due to reduction in the similarity of units in their covariate space.

Covenant-level impact
The results from individual covenants showed there was a heterogeneous response to changes in woody vegetation cover in FIGURE 3 (a-f) Variation in impact estimates across 6 sample conservation covenants from the Goldfields region based on models with a caliper size of 0.6.Plots in each panel are as follows: A: the proportion of woody vegetation cover in the covenanted area (black line) and each covenanted area's matched controls (blue lines) used to generate the synthetic control; B: the proportion of woody vegetation cover in the covenanted area (solid black line), the counterfactual prediction (dotted blue line) and the uncertainty of the prediction (yellow); C: the impact (dotted red line) and 95% credible interval (shown in yellow).The vertical dotted line shows the time the covenant was implemented.
the presence of a covenant.An example of the variation of covenant responses is in Figure 3. Impact estimates varied with time across the pool of covenants and ranged from significant positive impact (Figure 3a) to negative impact (Figure 3f).The outcome and the impact in terms of woody vegetation cover differed.For example, woody vegetation cover in the covenant improved, but there was no impact because the woody vegetation cover also improved in the synthetic control (Figure 3f).
The results also vary because the matching caliper size was increased, demonstrating that the estimates of effects were sensitive to the control pool used to construct the synthetic control (data on all covenants across all models in Appendix S7).
Impact varied with time since the covenant was implemented (Figure 4).Although many individual effects were not statistically significant, more effects were above zero than below and the majority of uncertainty bounds were above the zero line.This, combined with the fact that the mean estimate tended to be positive, provides some evidence that an aggregate effect may be increasing over time.In our comparison of covenants of different ages, the impact per year for all covenants showed that for all 4 caliper values there were more covenants that had positive effects than negative effects (Figure 5).The distribution of impacts (Figure 5, right panel) also indicated the density of the impact was concentrated over zero.

Aggregate-level impact
The meta-analytic Bayesian analysis showed that the covenants increased the proportion of woody vegetation cover per year (Figure 6).The mean effect (percent change in woody vegetation cover) per year was 0.83% (CI 0.39-1.25)for a caliper value of 0.6, 0.61% (CI 0.29-0.97)for a caliper value of 0.8, 0.51% (CI 0.25-0.77)for a caliper value of 1.0, and 0.28% (CI 0.07-0.46)for a caliper value of 1.2.The 95% credible interval of all distributions excluded 0, indicating statistically significant overall positive program impact, when averaged over all covenants in the region.The Rhat statistic for all models was 1, signifying model convergence.Other diagnostics for model convergence are in Appendix S3.Changing the variance in the effect priors did not affect the estimates (Appendix S4).The average impact of a subset of covenants of similar age (n = 44) determined with BSTS and mean difference and differencein-difference matching was similar.The BSTS and matching with mean difference produced similar positive mean estimates, whereas matching and difference in difference generated negative effect sizes, although they were within error bounds of the results of the other 2 models.

DISCUSSION
Our impact evaluation method has the potential to provide a more robust and nuanced understanding of impact compared with existing approaches.Over the last 30 years, the study area has seen a general increase in woody vegetation extent (Figure 1).In fact, the extent of woody growth in this region is recorded as one of the largest examples of natural regeneration in Australia (Geddes et al., 2011).By combining statistical matching with a synthetic control design, we isolated the impact that covenanting is having from other factors contributing to change.In our approach, the first of its kind in protected area evaluation, we estimated the temporal evolution of impact for each individual covenant, providing richer information on their performance than is available in traditional approaches to impact evaluation, such as matching or difference-in-difference methods (Wauchope et al., 2021).
We found a wide variation in the temporal patterns of impact across covenants (Figure 5).On average, more covenants had a positive impact than no or negative impact.The differences in the impact estimates for individual covenants may be partially explained by the difference in the levels of support that each covenant received (e.g., stewardship programs) (Trust for Nature, 2017).For example, a covenant may have performed  better because it received additional planting or fencing support, whereas poorly performing covenants may be the result of negligence by landholders.At the program level, the aggregate effect of all covenants in the region improved vegetation cover (Figure 6).Thus, our results provide evidence that the PLC program resulted in a positive contribution to increasing woody vegetation cover, consistent with results from previous studies (Nolte et al., 2019;Shumba et al., 2020).For example, in 2019, Nolte et al. (2019) found that in Massachusetts (USA), private land parcels protected by voluntary conservation agreements reduced forest loss by around 0.02% per year compared with controls.Many of the studies examining the impact of area-based conservation programs have been conducted in locations that have ongoing background biodiversity losses, and they show a positive effect of protection by reducing these losses (Geldmann et al., 2013).In contrast, our study area had a background trend of increasing woody vegetation cover, and the covenants studied showed an overall positive impact by increasing this gain in cover over what was observed in the control sites.
Our study makes an important methodological contribution to evaluation of the impact of conservation interventions such as PPAs.Previous studies on protected area impact have relied on estimating point estimates of average program impact and were based on assumptions regarding confounding covariates (Ribas et al., 2020).Our approach estimates individual covenantlevel and program-level impact, and relies on more reasonable assumptions on confounding than conventional approaches.In our comparison of our approach with traditional approaches, we found the meta-analytic estimate of program-level effect was consistent with traditional approaches.Our method provides a more comprehensive set of results, including a plot for each covenant showing the trend over time of covenant impact and associated uncertainties, the trend of woody vegetation outcome within the covenant, and the trend of woody vegetation in the set of matched controls and the constructed synthetic control.Despite impacts being the primary focus, outcomes can also offer complementary information.For instance, a decline in the vegetation extent (the outcome) on covenanted land can signal a deficiency in protection and raise concerns, even if there is a positive impact, due to a decrease in vegetation extent in the matched control sites.Given that freely available time-series data on woody vegetation cover is now available globally (Hansen et al., 2013;Wulder et al., 2019), our method is applicable to multiple locations around the world, provided that appropriate data on covariates are also available.Finally, our approach is transparent and reproducible because all the R packages we used are open source and the source code and data to reproduce our analyses are in Appendix S.
There are several limitations of our study that should be mentioned.First, even though we tried to minimize the differences in the covenants and controls by integrating matching and synthetic control, some biases likely remained.For example, we assumed that we controlled for landholder behavior by matching the preintervention trend in woody vegetation cover, but this assumption may not always hold because landholder behavior may change in a myriad of ways in response to policy or economic factors.In other cases, landholders may simply retain vegetation on their property with plans to clear it when the opportunity costs make it economically viable Others may keep vegetation solely for its recreational or aesthetic value.Changes in land ownership could also result in changing motivations for keeping or clearing vegetation.If the landholder's behavior is not reflected in the preintervention trend, then this assumption is violated.Indeed, in an impact estimation based on observational data, we assumed all biases had been controlled for, an assumption that is difficult to test.Because the true effects are unknown, researchers can never determine how many confounding factors have been controlled or the degree of bias in a specific analysis.Second, for many of the covenants included in our study, the time over which they have been in effect is rela-tively short (e.g., 35 covenants were <10 years old).This period may be insufficient for capturing impact because ecosystem responses may have considerable temporal lags.Future studies could explore the use of forecasting methods to complement our approach.
Other limitations are due to the data on woody vegetation cover, which is used as our outcome variable.There may have been measurement or classification errors in detecting vegetation from satellite imagery (Alix-Garcia & Millimet, 2022) or loss of information due to the binary classification of woody or nonwoody (in reality the outcome may have been a continuous gradient of woody cover) (Pettorelli et al., 2016).Further, our reliance on satellite data means we are not able to determine whether the detected increase in woody vegetation cover was intended (e.g., active restoration of native species) or unintended (e.g., woody encroachment).There are invasive woody species in the study area, and in this case a covenant restricting vegetation cover can be an intended effect, while an increase in cover could result from a woody weed species or the replacement of native grasslands (Veldman et al., 2015).Another limitation is that our data showed only the extent of woody vegetation, not its quality.This means the detected improvements give only a partial understanding of the covenant's effectiveness because we could not tell if vegetation conditions improved, for example, due to weed removal.A more detailed analysis to examine this would require extensive field work recording vegetation conditions at covenants and control sites.
Protected areas are typically evaluated using average impact estimates, which provide a broader overview of a program's performance.However, our approach has limitations in terms of understanding the performance of individual units in the program.Moreover, evaluators should utilize recent advances in impact methods, which are more robust and flexible than widely used matching techniques.We propose a framework that uses the synthetic control design to provide defensible estimates of individual-level impacts, distribution of their uncertainties, and a complementary set of outcome metrics.By implementing a meta-analytic technique, we were able to aggregate individual-level estimates to derive the overall program-level impact.Through our framework we showed that although the covenants in the Goldfields improved vegetation cover on average, there was large heterogeneity across the individual covenants.However, we believe our approach shows considerable promise for application to other conservation schemes, such as examining impacts to obtain more robust estimates for biodiversity or carbon credits.Stakeholders using our approach will still need to exercise caution when interpreting results because some level of bias will likely remain because of the range of assumptions required for the analysis of potential unobserved covariates.
Open access publishing facilitated by RMIT University, as part of the Wiley -RMIT University agreement via the Council of Australian University Librarians.

FIGURE 1
FIGURE 1The Goldfields region of Victoria showing (a) distribution of conservation covenants and private land, (b) increase in woody vegetation cover on all private land parcels from 1988 to 2020, and (c) temporal trend of covenanting (black, number of covenants; red, total area of covenants).

FIGURE 2
FIGURE 2 Workflow to estimate the impact of individual covenants and the aggregate impact of all covenants in the Goldfields region of Victoria.

FIGURE 4
FIGURE 4 Temporal distribution of the impact of each conservation covenant from the time since the intervention year for 4 different caliper values used in matching (red lines, individual impacts; gray shading, uncertainty for each individual effect; yellow lines, mean impact of all covenants).

FIGURE 5
FIGURE 5Distribution of impact per year for each of the studied conservation covenants across the 4 caliper value models arranged in a descending order of impact based on the model with caliper value of 1.2 (green dots, positive impact; orange dots, negative impact; corresponding lines, uncertainty for point impacts; n*, number of statistically significant effects).Each covenant has a fixed x-axis location across all plots, so the way the impact estimates for an individual covenant change with different caliper values can be determined by examining how estimates vary in the same vertical line.The histogram shows the density of the impact estimates.

FIGURE 6
FIGURE 6 (a) Outcomes of the meta-analytic Bayesian analysis to examine the distribution of annual impact over all included covenants in the Goldfields region.The posterior distribution of impact per year is shown for the 4 caliper sizes.(b) Results of analyses of a subset of covenants of the same age (n = 44) across the different caliper sizes comparing our method (Bayesian structural time series [BSTS]) with traditional matching methods used to calculate impact (DID, difference in difference).