A global analysis of management capacity and ecological outcomes in terrestrial protected areas

Protecting important sites is a key strategy for halting the loss of biodiversity. However, our understanding of the relationship between management inputs and biodiversity outcomes in protected areas (PAs) remains weak. Here, we examine biodiversity outcomes using species population trends in PAs derived from the Living Planet Database in relation to management data derived from the Management Effectiveness Tracking Tool (METT) database for 217 population time‐series from 73 PAs. We found a positive relationship between our METT‐based scores for Capacity and Resources and changes in vertebrate abundance, consistent with the hypothesis that PAs require adequate resourcing to halt biodiversity loss. Additionally, PA age was negatively correlated with trends for the mammal subsets and PA size negatively correlated with population trends in the global subset. Our study highlights the paucity of appropriate data for rigorous testing of the role of management in maintaining species populations across multiple sites, and describes ways to improve our understanding of PA performance.

negatively correlated with trends for the mammal subsets and PA size negatively correlated with population trends in the global subset. Our study highlights the paucity of appropriate data for rigorous testing of the role of management in maintaining species populations across multiple sites, and describes ways to improve our understanding of PA performance.

K E Y W O R D S
living planet database, management effectiveness tracking tool (METT), protected area management effectiveness (PAME), vertebrate population trends, world database on protected areas (WDPA)

INTRODUCTION
Setting aside land for the protection of nature is a key global strategy for halting the current loss of biodiversity (Convention on Biological Diversity, 2010;Gaston, Jackson, Cantu-Salazar, & Cruz-Pinon, 2008). This has resulted in a still-expanding global network of Protected Areas (PAs), now covering ca. 15% of the terrestrial surface of earth and 4% of the global ocean (UNEP-WCMC and IUCN, 2016). However, the extent to which PAs are safeguarding biodiversity is debated (Baillie, Joppa, & Robinson, 2016;Pringle, 2017). The importance of protecting the right places cannot be overstated. Informed by tools such as systematic conservation planning (Margules & Pressey, 2000) and the global standard for Key Biodiversity Areas (UNEP-WCMC and IUCN, 2016), considerable research has focused on understanding which areas of land (Eken et al., 2004) and sea (Klein et al., 2015) to protect. However, we also need to know if existing PAs are working to reduce threats and to understand what management systems and interventions make PAs most effective (Ferraro & Hanauer, 2015;Pressey, Visconti, & Ferraro, 2015).
Several studies have considered the relative effectiveness of PAs in reducing forest loss, generally finding that PAs have lower rates of deforestation than similar but unprotected areas (Geldmann et al., 2013). However, while deforestation data sets permit powerful analyses of changes in forest cover inside versus outside PA boundaries, they have significant limitations. They shed limited light on changes in other dimensions of forest biodiversity (e.g., empty forest syndrome, Redford, 1992)-and, of course, say nothing about nonforest biomes. Moreover, few studies have investigated associations between the habitat performance and management quality of PAs, with most finding no relationship .
Here, we approach the question of whether PA management quality impacts biodiversity outcomes using data on changes in native species populations (Barnes, Craigie, & Harrison, 2016;Mace, Collen, Fuller, & Boakes, 2010). Existing studies of the relationship between species population trends and management of the PA have either used detailed case studies from one or few sites (Geldmann et al., 2013) most recently 15 PAs from 14 countries (Beaudrot et al., 2016), or have relied on structured questionnaires (Bruner, Gullison, Rice, & da Fonseca, 2001) and interviews with experts (Laurance, Carolina Useche, & Rendeiro, 2012). In this article, we bring together the database on the Management Effectiveness Tracking Tool (METT) and the Living Planet Database (LPD), which are the largest global quantitative data sets on management inputs and time-series of animal populations, respectively. The LPD contains 5,956 vertebrate (predominantly mammal and bird) population time-series within 1,736 PAs around the world (Collen et al., 2009). Using the LPD,  showed that species population trends inside PAs are correlated with country-level socioeconomic factors such as the Human Development Index (HDI). However, these results do not address links between populations and actions undertaken inside PAs. The METT offer a potentially valuable resource for tackling this, by capturing an array of information on procedural elements related to the quality of management in the PAs (Mascia et al., 2014). METT has been championed by organizations like the International Union for the Conservation of Nature (IUCN), the Global Environmental Facility (GEF) and WWF, and applied to >2,000 PAs across the world .
Based on PA names and IDs, we matched LPD time-series within PAs to our METT database, to test the hypothesis that better site-level management (e.g., in terms of staffing, management plans, stakeholder involvement) leads to more positive vertebrate trends inside PAs. To account for the fact that the ability of PAs to deliver conservation outcomes also depends on other contextual factors, we include these in our model. Understanding how management actions and institutional arrangements link to the state of biodiversity inside PAs has major implications for our ability to address the challenges defined in the Aichi Targets, particularly target 11 that calls for PAs to be effectively and equitably managed (Convention on Biological Diversity, 2010). Based on our results, we also highlight how the paucity of direct data on changes in biodiversity constrains our understanding of the performance of PAs globally, and we highlight a path forward to address this challenge. Monitoring and enforcement systems -Relates to the appropriateness of the legal framework, the capacity to enforce, and the understanding of the biological and procedural conditions of the PA. Ostrom (1990) Speaks to the extent to which the legal framework governing the PA is appropriate for PA managers and other law-enforcement personal to address and mitigate threats and noncompliance with PA rules and regulations. Improved knowledge and understanding of PA conditions, across ecological, procedural, and threats, allows management to be informed and responsive.
PA regulations (2) Law enforcement (3) Resource inventory (9) Research (10) Monitoring and evaluation (30) Decision-making arrangements -Relates to the mechanisms for involving relevant stakeholders in and around the PA as well as the influence of these groups on management decisions. Ostrom (1990) Including a diversity of stakeholders increases the likelihood that management will be better suited to the local social and ecological context, and enhances the perceived legitimacy of the PA and compliance.
Education program (20) State and comm Neighbors (21) Indigenous people (22) Local communities (23) Note: Full description of all METT questions is found in Table S1. Numbers in parentheses indicate the question order in the original METT score card. Some questions (numbers: 11, 13, 24, 25, 26, 27, 28, and 29) are not included as these were not used in the analysis (see SI for full documentation).

PA management data
We compiled a data set of METT assessments from 1,988 PAs . The METT is a questionnaire, usually completed as a group exercise involving park managers and other stakeholders. The METT collects information on objectives of, threats to, and designation of the PA as well as evaluating the adequacy of 30 procedural elements of PA management (Stolton et al., 2007, and see SI for details). Our analysis focused on these 30 questions (see Table S1 for full list), which are answered by a score from 0 = inadequate or nonexisting to 3 = adequate or fully implemented. However, some of these attributes which may be of importance for PA success across other performance matrix (e.g., delivery of equity) cannot be reasonably expected to deliver improved biodiversity outcomes (Mascia et al., 2014). To address this, we used Ostrom's (1990) framework for governance of common pool resources and the IUCN World Commission on Protected Areas (WCPA) management effectiveness framework (Hockings, Stolton, & Dudley, 2000) to group the METT questions into four dimensions (Table 1). Both frameworks have been developed to understand the diversity and complexity of procedural elements contributing to successful conservation interventions. Our four categories were: (1) Design and Planning, relating to the legal status, design and identification of objectives of the PA; (2) Capacity and Resources, covering the adequacy of staffing, budgets and equipment; (3) Monitoring and Enforcement Systems, summarizing the effectiveness of monitoring and law enforcement; and (4) Decision-Making Arrangements, reflecting the engagement of local stakeholders in management decisions. For each of these four dimensions, we calculated a composite score based on Geldmann et al., (2015), which corrects for missing information within the individual dimensions. Each dimension was standardized between 0 (absent from the PA) to 100 (considered to be sufficient to achieve PA objectives). METT assessments were conducted between 2003 and 2014. For PAs with multiple assessments over time, we used the first (e.g., oldest) assessment to increase alignment with the LPD data.

Species population trends
We obtained species population trends from the LPD (Living Planet Database 2016), which uses data collated from published scientific literature, online databases, large-scale monitoring schemes (e.g., Pan-European Common Bird Monitoring Scheme) and gray literature (Loh et al., 2005). We used available terrestrial and freshwater species data for the sites, including birds, mammals, and reptiles. There was no compelling reason to separate or restrict these taxa in the analysis, as PAs aim to protect all species and all species are subject to a range of stressors. We only considered time-series within PAs that were added to the database before February 15, 2016. For all population time-series, we calculated the annual rate of change (i.e., the slope), by fitting a generalized linear regression model (GLM) with a log-link function, following . However, where  calculated slopes based on data from 1970, we took a more restrictive approach using only time-series with a minimum of three observations between 1990 and 2012, to better align the timescales of the LPD and METT data (see SI).

Contextual factors
The ability of PAs to deliver conservation outcomes can be expected to depend not only on how they are managed, but also on several social and economic contextual factors (Table S2). We identify four aspects related to the location of the included PAs, which we hypothesized to affect the performance of the PAs: (1) PA attributes (Gray et al., 2016;Woodroffe & Ginsberg, 1998), (2) human pressures (Geldmann, Joppa, & Burgess, 2014), (3) socioeconomic context , and (4) landscape structure (Joppa & Pfaff, 2011). For site attributes, we used date of establishment and size of each PA, obtained from the World Database on Protected Areas (WDPA; IUCN and UNEP-WCMC, 2015). For human pressures, we calculated the mean Human Influence Index (HII) inside the PA (Sanderson et al., 2002). We represented socioeconomic context with Gross Domestic Product (GDP), and the national-level HDI for 2005 and 2000, respectively (UNDP, 2011); and landscape structure by mean elevation of the PA (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005). To account for possible species-level effects, we used log of the body mass of the species, as this can be related to both conservation significance (Smith, Veríssimo, Isaac, & Jones, 2012) and vulnerability to threats (Brook & Bowman, 2005).

Statistical analysis
We assessed predictors of variation in the slopes of the individual LPD time-series using a mixed-effect model (GLM). We added country and taxonomic class as random effects, to account for PA, country-and taxonomic-level effects not captured by the contextual data (Zuur, Ieno, Walker, Saveliev, & Smith, 2009). As the four management dimensions were observed to be collinear, we never tested them together. Instead, we constructed four different base models, each with population trend as the dependent variable, and one of the management dimensions as well as our random effects and: (1) year of establishment, (2) size of PA, (3) HII, (4) HDI, (5) GDP, (6) mean elevation, and (7) species body mass as independent variables. Pooled models were run that included all vertebrate taxa (i.e., birds, mammals, and reptiles), as well as separate models for mammals alone-the only group with enough data to run a separate model. Model selection was based on Akaike information criterion (AIC) after assessing all possible model configurations. As our objective was to investigate the contribution of management, we always retained the METT management variable, regardless of the effect size.

Data coverage
The total overlap between the 5,956 population time-series from 1,736 PAs and the 1,988 METT assessments comprised data on 217 populations from 73 terrestrial PAs in 29 countries ( Figure 1, Tables S7 and S8). PA sizes ranged from 0.12 (Islotes de Punihuil, Chile) to 50,991 km 2 (Namib-Naukluft, Namibia; median = 1,579 km 2 ). Our sample was biased toward older larger PAs ( Figure S5). The population time-series were predominantly from Africa (n = 94, 43.3%) and Asia (n = 93, 42.9%), followed by Latin America and the Caribbean (n = 19, 8.8%), and Eastern Europe (n = 11, 5.1%). Our sample contained no PAs from North America, Western Europe, or Australia, as the METT has not been frequently used in those areas. The data set was dominated by mammals (n = 145, 66.8%) and birds (n = 61, 28.1%), while population data for reptiles (n = 11, 5.1%) were much more sparse.

Role of management
We found a positive relationship between aggregate scores for Capacity and Resources and population vertebrate trends in both the model for all taxa and for mammals only (Figures 2B and 3B). In neither the all-taxon nor the

F I G U R E 1 PAs with overlap between METT and LPD. Pie charts show the distribution of time-series between taxa, as indicated by color for
the 73 PAs mammals-only model did we find a relationship between the METT scores for Monitoring and Enforcement Systems, Decision-Making Arrangements, or Design and Planning and population trends. For all models considering mammal-only population trends, as well as the all-taxon model with Capacity and Resources, more recently established PAs experienced more positive population growth than did older ones (Figure 2 and Figure S1). For all models considering all-taxon, except the model with Capacity and Resources, smaller PAs experienced more positive population growth than larger ones (Figure 2). We found no relationship between population trends and HII, GDP, HDI, mean elevation, or body size in any of our models (Tables S3-S6).

Management capacities and resources
Capacity and Resources (which includes adequacy of staff, budgets, and available equipment) was the only dimension of PA management that was associated with positive changes in populations in our models. Although our analyses are correlational, this finding is consistent with the idea that having enough PA staff with appropriate training and budgets is important in delivering a functional global PA system (Leader-Williams & Albon, 1988;Smith, Muir, Walpole, Balmford, & Leader-Williams, 2003). For example, in the Karoo National Park, South Africa where we find adequate budget and staff numbers to be associated with increasing mammal populations following reintroductions, also validated by changes in natural vegetation cover (Kraaij & Milton, 2006). We do not take our results to imply that local stakeholder engagement (Oldekop, Holmes, Harris, & Evans, 2016), monitoring and enforcement (Jachmann, 2008), or planning (Pressey et al., 2015) are not important in ensuring effective PAs, but rather that their relative importance may be related to other performance measures (e.g., equity and economic benefits, or species and ecological representation). However, particularly for monitoring and enforcement, we had expected to find a positive relationship. Indeed, capacity and resources are unlikely to make the greatest impact unless some of these are devoted to enforcement (Hilborn et al., 2006;Jachmann, 2008). Our results align with evidencebased calls for conservation strategies to include increased funding for management of the existing PA portfolio (Bonham et al., 2014;Gill et al., 2017;Waldron et al., 2017). While the PA coverage of the earth continues to grow (UNEP-WCMC and IUCN, 2016), funding for management has not kept up (Watson, Dudley, Segan, & Hockings, 2014). Knowledge of conservation spending, at global and even national levels, is extremely limited, but all syntheses thus far show major shortcomings, indicating that many PAs are underfunded (Balmford, Gaston, Blyth, James, & Kapos, 2003;McCarthy et al., 2012;Miller, Agrawal, & Roberts, 2013;Waldron et al., 2013;Waldron et al., 2017).

PA age and size
For the mammal models and the all-taxon with Capacity and Resources, younger PAs saw greater increases in populations than older ones. Similar results were found for alpha diversity in a sample of 359 PAs across the globe (Gray et al., 2016), while it has been shown that alpha diversity was greater in older marine PAs (Edgar et al., 2014). We are not aware of −0. resources, (C) monitoring and enforcement systems, or (D) decision-making arrangements. The x-axis shows the standardized parameter estimates (mean = 0, SD = 1) of the population slope for the standardized explanatory variables (mean = 0, SD = 1). Dark gray shows models with all-taxon while light gray shows models with only mammal populations. All error bars are 90% confidence intervals any existing work testing any relationship between species population trends and PA age, but suggest several nonexclusive explanations for further testing. Older PAs are often located in more pristine areas where wildlife populations with or without protection are under less pressure (Joppa & Pfaff, 2009). Conversely, newer PAs may be established in locations under higher pressure, or to address observed declines of particular target species. Where the latter is the case, these are perhaps also more likely to be better resourced. In both cases this could lead to newer PAs experiencing more positive present-day population trends, with older ones supporting stable populations closer to carrying capacity. For the all-taxon models, except the one containing Capacity and Resources, smaller PAs had more positive population changes than larger ones. This was surprising as ecological theory suggests that PA size is important for viable populations (Walston et al., 2010). However, there is evidence that larger parks can lead to dilution of resources, higher risk of encroachment, and decreased detection of threats, so that increased size may not always result in increased populations (Barnes, Craigie, Dudley, & Hockings, 2016). While we think the LPD and WDPA are strong candidate data sets for testing such hypotheses related to size and age, this was outside the scope of our analysis where the pruning based on available METT data restrict, and potentially bias ( Figure S6), an ideal data set for addressing such questions.

Evaluating PA impacts requires more and better data
We conducted the largest terrestrial analysis linking PA management input to changes in terrestrial vertebrate populations. Our two data sets contain site-specific information from thousands of PAs. Despite this, the overlap consists of just 73 PAs and even within these, there are challenges with temporal disjunction ( Figures S9 and S10). We identify four overarching limitation to be addressed for expanding and strengthening such analyses in the future. First, the limited overlap between the METT and the LPD data sets shows a lack of coordination between the collection of data on PA interventions and outcomes. Second, our final model accounts for only a relatively small fraction of the variance observed in the data. Managers would be ill-advised to use the patterns we report to guide funding in specific PAs or to discount the role of planning, enforcement, or the involvement of local stakeholders in specific PA management. Third, the LPD also has shortcomings in its ability to assess PA performance, because positive slopes are not a direct measure of conservation success but only of growing populations. Similarly, the LPD lacks information on equivalent population trends outside PAs, making it impossible to fully discount the effect of location and history of the PA in trends obtained. Fourth, while the wide application of the METT makes it a potentially powerful tool for understanding conservation input and outputs, it is not without its limitations. Lack of uniformly applied guidelines for the implementation of METT assessment can lead to individual PAs interpreting similar conditions differently (Cook & Hockings, 2011). Similarly, PA managers and other stakeholder may have different agendas which can lead to both deflated and inflated scores (Cook & Hockings, 2011;Geldmann et al., 2015). Furthermore, METT assessments often rely on existing available knowledge, which is often insufficient (Mascia et al., 2014). However, these issues have been shown to be less pronounced for measures related to planning, input, and processes, which is the part of the assessment we use to generate the four management dimensions, compared to outputs and outcomes (Cook, Carter, & Hockings, 2014;Mascia et al., 2014).

Moving forward
Overall, our study highlights the need to better understand if, how, and when management capacity and resources improve PA performance. Enhancing our ability to answer questions at a large scale will require collecting data for interventions as well as changes in biodiversity, both inside and outside PAs in a standardized way that allows for comparing across regions and interventions. Such data do not currently exist for any global sample of PAs, but need to be created if the relationship between PA management quality and the impact on species outcomes is to be fully measured. Thus, we need to move beyond current, often opportunistic data collection activities to ensure that the resources already invested in monitoring schemes contribute to a greater whole. Reliance on ad hoc data collection has greatly impeded our ability to fully assess to what extent PAs have had an impact on the persistence of biodiversity. It has been suggested that the large funding bodies such as the GEF could be potential leaders in the field, having the financial strength to develop and implement a more coherent monitoring system (Craigie, Barnes, Geldmann, & Woodley, 2015). However, as our results indicate, current efforts to collect such data have both been spatially biased and may be lacking in credibility. We propose three steps that will need to be addressed to ensure that future data on management effectiveness can be more useful for assessing the performance of PAs and tracking progress toward policy targets. One, the assessment process on the ground needs to be better streamlined across PAs and over time to ensure comparability. This will require detailed guidelines and trained independent evaluators to participate in the process. Two, better systems for collations of data, after the assessments have been conducted, both at the site and institutional levels are required to ensure that data are available for analysis. Three, integration of counterfactual thinking should be included the assessments. This last step will involve assessment of conditions in a comparable nonprotected area and gathering contextual data for the individual PAs as part of the management assessment. This will not be easy and to succeed, will require the participation of PA agencies, NGOs, researchers, as well as buy-in from governments. We need to learn, for example, from the medical field, which has developed standardized methods and databases to ensure a strong evidence-based approach to health problems (Pullin & Knight, 2001. Without such coordinated and standardized efforts, our understanding of what makes PAs effective will continue to rely primarily on small-scale studies with variable designs, or else on limited correlational studies such as our own (Geldmann et al., 2013). A similar need has been identified for the marine realm (Gill et al., 2017) and for studies quantifying threats to biodiversity . To achieve the Aichi Targets, specifically Targets 11 (PAs) and 12 (threatened species), and Sustainable Development Goal 15 (terrestrial biodiversity), we need to understand how to most effectively protect biodiversity. Our results support the argument (Pringle, 2017) that establishment is not enough, and that investments in the PAs after their establishment are key to halting biodiversity loss.