The role of scale in designing protected area systems to conserve poorly known species

. Systematic conservation planning has a substantial theoretical underpinning that allows optimization of tradeoffs between biodiversity conservation and other socioeconomic goals. However, this theory assumes perfect spatial information about the locations of biodiversity features (e.g., species distributions). In practice, planners represent well-known taxa and other biodiversity ‘‘ surrogates ’’ in protected area systems, hoping that unmapped species will also be conserved. However, empirical research finds that surrogates predict species presence imperfectly, and sometimes rather poorly, at scales relevant to planning, and existing theory provides no further guidance. We developed new theory, explicitly incorporating aspects of spatial scale, for the representation problem when the locations of species distributions are unknown. Using probability theory and simulated and real species distributions, we found that the probability of adequately representing an unmapped species in a protected area system will be low unless the total fraction of the region being protected is larger than the species representation target. Furthermore, successful conservation depended critically on the relative sizes of the species distribution and of the individual protected areas; fewer, larger protected areas allowed the entire species distribution to fall into an unprotected gap. This scale-dependence varied with the configuration of the protected area system, with the conservation objective most likely to be attained if the individual protected areas were hyperdispersed (evenly spaced across the planning region). Using these results, we developed three design principles for representing unmapped species in protected areas: (1) The fraction of the region placed in protected areas should be substantially larger than the species-level representation target; (2) Individual protected areas must be at least one to two orders of magnitude smaller than the unmapped species ’ distribution; and (3) Protected areas should be evenly dispersed over geographic space. We also performed preliminary investigations of the effects of surrogates and socio-economic cost data on the probability of adequately representing unmapped species, finding that the primary effect of surrogates may simply be to promote hyperdispersion of protected areas across the planning region, and that seeking to minimize opportunity costs gives poorer conservation results than random protected area placement.


INTRODUCTION
Habitat conversion and overharvesting are primary threats to terrestrial and marine biodiversity (Ehrlich 1981, Jackson et al. 2001, and are often ameliorated by implementing protected area systems (Soulé 1991, Bruner et al. 2001, Sodhi et al. 2004. Early protected areas were sited at locations with low potential for commercial or subsistence use; these failed to conserve the biodiversity that was most threatened by human activities (Pressey 1994, Pressey et al. 2002, Possingham et al. 2006, Margules and Sarkar 2007. Design principles (representation, adequacy, comprehensiveness, complementarity, etc.) were developed in the late 1990s and early 2000s to ensure that protected areas represent a range of biodiversity (Possingham et al. 2006, Margules and Sarkar 2007), and were subsequently linked to the principle of cost-effectiveness (meeting conservation goals for a minimum cost; Adamowicz 2005, Carwardine et al. 2008a, b), which now constitutes the standard approach to protected area planning Pressey 2000, Moilanen et al. 2009b).
Designing systems of protected areas to conserve biodiversity is an inherently spatial problem: which locations will be included in the protected area system, and which will not? Likewise, the species and ecological communities that we seek to conserve are heterogeneously located in space. To the extent that co-location of protected areas and species will lead to the conservation of the latter, the question becomes: will putting a protected area in a certain location lead to overlap with the species of interest? This would seem to require knowing where the species occurs (henceforth, the ''species distribution''), and early approaches assumed that the presence (or abundance) of each species in each potential protected parcel of land or ocean was known, or could be reliably inferred from other sorts of spatial data (Moilanen et al. 2009a). The typical objective was to maximize the number of species with a target number or fraction of occurrences represented in the protected area system, subject to a budget constraint (or the inverse: minimize the budget required to adequately represent all species; Moilanen et al. 2009a). This approach continues to the present day in developing protected are systems for wellstudied taxa such as terrestrial vertebrates (Venter et al. 2014). As specified, this is not a spatial problem per se: one simply needs a list of species presences or abundances within each potential protected area. Explicit spatial considerations come into play as second order considerations, when comparing potential systems that perform similarly on the primary objective. First, all else being equal, low fragmentation is considered valuable, on both ecological and enforcement grounds (Fahrig 2002). Although conceptualized as a spatial configuration problem, drawing on the landscape ecology literature, the anti-fragmentation criterion has spatial scale impacts, prioritizing large protected areas over small protected areas. Second, the dispersal biology of particular species is used to set size and spacing guidelines, especially in marine protected areas: individual protected areas should be large enough so that most resident individuals will not leave the protected area during their normal day-to-day movements, and the spacing between protected areas should be small enough that individuals undertaking directed dispersal can reach one from the next (Moffitt et al. 2011). Finally, a combination of large distance between individual protected areas and large overall extent of the planning area is considered valuable to combat spatially contagious threats and disturbances that are not controlled by the protected status, such as hurricanes, oil spills, and the spread of disease (Allison et al. 2003, Game et al. 2008. Of course, the primary conceit-that we know which species will be represented in a given protected area system-is often untrue: for most species, we have at best only a broad delineation of geographic distribution and some ideas about habitat affinities, and for many species we have few (or no) reliable sightings, as location records tend to be concentrated in certain geographic areas (Margules and Austin 1994) and taxonomic groups (May and Harvey 2009). Collecting more data on species locations takes time, and delaying conservation action to acquire more information can result in net conservation losses if habitat conversion is ongoing (e.g., Grantham et al. 2008Grantham et al. , 2009. The solution that is widely advocated and used is to instead ensure adequate representation of ''surrogates:'' species whose spatial distribution is well known, along with biophysical habitat variables that have been mapped (Margules and Sarkar 2007). A great deal of effort has gone into examining the spatial co-occurrence between one group of species and another, or between habitat variables and particular species, as well as evaluating the extent to which a protected area system designed with a particular set of surrogates adequately represents a target species that was not included in the design. The general conclusions are that many surrogates have some degree of predictive power, but that the prediction is rarely perfect (e.g., Rodrigues and Brooks 2007). If extensive presence data for a species are available, then integrating those with mapped surrogates via a species distribution model (e.g., Guisan and Zimmermann 2000) can improve predictions (for that species) relative to using the surrogates alone (Rodrigues andBrooks 2007, Franklin 2010). However, spatial predictions even from these models are limited by the set of mapped environmental variables; for most species, a subregion that is homogeneous with respect to the surrogates is actually heterogeneous in other (unmeasured) variables that affect the species' abundance or probability of occurrence. Furthermore, it is difficult to know a priori which surrogates will be useful in a particular planning context; even in broadly similar ecosystems, the performance of a particular surrogate set can vary tremendously between nearby geographic regions (Grantham et al. 2010). Finally, surrogates can only be tested on species for which we already have good knowledge of the spatial distribution; this is a biased sample of biodiversity, and there is no particular reason to expect that a surrogate that is effective for a well-known species will perform equally well for a littleknown species (which will tend to be rare, small and cryptic).
This brings us back to the spatial nature of the problem. If we knew exactly where the species could be found, then we could apply spatial optimization techniques to place protected areas in locations that would be guaranteed to meet our conservation goals while minimizing other costs of conservation. Lacking good spatial information on a species, however, such optimization is impossible. Instead, the placement of protected areas becomes a spatial sampling exercise. In this context, considerations of spatial scale-extent (size of the planning region), grain (size of individual protected areas), and interval (spacing between individual protected areas)are critical, and their choice, relative to the spatial scale of the species distribution of interest, will strongly affect the expected success of the protected area system. In practice, these scale variables are often set arbitrarily: the extent is set by political boundaries, and the grain is set either by existing spatial subdivisions of the region (e.g., land parcels, watersheds) or by computational constraints on the total number of planning units; given these, the average interval is set by political or budgetary constraints on the total area in reserves, which controls the mean spatial intensity of the protected area system.
Our goal is to develop a theoretical framework that explicitly engages with these scaling issues, and to provide spatial configuration guidelines to improve the chances that a protected area system will represent species whose distributions are poorly known, have not been modeled, and may not be correlated with existing surrogates. The fundamental challenge is that both species distributions and protected areas are spatially autocorrelated-that is, if the hectare next door is within a particular species' distribution (or in a protected area), then there is a better than average chance that the hectare right here is also within the species' distribution (or protected area) as well. These scale and configuration issues have been studied in the context of deploying point sample locations to detect the presence of a species (typically modeled as having a circular species distribution; e.g., Nicholson 1993, Berec et al. 2015); in many contexts, a regular lattice sample configuration maximizes the detection probability, and the detection probability increases with sample intensity in predictable ways. However, the overlap of two autocorrelated patterns has not previously been analyzed; we combine formal analysis of idealized spatial patterns with simulations of hypothetical and realistic species distributions and planning scenarios. For mathematical simplicity, we focus here on species about which we have no spatial information. We also primarily assume that we have no useful surrogates for the species; this could represent a true lack of relevant surrogates, or a focus on a v www.esajournals.org sub-region that is uniform with respect to the surrogates we have mapped. We then ask, for a hypothetical species with a certain distribution size, what is the probability that we get lucky and adequately represent it in a protected area system? It might seem that this probability could be rather low in a world in which devoting 20% of the land or ocean to conservation is considered a stretch goal, but we will show that attention to spatial scale and configuration can greatly improve the odds.

MATHEMATICAL ANALYSIS
There are a number of ways that conservation ''success'' has been defined in designing protected area systems. Early theoretical models simply sought to have at least one occurrence of a species represented with the system, essentially allowing all computations to be binary (Saetersdal et al. 1993, Haight andSnyder 2009). More recent applications, such as those employing Marxan in real-life planning situations, have a threshold amount of the species distribution that they strive to include in the system; the Marxan optimization function is set up so that little value is obtained below the threshold, and little additional value is obtained for going over the threshold (Ball et al. 2009). The ultimate measure of conservation value would be a quantitative model of extinction risk; a reasonable proxy (that avoids complex dynamic modeling) might be an increasing function of the fraction of the species distribution or suitable habitat included in the system (e.g., Lindenmayer and Possingham 1996). In the work that follows we use the threshold criterion as implemented in Marxan, but we expect that many of the qualitative results will apply to other value functions as well. In particular, we assume that our conservation objective is to incorporate a fraction c of a species' distribution into the protected area system; we call this objective ''adequate representation'' of the species.
First and foremost, the probability of achieving adequate representation depends on r, the fraction of the planning region in reserves. If the planning region matches the species' distribution, and the species is found everywhere within the planning region, then exactly r of the distribution will be incorporated in the reserves, and the conservation objective will be met with probability one if r . c, and will never be met if r , c (Fig. 1, heavy line).
Suppose that, instead of being found everywhere in the planning region, individuals of species i are in N i point locations, randomly distributed across the planning region. The purpose here is primarily heuristic, but this might be a reasonable approximation for species that are specialized to rare, small, habitat patches (such as serpentine outcrops) that are not revealed in our environmental data layers. Then each occurrence of species i has a probability r of being covered by a protected area and, in expectation, a fraction r of each species will be included in the protected area system. However, the variance in the fraction conserved depends on N i . The probability that the conservation objective for species i will be met is described by the binomial cumulative distribution function If r . c, then this will approach one as N i gets large. However, setting r ¼ c gives at most a 50% chance of success; and even if N i ¼ 50, the probability of meeting the adequate representation objective only approaches one if r ¼ 2c ( Fig.   Fig. 1. Probability of representing 20% of a species' occurrences within a protected area system (c ¼ 0.2), for a species that occurs in N i random locations within the region, derived from binominal sampling theory. The step function represents the outcome if the species is found everywhere in the planning region (effectively, the limit as n goes to infinity).
If the species locations are clustered in a subregion of the landscape, and the individual protected areas are large, then it is intuitively clear that protection may well be all-or-nothing: if a protected area happens to be in the subregion, it will cover much of the species distribution, and if it does not, it will not. In expectation, fraction r of the species will be protected; but the adequate representation target will only be met with probability r (effectively, N i ¼ 1 in Fig. 1).
So far, we have ignored explicit spatial considerations by imagining that the species occupies a set of point locations. Similar approximations could be made by assuming that individual protected areas are points. However, in most real conservation contexts, both protected areas and species distributions constitute one or more contiguous areas, with characteristics of area and shape. Here we assume that the planning region has been divided into identically sized planning units, each of which are candidates for protection, and that the species of interest occupies some number of these planning units.
We define the following: A PR : area of the planning region A PU : area of a planning unit, or individual candidate protected area A Si : area occupied by species i s i ¼ A Si / A PR : fraction of the planning region occupied by species i N PU : number of planning units in the planning region N Si : number of planning units that overlap the species distribution r: fraction of the planning region to be put into protected areas c: fraction of the species distribution that is desired to be in protected areas N PA : number of planning units that are given protected status N PSi : number of planning units within the species distribution that are given protected status N ci : number of protected planning units that must be within the species distribution to meet the conservation objective For simplicity, we assume that the planning region contains an integer number of planning units (A PR /A PU ¼ N PU is an integer), and that the protected area target can be met with an integer number of planning units (rA PR /A PU ¼ N PA is an integer). The species distribution need not encompass an integer number of planning units; thus, N ci ¼ dcN Si e, where dxe is the ceiling function (the smallest integer larger than x).
The question is, if we choose N PA planning units at random to be protected areas, what is the probability that at least N ci of them are from the set N Si of planning units that overlap the species distribution? This is a stochastic process described by the hypergeometric distribution. Thus, the probability of obtaining a particular number of protected areas, N PSi , within the species distribution is If the species distribution is much smaller than an individual planning unit, and we make the approximation that the distribution is entirely within a single planning unit (that is, we ignore the cases where the distribution is divided by a planning unit boundary), then N Si ¼ 1 and the adequate representation goal is only achieved if N PSi ¼ 1. The probability of this occurring is Notice that this is identical to the point distribution model with the species concentrated at a single point. At the other extreme, when the species distribution is substantially larger than the individual planning unit, then we can ignore the marginal contributions of planning units that v www.esajournals.org fall on the boundary of the species distribution. Here, the probability of attaining the conservation goal comes from the upper tail of the hypergeometric cumulative distribution function: Unfortunately, this does not have a closed form solution. However, we can gain some insight from the formulas for the mean and variance of the hypergeometric distribution. In particular we can use them to find that the expectation and variance of the fraction of the species distribution in protected areas are (the approximation is simply to replace N PU À 1 with N PU in the formula for the hypergeometric variance, which will be inconsequential for most reasonably sized planning problems). This is valuable because, if r . c (which it must be if we are to have reasonable chance of success), then the probability of meeting the adequate representation target decreases as the variance in the fraction reserved increases. Lower values of s i (smaller absolute distribution size, relative to the planning region) makes conservation harder; of more interest, increasing the area of individual planning units (without changing the planning region; i.e., using fewer planning units and, ultimately, since the planning unit sets the scale of the individual protected areas, having fewer larger protected areas) also makes adequate representation more difficult (Fig. 2). The reverse is true if r , c, but the adequate representation probability is always small in that case.
Of course, the planning units that overlap the boundaries of the species distribution cannot really be ignored. Properly incorporating them requires using a multitype hypergeometric distribution, with a type for each category of overlap between the planning unit and the species distribution. The probability distribution of types depends on the shapes of the planning units and of the species distribution; rather than extend formal theory into this messy scenario we turn to simulations.

Square species distributions
To examine the interaction between compact species distributions and protected area configuration while maintaining a high level of abstraction, we simulated species having square distributions, randomly located within the planning region. Because spatial sampling theory suggests that hyperdispersed sampling design (e.g., a square lattice of sample locations) best captures spatial variability in a landscape (Quenouille 1949, Das 1950, Bellhouse 1977, Olea 1984, we constructed protected area systems in one of two ways: selecting planning units at random to protect, and maximizing hyperdispersion by placing protected areas with equal spacing between them. We used r ¼ 0.2 and c ¼ 0.1 throughout all of the simulations.
For the random configuration, we set up a planning region of 10 4 square planning units, each with unit area. For each of the species distribution areas in the set f2 À16 , 2 À15 , 2 À14 , . . . , 2 10 g units, we repeated the following algorithm 10 4 times: 1. place a square species distribution of specified size randomly in the planning region, with the entire species distribution within the region; 2. randomly select fraction r of planning units to put in protected areas (thus, each individual protected area has unit area); and 3. calculate fraction of species distribution in protected areas.
For the hyperdispersed configuration, we set up a system where each protected area is a square with unit area, evenly spaced in a grid such that fraction r of the total region is protected. For each of the species distribution areas in the set f2 À16 , 2 À15 , 2 À14 , . . . , 2 10 g units, we repeated the following algorithm 10 4 times: 1. place a square species distribution of specified size randomly in the planning region, with the entire species distribution within the region; and 2. calculate fraction of species distribution in protected areas.
v www.esajournals.org For each set of simulations, we calculated the fraction of replicates in which at least c of the species distribution was included in protected areas; this was interpreted as being the probability of achieving the adequate representation objective for that species.

Realistic species distributions
Real species distributions are not squares: they have irregular shapes, and are often separated into disjoint areas. There has been relatively little systematic study of species distribution shapes; the only generalization is that they tend to be wider in directions perpendicular to the dominant climate gradient, relative to their width along the gradient. On continental scales, this often results in east-west bands, whereas on mountains we see distributions forming elevation bands (Gaston 2003).
Rather than inventing an algorithm to simulate realistic distributions, we used a collection of actual species distributions: range maps of the 1281 mammal species that live in Africa (IUCN 2012). To simulate the process of conservation v www.esajournals.org planning in the region occupied by these species, we divided the continent into squares with areas of 625, 2500, or 10000 km 2 (25, 50, or 100 km on a side); excluding water bodies, cultivated terrestrial areas, and artificial surfaces (based on 300m resolution remote sensing data from the European Space Agency Environmental Satellite's imaging spectrometer; Bicheron et al. 2008), these formed the planning units.
With each of the three sets of planning units, we performed two simulations. First, we selected a fraction r ¼ 0.2 of the planning units at random, to represent a protected area system that was spatially random relative to species locations. For each of 100 replicate simulations, we evaluated whether each species had a fraction of its distribution equal to or greater than the conservation target (c ¼ 0.1) included in protected areas. We then calculated the species representation probabilities as the fraction of the 100 replicates in which the representation target was met.
Second, we enforced a geographically hyperdispersed protected area configuration by dividing the planning region into 1485 grid cells, each 1.33 degrees of latitude or longitude on a side. Using the 2500 km 2 planning units, we selected planning units so as to ensure that fraction r ¼ 0.2 of each grid cell was included in the protected area system. We implemented this in the conservation planning software Marxan (Ball et al. 2009), treating each grid cell as a conservation feature and setting the boundary length modifier to zero; the cost function was set to area. The resulting protected area systems were biased towards those planning units on the boundaries between grid cells, but were otherwise randomly located within each grid cell. Different random initializations led to different protected area systems that all met the geographic representation targets; we performed 100 replicates and analyzed them as for the random systems.

SIMULATION RESULTS
In the simulations with square species distributions, the probability of conservation success was one when the species distribution was large relative to the individual planning units. Since r ¼ 2c, this is not unexpected. However, for both random and hyperdispersed protected area configurations, the probability of meeting the conservation objective declined as the species distribution area declined relative to the planning unit area, asymptoting to r (Fig. 3A). This can be understood as the species distribution becoming so small that it can fit into the gap between individual protected areas.
There is a range of scales over which the probability of meeting the conservation objective is greater with the hyperdispersed than with the random protected area configuration (Fig. 3A). Although both systems have the same mean gap size, the random configuration will have some gaps that are larger than average, whereas the gaps sizes are identical in the hyperdispersed configuration. Thus, a species with a distribution that is somewhat larger than the mean gap size is guaranteed to overlap some protected areas in the hyperdispersed configuration, but may fall into a gap in the random configuration.
When applied to realistic species distributions, random protected area systems show a similar dependence of representation probability on the relative size of the species distribution and the individual protected areas (Fig. 3B). The pattern is identical for the three different sizes of protected area (compare the loess curves in Fig.  3B). The asymptote appears to be nearly 50% larger than r. The scatter around the trend exhibited by individual species can be accounted for mostly by binomial sampling error: with 100 replicates, the sampling standard deviation is 0.126 when the representation probability is 0.2 or 0.8 and is 0.158 when the representation probability is 0.5.
Using representation of latitude-longitude blocks to approximate a hyperdispersed protected area configuration substantially improves the adequate representation probability of realistic species distributions, with some species being guaranteed adequate representation with distributions up to three orders of magnitude smaller than under random protected area configurations (Fig. 3C). The hyperdispersed configuration also outperforms the random configuration in expectation, across a wide distribution of scales (compare dotted curves in Fig. 3D).
Across all scales where adequate representation is not guaranteed for the square species distributions, the adequate representation probability is higher for realistic species distributions than for square species distributions (Fig. 3D).

DISCUSSION
Through mathematical and simulation analyses, we have shown that the scale and configuration of protected area systems affects their ability to represent species that were not included as explicit planning targets in the planning process. Using a metric of conservation success that is commonly employed in systematic conservation planning (include a target proportion, c, of the species' distribution in protected areas), we found that, when the protected areas are spatially random with respect to the location of the species, substantially more than c of the planning region needs to be protected in order to have a high certainty of meeting the conservation target. Furthermore, we found that, even when the fraction of the region being protected is twice as large as the target protection level for a given species, individual protected areas need to be one to two orders of magnitude smaller than the species distribution in order to have a high probability of including the target fraction of the species distribution in the protected area system. v www.esajournals.org Finally, we found that hyperdispersed protected areas could reliably represent species with distributions an order of magnitude smaller than those represented by random protected area configurations.
Previous work on spatial sampling in relation to detecting species with compact distribution has examined the probability that one or more point samples falls within the species distribution (typically modeled as having a circular species distribution; e.g., Nicholson 1993, Berec et al. 2015). Our research extends this work by examining sample units that are themselves compact regions of nonzero area, and evaluating not just the probability of any overlap between the samples and the species distribution, but of overlap above a target proportion of the species distribution. As in the prior models, we find that a hyperdispersed distribution (ideally, a uniformly spaced array) of sample units maximizes the probability of achieving the overlap target, and that the representation probability increases abruptly as the mean gap between samples decreases below the scale of the species distribution. However, with non-point samples, the probability of achieving the overlap target depends not only on the intensity and configuration of the samples, but also on the fraction of the total area being sampled (Fig. 2). This relationship, in turn, depends critically on the intensity (number of sampling units), with a very large number of sample units being required to ensure representation without setting the total sample fraction to be substantially larger than the target overlap fraction.
These results come from analysis of an extreme scenario in which we have no information about the spatial location of a species. If the species distribution has been mapped, of course, the map can be incorporated directly into the conservation planning objective function. If a mapped variable can serve as a more or less reliable surrogate for the species, then the discrepancy between the total area in protection and the ensured representation of the species will be reduced. Overall, surrogate effectiveness is variable and somewhat unpredictable (Rodrigues andBrooks 2007, Grantham et al. 2010); thus, treating surrogates as if they were perfect substitutes for species distributions is risky.
Another way of thinking about the problem is that, having taken various surrogates into account, an area having uniform characteristics from the point of view of the surrogates may still be heterogeneous from the perspective of the species; the conservation planner must treat the actual species distribution as unknown within that part of the map.
Where there are sufficient records of a particular species' occurrences to construct a species distribution model (SDM), that model will typically prove to be a more effective surrogate for that species than direct use of the underlying environmental variables (Rodrigues and Brooks 2007). Nevertheless, SDMs are still imperfect representations of actual species distributions, both because of omitted explanatory variables and because of small or non-systematic spatial samples of species presence and abundance (Dennis and Thomas 2000, Kadmon et al. 2004, Barry and Elith 2006, Fourcade et al. 2014. The standard measure of SDM accuracy conflates errors of omission and commission; the latter (in which a location believed to be suitable for a species is actually not) is particularly problematic for conservation planning, especially given that they tend to be spatially autocorrelated (e.g., Franklin et al. 2009). Quantifications of these errors (Heikkinen et al. 2006, Elith andLeathwick 2009) will be needed to make reliable predictions for conservation planning.
The scale-dependence of the conservation effectiveness of protected area systems challenges current conservation planning practice, because planning units are often chosen for computational or administrative convenience, rather than after explicit consideration of ecological scales. Furthermore, this scale-dependence may explain some of the inconsistencies in evaluations of surrogates, many of which compare surrogate-based systems with random systems (Rodrigues and Brooks 2007). The latter can perform very well or very poorly, depending on the relative size of planning units and species distributions; accounting for this scale dependence may lead to a better understanding of surrogate effectiveness.
Surrogates are not located randomly across the landscape, and species-and habitat-based surrogates are often nested within bioregional or subregional boundaries. To our knowledge, the v www.esajournals.org spatial pattern of conservation surrogates has not been studied; but our collective experience suggests that a comprehensive set of surrogates tends to divide the planning region into relatively small, spatially localized subregions. To the extent that this is true, the use of surrogates may be enforcing a hyperdispersed spatial arrangement of protected areas; our results show that, over some spatial scales, this could lead to improved conservation performance when compared to random protected area configurations, even if the surrogates have little biological relationship to the species distributions. In an analysis of why the Great Barrier Reef Marine Park rezoning was effective at protecting a set of conservation features that were undescribed at the time, Bridge et al. (2015) found that ensuring representation within each of a number of geographically defined subregions was just as effective as ensuring representation within each of the ecologically defined bioregions used in the actual rezoning. We further explored this issue with the African mammal range data, constructing surrogates using 29 'natural' landcover types (including various classes of forest, shrub, herbaceous, grassland, aquatic, and deserts; Bicheron et al. 2008), nested within 123 ecoregions (Olson et al. 2001), for a total of 1875 conservation targets. We used Marxan as in the geographically hyperdispersed simulation, seek-ing to include 20% of the area in each target in the protected area system. This approach performed better than random protected area configurations, but no better than, and sometimes worse than, hyperdispersed protected area configurations constructed with the latitudelongitude grid (Fig. 4). Most conservation planning exercises use better-crafted surrogates than we used in this example, but this result does suggest that the appropriate null model for testing surrogate effectiveness, even after accounting for scale, should be geographically hyperdispersed, rather than random, protected area configurations.
Our finding that hyperdispersion of protected areas in geographic space is effective in the absence of spatial data on species distributions is reminiscent of prior work showing that environmental diversity (ED) can be maximized by choosing reserves to be evenly spaced across environment space (Faith andWalker 1996, Faith 2003). In regions with strong environmental gradients, the two results may be equivalent. However, many species distributions depend on both the environment (through niche constraints) and on geography (through biogeographic constraints and extinction-colonization dynamics). Thus, two geographically distant locations might have similar environmental conditions (and thus be considered redundant under ED), yet harbor Fig. 4. The adequate representation probability (c ¼ 0.1) of realistic species distributions, as a function of the ratio of protected area size to distribution size, using a surrogate (represent r ¼ 0.2 of each landcover type within each ecoregion). Circles represent individual species; the dark green curve is a loess fit to the points. The red and blue dotted curves show the performance of random and hyperdispersed protected area systems applied to the same species distributions (Fig. 3).
v www.esajournals.org very different communities of species. It may well turn out that the ideal protected area system will be hyperdispersed in both environment space and geographic space. We repeated the exercise in Fig. 4 using landcover as a surrogate without nesting it within bioregion; removing the geographical constraint substantially lowered the adequate representation probability for many species. An important direction for future research will be to extend this to sets of surrogates that more completely describe the relevant environmental space.
The issues raised by our results have some superficial similarities to the SLOSS (single large or several small) debate of the 1970s (Diamond 1976, Diamond and May 1976, Simberloff and Abele 1976a. The primary consideration in that debate revolved around species numberarea relationships and the spatial scale of species turnover (beta diversity), which would determine whether several disjoint protected areas might represent more species than a single large protected area with the same total areal extent (Simberloff and Abele 1982). Although the pattern and scale of beta diversity emerges from the pattern and scale of individual species distributions (together with the community interactions that determine patterns of co-occurrence), our results pertain even to a single focal species.
These results suggest that adequate representation of a species in a protected area system requires that the individual protected areas be at least one to two orders of magnitude smaller than the unmapped species' distribution, which for narrow endemics is quite small. Some such species are associated with identifiable surrogates, such as ecotones (Rouget et al. 2003, Kark et al. 2007, evolutionary refugia , and special features such as seamounts and pinnacles (Green et al. 2007). However, we are not aware of extensive empirical studies to show what fraction of narrow-range endemics are associated with such features. Interestingly, while conservation biologists tend to bemoan the fact that most protected areas are small (e.g., Gurd et al. 2001), a large number of small protected areas may be just what is needed for these narrow-range species.
In contrast, many flagship species have low densities and individuals with large home ranges; small protected areas scattered across the species distribution would not provide them adequate long-term conservation. Conservation planning often focuses on such species, seeking to make individual protected areas large; it is hoped that protecting these flagship or umbrella species will benefit other species as well (Roberge and Angelstam 2004). However, our results show that such a system will not effectively conserve narrow-distribution endemic species with unknown distributions, unless they exhibit strong ecological associations with the flagship species. Resolving this dilemma will probably require a multiscale approach (Boyd et al. 2008), with a few large protected areas to conserve areademanding species, together with a spectrum of smaller protected areas of variable size and connectivity to gain a representative sample of ecological and geographic spaces occupied by endemics. Some preliminary simulations, however, suggest that this will expand the total area that needs to be under protection: the large protected areas represent so few of the narrow range species that the total area requirements for small protected areas are nearly as large as if there had been no large protected areas at all.
Formally, our analysis focused on the size of individual protected areas, relative to species distributions. In practice, however, it is the size of the gaps between protected areas that matters. This means that, even when individual protected areas are small, if they are clustered in certain subsets of the planning region then there will be large gaps into which a substantial number of unlucky species may fall. Although we would not intentionally build a system with this feature, it can be an outcome of well-intentioned policy. Current best practice often seeks to design protected area systems that meet a target level of representation for the least possible socioeconomic cost (Carwardine et al. 2008b, Klein et al. 2008, reflecting an effort to get the greatest return on investment, given limited resources for conservation. However, if planners do not have distribution data for all species, this may lead to perverse conservation outcomes, as some species may preferentially use habitat that is also valuable to humans, and thus be underrepresented in a least-cost protected area system. This is particularly common in the establishment of marine protected areas, which often systemativ www.esajournals.org cally withhold protection from heavily fished locations, either through explicit design (Klein et al. 2008) or as the outcome of information asymmetries among stakeholders (Lynch 2006). This outcome represents the minimum conservation benefit that could be obtained by putting a given fraction of the seascape in protected areas. Even when biodiversity and humans have different reasons for spatial preferences, the fact that both cost maps and species distributions are spatially autocorrelated means that some ''unlucky'' species are concentrated in regions of high economic value. If these unlucky species have not been mapped, we will fail to conserve them when we exclude high-value locations from the system, essentially replicating historical conservation efforts that tended to concentrate protected areas in high elevation ''rock and ice'' sites and away from agriculturally productive ecosystems (Scott et al. 2001, Pressey et al. 2002. We illustrate this challenge by aiming to represent African mammals in low-cost areas, using agricultural land value data (Naidoo and Iwamura 2007) as an opportunity cost as in Carwardine et al. (2008a). We retained the geographic representation goal, seeking to reserve 20% of the land in each 1.33 degree grid cell, but set Marxan's objective function to minimize the total opportunity cost. The resulting protected area systems had a performance that was comparable to placing protected areas at random across the whole continent (Fig. 5), and performed less well than geographic representation targets alone. Furthermore, a substantial number of species had a zero probability of being adequately represented, in sharp contrast to our main results. While this analysis assumed no information about the location of species, preventing optimization, we expect that an optimization approach based on imperfect surrogates would produce qualitatively similar results: locations that appear identical from the perspective of the surrogates will differ in unmeasured characteristics that create heterogeneity in value both to the species and to humans. Local stakeholders often have a much more spatially resolved map of socio-economic value than biologists do of conservation value; whether this is incorporated systematically in the planning process or used to recommend post hoc adjustments to the plan, the result can be the systematic exclusion of certain species from the protected area system.
Our results suggest three principles for designing protected area systems to conserve species whose spatial distribution is unmapped or that are poorly represented by mapped surrogates. First, if the conservation goal is to include a stated fraction of the species distribution in the protected area system, then the total fraction of Fig. 5. The adequate representation probability (c ¼ 0.1) of realistic species distributions, as a function of the ratio of protected area size to distribution size, by protecting the r ¼ 0.2 least-cost locations within each degree of latitude and longitude. Circles represent individual species; the orange curve is a loess fit to the points. The red and blue dashed curves show the performance of random and hyperdispersed protected area systems applied to the same species distributions (Fig. 3).
v www.esajournals.org the region in protected areas must be substantially larger then the species-specific conservation target. A more quantitative statement of this principle will depend on the configuration and scale of both the protected areas and the species distributions, but our analyses here suggest that a factor of two may be a good rule of thumb.
Second, adequate representation in a protected area system requires that the gaps between individual protected areas be no larger than the species distribution. If the protected areas are randomly distributed with respect to the species, and the protected area fraction is twice the species representation target, then this requires that the individual protected areas be at least one to two orders of magnitude smaller than the unmapped species' distribution.
Third, at any scale of protected area size and spacing, a geographically hyperdispersed configuration that minimizes the variance in gap size maximizes the performance of the protected area system in representing unmapped species. In particular, when developing protected area systems for regions in which biodiversity has not been well studied, such as remote forests, a datafree grid-based configuration may be just as effective as a design based on expensive mapping of a small number of surrogates, and will certainly outperform a cost-based analysis that results in an aggregated protected area configuration. Where there is spatial data relevant to some species but not others, techniques developed for spatially balanced sampling (e.g., Stevens and Olsen 2004) may help create a configuration that maximizes the chance of also protecting unmapped species.
In this work we have developed the basic underpinnings of a spatially explicit theory of protected area system design. In the interests of mathematical simplicity, we have focused on a single measure of conservation effectiveness (include a specified proportion of the species distribution in protected areas), ignored patterns of co-occurrences between species, and left out potential negative influences of fragmentation on wide-ranging species. These are all areas for future work, as is theory on how to incorporate imperfect spatial information (such as that provided by SDMs). In addition, we need additional theory on how various aspects of species distribution shape affect the probability of a protected area system adequately representing the species; this can be coupled with empirical research on patterns of species distribution shapes across taxa and environments. Finally, an important open question is whether the insights generated by this theory can help us design protected area systems that can account for uncertainty in future species distributions under climate change.