A comparison of population viability measures

Abstract The viability of populations can be quantified with several measures, such as the probability of extinction, the mean time to extinction, or the population size. While conservation management decisions can be based on these measures, it has not yet been explored systematically if different viability measures rank species and scenarios similarly and if one viability measure can be converted into another to compare studies. To address this challenge, we conducted a quantitative comparison of eight viability measures based on the simulated population dynamics of more than 4500 virtual species. We compared (a) the ranking of scenarios based on different viability measures, (b) assessed direct correlations between the measures, and (c) explored if parameters in the simulation models can alter the relationship between pairs of viability measures. We found that viability measures ranked species similarly. Despite this, direct correlations between the different measures were often weak and could not be generalized. This can be explained by the loss of information due to the aggregation of raw data into a single number, the effect of model parameters on the relationship between viability measures, and because distributions, such as the probability of extinction over time, cannot be ranked objectively. Similar scenario rankings by different viability measures show that the choice of the viability metric does in many cases not alter which population is regarded more viable or which management option is the best. However, the more two scenarios or populations differ, the more likely it becomes that different measures produce different rankings. We thus recommend that PVA studies publish raw simulation data, which not only describes all risks and opportunities to the reader but also facilitates meta‐analyses of PVA studies.


| INTRODUC TI ON
Population-viability analyses (PVAs) are broadly used in ecology to assess the potential development of populations over time, to characterize their current status and future development, and to suggest effective conservation interventions (Beissinger & McCullough, 2002). Even though PVAs have been criticized for being too imprecise or are of low quality (Chaudhary & Oli, 2020, 2021Morrison et al., 2016), they are still considered a helpful tool in conservation biology (Brook, 2000;Brook et al., 2002), in particular, to evaluate the status, threats, and management options for populations (Lacy, 2019). Soulé (1987) defined viability as the minimum conditions for long-term persistence and adaptation of populations, following the concept of the minimum viable population (Shaffer, 1981). According to Soulé (1987), population viability involves a range of properties beyond persistence, including genetic properties, individual vigor, fertility, and fecundity. Various measures exist to quantify population viability, among them the mean time to extinction and the probability of extinction. Multiple attempts have been made to improve existing viability measures or to introduce new measures, e.g., the expected minimum population size N min (t) (McCarthy & Thompson, 2001) or the intrinsic mean time to extinction T m (Grimm & Wissel, 2004). However, no measure has been proposed, or broadly adopted, that can successfully be applied to a broad range of questions of conservation practitioners or to compare different viability studies. The lack of a unifying measure probably results from the complexity of the viability concept and reflects the multifaceted nature of extinction risk, as well as the diversity of questions that PVA is used to answer.
Viability measures can be roughly categorized into three classes, namely probabilistic measures, time measures, and population-size measures.
(1) Probabilistic measures, especially the probability of extinction P 0 (t), were the earliest and most widely used class of viability measures. Probabilities of extinction, quasi-extinction (Ginzburg et al., 1982), or the risk of decline focus on the likelihood of extinction or falling below critical population-size thresholds within a defined time horizon. Therefore, they require setting population size (N) and time thresholds (t), thus incorporating a subjective decision into viability assessment. (2) Time measures, especially the mean time to extinction, are frequently used as well (Foley, 1994;Reed et al., 2002). They highlight the temporal component of viability and the crucial role of population survival. The importance of time measures was underlined by the development of the intrinsic mean time to extinction T m (Grimm & Wissel, 2004), which considers the skewness of the distribution of extinction times (Ludwig, 1996) and the probability of reaching the established phase. (3) Examples of population-size measures are the expected average population size N E (t) and the expected minimum population size N min (t) (McCarthy & Thompson, 2001). In addition to these three rough categories, the population growth rate λ is not a traditional viability measure but a very important population property which is clearly related to viability, as it describes the trend of a population size (declining, stable, or increasing) (Lande, 1993). We therefore regard it here as a viability measure as well.
Together, over 20 different viability measures have been used in the literature (Pe'er et al., 2013), demonstrating the multidimensionality of the viability concept. It has been argued (e.g., Burgman et al. (1993)) that different viability measures might address different questions. For instance, the probability of extinction can help decision-makers assess how necessary it is to act, while the mean time to extinction might be suitable to assess how urgent an intervention may be. Choosing the best of several conservation actions could be done based on the expected population size at a given time (e.g., 10 years) after an intervention was taken.
Attempts to compare the results from multiple PVA studies that used different viability measures concluded that quantitative comparisons and generalizations remain virtually impossible, and little progress has been made over time (Burgman & Possingham, 2000;Crone et al., 2011;Naujokaitis-Lewis et al., 2009;Pe'er et al., 2013;Shaffer et al., 2002). There remains a need to assess different measures in terms of their consistency and suitability for different purposes. Ideally, such an assessment could guide the choice of viability measures and help mitigating the risk that the choice of a certain measure over another may affect the outcomes (e.g., in terms of the proposed intervention). Furthermore, it would be useful to identify the quantitative relationships between viability measures, in order to advance potential attempts for integration and quantitative analyses across studies to foster generalizations. If different viability measures ranked the same set of populations, species, or scenarios differently, this would complicate decision-making in nature conservation. By contrast, a consistent ranking of viability measures would enhance comparability.
This study compares eight viability measures: probability of extinction P 0 (t), risk of decline to a threshold population size P N (t), probability of quasi-extinction P QE,N (t) (Ginzburg et al., 1982), mean expected population size N E (t), expected minimum population size N min (t) (McCarthy & Thompson, 2001), expected/mean time to extinction T E , intrinsic mean time to extinction T m (Grimm & Wissel, 2004), and population growth rate λ. These eight viability measures were chosen because they are commonly used in the literature or proposed to be key measures for extracting important information from PVA simulations (Grimm & Wissel, 2004;IUCN, 2012).
We put emphasis on measures that represent the classes mentioned above, namely probabilistic, time, and population-size measures, as well as the growth rate as a measure to characterize populations' trajectory over time.
To evaluate the differences between these measures we simu-

| Simulating virtual species
Viability measures are computed from modeled population-size time series. Thus, we first parametrized the agent-based model RangeShifter (Bocedi et al., 2014) to simulate populations. The model allows a detailed parameterization that fits the life histories of a wide variety of species. For the parameterization, we used a published dataset that covers the parameters of 4574 virtual mammals.
This dataset was created to cover the diversity of sizes and life histories of real animals while accounting for the collinearity of different characteristics (Santini et al., 2016). The simulated species vary with respect to body mass, sexual maturity age, litters per year, litter size, home range area, population density, dispersal distance, and annual survival rate. All species were simulated with 100 repetitions for 100 years on three artificial fractal habitat maps. The habitat maps were created with RangeShifter (65 × 65 cells, Hurst exponent = 0.1) with 5%, 10%, and 20% habitat cover to reflect landscapes of different suitability to the species. This resulted in 13,722 scenarios To assess if the parameters of the simulation model can affect the relationship between two viability measures, we created three additional sets of scenarios: In each set, we varied either the carrying capacity of habitat patches, the mean dispersal distance, or the fraction of habitat patches in the map, while not changing any of the other parameters.
All RangeShifter parametrization files and outputs can be found in the Appendix S1.

| Computing viability measures
For each simulated scenario, we calculated eight viability measures in the following way: 1. Probability of extinction P 0 (t): the share of simulation runs in which an extinction (population size = 0) occurred within the specified time horizon t.
2. Risk of decline P N (t): the proportion of simulation runs in which the population size was equal to or lower than a population-size threshold N after the specified time horizon t.
3. Probability of quasi-extinction P QE,N (t) (Ginzburg et al., 1982): the fraction of simulation runs in which the population size dropped at least once below a population-size threshold N within the specified time horizon t.
4. Expected population size N E (t), also referred to as the mean population size, was calculated as the average population size of all simulation runs at time t. 5. Expected minimum population size N min (t) was obtained by calculating the mean of every simulation run's minimum population size within the time horizon t (related to the concept of the minimum viable population (Gilpin & Soulé, 1986)).
6. Intrinsic mean time to extinction T m was calculated from the probability of extinction over time, as the inverse slope of the linear regression through the tail of theln (1 − P 0 ) graph (Grimm & Wissel, 2004).
7. The mean time to extinction T E was extrapolated from the mean population size and the growth rate λ (intercept at N E (t) = 0). This allowed to compute T E even when not all simulation runs led to extinction within the simulated time frame.
8. The growth rate λ was calculated as the slope of the linear regression line of the mean population-size time series.
The viability measures 1-5 require further specifications of a time horizon (t) and/or a population-size (N) threshold. We used 25, 50, 75, and 100 years as time horizons and population-size thresholds of 1%, 5%, and 10% of the initial population size.

| Comparing viability measures
In this study, we (a) compared scenario rankings to find out if viability Kendall rank correlation coefficients to compare if different measures resulted in a similar ranking. Ties were handled by assigning the same rank and skipping one level (e.g., two species with rank 1 were followed by a rank of 3).
Second, we explored direct correlations and mathematical relationships between different viability measures that might allow for converting one measure into another. To do so, we fitted various linear and nonlinear models using the nls2 package (Baty et al., 2015;R Core Team, 2015) in R (R Core Team, 2015).
Lastly, for a more detailed assessment of the relationship between two viability measures, we explored if changes in single model parameters altered these relationships. In particular, we changed the carrying capacity per habitat patch and the mean dispersal distance (negative exponential dispersal kernel), and we used different habitat maps with different fractions of habitat (Appendix S2). If single model parameters caused changes in the relationship between viability measures, it would indicate that fixed functional relationships between viability measures might not exist. We thus computed the probability of extinction and the expected population size for all 100 years and plotted these values against each other for each parameter setting.

| Viability rankings
The computed viability measures show that each measure only worked for a fraction of all scenarios (Figure 1). For example, the population-size measures N E (100) and N min (100)  rate and probability of extinction got as low as 0.08 ( Figure 2). Most species and scenarios were ranked relatively similarly by the different viability measures, and the relationship was often mostly linear but rarely was the ranking exactly the same for two measures. The growth rate λ was a notable exception to this trend because its rankings differed greatly from all other rankings (Figure 2).

| Functional relationships between viability measures
The quantitative relationship between viability measures was in some cases linear but more often nonlinear (Figure 3, Table 1).
Often, these functions describe asymptotes, for example, the probability of extinction approaches zero at high population sizes ( Figure 3a). This is related to the same issue described above, that

F I G U R E 3
Relationships between different viability measures. For some measures, the functional relationship can roughly be described with adapted reciprocal functions (e.g., a, b, k, l), logistic functions (e.g., d, e), or simple linear functions (e.g., h), as shown in Table 1. However, significant variance and heteroscedasticity rendered even these relationships mostly useless to reliably calculate one measure from another. and T E , which showed a breakpoint (Figure 3f), which is an artifact related to the length of the modeled time period (100 years), while the relationship before this breakpoint was approximately linear ( Table 1). A positive growth rate always corresponded to a P 0 (100) of zero. On the other side, even strongly negative growth rates were sometimes linked to a P 0 (100) of zero, if the population size was very large. Lastly, some measures showed a distinct relationship when one or both measures were log-transformed (e.g., Figure 3e,f). On a log-log scale even very coarse correlations can look meaningful, but in practice, this will hardly be useful to compute one measure from another because it hides the large variance (e.g., N E vs. T E ).

| The effect of model parameters on relationships between viability measures
We found that changing parameters in the simulation model altered the relationship between viability measures. In particular, changing the carrying capacity, mean dispersal distance, or the habitat map altered the relationship between P 0 and N E (Figure 4).
For example, at a given P 0 , N E increased with increasing carrying capacity ( Figure 4a) and with decreasing mean dispersal distance ( Figure 4b). This dependence was slightly weaker when considering the change in P 0 at a given N E (Figure 4a-c). We also note that there were threshold behaviors, such as a decrease in the maximum possible P0 with decreasing mean dispersal distances ( Figure 4b) and a complete absence of extinctions when the proportion of suitable habitat exceeded about 10% (Figure 4c). This means that the same population size can correspond to different probabilities of extinction, which likely also partly explains the low correlation strength between different viability measures ( Figure 3). Consequently, we did not find any universal relationship between viability measures, that would not be sensitive to simulation model parameters.

| DISCUSS ION
Our systematic comparison of eight different population-viability measures across different scenarios and species showed three main results: First, all viability measures, except the growth rate λ, ranked the population viability of the simulated species similarly but not identically. Second, we found rough correlations, but no fixed relationships between viability measures, that would allow the conversion of one measure into another. Third, species and scenario parameters of the simulation model (including the habitat map) altered the relationship between any two viability measures.
Consequently, it appears to be impossible to compute one viability measure directly from another one. At best, functional relationships between two measures could be approximated for very similar scenarios. Hereafter, we outline the causes and implications of these findings and discuss whether a single number can represent viability.

| The relationships between viability measures
Our result that different viability measures rank species or scenarios similarly and that at least some viability measures correlate, indicates that most measures are based on a similar concept of viability.
As a result, identifying the best management option for a population seems to be robust with respect to the choice of the viability measure. By contrast, some scenario rankings were not identical and it was not possible to determine fixed relationships between viability measures. Thus, there are cases where the choice of the viability measure will affect which management option is considered the best for a population or which population is deemed more viable.
Furthermore, our results imply that two studies that reported two different viability measures cannot directly be compared by converting one measure into the other.
The relationships between viability measures seem to depend on species traits, carrying capacity, and habitat configuration. For example, increasing the species trait dispersal distance reduced the population size N E at a given extinction probability P 0 . This may be due to more intra-and interspecific interactions when species cover greater distances. This explanation is in line with our observation of decreasing maximum possible values of P 0 with decreasing dispersal distances. Furthermore, carrying capacity and the proportion of suitable habitat modified the relationship between N E and P 0 in an intuitive way, i.e., N E increased with increasing carrying capacity and P 0 became zero beyond a 10% threshold of habitat suitability. These are interesting theoretical interdependencies, but conservation scientists may often not have enough species trait and habitat data to assess these dependencies in detail. Thus, a pragmatic recommendation for conservation scientists, especially when supporting on-the-ground measures for population management, would be to choose (several) viability measures that show the least dependence on traits. In our case, P 0 should, for example, be chosen over N E because it was relatively less affected by differences in the traits we TA B L E 1 Approximated functional relationships between selected viability measures as shown in Figure 3 Viability measure 1 Viability measure 2 Figure 4). However, such dependencies may not always be as straightforward nor as intuitive as in our study. Moreover, relationships of viability measures may respond differently to distinct trait syndromes, such as fast versus slow pace-of-life syndromes (Healy et al., 2019), and these dependencies may be subject to spatial or temporal variability. Taken together, this calls for more research into the trait dependence of viability measures and the relationships between viability measures.

Approximated functional relationship
The lack of fixed relationships between viability measures can be explained by how viability measures process raw data. First, many viability measures are only based on a data subset, for example, N E (100) only requires the population size of all scenario repetitions after 100 years but discards all other information. Similarly, P 0 (100) only evaluates the fraction of simulation runs that went extinct after 100 years. Second, each viability measure aggregates the data in a unique way into a single number. Of course, the goal of a viability measure is exactly this, to describe viability in a single number, but this necessarily entails a loss of information regarding the underlying data distribution. A useful analogy is the computation of mean and median: Both can be calculated for the same distributions and both values will correlate when computed for a number of datasets. However, it is arguably not very meaningful to compute the mean from the median and vice versa. The same effect applies to viability measures. Each modeled scenario will result in a unique population-size frequency distribution over time ( Figure 5). These 3D distributions are characterized by different means, skewness, kurtosis, and how these characteristics change over time. Viability measures intend to summarize all the information from these distributions into a single number, but from this number, one cannot reconstruct the original distribution.
Consequently, one cannot accurately calculate one viability measure from another.

| Can a single number describe viability?
Given that there are many ways to aggregate raw populationviability data into a single number and that they all entail an information loss ( Table 2), it seems questionable if viability can or should be expressed as a single number. However, if viability is not expressed as a single number, it is also not possible to objectively rank different scenarios because only single numbers, not distributions, can be ranked at all. Thus, if we want to rank scenarios to support management decisions, what would be the most suitable single number viability measure (acknowledging that none of them will be perfect)?

F I G U R E 4
The relationship between the expected population size (N E ) and the probability of extinction (P 0 ) depends on scenario parameters. Here, all RangeShifter parameters were kept constant, except (a) the carrying capacity per habitat patch (K2), (b) the mean dispersal distance (meanDistI), and (c) the map with different fractions of suitable habitats in the landscape (scenarios that are not plotted showed no extinctions).

F I G U R E 5
Three examples of probability distributions (P) of population sizes (N) over time t. The distributions show (a) a population that stabilizes very early at a high level and also shows a high variance, (b) a population whose size first decreases but then stabilizes with a low variance at a certain population size, and (c) a population that declines and where some simulation runs already led to extinction.
Extinction and survival are at the core of the viability concept.
At first sight, this might imply that population sizes or growth rates are nonideal proxies of viability, because, by definition, a single surviving individual is sufficient to prevent the extinction of a species. However, population sizes and growth rates do affect viability via their effect on the occurrence and timing of extinctions.
Nevertheless, measures related to population size or growth rate capture viability less explicitly than measures related to extinction probability. Thus, the extinction probability distribution over time, P 0 (t), is the most fundamental description of viability.
To rank scenarios by P 0 (t) requires to aggregate a distribution into a single number. The probability of extinction at one (more or less arbitrary) point in time, e.g., 100 years, is one way to summarize the P 0 (t) distribution. Time measures like T E and T m are another way to summarize the P 0 (t) distribution into a single number. But T E has been criticized because the P 0 (t) distribution is often right-skewed (Grimm & Wissel, 2004;Ludwig, 1996), and T m only works in stable environments (because if the environment changes in the simulated time period, the tail of the -ln (1 − P 0 ) graph will not be linear, as required by T m ).
Aggregating the P 0 (t) distribution into a single number essentially means that the risks at different time periods are weighted against each other. This weighting is subjective and depends on a person's risk affinity. For example, would you trade a 1% higher extinction risk at time t for a 1.1% lower extinction risk at time t + 1? What about a 2%, 5%, or 10% lower extinction risk at t + 1? While some pairs of P 0 (t) distributions reflect clear differences in viability, distributions that, for example, mostly differ by lower or higher variance cannot be ranked objectively ( Figure 6). These idiosyncrasies of P 0 (t) distributions may be due to stochasticity effects on population dynamics (Melbourne & Hastings, 2008), as well as the typically right-skewed extinctiontime distributions.
While measures like the P 0 (100) or the (intrinsic) mean time to extinction can be seen as established conventions on how to summarize the P 0 (t) distribution into a single number, the inherent subjectivity of this process poses a problem to any viability ranking and to any comparisons of populations, species or scenarios.
Conservation scientists who support population management need to be clear about how a viability measure deals with probabilities, risks, and chances. This further supports that conservation scientists should assemble and report the raw simulated population-size time series to facilitate the comparison of different studies because TA B L E 2 Advantages and disadvantages of the analyzed viability measures.

Measure Advantages Disadvantages
Probabilistic measures P 0 (t)-probability of extinction • focuses on population survival • extinctions need to happen within the modeled time horizon • requires defining a time horizon • only returns meaningful values (0 < P 0 (t) < 1) for a fraction of all scenarios P N (t)-risk of decline • incorporates that small population sizes are almost certainly doomed (extinction vortex (Gilpin & Soulé, 1986)) • requires defining a time horizon • requires population-size threshold • only returns meaningful values (0 < P N (t) < 1) for a fraction of all scenarios P QE,N (t)-probability of quasi-extinction • gives even more weight to the extinction vortex than the risk of decline studies publish raw simulated population-size time series because they have many benefits not only for theory but also for conservation practice: First and foremost, raw population-size time series are the basis of a thorough probabilistic analysis including the possibility to determine all viability measures presented here; second, they make all risks and chances of the different analyses transparent, and finally, they allow for comprehensive and valid comparisons between studies, facilitating meta-analyses of studies that assess population viability.

ACK N OWLED G M ENTS
GP is funded by the German Centre for Integrative Biodiversity Research (

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are parameters of virtual species that are available in Santini et al. (2016) Guy Pe'er https://orcid.org/0000-0002-7090-0560 F I G U R E 6 The probability of extinction distributions of three scenarios shows that ranking population viability based on distributions is not always easy. It is easy to conclude that scenarios A and C show lower viability than scenario B because the probability of extinction is higher in A and C than in B at each point in time. However, the comparison of scenarios A and C is more difficult: Scenario C offers the chance of longer survival than scenario A but also bares the risk of an earlier extinction. Arguably, these risks and chances cannot objectively be weighed against each other to conclude that one scenario is more viable than the other.