Evaluation of species distribution models for estimating animal dark diversity

Aim Applying wide and effective sampling of animal communities is rarely possible due to the associated costs and the use of techniques that are not always ecient. Thus, many areas have a faunistic hidden diversity we denote Animal Dark Diversity (ADD), dened as the diversity that is present but not yet detected plus the diversity dened by Pärtel et al. (2011) that is not (yet) present despite the area’s favourable habitat conditions. We evaluated different species distribution model types (SDM techniques) on the basis of three requirements for ADD estimate reliability: 1) estimated spatial patterns of ADD do not differ signicantly from other SDM techniques; 2) good predictive performances; and 3) low overtting. eleven using biomod2 package in the R software environment. We tted the various SDM techniques to the data for each species and compared the resulting ADD estimates for the two animal groups under three threshold types. The results demonstrated that estimated ADD spatial patterns vary signicantly between SDM techniques and depend on the threshold type. They also showed that SDM techniques with overtting tend to generate smaller ADD sizes, thus reducing the possible species presence estimates. Among the SDMs studied, the ensemble models delivered ADD geographic patterns more like the other techniques while also presenting a high predictive performance for both faunal groups. However, the Ensemble Model Committee Average (ECA) performed much better on the sensitivity metric than all other techniques under any of the thresholds tested. In addition, ECA stood out clearly from the other ensemble model techniques in displaying low-medium overtting.


Introduction
Understanding the geographical distribution patterns of animal species communities is a key objective in ecology and for conservation policies implementation. However, in many cases, determining such patterns is rendered impossible by the di culties involved in monitoring the different taxa at large scales (Margules et al., 2002). The consequent undersampling of some areas has meant that there are regions for which animal diversity is still unknown.
Also, in some cases species diversity is hidden owing to the application of no-suitable sampling techniques that are unable to detect certain species' presence (Bailey et al., 2014;MacKenzie et al., 2018;Pärtel, 2014). Nevertheless, there is another type of hidden diversity called "dark diversity," and de ned by Pärtel et al. (2011) as the biodiversity that is absent from a site (true absence) yet could potentially inhabit it given the site's favourable environmental conditions. This concept does not apply to fauna, however, since true absence in the case of animals is never 100% certain due to detection problems (MacKenzie et al., 2018).
In some regions there are thus three speci c hidden diversity types, each one forming a part of the area's overall hidden diversity. Since the rst two cited types are typical of animal inventories, we can group them under the heading of Animal Dark Diversity (ADD), de ned as the diversity present in a given area but has not yet detected plus the dark diversity as de ned by Pärtel and col. The distribution patterns of these two ADD types could be determined through extensive monitoring and the implementation of better sampling techniques, but this would involve a considerable cost in resources, both human and material. An alternative approach would be to estimate the ADD distribution patterns using mathematical tools as proxies to determine a site's potential biodiversity. Good estimates of ADD would enable potential and as yet unknown high biodiversity areas for which conservation and management plans are needed to be identi ed and prioritized. Such estimates could also be used to pursue other conservation goals (e.g., to identify relatively high ADD areas regarding conservation problems, rare species, invasive species, etc.).
Studies have shown that the methods proposed for estimating dark diversity have performed well and could be successfully applied to the ADD concept.
However, the main focus of these analyses has been to evaluate estimation techniques based on species co-occurrence information (e.g., Brown et al., 2019;Carmona & Pärtel, 2021;Lewis et al., 2016;Ronk et al., 2016). Other methods, grounded upon species' environmental a nities, have received little attention (Carmona & Pärtel, 2021). Furthermore, approaches relying on species co-occurrence have generally been applied to taxonomic plant groups where good knowledge of species distribution usually exists. However, when complete information on each species' distribution is not available, environmental a nities methods such as species distribution models (SDMs) appear to be more effective (Beissinger et al., 2016;Wilkinson et al., 2019).
Indeed, SDMs are an important tool in ecology, biogeography and conservation (Peterson et al., 2011), allowing researchers to explore the relationships between species' geographical occurrence probabilities given certain environmental variables ). There are various modelling techniques for SDM development, ranging from traditional statistics models such as generalized linear models (McCullagh, 1984) and modern methods based on machine learning such as random forest (Breiman, 2001) to formulations based on a combination of different individual modelling techniques known as ensemble models (Araujo & New, 2007).
One way to estimate ADD using SDM techniques is by transforming the estimated presence probability of each species derived from its individual SDM into binary data (presence/absence) and then stacking the predictions. The ADD at each site is then given by the number of absent species according to the data inputted to calibrate and test the SDMs, whose results indicate presences. This procedure is analogous to the dark diversity estimation method using SDMs applied by Ronk et al. (2016).
The ADD size so estimated will depend on the threshold type used to transform the occurrence probabilities given that the false positive count of each SDM constructed for each species varies with the probabilities transformation method employed (Fielding & Bell, 1997;Jiménez-Valverde & Lobo, 2007). ADD sizes will also depend on the degree of over tting of the SDM technique. An over tted model is one that has incorporated much residual variation as part of its structure (Burnham & Anderson, 2002) and will therefore predict very well with training data but not with testing data (Mutasa et al., 2020). Thus, SDM techniques with a high degree of over tting will lead to very low ADD sizes given that the number of false positives predicted from training data (the larger part of the data from which ADD predictions will be made) will be low compared to those SDM techniques with less over tting.
Although there are several SDM techniques that can be used to estimate ADD, there are no previous studies in which ADD estimates-or dark density estimates generally-based on different techniques have been analysed and compared. Consequently, more work is needed on how SDM technique choice can produce different spatial distributions of ADD in geographic space and whether these differences vary across several threshold types. In order to be optimal, an SDM technique must not only demonstrate good predictive performance but also produce ADD spatial pattern estimates that do not signi cantly depart from those obtained with other SDM techniques, since otherwise its results would simply be the particular set of estimates that particular technique happens to generate.
Furthermore, the optimal technique should not produce models with signi cant over tting as this would exclude many false positives due to its inclusion of a high degree of residual variation. Therefore, we should suggest that it is desirable for an SDM technique to deliver the highest number of false positives, as long as it accurately re ects the species' environmental suitability (i.e., the technique has good predictive performance).
Bats and moths are two faunal groups with great biodiversity but due to their characteristics (nocturnal habits, elusive, cryptic, and rare species, different methodologies of study, etc) usually are studied using SDMs to understand their distribution patterns (e.g. Lisón & Calvo, 2013) and resolve many ecological issues. Therefore, they are a good model to explore and evaluate the implementation of ADD in these groups. In light of the foregoing, our aim in this study is to compare and assess ADD estimates derived from a range of SDM techniques using different threshold types while taking account of the degree of over tting and its effects. Speci cally, we propose to (1) compare ADD geographic distribution pattern estimates obtained from different SDM techniques and threshold types; (2) evaluate the techniques' predictive capacities and degrees of over tting and show how the latter is related to ADD size estimates; and (3) discuss the ability of these techniques to predict ADD in biogeography and conservation biology contexts.

Material And Methods
Bat and moth distribution data and environmental variables Our study was conducted using presence/absence distribution data on bat and moth (quadri ne Noctuoidea) species in the Iberian Peninsula nested in 10x10 km 2 UTM cells (6169 total cells). The bat distribution dataset was constructed from the distribution maps in "Atlas y Libro Rojo de los Mamíferos Terrestres de España" (Gisbert & Palomo, 2007) and "Atlas de dos morcegos de Portugal" (Rainho et al., 2013). These data were updated with new records at the same resolution from personal surveys and references (see Lisón et al., 2015;Lisón & Sánchez-Fernández, 2017). As regards the Eptesicus serotinus and E. isabellinus species of bats, we considered the distribution of the latter to be limited to the south and south-eastern regions of the Iberian peninsula (Andalusia and Murcia) (Gisbert & Palomo, 2007;Lisón, 2015), with the former species deemed to be found in the remaining areas. In fact, however, there is some overlap between the two areas (Santos et al., 2014).
Faunistic data on quadri ne Noctuoidea, as de ned by (Yela et al., 2011;Zahiri et al., 2011), were derived from the FAUNOCIB database, which is part of the GeoBrink platform (Yela et al., 2011). The database contains some 170,000 published and validated faunistic records of the 850 species of Euteliidae, Erebidae, Nolidae and Noctuidae known to be present in the Iberian Peninsula and the Balearic Islands. Data on rare species (less than 40 records) were not included given that for estimating their geographical distribution, models types other than those used here are recommended (Breiner et al., 2015).
Data on rare species (less than 40 records) were not included except for one bat species with 33 records, which we decided to include due to the small number of bat species when compared to the number of moth species. However, these presence records still allow us to an appropriate use of SDM, since when the presence records are higher than 30, models accuracy is often good (Hernandez et al., 2006;Stockwell & Peterson, 2002;Wisz et al., 2008).
Land-use variables were obtained following the procedure due to Lisón et al. (2015) and Lisón & Sánchez-Fernández (2017). We employed the land-use map provide by the CORINE Land Cover Project (see www.eea.europa.eu and https://land.copernicus.eu/pan-european/corine-land-cover/clc-2006). Land-use cover was reclassi ed from the 44 initial categories to 16 de nitive ones (see Lisón & Sánchez-Fernández, 2017). For each such category, we calculated the percentage of surface area included in each cell. In addition, we extracted 44 environmental and geographical variables from BIOMOD for use with each cell following (Barbosa et al., 2012). The variables used to model are available in Supplementary Material Table S1.

SDM modelling speci cations
We evaluated eleven different SDM techniques, of which eight were individual ones and the remaining three were ensemble model types that included the other eight (see Table S2). The individual techniques are varied and include both classical and modern models (Table S2). The ensemble models were the ones that are most popular among SDM users (Hao et al., 2019(Hao et al., , 2020. All of the SDM techniques were studied using the biomod2 package implemented in the R software environment . The package was chosen due to the variety of SDM techniques it supports and because it seems to be the most popular among users who prefer ensemble models for estimating species presence (Hao et al., 2019).
With each SDM technique we used biomod2's predetermined hyperparameters. Although their con guration could affect predictive performance (Hao et al., 2020), indications are that biomod2 users tend to employ the default hyperparameter settings (Hao et al., 2019) so to maintain consistency with existing studies we adopted the same practice. In any case, the default values have proven to give good results (e.g., Hao et al., 2020).
We t the SDMs to each species through cross-validation using a randomly selected 80% of the data for calibration (i.e., tting or training) and the remaining 20% for validation (i.e., testing). We repeated this step ten times and then averaged the estimated probabilities for each UTM cell to obtain a single probability and thus reduce the variance in the nal results.

Animal dark diversity (ADD) estimation and spatial patterns comparison
We estimated the ADD for each UTM cell under each SDM technique and threshold type to be the number of species estimated as present by the models when the input dataset indicated absences for them. To binarize the SDM presences/absences, three threshold types were tested: MaxKappa (Monserud & Leemans, 1992), which maximizes Cohen's kappa coe cient; MaxTSS (Allouche et al., 2006), which maximizes the True Skills Statistic; and MeanProb (Cramer, 2003), which is the average of the presence probability estimates. Following Scherrer et al. (2018), who in turn based themselves on Liu et al. (2005) and Nenzén & Araújo (2011), the three may be respectively classi ed by technique as single-index-based, sensitivity and speci city combined, and predicted-probabilitiesbased. The threshold values were determined using the optimal.thresholds function of the PresenceAbsence package in R (Freeman & Moisen, 2008). We tested ve thousand cut-off thresholds with MaxKappa and MaxTSS to nd the optimal value.
The ADD spatial patterns generated by all combinations of SDM technique and threshold type for each animal group were checked for correlation using Spearman's coe cient (ρ). The results were considered to indicate high correlation when ρ was ≥ 0.7.

SDM predictive performance and over tting evaluation
The SDM techniques' predictive performance was evaluated using three metrics: area under the receiver operating characteristic curves (AUC), sensitivity (i.e., proportion of correctly predicted presences or true positive rate) and speci city (i.e., proportion of correctly predicted absences or true negative rate). AUC shows how well a model discriminates species presences from absences. It measures the quality of the SDM's predictions irrespective of a single cut-off threshold. Sensitivity and speci city depend on a single cut-off threshold, so they were calculated using the cut-off threshold values used to make the ADD estimates.
SDM technique over tting was evaluated by the faunal group as the difference, under each metric, between the predictive performance calculated on the calibration data and that calculated on the testing data. Since the predictive performance of an over tted SDM technique is expected to be better with the calibration data, the most over tted techniques were considered those for which the predictive performance differences were the greatest. The relationship between over tting and ADD size for each faunal group was assessed using scatter plots. For each SDM technique, the expected value of ADD in a UTM cell was plotted by threshold type against the mean over tting value calculated by the AUC. A trend line obtained using simple linear regression was added to the graph of each plot.

Results
Data were analysed for 25 species of bats and 352 species of moths. With ten calibrations per species and technique, this meant that the number of SDMs tted was 41470, of which 2750 were for bats and 38720 for moths. For each SDM technique, 3770 models were calibrated. Presence estimates stacked by SDM technique and threshold type resulted in 66 maps of ADD geographical distributions, 33 for each faunal group (see Figure 1, Supplementary Material Figure S1, S2, S3, S4 and S5).
In general terms, the ADD size estimates varied from one UTM cell to the next depending on the SDM technique and threshold type (Figures. 2, 3 and 4). The MaxKappa threshold yielded smaller sizes, particularly in the case of moths (Figure 2 and 4).
For both animal groups, when MaxKappa and MaxTSS thresholds were used, the SDM technique that generated the highest ADD size estimates was GLM whereas with the MeanProb threshold, the highest estimates were produced by the ECA ensemble method (Figure 2 and 4). At the other extreme, when using MaxKappa and MaxTS thresholds the lowest estimates were obtained by the RF technique, while with the MeanProb threshold, this was the case for both the RF and FDA techniques on bats and FDA only on moths (Figure 2 and 4).
The correlation analyses of the different SDM techniques showed that for both animal groups, the geographic distribution patterns of the ADD sizes estimated by the ensemble models were more closely correlated with those of the other techniques than the latter were with each other (Figure 3). The estimated patterns of several of the techniques were highly correlated when MaxTSS and MeanProb thresholds were used (ρ> 0.7; Fig. 3), but with the MaxKappa threshold, only the ensemble models were highly correlated in both animal groups. RF was the technique that estimated the most particular ADD patterns when MaxKappa and MaxTSS thresholds were used, but with MeanProb this changed drastically, its estimates in this case being highly correlated with those of the other techniques, particularly for moths.
Most of the SDM techniques managed a good general predictive performance with the testing data (AUC> 0.7) for both animal groups (Table 1). RF achieved the best AUC values for bats, followed in declining order by EMP, EWMP and ECA. However, RF also exhibited the highest over tting for bats on this metric (Table 1). ECA scored the highest AUC value for moths, with EMP and EWMP in second and third spots, whereas for this faunal group RF again displayed the most over tting together with GBM (Table 1).
GLM was the individual SDM technique with the least over tting for both animal groups while among ensemble models, ECA's over tting was the lowest by a clear margin. Also, ECA was respectively 0.05 points above and 0.08 points below the values for the techniques with the lowest and highest over tting in bats, and 0.01 points below and 0.08 points above the lowest and highest over tting techniques in the case of moths. Thus, on this criterion ECA is medium-low for bats and low for moths.
ECM performed very well on the sensitivity metric under any threshold type. It was the third best on over tting with all threshold types except MeanProb, where it was the best of all, but on their performance the three ensemble models varied greatly. GLM and FDA presented the best over tting on sensitivity under MaxKappa, and GLM and CTA did the same under MaxTSS for both faunal groups, while with the MeanProb threshold, GLM and CTA were the best only for bats (Table 2).
Regarding speci city, under MaxKappa, RF obtained the best values for both animal groups while under MaxTSS, RF performed the best for bats, and CTA did so for moths. However, RF was also the technique with the highest over tting on this metric for bats under any threshold type (Table 2). When MeanProb was used, the best speci city was achieved by FDA technique for both bats and moths. GLM obtained the best over tting under any threshold type for either animal group (Table 2).
Finally, the results showed a clearly negative relationship between the mean over tting of SDM techniques and their expected ADD values in a UTM cell under the MaxKappa and MaxTSS thresholds (Figure 4). However, under the same two types, there were some differences regarding which techniques were associated with a larger expected ADD size concerning its over tting (Figure 4). Under MeanProb, by contrast, no relationship was observed between mean over tting and expected ADD values (Figure 4).

Discussion
In this study, we identi ed an optimal SDM technique for making ADD estimates as one that meets three key requirements: 1) its geographic distribution patterns do not signi cantly differ from those of other techniques; 2) it has good predictive performance; and 3) it displays a low degree of over tting.
As regards the rst requirement, the ensemble models tested in this study were the best option. They proved to signi cantly reduce the differences among individual SDM techniques in the ADD estimates under all of the threshold types tested. Those differences may be explained by the fact that each one of them produces a "noise" and a "signal" between presence probability estimates and environmental variables relationships as a result of its optimization methods (Araujo & New, 2007;Pearson et al., 2006). However, ensemble models can disjoint such noise to some degree, which improves their prediction quality (Araujo & New, 2007;Dormann et al., 2018) and consequently also the reliability of their estimated ADD patterns.
Considering now the SDM techniques with better general predictive performances (on AUC values), our results showed that ensemble models were better in both faunal groups, a nding that is consistent with other studies (e.g., Bouska et al., 2015;Farhadinia et al., 2015;Fletcher et al., 2016;Folmer et al., 2016;Hao et al., 2020;Marmion et al., 2009). ECA displayed a higher level of sensitivity than the other SDM techniques under any threshold type tested here but did not do so on speci city. This means that ECA did not explain species absences any better than the other techniques. Nevertheless, since we do not know if the species are truly absent in those areas, a technique that explains animal species absences better is not, in fact, the most appropriate. A more judicious choice would be a technique that is biased towards indicating presences, even if it means some cost in speci city. For our study this meant that the most suitable technique in terms of predictive performance on ADD estimates was the ECA ensemble model.
On over tting as measured by AUC and sensitivity, ECA's results were noticeably lower than the other ensemble models. This result, plus the technique's impressive predictive performance (especially on sensitivity) and the fact that its ADD spatial pattern estimates did not differ signi cantly from those of the other SDM techniques all point to ECA as the best option for making ADD estimates under any of the threshold types we tested.
Our results further showed that SDM techniques with low over tting values tended to estimate higher ADD sizes. Therefore, over tting should be given due consideration in judging a technique's ADD estimates, for if it is not, much information on possible species presences in certain geographic regions could be lost. In other words, predictive performance should not be the only factor in the evaluation of SDM techniques for estimating ADD. To illustrate this point, recall that for bats, our results showed RF as having the best AUC on the testing data, which might at rst suggest it was the best ADD estimation technique. However, it was also found that RF had a perfect classi cation on training data, with large differences between predictions on testing data (i.e., high over tting). Other studies have also reported this problem with RF, and have advised that SDM technique over tting should be investigated before deciding which model is best (Carlson et al., 2016). Yet others have declared RF to be better than techniques such as ensemble models, but precisely in those cases over tting was not taken into account (e.g., Balestrieri  In most instances where an SDM is used, binarization of the presences probabilities is implemented with the MaxTSS threshold (Scherrer et al., 2018), possibly due to its balancing of sensitivity and speci city in the estimates regardless of species prevalence (Jiménez-Valverde & Lobo, 2007). However, when estimating ADD, thresholds such as MaxKappa that are biased towards better estimates for speci city could be a good alternative since they tend to transform the probabilities with higher values into presences, particularly when species prevalence is low (Jiménez-Valverde & Lobo, 2007). As a result, MaxKappa tends to reveal the most probable areas for species presence, thus reducing ADD estimate uncertainty. On the other hand, it can be extremely biased towards improving speci city, as was the case in our study for moths, leading to very low ADD size estimates. By contrast, the MeanProb threshold tends to obtain better estimates on sensitivity (Cramer, 2003), which is consistent with our results. It therefore has a tendency to transform probabilities with low values into presences, which is not ideal for ADD estimation given that the nal estimates will then be characterized by considerable uncertainty.
In our view, thresholds should be used that transform high probabilities into presences, as this will lead to good ADD estimates. However, they must still be controlled using a tool that determines the threshold cut-offs so that only, say, those probabilities above the 75th percentile of their probability distribution are transformed into presences.
The Animal Dark Diversity (ADD) could be a tool with numerous applications in biogeography and conservation biology. Our results showed that it is possible to get good ADD estimates when ensemble models are implemented, taking into account the threshold used. The ADD areas could be used as a proxy of interesting areas where is possible to nd species never recorded before. The ADD maps could be implemented into planning monitoring campaigns of species which are determinant, especially those with conservation problems, and which need intensive surveys for characteristic habits (nocturnal, cryptic, elusive species, etc.). The possibility to apply this novel concept through SDMs will help the ecologists and decision-makers to decide where they should focus their efforts, reducing the conservation costs.

Final Remarks
Understanding biodiversity patterns across different geographical areas is necessary in order to implement effective conservation plans, especially in the context of rapid evolution due to climate change. Our results suggest that the ECA ensemble model is the best option for animal dark diversity estimates, but the key conclusion is that whichever SDM technique is chosen, it must possess good predictive performance (especially in sensitivity) and low over tting while generating animal dark diversity geographic pattern estimates that do not differ signi cantly from those generated by SDM techniques. Furthermore, to reduce uncertainty in the estimates, the threshold type should be one that transforms high presence probabilities into binary information. To conclude, the ndings presented here should help improve animal dark diversity estimation for applications in conservation biology.  Figure 1 Geographic distribution of estimated Animal Dark Diversity by each SDM technique for bats in the Iberian Peninsula using the MaxTSS threshold type (see Figs. S1, S2, S3, S4 and S5 for ADD estimates using the MaxKappa and MeanProb thresholds for both bats and moths).