The devil is in the detail: Environmental variables frequently used for habitat suitability modeling lack information for forest‐dwelling bats in Germany

Abstract In response to the pressing challenges of the ongoing biodiversity crisis, the protection of endangered species and their habitats, as well as the monitoring of invasive species are crucial. Habitat suitability modeling (HSM) is often treated as the silver bullet to address these challenges, commonly relying on generic variables sourced from widely available datasets. However, for species with high habitat requirements, or for modeling the suitability of habitats within the geographic range of a species, variables at a coarse level of detail may fall short. Consequently, there is potential value in considering the incorporation of more targeted data, which may extend beyond readily available land cover and climate datasets. In this study, we investigate the impact of incorporating targeted land cover variables (specifically tree species composition) and vertical structure information (derived from LiDAR data) on HSM outcomes for three forest specialist bat species (Barbastella barbastellus, Myotis bechsteinii, and Plecotus auritus) in Rhineland‐Palatinate, Germany, compared to commonly utilized environmental variables, such as generic land‐cover classifications (e.g., Corine Land Cover) and climate variables (e.g., Bioclim). The integration of targeted variables enhanced the performance of habitat suitability models for all three bat species. Furthermore, our results showed a high difference in the distribution maps that resulted from using different levels of detail in environmental variables. This underscores the importance of making the effort to generate the appropriate variables, rather than simply relying on commonly used ones, and the necessity of exercising caution when using habitat models as a tool to inform conservation strategies and spatial planning efforts.


| INTRODUC TI ON
Species distribution modeling (SDM) and habitat suitability modeling (HSM) are commonly utilized for mapping species across diverse ecological fields (Franklin, 1995;Guisan & Zimmermann, 2000).For targeted nature conservation measures in particular, but also for the planning of interventions in the environment, it is paramount to map out valuable habitats for endangered species as precisely as possible.However, recent research has shown that models often exhibit poor performance in accurately predicting species distribution (Lee-Yaw et al., 2022) and should therefore be treated with caution.
In addition to climatic factors, human intervention in the landscape, such as creation and expanding of settlement, or agricultural activities, alters the environment, creates dispersal limitations, and disturbs biotic interactions to the extent that only a small portion of the area remains suitable as habitat for the species (Chauvier et al., 2021;Titeux et al., 2016).Land cover data have been employed in previous studies to incorporate environments shaped by human activities into the models.Typically, datasets such as the Corine Land Cover (CLC), which is readily available for all of Europe, are used, as it provides remote sensing-derived land cover classes on a large scale (Abdi, 2013;Edman et al., 2011;Garrote et al., 2020;Posillico et al., 2004).However, the land use classes are only represented at a relatively coarse level of detail (e.g., three classes for forest).As the assumption behind HSM requires that environmental variables reflect the ecological niche of the target species (Araújo & Guisan, 2006), this level of detail is likely to be insufficient for precise modeling of habitat suitability for most species (e.g., Bradley & Fleishman, 2008;Cánibe et al., 2022;Gottwald et al., 2017;Mortelliti et al., 2007).Hence, HSM might require the utilization of more targeted classes in contrast to relying on widely available and therefore commonly utilized large-scale land cover datasets.
In this study, we tested this hypothesis by comparing the influence of environmental variables that are representing the same information, but at different levels of detail (e.g., tree species composition vs. mixed, coniferous, and deciduous forest classes) on model performance and the resulting HSM maps for three forest-dwelling bat species (Barbastella barbastellus, Myotis bechsteinii, and Plecotus auritus) in Rhineland-Palatinate, Germany.A customized and targeted land cover dataset that provides forest habitat information at the tree species level for the entire federal state of Rhineland-Palatinate, Germany (Bald et al., 2024) was used, in combination with high-information remote sensing datasets such as light detection and ranging (LiDAR), also available for the entire federal state of Rhineland-Palatinate, to optimally reflect the ecological requirements of the target species.
We compared the results with those obtained when using more generalized spatial variables at a coarser level of detail that are typically used for HSM such as the CLC dataset.
For both variable sets, the same state-of-the-art modeling workflow in Maxent (Phillips et al., 2006(Phillips et al., , 2017;;Phillips & Dudík, 2008) that incorporates an automated variable selection and model tuning approach based on spatial cross-validation was employed (Bald et al., 2023).Spatial cross-validation was chosen due to recent studies highlighting its importance for model training, variable selection, and tuning, to prevent inflated performance metrics and overly complex models (Kattenborn et al., 2022;Meyer et al., 2018Meyer et al., , 2019;;Ploton et al., 2020;Roberts et al., 2017).
The objective of this study is to compare two distinct modeling approaches, one employing targeted variables, tailored to reflect the ecological requirements of the target species (Araújo & Guisan, 2006), and the other utilizing generalized variables commonly used in HSM, these are hereafter referred to as targeted variables modeling (TVM) approach and generalized variables modeling (GVM) approach.

| MATERIAL S AND ME THODS
All data pre-and postprocessing were done in R version 4.1.2.For processing raster data, the R packages terra (version 1.7.3, Hijmans, 2023) and raster (version 3.6.3, Hijmans, 2022) were used.Additionally, the R package sf (version 1.0.7,Pebesma, 2018) was employed for processing vector data.The habitat suitability models were created with the software extension spatialMaxent (Bald et al., 2023).

| Study area
The federal state of Rhineland-Palatinate, covering a total area of 19,858 km 2 in southwestern Germany, served as the study area (Figure 1).With approximately 42% of its area covered by forests (BMEL, 2018), Rhineland-Palatinate is acknowledged as one of the regions in Germany with the highest abundance of forests.The low mountain ranges are largely characterized by spruce and beech, while Rhineland-Palatinate also features a large amount of oak and pine forests (PEFC, 2015).Rhineland-Palatinate is home to 22 species of bats, including the forest-dwelling Bechstein's bat (Myotis bechsteinii), which has its main distribution in Germany and is listed as critically endangered on the German Red List (IUCN, 2016a;Meinig et al., 2020).The barbastelle bat (Barbastella barbastellus) is also classified as critically endangered and the brown long-eared bat (Plecotus auritus) as endangered (IUCN, 2016b(IUCN, , 2023;;Meinig et al., 2020).
Focusing on forest-dwelling bat species, the study area was confined to Rhineland-Palatinate's forested regions, defined by a forest

| Presence-only data
The presence-only data (occurrence records) used in this study consisted of the locations of roosting trees frequented by female individuals of the three species during their reproductive period, serving as their summer habitat, collected between 2014 and 2022.138 presence points were available for Barbastella barbastellus, 366 for Myotis bechsteinii, and 151 for Plecotus auritus.The data were collected by the BFL (Büro für Faunistik und Landschaftsökologie, Rümmelsheim; BFL, 2023) as part of a data query addressing all stakeholders engaged in bat conservation initiatives within Rhineland-Palatinate. Due to the close proximity of some roost trees, the points were thinned using the R package spThin (version 0.2.0,Aiello-Lammens et al., 2015) to ensure a minimum distance of 50 m between them.The spThin algorithm is based on the concept of nearest neighbor distance (Aiello-Lammens et al., 2015).This process aims to have no more than one record per raster pixel (50 m × 50 m) and therefore eliminates data duplicates.It resulted in 135 data points for Barbastella barbastellus, 285 points for Myotis bechsteinii, and 132 points for Plecotus auritus.For modeling purposes, these points were manually divided into seven spatial clusters, with each cluster serving as one cross-validation fold in the modeling process (see Section 2.5), each representing a distinct spatial area (Figure 2).We opted for seven spatial folds to allow for the removal of two folds for testing purposes, while still enabling us to conduct five fold spatial cross-validation.

| Background points
Maxent is a presence-only modeling method, but requires the inclusion of artificial background points for model training and validation (Phillips et al., 2006).These background points were distributed randomly across the entire study area, aimed at representing the whole spectrum of environmental predictor variables.In numerous studies, the number of background points used was 10,000 (e.g., Bandara et al., 2022;Kafash et al., 2022;Wright et al., 2021), which is the default setting for Maxent (Phillips et al., 2006).However, recent studies have shown that higher numbers of background points may be necessary for large study areas (Renner et al., 2015;Warton & Shepherd, 2010).As our study area covers 19,858 km 2 , we decided to use a total of 50,000 background points.They were created using the function randomPoints() from the R package dismo (version 1.3.9;Hijmans, Phillips, et al., 2022).

| Environmental variables
A total of 754 environmental variables were used in this study, which involves a comparative analysis of two modeling approaches for each of the three bat species considered: one utilizing generalized variables and the other employing targeted variables (Figure 3).
The full set of environmental variables consisted of eight different variable groups (Figure 3): Spectral remote sensing data from a Sentinel-2 time series (ESA), indices derived from airborne LiDAR data (GeoBasis-DE/LVermGeoRP, 2021), climate data (DWD Climate Data Center [CDC] 2014[CDC] , 2023a[CDC] , 2023b[CDC] , 2023c[CDC] , 2023d, 2023e, 2023f, 2023g), 2023e, 2023f, 2023g), bioclimatic data (Fick & Hijmans, 2017), a global canopy height map produced with GEDI data (Potapov et al., 2021), indices derived from a tree species map (TSM; covering the six most important tree species in the study area; Bald et al., 2024;  (OpenStreetMap, 2023).Detailed information on the acquisition and processing of these variables is provided in the Appendix.The variables from different groups were incorporated into one or both of the two modeling approaches.In both modeling approaches, all variable groups were covered (Table 1), even though realistically, not all variable groups are used for most habitat modeling (e.g., often only climate data are used).Furthermore, we do not claim completeness regarding the variable groups used here.Depending on the species, other variable groups may also be of importance.

| Targeted environmental variables
For the TVM the complete Sentinel-2 time series, totaling 203 variables ("Sentinel-2 data" in the Appendix) was employed.Furthermore, all 29 LiDAR indices ("LIDAR data" in the Appendix), 103 climate variables ("Climatic variables" in the Appendix), 19 bioclimatic variables ("Bioclimatic variables" in the Appendix), 265 tree species map variables ("Tree-species-map" in the Appendix), and the distance to water variable ("Distance to water" in the Appendix) were utilized.
The data selected here as target variables were chosen to be more detailed than commonly used variables.However, this does not mean that other more detailed variables that were not considered here could not yield similar or even better results.For example, data on prey availability, such as maps of insect abundance (e.g., Partridge et al., 2020;Sierro & Arlettaz, 1997), may also be valuable.
In total, 620 variables were generated for the TVM.Pearson correlation analysis was performed to identify variables that demonstrated high similarity.Given that the modeling process utilized only the background points and not the complete environmental rasters, the correlation analysis was also conducted on the background points.Variables displaying a correlation exceeding 0.7 were excluded from further analysis to mitigate multicollinearity issues (Dormann et al., 2013;Zurell et al., 2020).This resulted in 111 variables that were used for the TVM ("Initial variable stack used for the targeted-variables modeling approaches" in the Appendix).
F I G U R E 2 Spatial folds delineating occurrence records for each bat species.Roosting sites of three bat species in Rhineland-Palatinate, Germany, partitioned into spatial folds for validation and testing.Each fold encompasses multiple roosting sites.

| Generalized environmental variables
The GVM approach involved a selection of variables commonly found in existing literature.Specifically, all 19 bioclimatic variables ("Bioclimatic variables" in the Appendix), and all 103 climate variables ("Climatic variables" in the Appendix) were incorporated, as they were among the most commonly used variables for SDM (e.g., Amini Tehrani et al., 2020;Bandara et al., 2022;Bellamy et al., 2020;Cable et al., 2021;Luo et al., 2020;Raman et al., 2020;True et al., 2021;Wright et al., 2021).If variables reflecting the three-dimensional forest structure were utilized, these often involve canopy height data (e.g., Amini Tehrani et al., 2020;Raman et al., 2020).Therefore, the global forest canopy height product ("Global forest canopy height product" in the Appendix) was used as a predictor variable (Potapov ), these inputs were processed to generate indices.The resulting variables were divided into a generalized and a more targeted stack of variables.These variables served as input for the modeling, where two models were created for each bat species.Variables that were part of the generalized variables modeling approach are displayed in orange.Variables that were part of the targeted variables modeling approach are displayed in pink.Variables can be part of both the generalized variables and the targeted variables modeling approach.et al., 2021).In terms of spectral data, normalized difference vegetation index (NDVI) often predominates for habitat modeling (e.g., Amini Tehrani et al., 2020;Kafash et al., 2022;Luo et al., 2020).
Therefore, all NDVI variables from the time series (VI10 January, March, April, May, June, September, and October; "Sentinel-2 data" in the Appendix) were included.Instead of class indices derived from the tree species map, the indices based on the CLC dataset ("Corine Land Cover" in the Appendix) were integrated.While the application of land cover information (e.g., Amini Tehrani et al., 2020;Bandara et al., 2022;Bellamy et al., 2020;Cable et al., 2021;Luo et al., 2020;Thomas et al., 2021;Wright et al., 2021) and/or landscape information via Fragstats or the R package landscape metrics is common in habitat modeling (e.g., Cable et al., 2021;Neubaum & Aagaard, 2022;Thomas et al., 2021), the highest resolution for forest classes typically extends no further than the categories provided by CLC, namely broadleaf, coniferous, or mixed forest (e.g., Amini Tehrani et al., 2020;Andersen et al., 2022;Bandara et al., 2022;Cable et al., 2021;Thomas et al., 2021;True et al., 2021).Therefore, the GVM approach incorporates all indices derived from the CLC dataset, amounting to 133 indices (compared to 265 for the tree species map variables with six classes for forest).Finally, the variable "distance to water" ("Distance to water" in the Appendix) was included, as similar variables based on vector datasets were commonly integrated (e.g., Amini Tehrani et al., 2020;Cable et al., 2021;Luo et al., 2020).The complete raster stack consisted of 263 variables.A Pearson correlation on all background points was calculated and variables with a correlation higher than 0.7 were removed.This resulted in 65 environmental variables used for the generalized variables habitat modeling ("Initial variable stack used for the generalized-variables modeling approaches" in the Appendix).
Two comprehensive benchmark studies by Valavi et al. (2022Valavi et al. ( , 2023) ) found that ensemble models (such as biomod2; Thuiller et al., 2009) only achieve satisfactory results when each model within the ensemble is meticulously tuned, validated, and tested.Another top performing method in these two benchmark studies was Maxent (Phillips et al., 2006(Phillips et al., , 2017;;Phillips & Dudík, 2008), which is one of the most widely adopted techniques for modeling species distribution (Guillera-Arroita et al., 2015).The varying performance of Maxent in such comparisons is likely due to the fact that Maxent only achieves good results when thorough tuning is conducted for each species (Merow et al., 2013;Morales et al., 2017).
SDM models frequently exhibit poor performance on spatially independent test data (Lee-Yaw et al., 2022), often due to the neglect of properties inherent in spatial data (Bahn & McGill, 2012).For example, numerous studies have highlighted the importance of spatial validation, to avoid inflated evaluation metrics and suboptimal parameter selection (Bahn & McGill, 2012;Bald et al., 2023;Meyer et al., 2018Meyer et al., , 2019;;Ploton et al., 2020).Furthermore, hyperparameters of the models should be tuned for each species (Merow et al., 2013;Morales et al., 2017), using spatial validation to select the right parameters.The same applies to variable selection; it has been shown that using too many variables can lead to bias in the results (Irving et al., 2020;Wang et al., 2016;Williams et al., 2012).Moreover, better results were achieved when variables were selected automatically rather than manually by the modeler (Meyer et al., 2019;Yiwen et al., 2016).Additionally, variable selection must be based on a spatial validation method to generate reliable results (Bald et al., 2023;Meyer et al., 2018Meyer et al., , 2019)).As these functionalities are not yet all available for the aforementioned high-performing ensemble models without substantial complexity, the software spatial-Maxent (version 1.0.0;Bald et al., 2023) was used in this study.spatial-Maxent is an extension for Maxent version 3.4.4(Phillips et al., 2006) and allows the utilization of Maxent with a spatial cross-validation, a variable selection and tuning of feature classes and regularization multiplier.Furthermore, spatialMaxent achieved better modeling results on a benchmark dataset of over 200 species than a more standardized version of Maxent, indicating that direct comparison between the two is not feasible (Bald et al., 2023).
The implemented variable selection was a forward variable selection (Meyer et al., 2019).In forward variable selection, the selection process begins by training a model for all possible combinations of two variables.The best two variables are selected, and one more model with one additional variable is trained for each remaining variable.The model with the best three-variable combinations is retained.This process is repeated until no further improvement is observed by adding more variables.For additional details on forward variable selection, refer to Meyer et al. (2019).
Using spatialMaxent a total of six models were calculated, two for each bat species, one with the targeted variables and one with the generalized variables, including forward variable selection, forward feature selection and regularization multiplier tuning in steps of 0.5 from 0.5 to 7 and spatial cross-validation.

| Model comparison
In this study, two modeling approaches were compared: one employing targeted variables (see Section 2.4.1), and the other utilizing generalized variables (see Section 2.4.2).For each of the three bat species analyzed, two models were developed employing identical modeling workflows (see Section 2.5), differing only in their input variables.
The TVM approach incorporated 111 variables, while the GVM approach comprised 64.Employing an automated variable selection method ensured that only the essential variables were retained (Meyer et al., 2018(Meyer et al., , 2019)), therefore, the difference in the initial variable count should not influence the comparability of the models.

| Evaluation
To evaluate the models, forward fold metric estimation (ffme; Bald et al., 2023), a variant of nested cross-validation (Schratz et al., 2019) was employed.Two out of the seven spatial folds were consistently utilized for testing the models, while the remaining five folds were used for spatial five fold cross-validation.This process was iterated for all possible combinations of two test and five training folds, resulting in the calculation of 21 models (7!/((7 − 2)! * 2!)) for model testing for each species and modeling approach.For all 21 models, several evaluation metrics were calculated and the mean value across all 21 models was used to assess the performance.To determine the best modeling approaches numerous metrics were utilized, aiming to minimize uncertainties associated with each individual metric (Lobo et al., 2008).In metrics requiring absence points, an equivalent number of background points as available presence points were randomly sampled.Following the recommendation of Yackulic et al. (2013), these metrics are labeled as presence-only (PO), such as MAE PO for mean absolute error.Zurell, 2020).Another commonly used metric in SDM is Kappa PO , which can be influenced by species prevalence; as an alternative, the true skill statistic (TSS PO ) has been suggested (Allouche et al., 2006).
We employ both metrics here to provide a comprehensive overview, and both were also calculated using the R package mecofun (version  et al., 2002).Therefore, this metric was also included and computed using the R package ecospat (version 3.5; Broennimann et al., 2023).
Finally, we included three additional metrics, namely, Sensitivity PO , Specificity PO , and percent correctly classified (PCC PO ) as they are commonly used in SDM according to Mouton et al. (2010).They were also calculated using the R package mecofun (version 0.1.1;Zurell, 2020).
To compare the habitat suitability maps of the TVM approach with the maps of the GVM approach, we utilized two metrics.The Pearson correlation, calculated with the layerCor() function from the terra R package (Hijmans, 2023), and the root mean square error (RMSE) were applied to assess map differences.Additionally, habitat suitability maps were transformed into presence-absence maps using the evalSDM() function from the R package mecofun (version 0.1.1;Zurell, 2020), with threshold selection based on maximizing the TSS PO across all folds.We recognize that the method for generating presence-absence maps can influence results (Wilson et al., 2005).By consistently applying the same threshold selection method across all models, we ensure a valid comparison, aligning with our emphasis on only comparing the TVM approach and the GVM approach.

| Performance of TVM and GVM approaches
For all three bat species researched in this study, the TVM approach consistently outperformed GVM approach across the majority of calculated metrics (Figure 5; Table 2).For Barbastella barbastellus and Plecotus auritus, the TVM approach performed better on all metrics (Figure 5a,c).For Barbastella barbastellus, the largest disparity among metrics was evident in COR PO , with a mean value of 0.65 for the TVM approach compared to 0.49 for the GVM approach (Figure 5a).For Plecotus auritus, the greatest difference was observed in the mean AUC PRG-PO metric, recording 0.66 for the TVM approach and 0.38 for the GVM approach (Figure 5c).The smallest difference between metrics for all bat species was found in mean MAE PO , where the difference between modeling approaches was less than 0.1 (Figure 5a-c).
For the species Myotis bechsteinii, superior outcomes were evident for all metrics in the TVM approach, except for CBI (Figure 5b).The mean CBI was 0.92 for the GVM approach and 0.82 for the TVM approach.
Among the targeted variable models of all the bat species, Barbastella barbastellus performed the best, followed by Myotis bechsteinii, and Plecotus auritus ranked the lowest (Figure 5d).In the generalized variable models, the pattern was similar, with only CBI for Myotis bechsteinii being better than the other two species (Figure 5e).

| Contribution of variable groups
A close analysis of the selected variables showed that, within the TVM approach, variables derived from the tree species map variable group make a notable contribution (Figure 6), collectively accounting for over 40% in each model.The targeted variable models of Plecotus auritus and Myotis bechsteinii had one of the tree species map variables among the first two most important variables ("Variable importance of all three bat species and two modeling approaches" in the Appendix, Figure A1).The targeted variable model of Barbastella barbastellus selected a variable from the tree species map variable group as the third most important variable ("Variable importance of all three bat species and two modeling approaches" in the Appendix, Figure A1).For all three bat species, the tree species map variables had also the highest number of selected variables.
The models of Barbastella barbastellus and Myotis bechsteinii selected over ten variables from the tree species map variables, while only nine were chosen by the targeted variable model of Plecotus auritus.For the targeted variable model of Barbastella barbastellus and Plecotus auritus, the second most important variables belonged to the LiDAR variables.For both species, the LiDAR variables exhibited a contribution exceeding 20%, and were the second most influential variable group based on the number of selected variables, three variables were selected for Barbastella barbastellus, and four variables for Plecotus auritus.The targeted variable model for Barbastella barbastellus identified a variable from the bioclimatic category as the most crucial, although this was the only one selected from these variables group.In the targeted variable model for Myotis bechsteinii, the Sentinel-2 variable group emerged as the second most important, with a total of five variables selected, collectively contributing 22.6%.Subsequently, a LiDAR variable, a climate variable, and the distance-to-water variable were also chosen.
In the GVM approach, it becomes apparent that the noteworthy contribution of the tree species map variable group in the TVM approach and its frequent selection could not be replaced by the variables of the CLC variable group (Figure 6).Instead, the bioclimatic variables contributed the highest for Barbastella barbastellus (58.4%) and Plecotus auritus (65.5%), while for Myotis bechsteinii, the Sentinel-2 variables group was the most important one (32.4%).However, the  Figures (d) and (e) provide a cross-species comparison between targeted variables and generalized variables modeling approaches.The MAE PO is presented at a reversed scale to enhance readability.Note that comparing metrics to each other is not feasible, as they are on different scales.Only comparisons between different modeling approaches and species are possible for each metric.The axis ranges from 0 to 1 in steps of 0.2, as shown in (a).The direct comparison of modeling approaches for each bat species shows that, with the exception of one metric (CBI, Myotis bechsteinii), the targeted variables modeling approach consistently outperforms the generalized variables modeling approach (a-c).cluded in all generalized variable models.The contribution in all generalized variable models showed that bioclimatic variables were not very often selected but with a very strong contribution, similar to the distance to water variable.The CLC variable group, similar to the tree species map variable group in the TVM approach, was frequently selected, but with lower contributions that often lagged behind those of the Sentinel-2 and climate variables.

| Comparisons of modeling approaches for presence-absence and habitat suitability maps
In this section, we undertake two comparisons.First, we analyze the agreement between the habitat suitability maps produced by the TVM approach with those generated by the GVM approach, examining each bat species individually.Furthermore, we compare the suitable areas of the presence-absence maps that were generated through thresholding the habitat suitability maps, specifically evaluating the differences between the TVM approach and the GVM approach.
When comparing the habitat suitability maps of the two modeling approaches directly (Figure 7 In the habitat suitability maps, thresholding was applied to generate presence-absence maps (refer to Section 2.7).Upon examining the suitable areas delineated in the presence-absence maps (Figure 8), it becomes evident that, for the TVM approach, fewer areas were determined as suitable compared to the GVM approach.
The largest difference in the predicted suitable area is for Myotis bechsteinii, where the disparity in suitable area between the TVM approach (2133.26km 2 ) and the GVM approach (2677.59km 2 ) is 544.33 km 2 .Both modeling approaches predict the largest suitable area for Plecotus auritus compared to the other two bat species, with 2370.13 km 2 for the TVM approach and 2755.48 km 2 for the GVM TA B L E 2 Mean results of all metrics across all 21 forward fold metric estimation folds for all models.Note: The value of the higher performing modeling approach for each species is indicated in bold.
F I G U R E 6 Variable contribution analysis.Each bat species and modeling approach are represented with two plots.The right plot illustrates the cumulative percentage contribution of variables within each variable group.The left plot displays the frequency distribution (number of variables) within each variable group.For the left plot the sum of the percent contribution of the variable group to the model are on the y-axis and the variable groups on the x-axis.For the right plot the frequency, how often a variable from the variable group was chosen by the model is on the y-axis and the variable group on the x-axis.The variable groups are bioclimatic variables, climatic variables, Corine Land Cover (CLC) variables, distance to water variable, global canopy height variable, light detection and ranging (LiDAR) variables, Sentinel-2 variables, and tree species map (TSM) variables.
| 13 of 27 approach.The smallest difference in suitable area between the TVM approach (1498.45km 2 ) and the GVM approach (1716.75km 2 ) is observed for Barbastella barbastellus, with a difference of 218.31 km 2 , and this species also had the smallest overall predicted area in both modeling approaches.

| DISCUSS ION
To evaluate how variables at a high level of detail affect model performance and resulting HSM maps, we contrasted the performance of targeted variables containing information on forest structure (LiDAR), and tree species composition with environmental variables that reflect the same landscape features with less detail on the landscape composition, such as CLC dataset or a canopy height model.
The large contribution of tree species map based variables, along with structural variables (LiDAR; Barbastella barbastellus and Plecotus auritus) and spectral remote sensing time series variables (Myotis bechsteinii), underscores the importance of targeted variables that reflect local species habitat conditions.That remote sensing variables contribute to improved results in HSM has previously been addressed (Leitão et al., 2019;Leyequien et al., 2007;Randin et al., 2020;Regos et al., 2022).However, often the focus is on the utilization of spectral remote sensing data (e.g., Amini Tehrani et al., 2020;Kafash et al., 2022;Luo et al., 2020).In contrast, the frequent selection of LiDAR variables by the TVM approach suggests that structural data enhances model performance compared to relying solely on spectral data or canopy height information.
Given the widespread availability of LiDAR data, particularly in Europe, their inclusion should be more frequently considered in HSM.The importance of high-information spectral remote sensing data (Oeser et al., 2020) was further confirmed in our study (second most important variable group for Myotis bechsteinii).
The GVM selected CLC variables for all three bat species but their lower contribution, compared to tree species map variables in the TVM, indicates a limited ability to capture nuanced habitat requirements essential for precise HSM.This is also supported by Mortelliti et al. (2007) who previously highlighted the inadequacy of forest classes in the CLC dataset for accurately describing the habitat of specialized forest-dwelling species.However, the modeling of species with a broad ecological niche may not necessarily benefit from the utilization of targeted land-cover variables.
The large differences in the habitat suitability maps, with a highest agreement in Pearson correlation between TVM and GVM of 0.45, further emphasize the need to proceed with caution when selecting environmental variables for HSM.These findings are consistent with prior research, showing discrepancies in model outcomes with varying variables (Bucklin et al., 2014;Syphard & Franklin, 2009).
Nonetheless, the predicted suitable areas align with species niche width, designating Barbastella barbastellus as the most threatened as well as most habitat-demanding and Plecotus auritus as the least demanding among the three bat species.The GVM consistently predicted larger suitable areas than the TVM.This aligns with recent studies indicating an overestimation of species' ranges compared to their actual AOH (Jetz et al., 2008;Rondinini et al., 2005).The observed limitations of GVM in accurately delineating AOH emphasize the need for more targeted approaches to habitat suitability mapping, as bias introduced by the overestimation of the AOH can heavily influence conservation outcomes (Jetz et al., 2008).
The results of HSM are always associated with high uncertainties and should therefore be treated like a hypothesis (Jarnevich et al., 2015).Nevertheless, they have potential as informative tools for habitat conservation and management strategies.For example, the maps created in this study could be utilized in spatial planning endeavors, such as determining suitable locations for wind farms and infrastructure projects (Gaultier et al., 2020;Roscioni et al., 2013).Before commencing on-site evaluations of such projects, the locations can be chosen to minimize habitat suitability in the project area.Subsequently, an on-site investigation can be conducted.This approach can enhance the likelihood that none of the three endangered species are found during the on-site assessment and thus shorten planning time and costs (Roscioni et al., 2013).Additionally, our findings have implications for forest management.LiDAR data were often selected by the models as important variables, indicating that forests with high structural complexity are favored by all three bat species (Figure 6; Froidevaux et al., 2016).Furthermore, these three forest-dwelling bat species demonstrate a preference for deciduous forests, with a high proportion of oak ("Variable importance of all three bat species and two modeling approaches" in Appendix).Such findings highlight the necessity of integrating these considerations into statewide forest management planning, particularly when decisions concern areas with high habitat suitability, as there is a risk of further habitat loss for these species.
Further research, testing alternative modeling methods beyond Maxent, will be important to reinforce and generalize the findings of this study.In summary, the integration of targeted vari-

| 3 of 27 BALD
et al. mask derived from the Copernicus high-resolution forest cover map (2018) with a spatial resolution of 10 m (EEA, 2022).The forest cover map was used to mask all environmental variables, which were resampled to 50 m spatial resolution as a compromise between fine spatial resolution and number of raster cells to be processed.All environmental variables were resampled to or calculated based on the forest mask.
Figure 4), indices derived from CLC forest classes (EEA, 2018; Figure 4), and a raster showing the distance of each pixel to water in the study area F I G U R E 1 Study area.(a) Location of the study area, the federal state of Rhineland-Palatinate, in Europe, Germany.(b) Forest cover of the study area (dark shading indicates forest cover and light shading indicates no forest cover).Data: EEA (2022) and OpenStreetMap (2023).

F
I G U R E 4 Tree species map and Corine Land Cover.(a) The six most common tree species in the study area are beech, Douglas fir, oak, spruce, pine, and other deciduous trees (Bald et al., 2024).(b) CLC forest classes used in the study: deciduous forest, coniferous forest, and mixed forest (EEA, 2018).| 7 of 27 BALD et al.

0. 1
.1;Zurell, 2020).In their benchmark SDM study,Valavi et al. (2022) also used the area under the precision-recall-gain curve (AUC PRG-PO ) and Pearson correlation between observed and predicted values (COR PO ).Therefore, we have also used these metrics for our study and calculated the AUC PRG-PO with the function calc_auprg() from the R package prg (version 0.5.1;Kull & Flach, 2023), as well as Pearson correlation using the R function cor() (R Core Team, 2023).There is only one commonly used metric available for evaluating the performance of SDM solely on presence data: the continuous Boyce index (CBI; Boyce

F
Comparison of model performance.Mean test results for all three bat species using both targeted variables and generalized variables modeling approaches.Figures (a) to (c) present individual bat species' metric results, comparing the two modeling approaches.
), Plecotus auritus exhibited the highest agreement between the two maps, with a Pearson correlation of 0.45.The maps of Myotis bechsteinii ranked at a similar level, as indicated by a Pearson correlation of 0.42.Barbastella barbastellus performs least favorably in this regard, with a Pearson correlation of 0.36.However, examining the RMSE values reveals that the differences of the two modeling approaches is higher for Myotis bechsteinii, with an RMSE value of 0.33, while Barbastella barbastellus and Plecotus auritus have RMSE values of 0.18 and 0.21, respectively.
Konowalik and NHamner & Frasco, 2018)detailed are in the generalized variables variable group.Variables with a higher level of detail in the targeted variables variable group.Detailed description of all variables in Appendix.Assignment of variable groups to respective modeling approaches at different levels of detail.Konowalik and Nosol (2021)recommend using the area under the receiver operating characteristics curve (AUC ROC-PO ) and the mean absolute error (MAE PO ) for selecting the model with the best performance.Therefore, the MAE PO was calculated using R package Metrics(version 0.1.4;Hamner&Frasco, 2018)and the AUC ROC-PO with the function evalSDM() from the R package mecofun (version 0.1.1; TA B L E 1

Species Modeling approach AUC ROC-PO TSS PO Kappa PO Sensitivity PO Specitivity PO PCC PO CBI AUC PRG-PO COR PO MAE PO
Habitat suitability maps.Maps showing the habitat suitability of the forested areas for all three bat species and the targeted variables (left) and generalized variables (right) modeling approach.Values close to 0 (blue) indicate low habitat suitability while values closer to 1 (red) indicate high habitat suitability.Study area is in white.
was shown that the choice of environmental variables heavily influences habitat suitability maps.This result underlines the necessity of selecting spatial variables with the best possible approximation to the ecological requirements of the target species, even if this can involve a considerable effort in data preparation.Despite the great potential of HSM for species conservation inF I G U R E 7 Variable importance of all three bat species and two modeling approaches (targeted variables (left) and generalized variables (right)).Variable importance in percent.Color of the variable bar plot indicates from which data source the variable was derived.The variable groups are bioclimatic variables, climatic variables, Corine Land Cover (CLC) variables, distance to water variable, global canopy height variable, light detection and ranging (LiDAR) variables, Sentinel-2 variables, and tree species map (TSM) variables.