Abstract
Context
Species distribution models are widely used in ecology. The selection of environmental variables is a critical step in SDMs, nowadays compounded by the increasing availability of environmental data.
Objectives
To evaluate the interaction between the grain size and the binary (presence or absence of water) or proportional (proportion of water within the cell) representation of the water cover variable when modeling water bird species distribution.
Methods
eBird occurrence data with an average number of records of 880,270 per species across the North American continent were used for analysis. Models (via Random Forest) were fitted for 57 water bird species, for two seasons (breeding vs. non-breeding), at four grains (1 km2 to 2500 km2) and using water cover as a proportional or binary variable.
Results
The models’ performances were not affected by the type of the adopted water cover variable (proportional or binary) but a significant decrease was observed in the importance of the water cover variable when used in a binary form. This was especially pronounced at coarser grains and during the breeding season. Binary representation of water cover is useful at finer grain sizes (i.e., 1 km2).
Conclusions
At more detailed grains (i.e., 1 km2), the simple presence or absence of a certain land-cover type can be a realistic descriptor of species occurrence. This is particularly advantageous when collecting habitat data in the field as simply recording the presence of a habitat is significantly less time-consuming than recording its total area. For models using coarser grains, we recommend using proportional land-cover variables.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
The relationship between species occurrence and their environment is fundamental to ecology (Cadotte et al. 2011; Wisz et al. 2013; Schmeller et al. 2018; Young et al. 2023) and its importance is growing in the face of the ongoing global change (Butchart et al. 2010; Barnosky et al. 2011; Carlson et al. 2022). Understanding species’ responses to the environment is intrinsically linked to the possibility of predicting species distribution patterns and, hence, is useful for their conservation and management. Species distribution models (SDMs) are widely used for such purposes (e.g. Václavík et al. 2012; Cohen et al. 2016; Ellis‐Soto et al. 2021; Lindegren et al. 2022; Mohammadi et al. 2022; Cogliati et al. 2023).
Even though SDMs are now commonly adopted, ecologists still face challenges. These are in particular related to the quality of the input data, which can significantly impact the fitted models (Araújo et al. 2019; Gábor et al. 2020; Bazzichetto et al. 2023; Smith et al. 2023; Wang and Jackson 2023). Such challenges include, among other issues, the selection of the appropriate scale/grain (Miguet et al. 2016; Wunderlich et al. 2022; Zarzo‐Arias et al. 2022) and environmental variables (Williams et al. 2012; Moudrý et al. 2019; Smith and Santos 2020).
The selection of environmental variables is a critical step in SDMs, nowadays compounded by the increasing availability of environmental data (e.g., Cord et al. 2013, 2014; Šímová et al. 2019; Howard et al. 2020; Moudrý et al. 2023a). This is especially true for land-cover data, the availability and quality of which increases with the number of remote sensing data (e.g. Prošek et al. 2020; Karra et al. 2021; Zhang et al. 2021; Hopkins et al. 2022). Land cover types are among the most common predictors that enter the SDMs, especially in the context of land-cover changes (Coppée et al. 2022; Peng et al. 2022). Land cover type, which typically describes the habitat availability within a spatial unit, is commonly included as a continuous variable—for example, as the area or proportion of a specific land-cover type within the study area (e.g., Moudrý and Šímová 2013; Rose et al. 2020; Koma et al. 2022). The underlying rationale is based on an assumption that the probability of occurrence of a species increases continuously with the increase in land cover (or habitat) area within a given spatial unit. This could be attributed to the fact that larger habitat areas sustain larger populations due to their higher carrying capacity and shelter and food availability, so that populations are less susceptible to stochastic extinctions, competition, predation and inbreeding depression (Hanski 1999; Lande et al. 2003). In addition, larger habitat areas are bigger targets for colonizing individuals from the surrounding habitat matrix (Buckley and Knedlhans 1986), thereby increasing the probability of rescue effects following extinction events (Brown and Kodric-Brown 1977). However, in some cases, the mere presence or absence of a habitat may be more important than the total habitat area. In a recent study, Gábor et al. (2022a) demonstrated that for species specializing in rare habitats, such as water bodies in Central Europe, the amount of habitat within a spatial unit is sometimes less important than its simple presence or absence. This is related to the concept of critical habitat area (Andren 1994; Fahrig 2001; Melo et al. 2018), which hypothesizes that there is a threshold in habitat amount below which a species cannot survive; for example, loons (order Gaviiformes) are physiologically constrained from foraging on land and require water habitat for survival (see Fig. 1). While there are limited examples of such an approach in SDMs (e.g., Pearson et al. 2004; Romero et al. 2013; Zhang and Vincent 2018), in a recent study, Gábor et al. (2022a) demonstrated that for species specializing in rare habitats, such as water bodies in Central Europe, the amount of habitat within a spatial unit is sometimes less important than its simple presence or absence. This could be particularly advantageous when collecting habitat data in the field as simply recording the presence of a habitat is significantly less time-consuming than recording its total area.
It is well-recognized that SDMs are grain-dependent (Elith and Leathwick 2009) and empirical evidence has shown that species exhibit stronger responses to their environment at certain grains than others (see reviews by Miguet et al. (2016), Moudrý et al. (2023b)). Therefore, the choice of the grain constitutes an important part of the modeling process as it can affect the ability to detect the response (Mertes and Jetz 2018; Luebert et al. 2022; Wunderlich et al. 2022; Lu and Jetz 2023). In addition, the choice of the grain is often determined by data availability rather than study goals. This results in large variability of grains adopted in existing studies, from a few meters (e.g., Bazzichetto et al. 2018; Lecours et al. 2020; Casanelles-Abella et al. 2022; Stark and Fridley 2022) to many kilometers (e.g., Kleisner et al. 2017; Norberg et al. 2019; Zarzo-Arias et al. 2022). The grain size is also essential when selecting environmental variables (Pearson and Dawson 2003; Moudrý et al. 2023b), and it is likely to be critical when deciding whether to use land-cover as a binary or continuous variable when modelling species occurrence. However, this has never been thoroughly tested.
We hypothesize that at finer grains, binary variables are sufficient to fit a useful model, as a single habitat type is more likely to be dominant or completely absent. However, at coarse grains, the presence of habitat alone cannot be enough to sustain a viable population that requires a certain percentage of habitat in a given area, and the variables representing the habitat proportion may play a greater role. Further, habitat specialists may only require the presence of a given habitat type during winter or migration but, require that habitat to be the dominant type on the landscape during breeding season (Zuckerberg et al. 2016), emphasizing the importance of understanding seasonal moderators on the selection of environmental variables.
Therefore, in this study, we examined the interaction between the grain size and the representation of a land-cover variable (binary or continuous) when modeling species distribution. We used citizen science data downloaded from eBird to fit species distribution models for 57 water bird species in North America. The eBird dataset consists of extensive and often spatially detailed occurrence data (an average number of records per species in this study is 880,270 records), making it ideal to model species distributions down to fine spatial grains and across large spatial extents. For each species, we fit models with water cover represented as a proportional or binary variable and using multiple grains (1 × 1 km, 5 × 5 km, 10 × 10 km, and 50 × 50 km). In addition, we fit models for both breeding and non-breeding seasons because many of these species occupy highly distinct ranges between these seasons, with specialization to landcover type often greatest during breeding season (Zuckerberg et al. 2016). Specifically, we address the following questions: (1) Do models built with binary land-cover variables perform equally well or even better than those built with proportional variables? (2) Is the model performance affected by the adopted gain? (3) Does breeding and non-breeding season affect model performance? (4) Do grain and season affect the usability of the binary water cover variable?
Material and methods
Modeling region and species selection
We modeled species distributions across the North American continent, in a box between 179.99° W, 42.68° W, 10.53° N, and 87.11° N that included Canada, the United States, and the majority of Central America. Although we only modeled species native to the United States and Canada and only present the biodiversity estimates for this region, modeling each species’ entire continental range was required to ensure accurate predictions.
Based on the American Birding Association birding codes (2008) updated to the Clements bird taxonomy as of 2021, we compiled a list of 197 water-associated native bird species that breed or overwinter in the United States and Canada each year (Clements 2007). These codes are widely used to distinguish regularly occurring species (code 1 or 2 species, which we use) from vagrants occurring irregularly (code 3+). We excluded species with almost entirely marine ranges or with insufficient data points and those for which records are only available for part of the season (i.e., breeding, non-breeding). We also avoided modeling Hawaiian endemics because these species were restricted to islands that were smaller than some of our spatial grain sizes.
Species data acquisition and filtering
We gathered data from eBird, a global citizen science initiative in which users submit checklists containing bird observations, widely used for understanding species distributions (Sullivan et al. 2014). eBird provides vast amounts of avian biodiversity data and the number of records used in this study is literally unique, ranging from tens of thousands to units of millions, with an average of 880,270 records per species (see Table A1 in Supplementary material). Users can indicate whether or not all observed species were recorded in the checklist (“complete” checklists), allowing for absence inference and presence-absence modeling. Users also indicate the level of effort involved in each observation by providing the distance traveled, time spent birding, and the number of observers (hereafter, effort indicators).
We compiled all eBird data for all species separately during the breeding season (June–August; June–July for shorebirds or Charadriiformes, which migrate early) and non-breeding season (December–February). We removed subspecies information from all checklists and summarized all data at the species level. Following established eBird data modeling protocols, we initially applied several filters to the data to reduce bias and improve data quality (Johnston et al. 2019; Kelling et al. 2018). First, we eliminated checklists with extremely long durations (> 3 h), large numbers of observers (> 5), or protocols other than “stationary” or “traveling,” as these are incomparable with the majority of eBird’s data. Second, to reduce observational positional error, we eliminated checklists covering more than 1 km because they are likely to result in greater spatial uncertainty (Gábor et al. 2022b, 2023). Third, data prior to 2004 (< 0.1% of points) were removed because there is insufficient data from earlier years to adequately control for long-term temporal trends in regional bird abundances, as recommended by eBird (Fink et al. 2010, 2020). Finally, data from users with fewer than five contributions were removed to reduce bias (e.g., false absence) caused by inexperienced birders.
Before modeling, the data was further filtered at the species level. Checklists were limited to those falling within a 200 km buffer of the species’ seasonal expert range boundary to limit overprediction outside of the species’ range extent (via Cornell’s spatial boundaries—https://ebird.org/science/status-and-trends/download-data, accessed July 2021). When modeling shorebirds (order Charadriiformes), we excluded August data because many species are already migrating long distances by this time. We limited checklists to one per 5 km grid cell per week to reduce site selection and temporal bias in data collection. To reduce the imbalance between presence and absence points, we utilized “case–control sampling” repeating this filtering for populations of checklists where the species is present and absent as recommended by eBird’s best practices (Fink et al. 2018). We ended up modeling 57 species (see Table A1 in Supplementary material). R 4.1.0 was used to complete all data compilation, analyses, and visualizations (R Core Team 2021). Coordinates, polygons, and grids used in the study operated under a conical equal area projection. For spatial geoprocessing, the raster (Hijmans et al. 2015), rgdal (Bivand et al. 2015), and sf (Pebesma 2018) packages were used.
Environmental data
In total, we considered 12 land-cover variables, 4 topographic/habitat variables, and 4 climatic variables (Table 1). We obtained land-cover data from the European Space Agency (Climate Change Initiative; “ESA. Land Cover CCI Product User Guide Version 2. Technical Report” 2017) at a 300 m resolution. This product provides proportional land-cover variables (i.e. they provide the proportion of the area covered by a particular land-cover type) and includes the following land-cover types: mixed forest, mosaic, shrubland, grassland, lichens/mosses, sparse, flooded/freshwater, flooded/saltwater, flooded/shrub, urban, barren, and ice. We coarsened land-cover variables to a grain of 1 km by mean-aggregating the percentages of individual land-cover types.
To examine how water cover (flooded/freshwater category) influences model performance, we summarized this variable in five ways: as a proportional representation and as a binary variable classified using thresholds of 1%, 10%, 20%, or 50%. For example, when using the 1% threshold, any cell with water cover of 1% or more is considered to contain water in the binary representation and any cell with less than 1% is not.
While our primary focus in this study was on the influence of water cover on model performance and output, each SDM included several classes of covariates to account for numerous other factors that influence species distributions. Our topographic/habitat suite of variables included the mean elevation (from EarthEnv; Robinson et al. 2014), mean enhanced vegetation index (EVI; MODIS; https://lpdaac.usgs.gov/products/mod11a1v006/), topographic wetness index (TWI; hydroSHEDS; Marthews et al. 2015), and terrain roughness index (TRI; EarthEnv). Climatic covariates included the mean annual temperature (bio1), mean annual precipitation (bio12), precipitation seasonality (bio15; all bio variables from CHELSA v2.1; Karger et al. 2021), and intra-annual variation in cloud cover (EarthEnv). We selected environmental variables to reduce collinearity (Fig. A1), although collinearity is less of a concern within random forest (Farrell et al. 2019). All variables (topographic, climatic, and land-cover) that were available at finer resolution were spatially aggregated to 1 km using the mean value of all cells falling within the 1 × 1 km cell. We also used temporal covariates in our models, such as year, date, and time of day, to account for temporal variability in bird activity and long-term population trends. Finally, as covariates, we included all effort indicators (distance travelled, duration of checklist, and number of observers).
Coarsening the spatial grain
To test the role of representing the water cover variable as binary or proportional in different spatial grains, we coarsened the spatial grain of our analyses by aggregating all land-cover, topographic, and climatic variables. We aggregated from 1 × 1 km to 5 × 5 km, 10 × 10 km, and 50 × 50 km (hereafter ‘test grains’) using the means of all 1 km cells within the larger cell. The test grains were chosen based on their frequent use in published SDM studies (see review by Moudrý et al. 2023b). The species observation data were subsequently associated to the each coarser resolution grids. This approach leads to the fact that multiple species records may occur in any cell of the coarser grid. Following Guisan et al. (2007), we decided to keep the sample size constant. If any coarse cell contained more than one species record, all these records were retained rather than reducing these to single record per cell. This allowed us to keep the sample size (i.e. presence: absence ratio) consistent between all grains and avoid mistaking the effects of change in resolution for those caused by the change in sample size and prevalence (Leroy et al. 2018). Temporal and effort covariates were not scaled because they were assigned at the checklist (point) level.
Model fitting and evaluation
We used Random Forest to model the breeding and non-breeding distributions of each species separately. Random Forest is a machine learning method designed to analyse large datasets with many covariates and is frequently found to produce the most accurate SDMs (e.g., Mi et al. 2017; Valavi et al. 2023). Random forests are adaptable, automatically adjusting to complex, nonlinear relationships, and consider high-order interactions between environmental variables (Evans et al. 2011). Random Forest models were conducted in R version 4.1.0 using the ranger package (Wright et al. 2018).
We randomly divided the data into training and testing. The testing set was used for model validation. Models were parameterized to 100 trees and 7 threads. We compiled metrics of predictive model performance including the area under the ROC curve (AUC), true skill statistic (TSS), sensitivity, and specificity. We masked portions of the predictions that fell outside of the buffered range extent when estimating the predictive performance. To diagnose overfitting, we examined test-set calibration plots for each model. To compare the effect of binary and proportional water cover variables, we assessed their relative importance. Overall, we fit models for each of the 57 analyzed species during each season (breeding vs. non-breeding) at four test grains (1 × 1 km to 5 × 5 km, 10 × 10 km, and 50 × 50 km) and five types of water cover representation (proportional, and binary using four thresholds), resulting in 2280 separate models. The computational time for our models was approximately a month using high-performance computing tools, which vividly demonstrates the volume of scenarios we ran to test our hypothesis.
Results
All performance metrics followed similar patterns, hence we focused on TSS for simplicity (see Figs. A2, A3 in Supplementary material for trends in AUC, sensitivity, specificity). In general, SDMs fitted at the finest grain (1 × 1 km) performed very well (breeding TSS = 0.8; non-breeding TSS = 0.79; Fig. 2). The decrease in model performance was observed between original models at 1 × 1 km grain and models at 5 × 5 km grain. Further coarsening of grain resulted in minimal additional decrease in models performance (Fig. 2). The comparison of model performance (TSS) between models fitted with water cover as a proportional variable and as a binary variable revealed minimal differences in model performance at all grains (Fig. 3). At 1 × 1 km grain, the performance decreased with the increasing threshold used to derive the binary water cover variable. At coarser grains, the threshold had no effect on model performance (Fig. A3). The results showed that the choice of water cover representation mattered most in species with fewer observations, which are generally less common (Fig. 3B). The drop in model performance was more pronounced for non-breeding season models (Fig. 3B), while TSS dropped on average by 0.12 from 0.82 (1 × 1 km) to 0.7 (50 × 50 kms) for breeding season models; the drop for non-breeding season models was about 0.2. In the models fitted using binary water cover representation, the importance of the water cover variable significantly decreased compared to the models using its proportional representation (Fig. 4). The importance of the water cover variable decreased with coarsening the grain and increasing the threshold. For example, when models were fitted using a 1 × 1 km grain and a 1% threshold, the mean drop in water cover importance was approx. 20% compared to the proportional representation, whereas for models fitted using the 50 × 50 km grain and 50% threshold, the drop was over 80% (Fig. 4).
Discussion
To fully utilize the potential of the fact that the availability of data suitable for SDM keeps increasing, we must understand how to process and use these data effectively. This includes determining the optimal grain size and appropriate selection of environmental variables. In this study, we evaluated the usefulness of binary land-cover variables for SDMs at multiple grains. We computed species distribution models for 57 water bird species in North America across several grain sizes (i.e., from 1 km2 up to 2500 km2) for breeding and non-breeding seasons. We used the proportional and binary representations of land-cover variables derived with various thresholds (1%, 10%, 20%, 50%).
Our results show that the use of binary land-cover variables should be approached with caution. Models created with binary variables performed equally well as those with proportional variables (Fig. 2), but there was a significant decrease in the importance of the water cover variable (Fig. 4). The use of binary water cover variable often led to the situation when this variable was not identified as the most important land-cover variable for water bird species which is, to say the least, unexpected. In addition, we observed that with coarser grain and greater thresholds used to derive the binary water cover variable, the water cover became less important, with a drop of up to 90% at the coarsest grain used compared to the model using the proportional water cover variable (Fig. 4). This contrasts with the study by Gábor et al. (2022a) who showed that for water bird species, models at approximately 10 × 10 km grain using binary predictors performed better than models with proportional predictors. The contrasting results may be attributed to the fact that in our study, the modeling algorithms were allowed to choose the best combination of environmental variables, i.e., the lower predictive power of the binary form of the water cover variable is compensated for by another variable. For example, flooded tree or shrubland habitats often occur near water cover and may become predictive of a species occurrence. In contrast, Gábor et al. (2022a) used only water cover variables for model fitting and, hence, their models did not have the option to select variables that would provide a closer fit.
Our results show that as the grain size becomes coarser, the model performance decreases, which is consistent with prior studies (Guisan et al. 2007; Seo et al. 2009; Chauvier et al. 2022; Zarzo-Arias et al. 2022) and supports the common recommendation to first try the most detailed grain that the data allows (Moudrý et al. 2023b). However, it’s important to note that a finer grain may not always be better, as organisms respond to their environment more strongly at some grains than at others. These grains have been referred to as ‘response grains’ or ‘ecological scales’ and depend on the species ecology (e.g., its home-range; Mertes and Jetz 2018). Consequently, the choice of grain in models can strongly influence our ability to detect and measure species’ response to the environment. The grain at which a species is expected to respond to the environment should be always considered (Manzoor et al. 2018; Wunderlich et al. 2022; Moudrý et al. 2023b).
Importantly, our results suggest that the grain size can impact the applicability of binary land-cover variables. The large decline in the importance of the binary water cover variable relative to its proportional representation observed at 25 km2 resolution and coarser suggests that at grains coarser than 1 km2, the binary representation of water cover is not applicable. Therefore, the proportion or total perimeter of water bodies (e.g., Virkkala et al. 2005; Moudrý and Šímová 2013) should be preferred at coarser grains. However, at a 1 km2 grain, the decrease in the importance of the water cover variable was relatively small and models using binary predictors (presence/absence of the habitat) might be useful.
The threshold for deriving the binary water cover variable is another important aspect to consider. For example, a 1% threshold used in this study requires a water area equivalent to 2 American football fields at a 1 × 1 km resolution, 185 at a 10 × 10 km resolution, and 4638 at a 50 × 50 km resolution. For comparison, the area of the lake of the Great Salt Lake (Utah, USA) is 456 football fields. Thus, it is apparent that especially at coarser grains, the use of an inappropriate threshold can lead to a substantial drop in the number of cells identified as containing water cover, even though many of them provide enough carrying capacity for smaller water bird species populations (Hanski 1999; Melo et al. 2018). Indeed, our results show that using higher than 1% threshold leads to a decline in the importance of the binary water cover variable compared to its proportional representation, making it unusable.
Despite our expectations, we did not find any significant difference in model performance or in importance of the water cover variable between breeding and non-breeding seasons. We expected that models would perform better during the breeding season as water is more critical to the biology of many birds during this season. This expectation was based on the fact that birds have offspring that often cannot leave the water or fly, and many species remain stationary at a specific nesting site.
Conclusions
The appropriate selection of environmental variables and grain size in species distribution modeling is of considerable importance. In this study, we demonstrated that (1) the performance of the models was not significantly affected by the type of the adopted water cover variable (proportional or binary) but a significant decrease was observed in the importance of the water cover variable when used in a binary form, (2) the performance of the models was affected by the adopted grain, with models at a 1 km2 grain performing considerably better than models at coarser grains, (3) models for the breeding and non-breading season performed equally well, and (4) the importance of the binary water cover variable declined relative to its proportional representation with coarser grains and during the breeding season. We highlight that the use of binary land-cover variables should be approached with caution. Binary representation of water cover might be useful at finer grain sizes (i.e., 1 km2), which is a promising result as at more detailed grains, the simple presence or absence of a certain land-cover type can be a realistic descriptor and can save time in fieldwork. However, binary representation of water cover is unsuitable for models at grains coarser than 1 km2. For such grains, the use of proportional land-cover variables produces more reliable models.
Data availability
Following our methods, bird occurrences and environmental data can be downloaded and processed as we did. R scripts to download data and develop species distribution models are available via a repository (https://anonymous.4open.science/r/SDM-binary-land-cover-variables2).
References
Andren H (1994) Effects of habitat fragmentation on birds and mammals in landscapes with different proportions of suitable habitat: a review. Oikos 355–366
Araújo MB, Anderson RP, Márcia Barbosa A, Beale CM, Dormann CF, Early R, Rahbek C (2019) Standards for distribution models in biodiversity assessments. Sci Adv 5(1):eaat4858
Association A.B. (2008) American Birding Association Checklist: Birds of the Continental United States and Canada
Barnosky AD, Matzke N, Tomiya S, Wogan GO, Swartz B, Quental TB, Ferrer EA (2011) Has the Earth’s sixth mass extinction already arrived? Nature 471(7336):51–57
Bazzichetto M, Malavasi M, Barták V, Acosta ATR, Moudrý V, Carranza ML (2018) Modeling plant invasion on Mediterranean coastal landscapes: an integrative approach using remotely sensed data. Landsc Urban Plan 171:98–106
Bazzichetto M, Lenoir J, Da Re D, Tordoni E, Rocchini D, Malavasi M, Sperandii MG (2023) Sampling strategy matters to accurately estimate response curves’ parameters in species distribution models. Glob Ecol Biogeogr 32(10):1717–1729
Bivand R, Keitt T, Rowlingson B, Pebesma E, Sumner M, Hijmans R, Bivand MR (2015) Package ‘rgdal’. Bindings for the Geospatial Data Abstraction Library. https://cran.r-project.org/web/packages/rgdal/index.html. Accessed 15 Oct 2017, 172
Brown JH, Kodric-Brown A (1977) Turnover rates in insular biogeography: effect of immigration on extinction. Ecology 58(2):445–449
Buckley RC, Knedlhans SB (1986) Beachcomber biogeography: interception of dispersing propagules by islands. J Biogeogr 13(1):69–70
Butchart SH, Walpole M, Collen B, Van Strien A, Scharlemann JP, Almond RE, Watson R (2010) Global biodiversity: indicators of recent declines. Science 328(5982):1164–1168
Cadotte MW, Carscadden K, Mirotchnick N (2011) Beyond species: functional diversity and the maintenance of ecological processes and services. J Appl Ecol 48(5):1079–1087
Carlson CJ, Albery GF, Merow C, Trisos CH, Zipfel CM, Eskew EA, Bansal S (2022) Climate change increases cross-species viral transmission risk. Nature 607(7919):555–562
Casanelles-Abella J, Müller S, Keller A, Aleixo C, Alós Orti M, Chiron F, Moretti M (2022) How wild bees find a way in European cities: pollen metabarcoding unravels multiple feeding strategies and their effects on distribution patterns in four wild bee species. J Appl Ecol 59(2):457–470
Chauvier Y, Descombes P, Guéguen M, Boulangeat L, Thuiller W, Zimmermann NE (2022) Resolution in species distribution models shapes spatial patterns of plant multifaceted diversity. Ecography 2022(10):e05973
Clements JF (2007) Clements checklist of birds of the world. Associates/Cornell University Press, Comstock Pub
Cogliati M, Arikan-Akdagli S, Barac A, Bostanaru AC, Brito S, Çerikçioğlu N, Brandão J (2023) Environmental and bioclimatic factors influencing yeasts and molds distribution along European shores. Sci Total Environ 859:160132
Cohen JM, Civitello DJ, Brace AJ, Feichtinger EM, Ortega CN, Richardson JC, Rohr JR (2016) Spatial scale modulates the strength of ecological processes driving disease distributions. Proc Natl Acad Sci 113(24):E3359–E3364
Coppée T, Paquet JY, Titeux N, Dufrêne M (2022) Temporal transferability of species abundance models to study the changes of breeding bird species based on land cover changes. Ecol Model 473:110136
Cord AF, Meentemeyer RK, Leitão PJ, Václavík T (2013) Modelling species distributions with remote sensing data: bridging disciplinary perspectives. J Biogeogr 40(12):2226–2227
Cord AF, Klein D, Mora F, Dech S (2014) Comparing the suitability of classified land cover data and remote sensing variables for modeling distribution patterns of plants. Ecol Model 272:129–140
Elith J, Leathwick JR (2009) Species distribution models: ecological explanation and prediction across space and time. Annu Rev Ecol Evol Syst 40:677–697
Ellis-Soto D, Merow C, Amatulli G, Parra JL, Jetz W (2021) Continental-scale 1 km hummingbird diversity derived from fusing point records with lateral and elevational expert information. Ecography 44(4):640–652
ESA (2017) Land cover CCI product user guide version 2. Technical Report
Evans JS, Murphy MA, Holden ZA, Cushman SA (2011) Modeling species distribution and change using random forest. In: Predictive species and habitat modeling in landscape ecology: concepts and applications, pp 139–159
Fahrig L (2001) How much habitat is enough? Biol Conserv 100(1):65–74
Farrell A, Wang G, Rush SA, Martin JA, Belant JL, Butler AB, Godwin D (2019) Machine learning of large-scale spatial distributions of wild turkeys with high-dimensional environmental data. Ecol Evol 9(10):5938–5949
Fink D, Hochachka WM, Zuckerberg B, Winkler DW, Shaby B, Munson MA, Kelling S (2010) Spatiotemporal exploratory models for broad-scale survey data. Ecol Appl 20(8):2131–2147
Fink D, Auer T, Ruiz-Gutierrez V, Hochachka WM, Johnston A, La Sorte FA, Kelling S (2018) Modeling avian full annual cycle distribution and population trends with citizen science data. bioRxiv 251868
Fink D, Auer T, Johnston A, Ruiz-Gutierrez V, Hochachka WM, Kelling S (2020) Modeling avian full annual cycle distribution and population trends with citizen science data. Ecol Appl 30(3):e02056
Gábor L, Moudrý V, Lecours V, Malavasi M, Barták V, Fogl M, Václavík T (2020) The effect of positional error on fine scale species distribution models increases for specialist species. Ecography 43(2):256–269
Gábor L, Šímová P, Keil P, Zarzo-Arias A, Marsh CJ, Rocchini D, Moudrý V (2022a) Habitats as predictors in species distribution models: shall we use continuous or binary data? Ecography 2022(7):e06022
Gábor L, Jetz W, Lu M, Rocchini D, Cord A, Malavasi M, Moudrý V (2022b) Positional errors in species distribution modelling are not overcome by the coarser grains of analysis. Methods Ecol Evol 13(10):2289–2302
Gábor L, Jetz W, Zarzo‐Arias A, Winner K, Yanco S, Pinkert S, Moudrý V (2023) Species distribution models affected by positional uncertainty in species occurrences can still be ecologically interpretable. Ecography e06358
Guisan A, Graham CH, Elith J, Huettmann F, NCEAS Species Distribution Modelling Group (2007) Sensitivity of predictive species distribution models to change in grain size. Divers Distrib 13(3):332–340
Hanski I (1999) Metapopulation ecology. Oxford University Press, Oxford
Hijmans RJ, Van Etten J, Cheng J, Mattiuzzi M, Sumner M, Greenberg JA, Hijmans MRJ (2015) Package ‘raster’. R package 734:473
Hopkins LM, Hallman TA, Kilbride J, Robinson WD, Hutchinson RA (2022) A comparison of remotely sensed environmental predictors for avian distributions. Landsc Ecol 37(4):997–1016
Howard C, Stephens PA, Pearce-Higgins JW, Gregory RD, Butchart SH, Willis SG (2020) Disentangling the relative roles of climate and land cover change in driving the long-term population trends of European migratory birds. Divers Distrib 26(11):1442–1455
Johnston A, Hochachka W, Strimas-Mackey M, Gutierrez VR, Robinson O, Miller E, Fink D (2019) Best practices for making reliable inferences from citizen science data: case study using eBird to estimate species distributions. BioRxiv 574392
Karger DN, Wilson AM, Mahony C, Zimmermann NE (2021) Global daily 1 km land surface precipitation based on cloud cover-informed downscaling. Sci Data 8(1):307
Karra K, Kontgis C, Statman-Weil Z, Mazzariello JC, Mathis M, Brumby SP (2021) Global land use/land cover with Sentinel 2 and deep learning. In: 2021 IEEE international geoscience and remote sensing symposium IGARSS, pp 4704–4707. IEEE
Kelling S, Johnston A, Fink D, Ruiz-Gutierrez V, Bonney R, Bonn A, Guralnick R (2018) Finding the signal in the noise of Citizen science observations. bioRxiv 326314
Kleisner KM, Fogarty MJ, McGee S, Hare JA, Moret S, Perretti CT, Saba VS (2017) Marine species distribution shifts on the US Northeast Continental Shelf under continued ocean warming. Prog Oceanogr 153:24–36
Koma Z, Seijmonsbergen AC, Grootes MW, Nattino F, Groot J, Sierdsema H, Kissling WD (2022) Better together? Assessing different remote sensing products for predicting habitat suitability of wetland birds. Divers Distrib 28(4):685–699
Lande R, Engen S, Saether B-E (2003) Stochastic population dynamics in ecology and conservation. Oxford University Press, Oxford
Lecours V, Gábor L, Edinger E, Devillers R (2020) Fine-scale habitat characterization of The Gully, the Flemish Cap, and the Orphan Knoll, Northwest Atlantic, with a focus on cold-water corals. In: Seafloor geomorphology as benthic habitat. Elsevier, pp 735–751
Leroy B, Delsol R, Hugueny B, Meynard CN, Barhoumi C, Barbet-Massin M, Bellard C (2018) Without quality presence–absence data, discrimination metrics such as TSS can be misleading measures of model performance. J Biogeogr 45(9):1994–2002
Lindegren M, van Deurs M, Maureaud A, Thorson JT, Bekkevold D (2022) A spatial statistical approach for identifying population structuring of marine fish species: European sprat as a case study. ICES J Mar Sci 79(2):423–434
Lu M, Jetz W (2023) Scale-sensitivity in the measurement and interpretation of environmental niches. Trends Ecol Evol
Luebert F, Fuentes-Castillo T, Pliscoff P, García N, Román MJ, Vera D, Scherson RA (2022) Geographic patterns of vascular plant diversity and endemism using different taxonomic and spatial units. Diversity 14(4):271
Manzoor SA, Griffiths G, Lukac M (2018) Species distribution model transferability and model grain size–finer may not always be better. Sci Rep 8(1):7168
Marthews TR, Dadson SJ, Lehner B, Abele S, Gedney N (2015) High-resolution global topographic index values for use in large-scale hydrological modelling. Hydrol Earth Syst Sci 19(1):91–104
Melo I, Ochoa-Quintero JM, de Oliveira Roque F, Dalsgaard B (2018) A review of threshold responses of birds to landscape changes across the world. J Field Ornithol 89(4):303–314
Mertes K, Jetz W (2018) Disentangling scale dependencies in species environmental niches and distributions. Ecography 41(10):1604–1615
Mi C, Huettmann F, Guo Y, Han X, Wen L (2017) Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence. PeerJ 5:e2849
Miguet P, Jackson HB, Jackson ND, Martin AE, Fahrig L (2016) What determines the spatial extent of landscape effects on species? Landsc Ecol 31:1177–1194
Mohammadi A, Almasieh K, Nayeri D, Adibi MA, Wan HY (2022) Comparison of habitat suitability and connectivity modelling for three carnivores of conservation concern in an Iranian montane landscape. Landsc Ecol 37(2):411–430
Moudrý V, Šímová P (2013) Relative importance of climate, topography, and habitats for breeding wetland birds with different latitudinal distributions in the Czech Republic. Appl Geogr 44:165–171
Moudrý V, Lecours V, Malavasi M, Misiuk B, Gábor L, Gdulová K, Wild J (2019) Potential pitfalls in rescaling digital terrain model-derived attributes for ecological studies. Ecol Inform 54:100987
Moudrý V, Cord AF, Gábor L, Laurin GV, Barták V, Gdulová K, Wild J (2023a) Vegetation structure derived from airborne laser scanning to assess species distribution and habitat suitability: the way forward. Divers Distrib 29(1):39–50
Moudrý V, Keil P, Gábor L, Lecours V, Zarzo-Arias A, Barták V, Šímová P (2023b) Scale mismatches between predictor and response variables in species distribution modelling: a review of practices for appropriate grain selection. Prog Phys Geographys 47(3):467–482
Norberg A, Abrego N, Blanchet FG, Adler FR, Anderson BJ, Anttila J, Ovaskainen O (2019) A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels. Ecol Monogr 89(3):e01370
Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the distribution of species: are bioclimate envelope models useful? Glob Ecol Biogeogr 12(5):361–371
Pearson RG, Dawson TP, Liu C (2004) Modelling species distributions in Britain: a hierarchical integration of climate and land-cover data. Ecography 27(3):285–298
Pebesma EJ (2018) Simple features for R: standardized support for spatial vector data. R J 10:439
Peng S, Zhang J, Zhang X, Li Y, Liu Y, Wang Z (2022) Conservation of woody species in China under future climate and land-cover changes. J Appl Ecol 59(1):141–152
Prošek J, Gdulová K, Barták V, Vojar J, Solský M, Rocchini D, Moudrý V (2020) Integration of hyperspectral and LiDAR data for mapping small water bodies. Int J Appl Earth Obs Geoinf 92:102181
R Core Team (2021) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Robinson N, Regetz J, Guralnick RP (2014) EarthEnv-DEM90: a nearly-global, void-free, multi-scale smoothed, 90m digital elevation model from fused ASTER and SRTM data. ISPRS J Photogramm Remote Sens 87:57–67
Romero D, Olivero J, Real R (2013) Comparative assessment of different methods for using land-cover variables for distribution modelling of Salamandra salamandra longirotris. Environ Conserv 40(1):48–59
Rose JP, Halstead BJ, Fisher RN (2020) Integrating multiple data sources and multi-scale land-cover data to model the distribution of a declining amphibian. Biol Conserv 241:108374
Schmeller DS, Weatherdon LV, Loyau A, Bondeau A, Brotons L, Brummitt N, Regan EC (2018) A suite of essential biodiversity variables for detecting critical biodiversity change. Biol Rev 93(1):55–71
Seo C, Thorne JH, Hannah L, Thuiller W (2009) Scale effects in species distribution models: implications for conservation planning under climate change. Biol Lett 5(1):39–43
Šímová P, Moudrý V, Komárek J, Hrach K, Fortin MJ (2019) Fine scale waterbody data improve prediction of waterbird occurrence despite coarse species data. Ecography 42(3):511–520
Smith AB, Santos MJ (2020) Testing the ability of species distribution models to infer variable importance. Ecography 43(12):1801–1813
Smith AB, Murphy SJ, Henderson D, Erickson KD (2023) Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Glob Ecol Biogeogr 32(3):342–355
Stark JR, Fridley JD (2022) Microclimate-based species distribution models in complex forested terrain indicate widespread cryptic refugia under climate change. Glob Ecol Biogeogr 31(3):562–575
Sullivan BL, Aycrigg JL, Barry JH, Bonney RE, Bruns N, Cooper CB, Kelling S (2014) The eBird enterprise: an integrated approach to development and application of citizen science. Biol Conserv 169:31–40
Václavík T, Kupfer JA, Meentemeyer RK (2012) Accounting for multi-scale spatial autocorrelation improves performance of invasive species distribution modelling (iSDM). J Biogeogr 39(1):42–55
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2023) Flexible species distribution modelling methods perform well on spatially separated testing data. Glob Ecol Biogeogr 32(3):369–383
Virkkala R, Luoto M, Heikkinen RK, Leikola N (2005) Distribution patterns of boreal marshland birds: modelling the relationships to land cover and climate. J Biogeogr 32(11):1957–1970
Wang L, Jackson DA (2023) Effects of sample size, data quality, and species response in environmental space on modeling species distributions. Landsc Ecol 38(12):4009–4031
Williams KJ, Belbin L, Austin MP, Stein JL, Ferrier S (2012) Which environmental variables should I use in my biodiversity model? Int J Geogr Inf Sci 26(11):2009–2047
Wisz MS, Pottier J, Kissling WD, Pellissier L, Lenoir J, Damgaard CF, Svenning JC (2013) The role of biotic interactions in shaping distributions and realised assemblages of species: implications for species distribution modelling. Biol Rev 88(1):15–30
Wright MN, Wager S, Probst P, Wright MMN (2018) Package ‘ranger.’
Wunderlich RF, Mukhtar H, Lin YP (2022) Comprehensively evaluating the performance of species distribution models across clades and resolutions: choosing the right tool for the job. Landsc Ecol 37(8):2045–2063
Young MA, Critchell K, Miller AD, Treml EA, Sams M, Carvalho R, Ierodiaconou D (2023) Mapping the impacts of multiple stressors on the decline in kelps along the coast of Victoria. Aust Divers Distrib 29(1):199–220
Zarzo-Arias A, Penteriani V, Gábor L, Šímová P, Grattarola F, Moudrý V (2022) Importance of data selection and filtering in species distribution models: a case study on the Cantabrian brown bear. Ecosphere 13(12):e4284
Zhang X, Vincent AC (2018) Predicting distributions, habitat preferences and associated conservation implications for a genus of rare fishes, seahorses (Hippocampus spp.). Divers Distrib 24(7):1005–1017
Zhang X, Liu L, Chen X, Gao Y, Xie S, Mi J (2021) GLC_FCS30: global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst Sci Data 13(6):2753–2776
Zuckerberg B, Fink D, La Sorte FA, Hochachka WM, Kelling S (2016) Novel seasonal land cover associations for eastern North American forest birds identified through dynamic species distribution modelling. Divers Distrib 22(6):717–730
Funding
Funding was provided by Fulbright Commission, Czech Republic (Grant no. Fulbright-Masaryk Scholarship).
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows: study conception and design: LG, JC, VM, WJ; analysis and interpretation of results: LG, JC; draft manuscript preparation: LG, JC, VM, WJ; All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gábor, L., Cohen, J., Moudrý, V. et al. Assessing the applicability of binary land-cover variables to species distribution models across multiple grains. Landsc Ecol 39, 66 (2024). https://doi.org/10.1007/s10980-024-01813-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10980-024-01813-3