When is variable importance estimation in species distribution modelling affected by spatial correlation?

In this way, this study helps to clarify in a systematic and controlled approach how to make proper inferences about variable importance in species distribution models.


Introduction
Species distribution modelling methods have been used frequently by ecologists to define the geographic ranges of species, but also to infer the factors determining the realised niche of the species (Peterson and Soberón 2012). When making these inferences researchers should be aware that models based on statistical correlations do not necessarily uncover causal mechanisms. However, these models can still help to formulate relevant hypothesis that can further be backed up by experimental investigation in controlled settings (Austin et al. 1990, Austin 2002, Meineri et al. 2012, Berdugo et al. 2019. Besides, explanatory models that capture relevant relationships with their model definition allow to transfer the modelled niche characteristics across spatiotemporal frames (Duque-Lazo et al. 2016).
To create models that capture relevant relationships, many studies reduce the total list of possible explanatory variables to a shorter array of the most important variables. This is usually accomplished by using measures of variable importance that compute a ranking of all the possible species-environment relationships. This is also referred to as a sensitivity analysis of a model to different predictors (Wei et al. 2015). One of the most popular measures used is the model-independent variable importance. This measure basically quantifies how a certain input, or absence of it (from a pool of candidate variables), affects the output of a model either in its accuracy or in terms of increase/decrease of uncertainty (Thuiller et al. 2009, Wei et al. 2015. Beyond this, the use of p-values, standardised coefficients and partial response curves can also help in comparison of variable influence across models (Naimi and Araújo 2016).
As species distribution modelling can be used to estimate the importance of species-environment relationships, it is important to understand their sensitivity to deviations from the statistical assumptions behind different models (Austin 2002. One of the confounding aspects, often neglected in ecology, is that of spatial autocorrelation in explanatory variables. As clarified first by Lennon (2000), and later explored by many other studies (Segurado et al. 2006, Beale et al. 2007, Veloz 2009), pseudo replication is a prominent problem when spatial autocorrelation occurs, because nearer observations have more similar values. In statistical terms, the true degree of freedom is actually lower than the number of observations when pseudo-replication occurs.
Spatial autocorrelation in explanatory variables can cause a biased estimation of significance values of variables included in the models (Legendre 1993, Lennon 2000. However, Diniz-Filho et al. (2003) and Hawkins et al. (2007) argued that the observed inflation in importance for spatially autocorrelated variables is not necessarily spurious. This depends in part on the spatial scale of the study. This has been tested with real data (Diniz-Filho et al. 2003), which did not allow to compare estimated variable importance with the true (unknown) importance of these variables in fitted models. Also, the chance exists that small-scale mechanisms will be ignored when these correlate with variables that change over larger distances. Then the larger scale processes can be selected as more important for a species. In such cases, important ecological interpretations can be lost (Segurado andAraújo 2004, De Knegt et al. 2010).
A second main factor that influences ecological causal mechanisms, is the variation in how species respond to changes in environmental conditions. Species responses to environmental conditions rarely result from one independent ecological process (Austin 2002, Oksanen andMichin 2002). Therefore, even though in theory one would expect a unimodal response curve for most environmental conditions (following the niche theory by Hutchinson 1957), most estimated response curves from observations take different shapes, from linear to asymmetric or skewed curves (Rydgren et al. 2003). This linearity or skewness can also be a result of limited sampling along environmental gradients. Then we are observing only a part of these curves, sometimes referred to as 'truncated responses' (Austin and Nicholls 1997). Such variations can also bias the estimation of variable importance from correlative models because they don't show the impact of the full range of a variable on the occurrence of a species (Austin 2007).
Response curves also define the location of the species niche (i.e. conditions that are most preferred by a species) along environmental gradients. The occurrence of these environmental conditions is not necessarily homogeneously distributed over the extent of a studied area. Therefore the suitable ranges of certain environmental conditions might be more or less prevalent in a study area. This difference can further affect the variable importance from correlative models (Austin 1987, Rydgren et al. 2003. The effects of different kinds of response curves on model accuracy, complexity and overfitting have been studied before (Meynard and Quinn 2007, Santika and Hutchinson 2009, Bell and Schlaepfer 2016), yet how this affects variable importance estimations in the presence of spatial autocorrelation, has to our knowledge not yet been explored.
Therefore, this study aims to investigate how model-independent variable importance estimations are influenced by the effects of 1) spatial autocorrelation in predictor variables and 2) types of species response curves.

Material and methods
We used generated data to test the relationship between levels of autocorrelation in environmental predictors and their estimated variable importance in species distribution models. This so-called 'virtual species' distribution modelling is an efficient method developed by Hirzel et al. (2001) to evaluate the impact of statistical properties on distribution models. This involves the use of a set of 'virtual' species in a computergenerated environment under controlled settings like defined species responses, levels of autocorrelation and sampling schemes. The true distribution of the virtual species, the levels of autocorrelation in the environmental variables and the importance of the variables in explaining the distribution are known and controlled here, and thus, the characteristics of models and related statistical measures can be properly investigated without interference of confounding variables (Miller 2014, Naimi 2015.
We generated the data for various scenarios using the following five main steps as illustrated in Fig. 1.
Firstly four artificial landscapes of explanatory variables per species are generated with different levels of SAC. Secondly different response curves to these four environmental variables are defined to convert these environmental maps into 'partial suitability' maps. These maps are combined into final species habitat suitability maps by making combinations of these partial suitability maps. Thirdly, the presence-absence distribution of the virtual species is determined using the species habitat suitability as a probability. Fourthly, various species distribution models per combination of input maps and response types are fitted using sampled presence/absence observations. Lastly, the variable importance is then calculated for each explanatory variable and compared between the different scenarios. Scenario's that were considered differed in the responses to environmental conditions by the species (linear versus unimodal), the coincidence of favourable conditions with prevailing environmental conditions (either dominant or rare), and the amount of spatial correlation in the 'background variables', relative to the 'variables of interest'. Because we combine all four landscapes with equal weights, we know that their relative importance for the fitted models should be the same as well. We can compare this against the estimated variable importance that can be inferred from fitted SDMs. All computations are performed in R statistical software (<www.r-project.org>). We will explain these steps in more detail below.

Simulating the explanatory variables
Landscapes of environmental variables were simulated on a grid of 100 by 100 pixels. Four variables were created per scenario. One variable, the variable of interest (VOI), had a different level of spatial autocorrelation from the other three variables, the background variables, which were kept constant. The autocorrelation in an environmental variable is controlled by specifying the range parameter in a variogram model (below), which is expressed as a percentage of the width of the landscape (100 pixels in our case) to allow for scale-free interpretation of the results. We created scenario's where the spatial autocorrelation in the background was kept at a minimal level (range of 0% of the extent; so no autocorrelation at all) and scenario's where the spatial autocorrelation in the background was at an intermediate level (range of 12.5% of the extent). In this way, we can assess the effect of spatial autocorrelation on the changes in relative importance of the variable of interest. To assess the robustness of our findings, every scenario that was considered was replicated 20 times so that we could look at mean and spread in variable importance.
The explanatory variables are simulated as unconditional Gaussian random fields with a certain covariance structure , Beguería and Pueyo 2009, Nychka et al. 2017. Initially for all the variables, a covariance matrix is defined where the correlation between two pixels is where D ij is the Euclidean distance between pixels x [i] and x [j] and θ is the range of autocorrelation expressed as a percentage of the total extent and is hereafter referred to as the 'level of SAC' (spatial autocorrelation). This simulates an autocorrelated surface with variograms as shown in Fig. 2a. The covariance function defines stationary and isotropic autocorrelation. This helps in isolating the effect of SAC and shape of the response curve on variable importance and excludes possible confounding conditions like anisotropy and non-stationarity that are likely to interfere when using real data. The level of SAC ranged between 4 from 0% to 30% of the extent in 25 steps of 1.2%. The SAC in the realized landscapes saturated at levels higher than 30% when estimated with Moran's I for various random samples from the explanatory variable as illustrated in Fig. 2b.
To make different realisations of the environmental variables with the same level of spatial autocorrelation, each covariance matrix was then multiplied with 20 realisations of a Gaussian random error derived from rnorm ~ (0,1). This led to 20 replicates of each environmental variable with the same level of SAC. These replicates were used to derive confidence intervals for the estimated variable importance.

Simulating virtual species
To simulate the virtual species, response curves were defined for each of the four explanatory variables. We defined the response curves for two cases, one in which the species niche overlapped with the dominant environmental condition in the study area, and another where the species niche has no overlap. In the latter case, optimal conditions for that species are at the edge of an environmental gradient while this environmental condition follows a normal distribution. These different cases resulted in systematically different prevalences. We corrected for this in the sampling of the landscapes to make results comparable (Sampling the area for presenceabsence points).
Further, two types of response curves were defined (Naimi et al. 2011): monotonic (linearly increasing), and unimodal responses. Earlier studies argued against the simplistic use of unimodal curves to model species responses (Austin 2002) and advocated the use of differently shaped curves defined by the beta function. For this study, we assume that a unimodal response can approximate a case where the entire range of an environmental condition is sampled. When we truncate such a curve (i.e. sample an incomplete range in terms of its relevance for a species), we get intermediate forms like monotonically increasing or decreasing responses. Hence, the two response curve shapes were chosen here to represent the two extremes (unimodal and linear) across a gradient.
These different response curves were combined in various ways to define four types of species: 1) a species that has a monotonic (positive linear) response to both the background and the variable of interest (the 'linear' species); 2) a species that has a unimodal response to both the background and the variable of interest (the 'unimodal' species); 3) a species that has a unimodal response to two of the background variables and a monotonic response to the other two variables including the variable of interest (the 'linear-mixed' species); and finally 4) a species that has a monotonic response to two of the background variables and a unimodal response to the other two variables including the variable of interest (the 'unimodal-mixed' species).

Simulating the habitat suitability
For all 20 realisations of each species type, and at a given level of SAC, a habitat suitability for the corresponding species is defined as a simple unweighted addition of each explanatory variable's suitability level as defined by the response curve (Naimi 2015) so that we maintain an equal importance for all four variables. The summed probabilities are rescaled to the range [0,1], so the final suitability map mimics a probability distribution map with each cell containing a probability of having the species. The suitable habitat is then converted to a presence-absence map using a logistic probability transformation. The main parameters that define this conversion are alpha (slope) of 0.05, and beta (threshold) equal to 0.5.

Sampling the area for presence-absence points
Independent random samples (200 points) were taken from each of the presence-absence maps to create a training dataset and an additional 200 points were sampled to make a validation dataset. This implies a sampling density of 2% of all the pixels in the simulated landscape. To ensure that the differing species prevalence (Supporting information) that occurs as a result of 1) included randomness in the data and 5 2) the systematic difference between the different cases of niche overlap with the dominant conditions does not bias the results, a prevalence constraint of 25% is used when sampling the presence-absence datasets. A difference in prevalence, both in the sampling or in the initial presence-absence map, does not necessarily affect the relative variable importance estimations within a model, but the constraint is maintained to aid comparison between the two cases of niche overlap (see Supporting information for the details). Besides the sampling density of 2%, also samples with a density of 0.05% (50 points) and 3.5% (350 points) were taken to analyse the effect of sampling density.

Modelling methods
The study incorporates seven commonly used species distribution modelling methods including generalized linear models (GLM), generalized additive models (GAM) and five machine learning methods (Supporting information). We fitted each model using the default settings as provided by the various modelling packages. We tested different models to assess whether the impact of SAC and response type on variable importance estimations varies across these models. All models were run using the BIOMOD and SDM package in R (Thuiller et al. 2009, Naimi andAraújo 2016).

Model evaluation and variable importance estimation
To evaluate whether there were differences in performance between the various models, we calculated the area under the receiver-operation curve (AUC) of each model. AUC is regarded as a threshold independent measure that assesses the discriminatory power of the model in separating presences from absences (Thuiller 2003, Luoto et al. 2005, Allouche et al. 2006, Meynard and Quinn 2007. AUC can be considered as a good tool to compare the different models in this study since the sample size was constant and the prevalence is maintained at 25% (Hanberry and He 2013). The variable importance was estimated using a model independent method as formulated by Thuiller et al. (2009), in which the Pearson's correlation (r) is calculated between the predicted values using the original dataset and predictions where one of the variables is randomized. A strong correlation indicates that there is no effect of the randomisation for that variable, and hence, the variable is not important. To facilitate interpretation, variable importance is expressed as (1 − r). Because all the habitat suitability maps were generated using an equal weight for each variable, a difference in variable importance between the background environmental conditions and the variable of interest (for which the level of SAC was changed) indicates the effect of SAC on the variable importance.
The randomization of a variable to determine its importance was implemented by taking values from a different spatial realisation of that environmental variable but with the same level of SAC (one of the other 20 replicates) than used in training the model. In this way, the spatial structure of the environmental variable was preserved, so the estimation of variable importance would not be a function of an accidental change in the level of SAC by the randomisation process.

Results
All the models, except BRT and ANN, performed well in all cases with high AUC values between 0.80 and 0.95, implying high discriminatory power (Fig. 3). This was the same across the spectrum of autocorrelation levels for all species types and both scenarios (overlapping or non-overlapping niches with dominant environmental conditions), with a marginal increase in AUC as the autocorrelation increase for a few cases. Residual SAC was significant in BRT, ANN and Maxent, and increases as percentage SAC of the explanatory variables increased (Supporting information for detailed estimates of AUC and residual SAC).
Results for most of the modelling techniques were similar, therefore, only the results for GLM and RF for both cases of niche overlap with environmental conditions and 0% background SAC are presented in the main article. All other combinations can be found in the Supporting information. The results from scenarios of varying sampling densities produced no statistically significant differences, and thus have not been further included.

Species with linear and unimodal responses
When all four variables have the same level of SAC, the results show the same variable importance of 0.25 for each variable. The estimated variable importance increases when the level of SAC increases, but the impact of this increase and the overall importance of a variable depends highly on the shape of the response curve (Fig. 4). Under conditions where the species niche is dominant in the study area ( Fig. 4a-b, e-f ), the species with the linear response shows beyond a level of SAC of 5-10% significantly different variable importances for the VOI compared to the background variables. For species with a unimodal response no such pattern is evident. Both these patterns can also be seen also in the machine learning methods (Supporting information).
For the case where the species niche is not overlapping with the dominant environmental conditions in the study area (Fig. 4c, d, g-h), a similar pattern of the higher variable importance is observed for the VOI in the linear species (Fig. 4c-d), although the absolute increase in the variable importance is much smaller compared to the case where the niche is overlapping with the dominant environmental condition. In addition, no effect is observed for the unimodal species (Fig. 4g-h). These patterns can be seen in all the other models (Supporting information), though the patterns in BRT and ANN are much more erratic with wider confidence intervals.

Species with mixed responses
For the species with the mixed linear and unimodal responses to the environmental conditions, when the niche is overlapping with the dominant environmental condition in the study area ( Fig. 5a-b, e-f ), the conditions with the unimodal responses were generally more important than the conditions with the linear responses. This can be best observed at the 0% relative SAC band where the unimodal condition is always estimated to be more important than the linear condition. Further, in the machine learning methods, a higher level of SAC for the VOI can compensate the bias in the importance values due to the shape of the response curve (Fig. 5f ). This implies that a highly autocorrelated linear response variable can gain importance and match up to the unimodal response variable. These results were consistent across all the modelling methods, except BRT (Supporting information), where the higher variance in the results points towards an inconclusive effect of SAC on the estimation of variable importance.
For the case where the species niche is not overlapping with the dominant environmental condition in the study area, the effect of bias due to the shape of the response curves is not seen in the case of the mixed unimodal species. There is only a slight increase in importance with increasing SAC in the VOI in the mixed linear species (Fig. 5g-h).
The results from the analyses with 12.5% background SAC (Supporting information) were qualitatively similar to what Figure 3. AUC values, for the two cases and four different species considering a background level SAC of 12.5%. VOI = 0% SAC. VOI = 12.5% SAC. VOI = 30% SAC. 7 is presented above. This highlights the fact that the impact of SAC on variable importance depends on relative differences in SAC among environmental variables. In cases of high SAC lower relative SAC levels result in lower variable importance.

Discussion
The effect of spatial autocorrelation One of the main objectives in this study was to quantify the effects of spatial autocorrelation in explanatory variables on model-independent variable importance estimates. An inflating effect of spatial autocorrelation on the importance estimates was noticed, conditioned by the type of species response. The inflation in the importance values was clear and significant when the response of species to the environmental gradients was linear, but it was insignificant for more complex responses like the unimodal response. This implies a response specific bias. When spatial autocorrelation increases, the spread around the mean value of an environmental condition increases in the sample (compare Fig. 6a-b with Fig. 6c-d). For unimodal responses, this means that the estimated response curve is much wider over the spatially auto correlated sample set (though the 'imposed' Figure 4. Variable Importance estimates from GLM and RF for the linear and unimodal species (at 95% confidence interval of the 20 realisations) for both cases of niche overlap with a background SAC 0%. Supporting information for the graphs for the other models. The red line indicates the VOI, the grey lines indicate the back-ground variables. Figure 5. Variable importance estimates from GLM and RF for the mixed unimodal and mixed linear species (at 95% confidence interval of the 20 realisations) for both cases of niche overlap with a background SAC 0%. The Supporting information present graphs for the other models. The red line indicates the VOI, the grey lines indicate the back-ground variables.
8 niche conditions are the same) and hence less discriminating (compare Fig. 6b with Fig. 6d). The linear response is defined only by its slope which is unaffected by this increase in spread in the sample (compare Fig. 6a with Fig. 6c). Therefore, for linear responses to environmental conditions an increasing level of SAC can display an increasing bias in the variable estimation, while for unimodal responses this bias can be compensated by the increasing width of the estimated species response curve to that environmental condition (regardless of the true (i.e. imposed by us in this study) width of that curve). This could also explains why in many studies with real datasets the results show an inconsistent effect of autocorrelation in the landscape on variable importance estimations (Bini et al. 2009).
The response curve specific behaviour could also be a result of the binomial distribution of the response data (presence or absence) which is a simplification of the underlying probability. This makes the model more sensitive to the input characteristics of the environmental conditions. This sensitivity might be less prominent in cases of continuous response datasets (e.g. with normal or Poisson distributions; Ripley andVenables 2002, Dormann et al. 2007). Nevertheless, it is known that also with continuous responses, spatially autocorrelated datasets affect a model's performance (Rocha et al. 2017(Rocha et al. , 2018, although the effect on variable importance estimations in such cases has not been investigated as we did in this study.

The higher importance of unimodal over linear responses
This study found an inherent bias for all the modelling methods to yield higher importance estimates for environmental conditions with unimodal responses. This is in agreement with the results of the studies by Meynard and Quinn (2007) and Santika and Hutchinson (2009). This can be explained by Liebig's law of the minimum, where the most constraining factor gains more importance (Huston 2002, Austin 2007. Unimodal responses describe the full range of suitable conditions, indicating upper and lower suitability ranges for an environmental condition. Linear responses are often a truncated version of unimodal responses, not able to indicate the constraints on one side of the environmental gradient. Therefore, unimodal responses are presumably better in constraining the suitability envelope for a given environmental variable. This study showed that the effect of higher variable importance estimates for unimodal responses should only be expected in cases where the species niche overlaps with the dominant environmental conditions within the sampled geographic extent. The absence of similar patterns in cases where the species niche is less prevalent in the sampled extent implies that there can be other reasons at play. One alternative explanation can be that the situation where the overlap of the Gaussian distribution of the environmental variable with the species niche is a rare phenomenon that inflates the variable importance for that environmental condition. Therefore, it is important to analyse the distribution of the sampled values with respect to the shape of the response curve (assuming this is known beforehand, which is often not the case). This finding further highlights the need to understand the scale of the sampled environmental variable when analysing a snapshot of the spatial extent (De Knegt et al. 2010). Mismatches in the scale between different variables (linear 'truncated' versus unimodal) can affect the estimated importance of environmental conditions that aid the description of the niche of a species with models.

Differing levels of niche overlap
The effect of an inflated importance is exaggerated when a species niche is overlapping with dominant environmental conditions in a study area. Therefore, common species may be more affected by inflation due to the spatial autocorrelation in environmental variables compared to species that prefer less rare environmental conditions and hence may be found in rare and restricted parts of a landscape. In such cases, it is possible that spatial autocorrelation in the explanatory variables is responsible for the differences in variable importance, as such evidence has been observed in some studies where specialist and generalist species were compared in the same geographical extent and resolution (Peers et al. 2012). When suitable environmental conditions are dominant in a study area for a generalist species, it can be more susceptible to pseudo replications due to the presence of spatial autocorrelation, and this is less the case for species that prefer rare conditions. Therefore, for species that are found at rare environmental conditions, the variable importance estimates are more consistent. The same response-specific bias exists, but the magnitude of the inflation is smaller than in the species that prefer more commonly occurring environmental conditions (compare Fig. 3b, d).

Relative scale of environmental variables
As predictor variables are increasingly being derived from remotely sensed imagery, the interplay between resolution, geographical extent and species responses to them, becomes more prominent. Spatial autocorrelation can be one of the outcomes of the interaction between these three parameters. At any extent of observations, there will always be multiple factors that contribute to the occurrence of a species at multiple scales (Levy 1992). The broader the scale of a variable (like climatic factors), the more autocorrelated in space it will be. Species responses to such variables will usually be truncated, since not enough of the specific environmental range (e.g. temperature ranges) is captured to identify a species optimum. Therefore, the relationship ends up being monotonic (e.g. the warmer, the better; Guo 2014). In cases where truncated responses are used in relation to other types of response curves (that are not constraining enough to gain statistical importance), the red shift will pose as a significant issue, and true ecological mechanisms will not be captured (Austin et al. 2010). This is possibly why multiple studies have identified climatic variables as being important in variable importance rankings (Yu et al. 2017). In other cases, using explanatory variables that are relevant but at different scales, the responses that are covering the full range (dominant conditions overlap with unimodal response) can be more important than the truncated ones ( Fig. 5a-b). Therefore, care must be taken in the choice of explanatory variables in terms of resolution and scale of the data (Atkinson 1993, De Knegt et al. 2010. Ideally, to override this bias, variables of a similar spatial scale must be sampled over an extent that captures the entire niche requirements of all the variables. Therefore, distribution modelling on larger scales (for example on a global extent) are less likely to be flawed (Elith and Leathwick 2009). Further, multi-scale analytical methods that incorporate variables at different scales with respect to the occupancy pattern at each given scale can help to provide a more holistic and scale-free analysis to estimate robust distribution patterns and variable importance estimates (Pearson et al. 2004, Lipsey et al. 2017.

A critique on the randomisation method of variable importance
Model-independent variable importance estimations, as defined by Thuiller et al. (2009), are sensitive to the pseudo replications of data points caused by spatial autocorrelation. Therefore, the estimates are likely to be biased when faced with high values of relative SAC and for truncated response curves. The reason is that the measure is ultimately dependant on the coefficient estimates, the standard error and the significance of the variable, all of which are inflated in the presence of autocorrelation . Other methods, like Monte Carlo approaches (Fortin and Dale 2005), repeat a randomisation (with only spatial autocorrelation structure preserved) multiple times to get a distribution of many possibilities of the importance values. Then simple statistical tests can be used to check if the real computed variable importance is significant or not. Araújo and Guisan (2006) further discussed the inability of regression models to identify individual contributions of predictors in an absolute sense (compared to another set of predictors). This argument holds for model-independent variable importance measures as well. Therefore, some researchers favour methods of hierarchical partitioning and variance partitioning that can provide more robust measures for computing the unbiased contribution of each variable (spatial counterpart and otherwise), though this method is not as useful for prediction (Heikkinen et al. 2005, Murray andConner 2009). The method we used is the most convenient way to estimate variable importance alongside with making predictions. Also this method is commonly applied in SDM studies. Hence we believe it is the best method for this study.

Conclusion
Many studies found inconclusive results regarding the effect of spatial autocorrelation on variable importance, though the problem of inflation of significances is a widely accepted issue. This study clarifies the basic functionalities of species distribution models when faced with autocorrelation in the environmental conditions, different types of species response curve geometry (due to scale mismatches), and the overlap between the niche and environmental conditions. We showed that the shape of the response curve and the overlap between species niche and environmental conditions modify the effects of spatial autocorrelation in environmental conditions on variable importance estimations. The inflating effect of spatial autocorrelation is mainly an issue when the optimal environmental conditions for a species overlap with the dominant environmental conditions of a species. This effect is exacerbated when the response to these conditions is linear, rather than unimodal. This study concludes that the red-shift, coined by Lennon (2000), is an evident flaw when a modelindependent variable importance estimation is employed in the presence of spatial autocorrelation, but only for specific species response characteristics. Also, relative differences in the levels of spatial autocorrelation between the environmental conditions are giving an indication to what extent inflation as a result of the SAC can be expected. Consequences of such red-shifted variable importance, though robust in a single spatiotemporal frame, can affect the predictive accuracy of the models when transferring the model to different regions or different time periods when the spatial correlation structure of the environmental variables vary (Duque-Lazo et al. 2016). It is misguided to consider the inflated importance as an actual ecological mechanism, and thus proper methods to account for the spatial structure are needed or a multi-scale analysis must be incorporated when computing model-independent variable importance estimates from species distribution models.