Variation in niche and distribution model performance: The need for a priori assessment of key causal factors
Highlights
► Ecological niche models are now very popular in the ecological literature. ► Many studies using these approaches lack a proper conceptual framework. ► We analyze effects of abiotic niche and accessibility on model performance. ► Understanding the abiotic-dispersal configuration is crucial.
Introduction
In recent years, increasing effort has been invested in estimating ecological requirements of species and using those estimates to identify distributional areas. These methods are known as species distribution modeling (SDM) when emphasis is on estimating distributions of species, or ecological niche modeling (ENM) when emphasis is on niche requirements of species (Guisan and Thuiller, 2005, Marti et al., 2005, Peterson, 2006). Since 1990, growth in numbers of papers published in these fields has been almost exponential (Lobo et al., 2010). The following is a summary of relevant conceptual points, which are developed in greater detail in Peterson et al. (2011).
The typical correlational methods for SDM and ENM are based on finding regions in the space of environmental variables that are, in some mathematical sense, similar to conditions at sites where the species has been observed. Not surprisingly, many methods are capable of performing this task, and numerous studies have attempted to compare their performances (Guisan and Zimmermann, 2000, Segurado and Araújo, 2004, Elith et al., 2006, Pearson et al., 2006), some even trying to establish why different methods produce different answers (Elith and Graham, 2009). These questions are very relevant; to advance the discussion, however, we suggest that one must be explicit about what exactly is being modeled, with hypotheses regarding key factors affecting the modeled object. Several kinds of distributional areas exist, each with different properties (Soberón and Peterson, 2005, Jiménez-Valverde et al., 2008, Peterson et al., 2011), so delineating the aim of modeling efforts is crucial. Known presences in relation to probable absences define the occupied area of a species, making crucial an explicit understanding of the type of absence data that are available (Lobo et al., 2010). Another often-ignored point is the extent of the region from which background data are sampled, if the algorithm being used requires such information (Hirzel and Le Lay, 2008, VanDerWal et al., 2009, Barve et al., 2011, Elith et al., 2011). Finally, one needs to understand the actual mathematical operations that different algorithms perform on the data to arrive at estimates of the object of interest (Guisan and Zimmermann, 2000, Franklin, 2009).
In this study, we compare five ENM/SDM algorithms in a novel challenge, in which we specify unequivocally: (i) the type of distributional area being modeled, (ii) the configuration of factors causing the distributions, and (iii) the way in which algorithms use “background data.” Previous authors have concluded that the field of ENM/SDM is still immature, and clear guidance for selecting relevant methods cannot yet be provided (Elith and Graham, 2009). We agree with these authors that much work remains before we will understand fully the complexities of this field. However, substantial clarity can be obtained by assessing factors systematically, using virtual species as test beds. The basis of our analysis is a simple heuristic scheme, the BAM diagram (Soberón and Peterson, 2005, Soberón, 2007), which summarizes joint effects of biotic, abiotic, and dispersal characteristics of species; we apply this framework to virtual species for which the truth is known about what factors determine each distributional area.
Fig. 1 is a simplified representation of the BAM diagram, in which only two sets of factors affect distributions of species (Grinnell, 1924, Good, 1931, Udvardy, 1969, Brown et al., 1996, Gaston, 2003): the right combination of environmental conditions (the A circle), and the region of geography that has been accessible to the species over a given period of time (the M circle). We ignore biotic interactions (B) for reasons discussed below. From Fig. 1, we see that three regions exist that can reasonably be regarded as the object of a modeling exercise. First, the “occupied area” (Gaston, 2003), denoted by GO, is an area that presents the abiotic conditions that a species requires to survive and reproduce and that has been accessible to the species; by definition, GO = A ∩ M. The second area is that which can potentially be invaded if the restrictions of M are relaxed (Svenning and Skov, 2004), defined as GI = A ∩ MC, with the C denoting “complement.” GI is thus the set of areas with the right environmental conditions but that is currently inaccessible to the species; this area is the focus of most modeling exercises dealing with invasive species. The third area, the union of GO and GI, is equivalent to A in the BAM diagram. The question of whether one is attempting to model GO, A, or GI is key. The relative sizes and positions of B, A, and M should be explicit at the outset, because, as we will see below, different configurations of the BAM diagram lead to radically different capacities for algorithms to estimate the areas of interest.
As a side note, we can use the BAM framework to distinguish between two conceptual frameworks in this emerging field. When the focus is on estimating the occupied area GO, the study falls into the realm of SDM. Modeling GO requires information not only about favorable conditions for the species (i.e., its fundamental ecological niche), but also about factors that restrict its spread (biotic and geographic factors constraining dispersal) or overpredictions will result (Peterson et al., 1999). When the focus is on estimating A or GI, only the favorable conditions (and biotic circumstances for the case of GI) need to be estimated, which can be projected in geographic space; these potential distributional areas are the subject of ENM.
Consider now the different types of absence data that the above schemes may require (Peterson, 2006, Lobo et al., 2010). In Fig. 1, several types of “absence” data are shown, but the circles are absences owing to lack of suitable environmental conditions (black circles) within space that has been accessible to the species (Barve et al., 2011) or occupancy dynamics (white circles) within completely suitable and accessible areas (Hortal et al., 2010). The squares, however, are absences under suitable conditions, but at sites where the species is not present owing to inaccessibility. Black triangles are absences resulting from both conditions acting simultaneously. It should be obvious that the biological meaning of these three classes of absences is very different, particularly if one is attempting to model GI or A. Therefore, as has been emphasized previously (Barve et al., 2011), clear a priori hypotheses about M should be an integral part of the modeling exercise, despite the fact that no widely used SDM/ENM algorithm requests information explicitly about this region.
It is also important to keep in mind the implications of how different modeling algorithms operate on the data. First, one must distinguish between “background data,” used to characterize the overall landscape; pseudoabsence data, which are artificial absences created for algorithms that fit functions to binary data; and true absences, which are based on reliable field evidence of non-occurrence (although one still needs to ponder the different types of absences listed above). Among commonly used algorithms in ENM/SDM, for instance, Maxent uses background data to create a null model for a probability density (Elith et al., 2011); Desktop GARP uses pseudoabsences to fit some of its component methods (Stockwell and Peters, 1999); and GAM and multivariate regression methods use either pseudoabsence or true absence data, but the interpretations of models based on one or the other are not the same (Pearce and Boyce, 2006, Ward et al., 2009). BIOCLIM, DOMAIN, and other “envelope methods” operate with only presence data (Franklin, 2009). We use the term “non-presence data” to refer to any of the above characterizations of absence; obviously, choice of a reference area implies a choice of non-presence data (Barve et al., 2011). Hence, selecting reference regions carefully and with good biogeographic considerations is a crucial point in the modeling exercise (VanDerWal et al., 2009, Godsoe, 2010, Barve et al., 2011, Elith et al., 2011). Strict presence-only methods like envelope or distance techniques (Busby et al., 1991, Hirzel et al., 2002, Farber and Kadmon, 2003) may be less affected by choice of reference region.
Jiménez-Valverde et al. (2008) argued that ENM algorithms produce outputs that fall somewhere between the occupied area GO and the potential area A. In this paper, we explore this insight in greater detail, characterizing model results along this spectrum, rather than simply seeking a ‘best’ approach. We investigate implications of different hypotheses regarding the relative size and position of accessible and suitable areas and the size of the reference region for the ability of different algorithms to estimate GO and A. The result is a picture of the strengths and limitations of a variety of ENM/SDM algorithms under certain sets of biological and biogeographic circumstances.
Section snippets
Virtual niches and virtual species
We created three virtual fundamental niches for the purpose of exploring the scenarios and ideas described above: the effects of BAM scenario, training region, and modeling method on the efficacy of models purporting to estimate ecological niches and distributional areas. Simple fundamental niches were postulated by selecting rectilinear, non-interacting sets of conditions in a two-dimensional environmental space (annual mean temperature and annual precipitation), and identifying the geographic
Results
For each combination of algorithm, species, and BAM scenario, we evaluated the closeness of algorithm output to the corresponding actual (GO) and potential (A) distributional areas (Fig. 4). We noted consistent differences among BAM scenarios with respect to these measures. In the CB and HD scenarios, algorithms were capable of yielding relatively good predictions of both A and GO (recalling that in HD A = GO). The other two scenarios posed challenges that the algorithms we tested consistently
Discussion
This study departs from previous comparative analyses of ENM or SDM (e.g., Brotons et al., 2004, Segurado and Araújo, 2004, Elith et al., 2006, Elith and Graham, 2009) in that: (i) the relative importance of environmental suitability vs. dispersal (i.e., configurations of the BAM diagram) is considered, (ii) the measure of performance of the models is their correspondence to two known distributional areas (the occupied and potential distributions), and (iii) we distinguish between different
Acknowledgments
We thank our colleagues in thinking about these ideas, including in particular Fangliang He, Monica Papeş, Yoshinori Nakazawa, Sean Maher, Fabricio Villalobos, and Alberto Jiménez-Valverde. This research was supported in part by a grant to JS and ATP by Microsoft Research. AL-N received support from the Consejo Nacional de Ciencia y Tecnología, México (189216).
References (58)
- et al.
Evaluating predictive models of species’ distributions: criteria for selecting optimal models
Ecological Modelling
(2003) - et al.
The crucial role of the accessible area in ecological niche modeling and species distribution modeling
Ecological Modelling
(2011) - et al.
Assessment of alternative approaches for bioclimatic modeling with special emphasis on the Mahalanobis distance
Ecological Modeling
(2003) - et al.
Predictive habitat distribution models in ecology
Ecological Modelling
(2000) - et al.
Model selection in ecology and evolution
Trends in Ecology and Evolution
(2004) - et al.
Maximum entropy modeling of species geographic distributions
Ecological Modelling
(2006) - et al.
Selecting pseudo-absence data for presence-only distribution modeling: how far should you stray from what you know?
Ecological Modelling
(2009) - et al.
The effect of the extent of the study region on GIS models of species geographic distributions and estimates of niche evolution: preliminary tests with montane rodents (genus Nephelomys) in Venezuela
Journal of Biogeography
(2010) - et al.
Five (or so) challenges for species distribution modelling
Journal of Biogeography
(2006) - et al.
Opening the climate envelope reveals no macroscale associations with climate in European birds
Proceedings of the National Academy of Sciences of United States of America
(2008)
Presence–absence versus presence-only modelling methods for predicting bird habitat suitability
Ecography
The geographic range: size, shape, boundaries, and internal structure
Annual Review of Ecology and Systematics
Geographical separation of two Ulex species at three spatial scales: does competition limit species’ ranges?
Ecography
BIOCLIM – a bioclimate analysis and prediction system
Estimating demographic models for the range dynamics of plant species
Global Ecology and Biogeography
Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models
Ecography
Novel methods improve prediction of species’ distributions from occurrence data
Ecography
The art of modelling range-shifting species
Methods in Ecology and Evolution
A statistical explanation of MaxEnt for ecologists
Biodiversity Research
A review of methods for the assessment of prediction errors in conservation presence/absence models
Environmental Conservation
Mapping Species Distributions: Spatial Inference and Prediction
The Structure and Dynamics of Geographic Ranges
I can’t define the niche but I know it when I see it: a formal link between statistical theory and the ecological niche
Oikos
A theory of plant geography
New Phytologist
Geography and evolution
Ecology
Predicting species distribution: offering more than simple habitat models
Ecology Letters
Computer tools for spatial analysis of plant genetic resources data: DIVA-GIS
Plant Genetic Resources Newsletter
Ecological-niche factor analysis: how to compute habitat-suitability maps without absence data?
Ecology
Habitat suitability modelling and niche theory
Journal of Applied Ecology
Cited by (175)
An exhaustive evaluation of modeling ecological niches above species level to predict marine biological invasions
2023, Marine Environmental ResearchIntegrating the artificial intelligence and hybrid machine learning algorithms for improving the accuracy of spatial prediction of landslide hazards in Kurseong Himalayan Region
2022, Artificial Intelligence in GeosciencesCitation Excerpt :Accessibility to the range of solutions has considerably improved decision-makers willingness to ensure continuity in environmental development (Liu et al., 2008). Saupe et al. (2012) cautioned that the application of modeling methods would produce quite different outcomes and very different outputs in many other domains or with other applications. For this purpose, simulation results evaluations are critical for model efficiency and accuracy assessments (Briand et al., 2000; Stoyanov et al., 2013).
Modelling the potential impact of climate change on Carapa procera DC. in Benin and Burkina Faso (West Africa)
2024, Modeling Earth Systems and Environment