Elsevier

Ecological Modelling

Volumes 237–238, 10 July 2012, Pages 11-22
Ecological Modelling

Variation in niche and distribution model performance: The need for a priori assessment of key causal factors

https://doi.org/10.1016/j.ecolmodel.2012.04.001Get rights and content

Abstract

Ecological niche models and species distribution models are becoming important elements in the toolkit of biogeographers and ecologists. Although burgeoning in use, much variation exists in implementation of these techniques, leading to considerable diversity of methodology and discussion of what is the ‘best’ approach. In this analysis, we explore implications of different configurations of major factors that constrain species’ distributions—abiotic factors and dispersal limitation—for the success or failure of these models. We analyze variation in performance among modeling approaches as a function of the relative configuration of these two factors and the spatial extent of training region, with the result that a clear understanding of the abiotic-dispersal configuration is a prerequisite to effective model implementations; the effects of spatial extent of the training region are less consistent and clear. Model development will be powerful only when set in an appropriate and explicit biogeographic and population ecological context.

Highlights

Ecological niche models are now very popular in the ecological literature. ► Many studies using these approaches lack a proper conceptual framework. ► We analyze effects of abiotic niche and accessibility on model performance. ► Understanding the abiotic-dispersal configuration is crucial.

Introduction

In recent years, increasing effort has been invested in estimating ecological requirements of species and using those estimates to identify distributional areas. These methods are known as species distribution modeling (SDM) when emphasis is on estimating distributions of species, or ecological niche modeling (ENM) when emphasis is on niche requirements of species (Guisan and Thuiller, 2005, Marti et al., 2005, Peterson, 2006). Since 1990, growth in numbers of papers published in these fields has been almost exponential (Lobo et al., 2010). The following is a summary of relevant conceptual points, which are developed in greater detail in Peterson et al. (2011).

The typical correlational methods for SDM and ENM are based on finding regions in the space of environmental variables that are, in some mathematical sense, similar to conditions at sites where the species has been observed. Not surprisingly, many methods are capable of performing this task, and numerous studies have attempted to compare their performances (Guisan and Zimmermann, 2000, Segurado and Araújo, 2004, Elith et al., 2006, Pearson et al., 2006), some even trying to establish why different methods produce different answers (Elith and Graham, 2009). These questions are very relevant; to advance the discussion, however, we suggest that one must be explicit about what exactly is being modeled, with hypotheses regarding key factors affecting the modeled object. Several kinds of distributional areas exist, each with different properties (Soberón and Peterson, 2005, Jiménez-Valverde et al., 2008, Peterson et al., 2011), so delineating the aim of modeling efforts is crucial. Known presences in relation to probable absences define the occupied area of a species, making crucial an explicit understanding of the type of absence data that are available (Lobo et al., 2010). Another often-ignored point is the extent of the region from which background data are sampled, if the algorithm being used requires such information (Hirzel and Le Lay, 2008, VanDerWal et al., 2009, Barve et al., 2011, Elith et al., 2011). Finally, one needs to understand the actual mathematical operations that different algorithms perform on the data to arrive at estimates of the object of interest (Guisan and Zimmermann, 2000, Franklin, 2009).

In this study, we compare five ENM/SDM algorithms in a novel challenge, in which we specify unequivocally: (i) the type of distributional area being modeled, (ii) the configuration of factors causing the distributions, and (iii) the way in which algorithms use “background data.” Previous authors have concluded that the field of ENM/SDM is still immature, and clear guidance for selecting relevant methods cannot yet be provided (Elith and Graham, 2009). We agree with these authors that much work remains before we will understand fully the complexities of this field. However, substantial clarity can be obtained by assessing factors systematically, using virtual species as test beds. The basis of our analysis is a simple heuristic scheme, the BAM diagram (Soberón and Peterson, 2005, Soberón, 2007), which summarizes joint effects of biotic, abiotic, and dispersal characteristics of species; we apply this framework to virtual species for which the truth is known about what factors determine each distributional area.

Fig. 1 is a simplified representation of the BAM diagram, in which only two sets of factors affect distributions of species (Grinnell, 1924, Good, 1931, Udvardy, 1969, Brown et al., 1996, Gaston, 2003): the right combination of environmental conditions (the A circle), and the region of geography that has been accessible to the species over a given period of time (the M circle). We ignore biotic interactions (B) for reasons discussed below. From Fig. 1, we see that three regions exist that can reasonably be regarded as the object of a modeling exercise. First, the “occupied area” (Gaston, 2003), denoted by GO, is an area that presents the abiotic conditions that a species requires to survive and reproduce and that has been accessible to the species; by definition, GO = A  M. The second area is that which can potentially be invaded if the restrictions of M are relaxed (Svenning and Skov, 2004), defined as GI = A  MC, with the C denoting “complement.” GI is thus the set of areas with the right environmental conditions but that is currently inaccessible to the species; this area is the focus of most modeling exercises dealing with invasive species. The third area, the union of GO and GI, is equivalent to A in the BAM diagram. The question of whether one is attempting to model GO, A, or GI is key. The relative sizes and positions of B, A, and M should be explicit at the outset, because, as we will see below, different configurations of the BAM diagram lead to radically different capacities for algorithms to estimate the areas of interest.

As a side note, we can use the BAM framework to distinguish between two conceptual frameworks in this emerging field. When the focus is on estimating the occupied area GO, the study falls into the realm of SDM. Modeling GO requires information not only about favorable conditions for the species (i.e., its fundamental ecological niche), but also about factors that restrict its spread (biotic and geographic factors constraining dispersal) or overpredictions will result (Peterson et al., 1999). When the focus is on estimating A or GI, only the favorable conditions (and biotic circumstances for the case of GI) need to be estimated, which can be projected in geographic space; these potential distributional areas are the subject of ENM.

Consider now the different types of absence data that the above schemes may require (Peterson, 2006, Lobo et al., 2010). In Fig. 1, several types of “absence” data are shown, but the circles are absences owing to lack of suitable environmental conditions (black circles) within space that has been accessible to the species (Barve et al., 2011) or occupancy dynamics (white circles) within completely suitable and accessible areas (Hortal et al., 2010). The squares, however, are absences under suitable conditions, but at sites where the species is not present owing to inaccessibility. Black triangles are absences resulting from both conditions acting simultaneously. It should be obvious that the biological meaning of these three classes of absences is very different, particularly if one is attempting to model GI or A. Therefore, as has been emphasized previously (Barve et al., 2011), clear a priori hypotheses about M should be an integral part of the modeling exercise, despite the fact that no widely used SDM/ENM algorithm requests information explicitly about this region.

It is also important to keep in mind the implications of how different modeling algorithms operate on the data. First, one must distinguish between “background data,” used to characterize the overall landscape; pseudoabsence data, which are artificial absences created for algorithms that fit functions to binary data; and true absences, which are based on reliable field evidence of non-occurrence (although one still needs to ponder the different types of absences listed above). Among commonly used algorithms in ENM/SDM, for instance, Maxent uses background data to create a null model for a probability density (Elith et al., 2011); Desktop GARP uses pseudoabsences to fit some of its component methods (Stockwell and Peters, 1999); and GAM and multivariate regression methods use either pseudoabsence or true absence data, but the interpretations of models based on one or the other are not the same (Pearce and Boyce, 2006, Ward et al., 2009). BIOCLIM, DOMAIN, and other “envelope methods” operate with only presence data (Franklin, 2009). We use the term “non-presence data” to refer to any of the above characterizations of absence; obviously, choice of a reference area implies a choice of non-presence data (Barve et al., 2011). Hence, selecting reference regions carefully and with good biogeographic considerations is a crucial point in the modeling exercise (VanDerWal et al., 2009, Godsoe, 2010, Barve et al., 2011, Elith et al., 2011). Strict presence-only methods like envelope or distance techniques (Busby et al., 1991, Hirzel et al., 2002, Farber and Kadmon, 2003) may be less affected by choice of reference region.

Jiménez-Valverde et al. (2008) argued that ENM algorithms produce outputs that fall somewhere between the occupied area GO and the potential area A. In this paper, we explore this insight in greater detail, characterizing model results along this spectrum, rather than simply seeking a ‘best’ approach. We investigate implications of different hypotheses regarding the relative size and position of accessible and suitable areas and the size of the reference region for the ability of different algorithms to estimate GO and A. The result is a picture of the strengths and limitations of a variety of ENM/SDM algorithms under certain sets of biological and biogeographic circumstances.

Section snippets

Virtual niches and virtual species

We created three virtual fundamental niches for the purpose of exploring the scenarios and ideas described above: the effects of BAM scenario, training region, and modeling method on the efficacy of models purporting to estimate ecological niches and distributional areas. Simple fundamental niches were postulated by selecting rectilinear, non-interacting sets of conditions in a two-dimensional environmental space (annual mean temperature and annual precipitation), and identifying the geographic

Results

For each combination of algorithm, species, and BAM scenario, we evaluated the closeness of algorithm output to the corresponding actual (GO) and potential (A) distributional areas (Fig. 4). We noted consistent differences among BAM scenarios with respect to these measures. In the CB and HD scenarios, algorithms were capable of yielding relatively good predictions of both A and GO (recalling that in HD A = GO). The other two scenarios posed challenges that the algorithms we tested consistently

Discussion

This study departs from previous comparative analyses of ENM or SDM (e.g., Brotons et al., 2004, Segurado and Araújo, 2004, Elith et al., 2006, Elith and Graham, 2009) in that: (i) the relative importance of environmental suitability vs. dispersal (i.e., configurations of the BAM diagram) is considered, (ii) the measure of performance of the models is their correspondence to two known distributional areas (the occupied and potential distributions), and (iii) we distinguish between different

Acknowledgments

We thank our colleagues in thinking about these ideas, including in particular Fangliang He, Monica Papeş, Yoshinori Nakazawa, Sean Maher, Fabricio Villalobos, and Alberto Jiménez-Valverde. This research was supported in part by a grant to JS and ATP by Microsoft Research. AL-N received support from the Consejo Nacional de Ciencia y Tecnología, México (189216).

References (58)

  • L. Brotons et al.

    Presence–absence versus presence-only modelling methods for predicting bird habitat suitability

    Ecography

    (2004)
  • J.H. Brown et al.

    The geographic range: size, shape, boundaries, and internal structure

    Annual Review of Ecology and Systematics

    (1996)
  • J.M. Bullock et al.

    Geographical separation of two Ulex species at three spatial scales: does competition limit species’ ranges?

    Ecography

    (2000)
  • J.R. Busby et al.

    BIOCLIM – a bioclimate analysis and prediction system

  • J.S. Cabral et al.

    Estimating demographic models for the range dynamics of plant species

    Global Ecology and Biogeography

    (2010)
  • J. Elith et al.

    Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models

    Ecography

    (2009)
  • J. Elith et al.

    Novel methods improve prediction of species’ distributions from occurrence data

    Ecography

    (2006)
  • J. Elith et al.

    The art of modelling range-shifting species

    Methods in Ecology and Evolution

    (2010)
  • J. Elith et al.

    A statistical explanation of MaxEnt for ecologists

    Biodiversity Research

    (2011)
  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environmental Conservation

    (1997)
  • J. Franklin

    Mapping Species Distributions: Spatial Inference and Prediction

    (2009)
  • K. Gaston

    The Structure and Dynamics of Geographic Ranges

    (2003)
  • W. Godsoe

    I can’t define the niche but I know it when I see it: a formal link between statistical theory and the ecological niche

    Oikos

    (2010)
  • R.D. Good

    A theory of plant geography

    New Phytologist

    (1931)
  • J. Grinnell

    Geography and evolution

    Ecology

    (1924)
  • A. Guisan et al.

    Predicting species distribution: offering more than simple habitat models

    Ecology Letters

    (2005)
  • R.J. Hijmans et al.

    Computer tools for spatial analysis of plant genetic resources data: DIVA-GIS

    Plant Genetic Resources Newsletter

    (2001)
  • A.H. Hirzel et al.

    Ecological-niche factor analysis: how to compute habitat-suitability maps without absence data?

    Ecology

    (2002)
  • A.H. Hirzel et al.

    Habitat suitability modelling and niche theory

    Journal of Applied Ecology

    (2008)
  • Cited by (175)

    • Integrating the artificial intelligence and hybrid machine learning algorithms for improving the accuracy of spatial prediction of landslide hazards in Kurseong Himalayan Region

      2022, Artificial Intelligence in Geosciences
      Citation Excerpt :

      Accessibility to the range of solutions has considerably improved decision-makers willingness to ensure continuity in environmental development (Liu et al., 2008). Saupe et al. (2012) cautioned that the application of modeling methods would produce quite different outcomes and very different outputs in many other domains or with other applications. For this purpose, simulation results evaluations are critical for model efficiency and accuracy assessments (Briand et al., 2000; Stoyanov et al., 2013).

    View all citing articles on Scopus
    View full text