Introduction

Mapping the world’s biodiversity is a challenging responsibility. It is even more challenging to reliably predict how species’ distributions will change due to threats such as climate change and habitat destruction [25, 26, 93]. To accomplish these important tasks, biogeographers, ecologists, statisticians, and the machine learning community have worked to create models that predict species’ distributions and facilitate inference such as the habitat characteristics that individuals of a species prefer. The goal of this review is to present a unified view of many commonly used species distribution models. To accomplish this goal, we review the common types of data used to model species’ distributions. We then discuss a unified spatio-temporal point process model for count, presence-absence, and presence-only data that has been developed within a hierarchical framework in several studies [11, 23, 31, 62]. The hierarchical modeling framework provides a way to incorporate important components of the data collection process (e.g., detection) and ecological processes (e.g., spread of an invasive species). By adding these components, the hierarchical framework allows for unification of many existing models and numerous extensions, including a wide variety of regression and theory-based models as well as explicit models of the data collection process.

Data Types

There are three types of data commonly used for species distribution modeling: count, presence-absence, and presence-only. Although each type of data may seem distinct, we show in the next section that many types of data can be thought of as arising from a spatio-temporal point process (Fig. 1).

Fig. 1
figure 1

Discrete time inhomogeneous Poisson point pattern with an intensity ( λ(s,t)) that increases at each time step (panel a; t=1,2,3). Presence-absence data generated by aggregating the points into grid cells (panel b; u=0 are white cells and u=1 are green cells). Count data generated by aggregating points into grid cells and counting the total number of points in each cell (panel c)

Count Data

Count data are generally collected using systematic surveys. Two systematic survey types that are used to generate count data are point counts and quadrat counts. Point counts involve at least one observer visiting a preselected point and recording the number of individuals they see during a prescribed time interval (e.g., Breeding Bird Survey data). Quadrat counts involve searching an area of known dimensions and recording the number of individuals found during a fixed period of time.

Presence-Absence Data

Presence-absence data can be count data that have been collapsed to binary responses where a one indicates that at least one individual was present and a zero indicates absence or non-detection. Presence-absence data can also be collected using the same protocol as point counts and quadrat counts, but the survey is terminated when at least one individual has been observed.

Presence-Only Data

Unlike count and presence-absence data, which are usually collected at locations and time periods selected using a sampling design, presence-only data are typically collected opportunistically. Data sources and collation protocols can vary widely, but common sources include museum specimens and citizen contributed sightings of a species [7, 27, 46]. Typically presence-only data are reported as the location and time that an individual of a species was observed or collected. Another data collection process that gives rise to presence-only data is recording multiple locations of the same individual (e.g., telemetry data; [60]). For example, telemetry data is usually recorded as a time series of presence-only locations of an individual. The analysis of relocation data is often referred to as resource selection, but important connections between the statistical methods used for resource selection modeling and species distribution modeling exist [69, 73, 74].

Distribution Theory

An individual of a species exists as a point in space at any given time (Fig. 1a). A statistical model describing the pattern created by points in space and time is the space-time Poisson point process distribution [16, 20]. The log-likelihood for a space-time Poisson point process is

$$ l(\lambda;\mathbf{U})={\sum}_{i=1}^{n}\log\lambda(\mathbf{s}_{i},t_{i})-\intop_{\mathcal{A}}{\intop_{0}^{T}}\lambda(\mathbf{s},t)dtd\mathbf{s}-\log(n!)\:, $$
(1)

where the matrix U is a n×3 matrix with rows that contain the location s i in two-dimensional space and time t i of the i th point ( i=1,...,n), \(\mathcal {A}\) is the two-dimensional study area, and the time period of observation is [0,T]. The intensity function λ(s,t) will depend on parameters and describes how the expected abundance (for an infinitely small region) changes within the study area and time period of observation. In words, Eq. 1 describes points that occur over a study area (\(\mathcal {A}\)) and time interval [0,T] (Fig. 1a). The rate at which these points occur at any given location s and point in time t is governed by the intensity function λ(s,t).

An important quantity derived from the point process model is the integrated intensity function

$$ \bar{\lambda}=\intop_{\mathcal{B}}\intop_{t_{1}}^{t_{2}}\lambda(\mathbf{s},t)dtd\mathbf{s}\:, $$
(2)

where \(\mathcal {B}\) is an area of interest within the study area \(\mathcal {A}\) (i.e., \(\mathcal {B}\subseteq \mathcal {A}\) ) over the time interval t 1 to t 2 ( 0≤t 1<t 2T; hereafter \(\mathcal {B}\), t 1, and t 2 are termed the space-time volume of interest). An important result is that the number of points (u) within the space-time volume of interest follows a Poisson distribution

$$ u\sim\text{Poisson}\left( \bar{\lambda}\right), $$
(3)

that depends on the integrated intensity function. Another important derived quantity is the probability that the space-time volume of interest has at least one point in it. This probability can be calculated as \(\text {P}(u>0)=1-e^{-\bar {\lambda }}\). It follows that the presence or absence of a species within the space-time volume of interest is distributed as

$$ I(u>0)\sim\text{Bernoulli}(1-e^{-\bar{\lambda}})\:, $$
(4)

where the indicator function I(u>0) takes on a value of one if the volume has one or more points in it and zero otherwise.

The space-time Poisson point process distribution naturally describes presence-only data when recorded as the location and time an individual was observed or collected. As a result, variants of the Poisson point process model are popular for modeling species’ distributions using presence-only data [83, 99]. By specifying an area (\(\mathcal {B}\)) and time interval [t 1,t 2] of interest, the space-time Poisson point process distribution can be linked to count and presence-absence data, which can be interpreted as points aggregated over a spatial and temporal extent ([1]; Fig. 1b & c). For example, \(\mathcal {B}\), t 1, and t 2 could represent a circle of a 100 m radius with an observer at the center counting the number of Bobwhite Quail (Colinus virginianus) seen within a 10 minute time interval. In this example, aggregation would occur when the observer records the count instead of the exact location and time that each Bobwhite was present. As another example, \(\mathcal {B}\), t 1, and t 2 could represent a 1 m2 plot where the presence or absence of cheat grass (Bromus tectorum) is observed within a year. Regardless of whether the recorded data are counts, presence-absence, or presence-only, conceptualizing the data as a space-time Poisson point process is advantageous for the following reasons: 1) understanding the effects of spatial or temporal data aggregation (Fig. 1; 2) understanding and modeling the data collection process; and 3) preserving the ability to model continuous spatio-temporal dynamics.

Hierarchical Modeling Framework

Regardless of the type of data, in most studies, the quantity of interest is derived from the space-time intensity function λ(s,t). As we mentioned, λ(s,t) depends on parameters. For example, the intensity function is often formulated such that

$$ \log(\lambda(\mathbf{s},t))=\boldsymbol{\mathbf{x}}(\mathbf{s},t)^{\prime}\boldsymbol{\beta}\:, $$
(5)

where x(s,t) is a p×1 vector that contains covariates at any given location and time within the study area and β is a p×1 vector of regression coefficients. Readers familiar with generalized linear models may recognize Eq. 5 as a linear predictor with a log link function (similar to that used with Poisson regression for count data). In fact, there are many important connections between certain generalized linear models (e.g., Poisson and logistic regression) and the Poisson point process model [1, 3, 32, 99]. Similar to the generalized linear modeling framework for count and binary data, Eqs. 1 and 5 specify a flexible model that can be used to describe how the response variable (e.g., a point pattern) changes over space and time due to covariates. The important connection is that count and binary data are aggregated point patterns that can be modeled using logistic or Poisson regression (Fig. 1), but interpreted as a discrete approximation to the space-time Poisson point process [1].

By placing the space-time Poisson point process model within a hierarchical modeling framework, the model can be made even more flexible. For example, a hierarchical modeling framework provides a way to incorporate important components of both data collection and ecological processes. The framework we present originates from [8], but introductions can be found in [15, 16], and [51]. Following [35] , we use the bracket notation to represent conditional probability density functions and write the hierarchical species distribution model as

$$ g(\mathbf{Y)}\sim[g(\mathbf{Y})|\mathbf{U},\mathbf{\boldsymbol{\theta}}]\:, $$
(6)
$$ \mathbf{U}\sim[\mathbf{U}|\lambda(\mathbf{s},t)]\:, $$
(7)
$$ \lambda(\mathbf{s},t)\sim[\lambda(\mathbf{s},t)|\boldsymbol{\beta}]\:, $$
(8)

where Eq. 6 is called the data model and is a probability density function (or some transformation g(⋅) thereof) that describes the observed point level data Y given the underlying spatio-temporal point pattern U parameters 𝜃. The observed point level data Y may be aggregated to create binary or count data by transforming Y with a deterministic function g(⋅). Equation 7 is called the sampling model, which is the space-time Poisson point process distribution from Eq. 1 that is conditional on the intensity function λ(s,t). Finally, the distribution of λ(s,t) is called the process model and depends on the parameter vector β (Eq. 8). In Fig. 2, we show visual examples of Eqs. 68 with a directed acyclic graph (also called a Bayesian network) that can be used to understand and represent the hierarchical species distribution model [51]. In the following sections, we focus on each level of the hierarchical model and conclude with comments on implementation.

Fig. 2
figure 2

Example aggregated data, observed point data, true point data, and processes (left figures) that can be used with the directed acyclic graph (right; arrow diagram) to visualize and represent a hierarchical species distribution model. Point pattern data Y at three time steps that are observed with error when compared to the underlying true point pattern U. The point pattern data Y might be aggregated by a transformation g(⋅) during the data collection process (e.g., points represented as count or presence-absence in grid cells). The true point pattern ( U) depends on the spatio-temporal intensity function λ(s,t) which was generated using a spatio-temporal diffusion process. The data Y t were observed with location error where the gray points are the true locations and black + signs are the recorded locations. The data Y t+1 were observed from biased sampling efforts that resulted in the blacked out areas not being sampled. The data Y t+2 resulted from non-detection of U t+2 which is represented by the difficulty of observing points in the shaded region. Note, in the directed acyclic graph, the solid lines show stochastic relationships and the dashed lines show deterministic relationships

Process Model

The process model (Eq. 8) can take a wide range of forms depending on the goals of the study. For example, the process model may have a relatively simple form such as a log-linear model that includes covariates x(s,t) (e.g., Eq. 5) or it may be a dynamic model such as a partial differential equation. There are three common classes of process models used to determine the distribution of a species: parametric regression, semiparametric regression, and dynamic spatio-temporal models.

Parametric Regression

The goal of parametric regression is to formulate a link between abundance of a species and covariates. This is typically accomplished using a model of the form \(\log (\lambda (\mathbf {s},t))=\boldsymbol {\mathbf {x}}(\mathbf {s},t)^{\prime }\boldsymbol {\beta }\), where the log function links the known spatial covariates and unknown regression parameters β to the underlying intensity λ(s,t). Similar to generalized linear models, the strength of the parametric regression model is that it is simple to implement and easy to interpret.

Semiparametric Regression

As a generalization of parametric regression, semiparametric regression also links the underlying intensity λ(s,t) to some number of covariates x(s,t). The difference is that semiparametric regression involves functions of the covariates that allow the model to capture nonlinear effects (e.g., polynomial regression). In most situations, semiparametric regression is implemented using basis functions [47]. Many of the statistical and machine learning methods used for species distribution modeling rely on basis functions. For example, generalized additive models are used to model species’ distributions and employ various basis functions to model the smooth effect of a covariate [39, 44]. Other commonly used methods that rely on basis functions include regression trees [29, 44], some maximum entropy methods (e.g., MAXENT; [30]), and latent Gaussian process models [12, 21, 37, 53, 59, 67]. One advantage of semiparametric regression is that smooth effects of the spatial location or time period can be used to account for spatial or temporal autocorrelation [47].

Dynamic Process Models

Many ecological theories have generated hypotheses about why the abundance of a species varies across space and through time [52]. However, the linkages between species distribution modeling approaches based on regression or semiparametric regression models and ecological theory are weak [24, 28, 40]. Dynamic spatio-temporal models can be used within the hierarchical framework to link theoretical models to data when modeling the distribution of a species. For example, species distribution models are commonly used to understand and predict the spread of an invasive species. A simple partial differential equation that describes the spread (ecological diffusion) and logistic growth of a population is

$$ \frac{\partial\lambda(\mathbf{s},t)}{\partial t}=\left( \frac{\partial^{2}}{\partial {s_{1}^{2}}}+\frac{\partial^{2}}{\partial {s_{2}^{2}}}\right)\delta(\mathbf{s},t)+r\left( 1-\frac{\lambda(\mathbf{s},t)}{k}\right)\:, $$
(9)

where s 1 and s 2 are elements of the spatial coordinate s and δ(s,t) is the diffusion coefficient (or motility coefficient) that controls the rate of spread and is inversely related to residence time [34]. The parameters r and k are analogous to the growth rate and equilibrium population size in the traditional logistic growth model when expressed as a ordinary differential equation. Various implementations of this approach have been used to model the invasion and spread of Eurasian Collared-Doves (Streptopelia decaocto), House Finches (Carpodacus mexicanus), Common Mynas (Acridotheres tristis), and mountain pine beetles (Dendroctonus ponderosae) [9, 55, 82, 101, 104]. Partial differential equation models are well developed for many ecological processes and are easy to modify to match the goals of a study [52]; any of the parameters in Eq. 9 could depend on covariates x(s,t) (e.g., \(\delta (\mathbf {s},t)=\boldsymbol {\mathbf {x}}(\mathbf {s},t)^{\prime }\boldsymbol {\beta }\)). For example, [54] allowed the diffusion coefficient δ(s,t) to depend on human population density and showed that diffusion of Eurasian Collared-Doves might be negatively associated with human population density.

Dynamic statistical models enable scientific inference that can not be obtained when using parametric or semiparametric regression [102]. The dynamical approach to statistical modeling has been present in the ecological literature for quite some time [101] and has been used in many areas of environmental research [16], but has not been widely used for species distribution modeling.

Sampling Models

The sampling model (Eq. 7) is a probability density function that describes the data that would be observed if the data could be recorded perfectly. For the space-time Poisson point process model, the matrix U (where each row is the exact location and time that an individual was present) contains the data that would have been recorded if there were no errors in observation or aggregation of the data (Fig. 2). For most contemporary analyses of presence-only data, the Poisson point process is the sampling model that has been used and advocated as the preferred method by many authors [1, 43, 83, 84, 98, 99]. Typically, count data and presence-absence data are treated differently due to the aggregation that occurs during the data collection process (Fig. 1; [1]). Given the space-time Poisson point process, the aggregate (over space and time) of U is u, a m×1 vector that contains the true count or presence-absence for m discrete sample units during some time interval (Fig. 1b & c). Associated with each element of u is the location of the discrete sampling unit and the time period it was sampled. As shown in the distribution theory section, aggregation of a spatio-temporal Poisson point process results in Poisson or Bernoulli distribution for u. The interpretation when using Poisson or Bernoulli sampling models can be linked to the Poisson point process via the integrated intensity function (Eq. 2).

Data Models

In almost all studies, the underlying spatio-temporal point pattern U (or some aggregate u) is not observed perfectly (e.g., [5, 61, 65, 66, 77]; Fig. 2). The hierarchical modeling framework provides a natural way to incorporate uncertainty in observations. This is accomplished via the data model (Eq. 6) which is a conditional distribution that describes the observed data Y (or some aggregate g(Y)≡y) given the true underlying point pattern U (Fig. 2). For most species distribution modeling efforts, determining which data model is appropriate can be the greatest challenge to obtaining reliable predictions and inference. Next, we highlight common data models for count, presence-absence, and presence-only data.

Data Models for Counts

Count data generally suffer from two types of observer error. Within an area of interest \(\mathcal {B}\), over the time period t 1 to t 2, the true number of individuals present is u. The observed count y can contain individuals that were counted multiple times, not detected, or both. The most common data model assumes that the only observation error is non-detection. With a Poisson sampling model, this approach is called the N-mixture model [85]

$$ y|u,\!p\sim\left\{\begin{array}{ll} \text{Binomial}(u,p) & ,u>0\\ 0 & ,u=0 \end{array}\right., $$
(10)

where y is the observed count, u is the unknown true number of individuals, and p is the probability of detection. There are many variations of the N-mixture model, but most implementations require repeated sampling over multiple time periods during which the true number of individuals present (u) within the area (\(\mathcal {B}\)) and time period ([ t 1,t 2]) of interest can be assumed to be constant. The implication of this assumption is that the underlying point pattern must be static over the multiple sampling intervals to ensure that u is constant. There are alternative models that require only a single sample [17, 50, 90], but such models require stronger assumptions or structure in the process model and are a current topic of debate [63, 64, 91].

Data Models for Presence-Absence

Presence-absence data generally suffer from similar observation errors as count data, but these errors are termed false-positive and false-negatives (non-detection). Data models for false negatives are the most widely used (e.g., [71, 95]). A commonly used form of the data model for false-negatives is

$$ y|u,\!p\sim\left\{\begin{array}{ll} \text{Bernoulli}(p) & ,u>0\\ 0 & ,u=0 \end{array}\right. $$
(11)

where y is the observed presence or absence, u is the unknown true number of individuals, and p is the probability that at least one individual is detected. As with the N-mixture model, Eq. 11 requires repeated sampling over multiple time periods during which the occupancy is constant (i.e., u>0 or u=0 must remain constant). Numerous variations of Eq. 11 have been used to account for false-negatives that make less restrictive assumptions (e.g., data from single-site visits [18, 70]). Recently there have been controlled experiments to assess the performance of these data models when basic assumptions are not met (e.g., the assumption that no false positives occur [75]). Also, data models have been recently developed to deal with false-positives [2, 13, 87, 88] and species misidentification [76]. Although Eq. 11 and variants have been widely used as data models, there is debate about whether the model assumptions are met in practice [38, 100].

Data Models for Presence-Only

Unlike count and presence-absence data, presence-only data are almost always collected opportunistically. Determining what types of observational errors might have occurred with presence-only data and how these errors can be modeled is an emerging area of research. Three common types observational errors that we discuss are: sampling bias, non-detection, and location error (Fig. 2).

As before, let \(\mathcal {A}\) be the two-dimensional “study area” and \(\mathcal {B}\) be an area that was “sampled.” With count and presence-absence data, we know \(\mathcal {B}\) as a result of the survey design. For most designed surveys, there will be multiple areas that are sampled so that we have \(\mathcal {B}_{1},...,\mathcal {B}_{m}\) non-overlapping sites. As a result, the entire study area (\(\mathcal {A}\)) is the union of the areas that were sampled (\(\mathcal {A}=\mathcal {B}_{1}\cup \cdots \cup \mathcal {B}_{m}\); note that a similar argument applies to the time interval [0,T]). For designed surveys, \(\mathcal {A}\) is a known spatial domain, but for presence-only data there is rarely information to determine \(\mathcal {A}\) exactly. The space-time Poisson point process model requires that \(\mathcal {A}\) is known due to the limits of integration in Eq. 1. Various ad hoc methods have been used in an attempt to remedy the fundamental lack of information that would otherwise have been obtained from the survey design. For example, a common technique is to sample a large number (q) of randomly (or uniformly) located points where it is assumed that the species was not present (often called pseudo-absences). The area where the pseudo-absences are sampled from (\(\mathcal {P}\)) generally encompasses the locations of the observed data. The presence-only data and pseudo-absences are then treated as presence-absence data. Alternatively, Warton and Shepherd [99] showed that the pseudo-absences are a numerical technique for approximating the integral in Eq. 1, if one assumes that \(\mathcal {A}=\mathcal {P}\). Numerous studies have explored the bias from the so-called pseudo-absence approach for binary data. Given the connection between binary data and the Poisson point process models (Eq. 4), the results from studies on the pseudo-absence approach for binary data should also apply to the Poisson point process model (e.g., [4, 80, 103] and many others). How the study area \(\mathcal {A}\) is defined has the potential to influence both prediction and inference [6, 96]. However, for most situations the assumption that \(\mathcal {A}=\mathcal {P}\) is difficult to assess. We are not aware of a data model that can remedy the problems associated with this form of sampling bias.

Similar to count and presence-absence data, some individuals go undetected. As before, let \(\mathcal {A}\) be the two-dimensional study area. Now assume that everywhere within \(\mathcal {A}\) there is a non-zero probability that a point could be detected ( p(s,t)). In effect, we are assuming that the entire study area was “sampled,” and that all individuals could have been detected. This assumption differs from that in the previous example in that now we are assuming that \(\mathcal {A}\) is known, but that not all points within \(\mathcal {A}\) are detected. For example, [46] presented data that included citizen contributed sightings of the endangered Whooping Crane (Grus americana) within the state of Nebraska, USA. For the study presented in [46], it might be reasonable to assume that \(\mathcal {A}\) is the state of Nebraska and that detection was not perfect. Assume, for a moment, that p(s,t)=1, ∀ s,t. As a result, detection of points would be perfect and the observed data would be U (the n×3 matrix with rows that contains the location and time of each sighting; Fig. 2). If detection is not perfect, then the observed data are Y (the m×3 matrix with rows that contains the location and time of each observed individual where mn). Non-detection results in a thinned spatio-temporal Poisson point process model, which has the following likelihood

$$\begin{array}{@{}rcl@{}} l(\lambda,p;\mathbf{Y})\!\!&=&\!\!{\sum}_{i=1}^{m}\log\lambda(\mathbf{s}_{i},t_{i})p(\mathbf{s},t)\,-\,\intop_{\mathcal{A}}{\intop_{0}^{T}}\lambda(\mathbf{s},t)p(\mathbf{s},t)dtd\mathbf{s}\\ &&-\log(m!)\:. \end{array} $$
(12)

The thinned Poisson point process (Eq. 12) is the same as Eq. 1, except that the intensity function λ(s,t) has been multiplied by (or is a convolution with) the probability of detection p(s,t). To understand the implications of thinning a point process, let \(\mathcal {B}\) be an area of interest within the study area \(\mathcal {A}\) over the time interval [t 1,t 2] and define the integrated probability of detection as

$$ \bar{p}=\frac{1}{|\mathcal{B}||t_{2}-t_{1}|}\intop_{\mathcal{B}}\intop_{t_{1}}^{t_{2}}p(\mathbf{s},t)dtd\mathbf{s}\:, $$
(13)

where \(|\mathcal {B}|\) is the area of \(\mathcal {B}\) and |t 2t 1| is length of the time interval of interest. As a result, the true number of individuals within \(\mathcal {B}\) over the time interval [t 1,t 2] is u as in Eq. 3, but the observed number of individuals is distributed

$$ y\sim\text{Poisson}(\bar{p}\bar{\lambda})\:, $$
(14)

where \(\bar {\lambda }\) is the integrated intensity from Eq. 2. When only Y (or y) is observed, and without additional information about λ(s,t) or p(s,t) (or any derived quantities \(\bar {\lambda }\) and \(\bar {p}\)), it is impossible to simultaneously estimate both quantities. Without a correction for non-detection, parameters estimated from the process model for λ(s,t) will be biased. There are special cases when using parametric regression process models where the bias might not be an issue [22, 49, 98], but it is unlikely that the required assumptions will be met in practice and are challenging to verify in most cases [49]. To correct for bias, recent efforts have paired presence-only data with additional data sources such as expert knowledge [46] or a smaller number of perfectly detected presence-absence or count data [23, 31, 33, 36, 42]. Determining what sources of data can be used to correct for the bias and how to incorporate auxiliary sources of data is an ongoing area of research.

Another common source of observational error occurs when the individual is present at a location that is different than the recorded location (Fig. 2). For some historical records, the recorded location may be the nearest road, town, or other feature on the landscape. For many applications, this presents a problem because λ(s,t) varies spatially. As a result, the intensity at the recorded locations will be different than at the true locations. For example, location error is a problem when λ(s,t) depends on spatial covariates (e.g., Eq. 5) and the covariates at the recorded location are different than the covariates at the true location [45]. With respect to count and presence-absence data arising from a space-time Poisson point process, an implicit assumption is that the covariates can be represented by a single point within the discrete sampling units. Aggregation introduces location error, and it is important to check whether this error has a noticeable effect on inference [31]. Development of data models that account for location error within the hierarchical modeling framework is needed. Data models developed for accommodating covariate measurement error may provide useful extensions to correct for location error [45, 92].

Implementation

Hierarchical models can be implemented under a Bayesian or likelihood paradigm (e.g., [51, 68, 81, 86]). The complexity of the model usually dictates which paradigm is used. For example, it is common to implement hierarchical models that use partial differential equation (e.g., Eq. 9) using Bayesian algorithms such as Markov chain Monte Carlo (MCMC; [102]) or integrated nested Laplace approximation [57, 58], whereas, simpler models that use basis functions might use penalized maximum likelihood [47].

Data models are rarely used in the analysis of presence-only data (but see [23, 31, 45]). If sampling error is ignored, the Poisson point process model is relatively straightforward to implement using maximum likelihood, penalized maximum likelihood, or Bayesian estimation (via MCMC or integrated nested Laplace approximation [57, 58]). Recently there has been a focus on understanding how generalized linear models with Poisson or Bernoulli response distributions can be used to approximate the Poisson point process model [1, 3, 32, 99]. Most software used to implement semiparametric regression was developed as an extension of the generalized linear modeling framework (e.g., generalized additive models). A major benefit of the recent work on approximating the Poisson point process model is that widely available software can be used when the data are assumed to be observed without error [32].

Regardless of the method used, an important task when implementing the point process model is to approximate the integral in Eq. 1 [99], which requires spatial covariate data at all locations within the study area. The quality of this approximation can be assessed by viewing plots of the estimated likelihood or parameters against the number of integration points (e.g., see Fig. 1 in [78], Fig. 2 in [83], or Appendix S2 in [48]). Also important to implementation are various computational problems, such as infinite and boundary estimates, that can also arise for particular species distribution models and data sets [19, 41, 48, 100]. At present, we are unaware of a comprehensive review of the various estimation methods for implementing hierarchical species distribution models. Thus, an awareness of the variety of options available to implement hierarchical models is helpful so that the chosen method will best match the goals of the study.

Conclusion

Hierarchical models are widely used in ecology [14, 15]. In practice, the hierarchical modeling framework has been particularly useful when implementing process models that incorporate ecological theory or when modeling the data collection process (e.g., [54, 55, 101]). The hierarchical modeling framework has also been useful and widely implemented for count and presence-absence data [72, 86], but has not been readily used for presence-only data (but see [23, 31]). Although this review is limited to hierarchical models for a single species, the hierarchical species distribution modeling approach can be readily extended to include multiple, and possibly interacting, species [56, 79, 97]. Given the success for analyzing other types of data, we believe the hierarchical modeling framework has great promise for presence-only data.

Although many data models exist for count and presence-absence data, there are relatively few explicit data models that have been developed for presence-only data. In many cases, the data collection process is ignored, which might be due to the complex and often unknown observation process associated with opportunistic presence-only data. Opportunistic data collation may result in very complicated errors that are difficult to model without auxiliary data or strong assumptions [22, 23, 31, 45, 46, 49, 94, 98]. The key to obtaining reliable inference from opportunistic presence-only data is determining which additional data sources are needed and available to responsibly model the data collection process. Recently, the spatio-temporal Poisson point process model has been used for the analysis of animal movement data collected using telemetry devices [10, 60, 89]. We anticipate that data models developed for telemetry data will have an analogous use for species distribution models (e.g., [10]). We also anticipate that the methodology developed to account for repeated measurements (i.e., locations) of the same individual(s) developed for telemetry data, will have analogous use for species distribution models that are use to model count, presence-absence, and presence-only data that includes multiple observations of the same individual(s).