Estimating uncertainty in density surface models

Providing uncertainty estimates for predictions derived from species distribution models is essential for management but there is little guidance on potential sources of uncertainty in predictions and how best to combine these. Here we show where uncertainty can arise in density surface models (a multi-stage spatial modelling approach for distance sampling data), focussing on cetacean density modelling. We propose an extensible, modular, hybrid analytical-simulation approach to encapsulate these sources. We provide example analyses of fin whales Balaenoptera physalus in the California Current Ecosystem.


Introduction
When making predictions of animal abundance from density surface models, we are interested in two situations: summaries for each prediction cell to create maps of uncertainty and summaries of total abundance (sums of the cells) over subsets of the prediction grid to obtain time series of abundance. This note investigates what different averages mean for predictions from spatial models based on generalized additive models (such as density surface models) when predictions are made at multiple time points and summaries of those predictions are required. Both simulation and analytical approaches are covered.

Setup
We begin by thinking about a spatial model based on a generalized additive model with an offset based on effort and detectability as in equation (1) in the main paper. In this case we have some models of the form: where n j is the number of individuals (or groups) observed in segment j (of which there are J), which has some environmental covariates x .j which are modelled by smooth functions f k and some intercept β 0 . Each segment is of size A j and has associated detection probabilityp j (as in the main article, we can extend this to include other components but stick with just detection probability for simplicity of presentation here). We make predictions at each grid cell by plugging-in values of the covariates to the following equation: where the predictionsn r are indexed by r, of size A r (there are R total cells). Hats indicate estimated quantities. Noting that we represent the f k as basis expansions, we can re-write the expression inside the bracket as a matrixvector product:n whereX r is a single row (corresponding to prediction cell r) of some prediction matrix,X (Wood, 2017, Section 6.10). As well as being interested inn r , we are also interested in aggregates likeN = R r=1n r i.e., the abundance in a given area. Here we are interested in how to calculate uncertainties forn r andN when we may have multiple estimates because of temporal variation in the x kr .

Prediction targets
When all covariates in our model are static in time (e.g., bathymetry, altitude etc) things are straightforward, but it is often the case that environmental covariates are dynamic and therefore implicitly temporal (e.g., sea surface temperature, measures of prey distribution, etc). In which case, we make predictions for given time slice, e.g., "Friday 7 May 2021" or "week 16 of 2010", "an average day in July over the last 10 years" (the latter being an example of a "climatology"). We can make separate predictions for each of these time slices but we may also want to average them (for a given period ), e.g., "average day in June in 2014 from predictions of each day in June in 2014". Note that below we generally assume that the slices are representative of the periods that contain them (that they are not just extrema).
Next we enumerate the options that we are interested in. Each is informally titled, followed by mathematical definition of an appropriate abundance estimator and a description of the corresponding variance. We describe how to calculate these in the next section. Note that in all of these cases we are making assumptions about the "representativeness" of the covariates we use to illustrate the conditions in our prediction area during the time of interest.
1. Maps of abundance at a given slice. Abundance for each prediction cell in a given time slice: {n r ; r = 1, . . . , R}.
Var(n r ) gives uncertainty from model parameters for cell r given the environmental covariates. This is the "simple" case where we only have one prediction grid at a single slice and want to know about the resulting map and its uncertainty.
2. Total abundance at a given slice. Sum of 1., over all cells in this time slice:N = rn r . Var(N ) gives uncertainty from the model parameters given covariates but including covariances between the cells. This summary of 1. is another simple case where we only have one time slice.
3. Overall average map of abundance in a given period. Average cell values in this period, made up of slices t = 1, . . . , T : n r = 1 T tn rt ; r = 1, . . . , R . Var(n r ) takes into account the variance both within and between the slices. Here we want to summarize over multiple environmental conditions (covariate grids) to get an idea of an average abundance map over some time period (e.g., "any day in 2014" or "any day in June in the last 10 years").
4. Overall average total abundance in a given period. Average of T estimates of 2., or the sum ofn rt over r, then averaged over T :N = 1 T t rn rt . This summary of 3. gives us a measure of the abundance on a given random slice in a given period.

5.
Average map of abundance in a given period from summaries. Summaries are required at some grouping of t (say, G groups {τ g ; g = 1, . . . , G} where τ g is the set of time indices for group g and g |τ g | = T if we denote the number of elements in group g as |τ g |), before being averaged at the higher level: n G r = 1 G g 1 |τg| t∈τgn rt ; r = 1, . . . , R . This allows us to make predictions at the daily level, summarize them at the month then calculate what an average month looks like over multiple years (rather than 3., which gives us a random day in that month). Var(n G r ) also takes into account the variance both within and between periods rather than within and between slices.
6. Average total abundance in a given period from summaries. Summaries are required before calculating the total abundance. Again grouping at some lower level before averaging at the higher level but in this case for total abundance: rn rt . Var(N G ) gives our uncertainty in a total abundance estimate, taking into account variance within and between the total abundances per time period rather than per slice.
Obtaining point estimates of predictions for cases 1-6 is "easy", but what the variance estimates are less clear. We next look into general tactics for estimating variance in these cases, then come back to 1-6 and address them each in detail.

Tools for uncertainty estimation
We have two approaches to deal with uncertainty estimation from these kinds of models: simulation and analytic. Here we give a brief overview of these two methods in general before applying them to the above situations in the next section. Note that we assume here that uncertainty inp j in (1) has be propagated via e.g, Bravington et al. (2021).

Simulation
The simulation approach is based on the idea that theβ estimated in our model has a (posterior) distribution that we can sample from. If we think of models fitted by REML as empirical Bayes estimates then we might call this posterior sampling, else we can call it a parametric bootstrap. Either way, we sample from the distribution ofβ, plug those samples into (2) and obtain multiple realisations of our predictions, based on the uncertainty of our model.
The posterior for β given smoothing parameters λ is approximately distributed as β|λ ∼ N (β, Vβ), where as in the main article smoothing parameter uncertainty can also be included (but again we keep things simple here for presentational purposes). The following algorithm then can be used: (a) Simulate model coefficients from the fitted model, to obtainβ b (this can be done directly from N (β, Vβ), using importance sampling or Metropolis-Hastings sampling).
(b) Calculate per-cell predictions for this realisation by pluggingβ b into 2.
(c) Calculate an appropriate summary statistic of the per-cell predictions (e.g., sum for total abundance).
2. Calculate the empirical variance or percentile intervals of the results from 1(c) to obtain requisite quantities.
The overarching idea here is that we are able to create "possible" predictions conditional on the model being correct.
Our concern below will be how to calculate appropriate summaries at 1(c) and their variances at 2.

Analytic
The analytic approach is based on standard linear modelling theory. We can obtain Vβ, the covariance matrix for the model parameters (Wood, 2017, Sections 5.8 & 6.9.3). Using the delta method to move from uncertainty about β to uncertainty aboutn r orN . For example, if we wanted to calculate the variance of a total abundance estimatê N , we use: where A is a row vector of the A r s. Derivatives are evaluated at the estimated values of the model parameters.
The terms in brackets can be though of as re-projecting the uncertainty from theβs into the space of the predicted abundance estimate. Our secondary tool for the analytic case is an identity which we can use to find total variance given we have summaries. First let us imagine the case in which we have a random variable X that we measure at t = 1, . . . , T and denote x t (standing-in for a single cell or a total abundance estimate). If we have have a measurement at each t then we can simply compute the empirical variance of the x t s to get an estimate of the variance of X. However, the more complicated case is where we partition the x t s into G groups (for example, if t are days but we only have monthly averages, G = 12). In that case, we no longer can access the individual x t , justx g which are averages of the K (say) x t s in partition G (assuming each group is of the same size, K, we have that T = KG). We also have Var g (x) the variance of the x t s in partition g. We can then take the regular estimator of variance, Var(x) = 1 2 , and re-arrange as follows to get an expression in terms of Var g (x) and Var(x g ) (the empirical variance of thex g s).
So we start out with: The summation in the right-hand term in the above is G − 1 times the variance of thex g s, Var(x g ) (from the definition of the variance). We can apply the same logic to the left side too, and say that the inner summation is K − 1 times the per group variance of the x gk s, Var g (x). So, we can write: Translating this to the kind of temporal summaries we are interested in, Var g (x) represents uncertainty from the model (from theβs), and Var(x g ) is a measure of the between-time-slice uncertainty from the estimates. In the simple case of a random variable above, we know what K should be. This is less clear when we use (3) to obtain Var g (x). By analogy to the simple case, K is the "number of samples used to calculate each Var g (x)", that is the number of samples used to fit the model from which we derive our predictions. So K = J, where J is the number of segments used to fit the model (see above). Since increasing K will improve our estimate of variance in the above, and increasting the number of segments will generally improve our estimate of variance, the analogue holds.

Estimating uncertainty in useful cases
Going back to our list above, how can we use the above tools to obtain useful results? Conceptually, the simulation approach can be easier to think about, as we are able to consider posterior samples as "possible universes" in which particular resulting populations occur. We can also think about how/whether we want to make sampling exchangeable with time slices of environmental covariates. Analytic results are more involved, as we need to work out the form of matrix operations, rather than the averaging required for the simulation approach. Below we look at the simulation-based approach first in each case and then use that to aid the construction of the analytic estimator.

Maps of abundance at a given slice
This is the simplest case, as described in Miller et al. (2013). We have one prediction grid and want to calculate the resulting map and its uncertainty. We use 2 to getn r and calculate Var(n r ) in one of the following two ways: 1. Simulation: Storing then r for each posterior sample b = 1, . . . , B then calculating the variance per cell over the B samples.

Total abundance at a given slice
Now for a single time slice, sum cells in the map to obtainN = rn r . Again, this is a case covered in Miller et al. (2013).
1. Simulation: At step 1(c), we store theN for each posterior sample b = 1, . . . , B then calculating the variance per over the B estimates ofN . (3).

Overall average map of abundance in a given period
Now we have the case where we obtain multiple estimates per prediction cell, based on different covariate values (slices) and want to get an average in a period (collection of T slices indexed by t). As noted above, we need to make an assumption about how representative both each slice is for that time and about how representative of the whole period a collection of slices is. For prediction cell r,n r = 1 T tn rt . 1. Simulation: The simulation lets us think about this clearly: we want to allow for exchangeability between time and samples. The full set of T B possible values for a given cell (based on the values in the T B generated surfaces) can be considered different possible values taking into account both environmental variability and model uncertainty. So, we storen rtb (the estimate for cell r at time t from posterior sample b) at step 1(c). We can then use the empirical variance to obtain : 2. Analytic: We want to find an equivalent analytic expression to (6): that is the variance around the averagē n r . We can obtain the variance per cell, per slice Var t (n rt ), using (5) (which only includes model uncertainty fromβ). The empirical variance of per-slice estimates is given by Var (n t ) = 1 T −1 t (n rt −n r ) 2 (which does not include any model uncertainty fromβ). Ideally we would like to have the T estimates be for each slice in the period we are interested in, at a representative temporal resolution (e.g., for a average over a month, we might ideally like to have daily slices but in practice only have 8-day averages). We can use (4) to write: Overall average total abundance in a given period We can take the T maps implied by the predictions in the previous subsection and sum each to obtain estimates of abundance per slice, then average these for an overall average. The point estimate of abundance can therefore be calculated in terms of the estimates of abundance per time period (N t ) or from the perspective of the cells, to obtain:N = 1 T tN t = 1 T t rn rt .
1. Simulation: Via the same logic as in the previous situation, we now storeN tb at step 1(c), and summarize using Var(N ) = 1 2. Analytic: We can use the same approach as in (7), but this time we make summaries first, summing the per-cell observations (applying (3) to each of the T time slices to get Var(N t ) and calculating the empirical variance over the estimatesN t to obtainVar t N t ). We then get the following formula: Here K has the same meaning as in (7).
Average map of abundance in a given period from summaries Our final two prediction targets require summarization before making a final estimate. We take our per-cell per-time slice estimates,n rt , within some grouping and average those first, then take the overall and then take a final mean over the G groups we created:n G r = 1 G g 1 |τg| t∈τgn rt . As a concrete example, if we have daily estimates and want a seasonal summary map but assuming an average day in each month, we would have t indexing the T days, G months index by g each of length τ g days. So Var(n G r ) gives the variance of an average day in month g (for cell r).
1. Simulation: At step 1(c), we calculate within each grouping for each posterior sample per cell, so we calculatē n g rb = 1 |τg| t∈τgn rtb (note the b subscript remains so model uncertainty is retained). Then when we come to summarize we calculate the empirical variance over the groups and posterior samplesn g rb , Var(n G r ) = 1 GB−1 g,b n g rb −n G r 2 .
2. Analytic: As above, we have variance per cell, per slice Var(n rt ). We can obtain our estimator by using (4) twice: first at the "top level" to obtain an estimate over the G groups, then within each of the G groups.
Starting with the former, we write: where Var (n g r ) gives the empirical between-group variance (Var(n g r ) = 1 |τg| t∈τg (n g r −n g r ) 2 ). We apply (4) again to obtain a usable expression for Var g (n g r ), we can then write: where Var(n t ) t∈τg is the empirical variance of those estimates in group g and Var t (n rt ) is calculated from the delta method (3) as above. So substituting (10) into (9) we obtain: Average total abundance in a given period from summaries Finally we have the situation where we have the same grouping setup as the last situation, but now we want total abundance estimates, rather than maps. Our point estimate is nowN G = 1 G g rn rt . Var(N G ) gives our uncertainty in a total abundance estimate, taking into account variance within and between the total abundances per time period rather than per slice.
1. Simulation: At step 1(c), we calculate within each grouping for each posterior sample per cell, so we calculateN G b = 1 G g 1 τg t∈τg rN tb (note the b subscript remains so model uncertainty is retained). Then when we come to summarize we calculate the empirical variance over the groups and posterior samplesN G b , Var(N G ) = 1 2. Analytic: We can adapt (11) to the case ofN G : where quantities are analogous to those above.

Discussion
Here we have developed estimators for some common situations when attempting to calculate abundance estimates for populations when model include temporally-varying covariates. We tend to prefer the simulation-based approaches on a conceptual level, we also note that in our experience these estimators are often easier to understand for a non/less mathematically-inclined audience. This approach is not without its problems though, ensuring that sampling ofβ is correct can be tricky, and we have found that the multivariate normal assumption does not hold in many situations, prompting the use of the more expensive Metropolis-Hastings algorithm. This problem of poor sampling is often revealed by the presence of extreme abundance predictions in the set of simulations, which are more likely to occur when fitted smooth terms, as plotted on the log scale, exhibit strongly widening intervals above zero, as for mixed layer depth in Figure S3, especially when predictions are made beyond the sampled ranges of such covariates. Another reason to prefer the simulation-based approach is that percentile intervals can be calculated easily, this is more tricky in the analytic case as the usual assumption of abundance being log-normally distributed does not necessarily hold.