Accounting for imperfect detection of groups and individuals when estimating abundance

Abstract If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double‐observer models, distance sampling models and combined double‐observer and distance sampling models (known as mark‐recapture‐distance‐sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under‐counted, but not over‐counted. The estimator combines an MRDS model with an N‐mixture model to account for imperfect detection of individuals. The new MRDS‐Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS‐Nmix model to an MRDS model. Abundance estimates generated by the MRDS‐Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re‐allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.

tion difficulties might bias abundance estimates, but there were no estimators available to address this concern. Therefore, we resolved to try to develop an abundance estimator that could account for imperfect detection of both groups and individuals.
Here, we propose a method to account for imperfect detection of individuals within groups by incorporating N-mixture models (Royle, 2004) into MRDS models. N-mixture models are hierarchical models that rely on repeated counts of individuals to estimate detection probability and abundance. Under our approach, observers make independent counts of each observed group, in addition to the detection history and distance-to-transect data collected for an MRDS model.
By combining the three distinct sampling ideas into a unified hierarchical model, we can obtain unbiased abundance estimates even when group size is measured with error. We expect this data collection to be achievable because the only difference from current MRDS sampling design is that each observer should independently count and record group size, instead of conferring and recording a single group-size count (e.g., Laake et al., 2008). We posit that such data are routinely collected but often summarized after data collection to discard auxiliary information on group size (Conroy et al., 2008). However, because the N-mixture component introduces an assumption that individuals are not double-counted, survey protocols may need to be refined to meet this assumption.
We present a new estimator for abundance of grouped animals when detection of groups is imperfect and group size may be undercounted. We introduce the sampling situation and the formal model, and then demonstrate its application to simulated data. We also demonstrate the bias associated with a reduced estimator that assumes group size is recorded without error.

| SAMPLING DESIGN AND MODEL DESCRIPTION
Our goal is to estimate animal abundance when groups of animals are imperfectly detected and when group size may be under-counted.
Our approach is an extension of available MRDS models. Accordingly, F I G U R E 1 A view of elk (Cervus elaphus) during an aerial survey. Three elk are readily visible, while one blends into the background and one is partially obscured by a tree branch the proposed sampling design is a slight modification of the MRDS sampling design (Burt, Borchers, Jenkins, & Marques, 2014). As with MRDS sampling, we propose that multiple surveyors conduct distance sampling, either traversing a line transect or conducting point counts.
The surveyors should observe groups of animals and record which groups were seen by which surveyors, and the perpendicular distance from the groups to the transect line. The only survey modification that we propose is that each observer should record an independent count of group size, rather than a unified count for all observers.
The observed counts of animals are likely to underestimate true abundance because some groups may not be observed and/or some individuals within groups may not be observed (Graham & Bell, 1989).
Our task is to account for this imperfect observation of animals. We chose to address this problem with Bayesian methods both for convenience and to facilitate hierarchical modeling that can account for spatial and temporal variation in abundance (Chelgren, Adams, Bailey, & Bury, 2011;Moore & Barlow, 2011).
Given certain assumptions, we can use distance data to account for imperfect detection of groups (Buckland et al., 2001;Burnham, Anderson, & Laake, 1980). Under distance sampling, we expect a negative relationship between distance from the transect and detection probability of groups. If we specify a detection function, g(d; θ), we can estimate parameter values, θ, from distance data for n groups, d 1 ,d 2 , … ,d n . Typically, estimating detection parameters relies on the assumption that g(d = 0) = 1 (although double-observer data allow us to relax this assumption, see below). We developed the model likelihood using a data augmentation approach (Royle, Dorazio, & Link, 2007; see Kéry & Royle, 2016, ch. 8).
Under this approach, we augment the observed data set with a large number of unobserved groups that are missing distance and groupsize data. We then estimate a data augmentation parameter, Ω, which is the probability that the unobserved groups belong to the sampled population. By fixing the size of the data set, data augmentation simplifies Bayesian analysis of the model by Markov chain Monte Carlo (MCMC), especially when individual or group-level covariates, such as distance and group size, are used.
The process model in distance sampling rests on the assumption that groups are uniformly distributed in space (or appropriately modeled with covariates). Thus, the process model can be stated by specifying that the distance between groups and the transect is uniformly distributed out to a distance w and that membership in the surveyed population of groups in the augmented data set, z i , is Bernoulli distributed with probability Ω: where z i = 1 if the group belongs to the population and 0 otherwise.
If we adopt a half-normal model for the detection function (uniform, hazard, or other functions are also possible), then the observation model can be specified by stating that the detection of groups is Bernoulli distributed, with probability p i : where σ is the shape parameter for the half-normal distribution and y i = 1 if the group was detected and 0 otherwise. Under this model, Σz i gives the number of groups in the surveyed area.
In a typical application, total abundance would be estimated by multiplying the estimated number of groups by the average observed group size (Buckland et al., 2001;Forsyth & Hickling, 1997).
Estimation of both group detection and group size can be refined by including group size as a covariate on detection (Alpizar-Jara & Pollock, 1996). The process model would then include a distribution for group sizes, with the Poisson distribution being a natural choice (Kéry & Royle, 2016): where n i are the observed group sizes and λ is a parameter indicating mean group size. The observation model would then add a group-size covariate to the shape parameter, so that: If observed group size equals true group size, then Σz i n i is an estimate of total abundance.
We can relax the assumption that detection on the transect is perfect, g(0) = 1, by incorporating double-observer data into a MRDS model (Borchers et al., 1998;Conn, Laake, & Johnson, 2012;Laake & Borchers, 2004). This extension requires two surveyors to record which groups were observed by each surveyor, generating detection histories for each observed group. In this case, we revise the probability of detection to include the term p 0 , indicating the probability of detection at d = 0: With a single observer, p 0 and Ω are confounded, but with observations repeated across i groups and j observers, y i,j , where both parameters can be estimated. While double-observer data allow more flexibility in the group observation model, abundance estimates still assume that group size is accurately recorded.
To account for uncertainty in group size, we integrate an N-mixture model with the MRDS setup described above. The N-mixture model assumes each individual within a group is detected independently (conditional on detection of the group), with probability ≤1. This assumption may be violated if surveyors use mental addition, extrapolation, or "guess-timation" to count large groups. Using the N-mixture model, group sizes, n i , become latent variables, instead of observed data. The observed counts from each observer, c i, j , are then part of the observation process, with counts the result of a binomial process with order n i and probability r. Furthermore, because counts are only recorded for detected groups, both the counts, and the estimated group sizes, must be ≥ 1. Accordingly, we model both the abundance process and the observation process with zero-truncated distributions, which we abbreviate ZT. For this model, the totality of the abundance process model includes three distributions: where d i are distance observations, and the data augmentation variable, z i , and group size, n i , are latent variables.
If detection probability of individuals within groups (though not for groups) is constant across distances, the observation process includes two distributions describing the detection of groups and individuals: where y i,j is 1 if observer j detects group i, and 0 otherwise, and c i,j gives the observed count if group i is detected by observer j and is unobserved otherwise. Alternatively, if we expect detection of individuals within a group to decline with distance, we could revise the count model so that: where r 0 is the probability of detecting an individual on the transect and τ is the shape parameter for the half-normal detection model.
Thus the model may accommodate distance-based detection in both the observation of the group and of individuals within the group.
To ensure that the above model is identifiable and to compare performance to an MRDS model that assumes group sizes are correctly recorded, we conducted a simulation study. We simulated data under the proposed MRDS-Nmix model using a variety of parameter settings. In each simulation, there were a total of 200 groups available for detection and survey strip-width was 100 m. We varied the group size parameter (λ), the effect of distance on detecting a group of size 1 (β 0 ), the effect of group size on detection (β 1 ), the effect of distance on detecting individuals (τ), the probability of detecting a group at distance 0 (p 0 ), and the probability of detecting an individual at distance 0 (r 0 ), for a total of 12 simulation scenarios (Table 1). The particular scenarios considered were intended to yield moderate mean detection probabilities (Table 1) because high detection probabilities generate little bias while low detection probabilities require extensive surveys. Furthermore, parameter values were selected so that detection probabilities were high (>0.65) for groups of four animals and individuals within detected groups at 10 m, and low (<0.65) for groups of four animals and individuals within detected groups at 100 m so that the survey strip was not too limited or expansive. Note that our simulations did not include additional unmodeled detection heterogeneity, although this can be an issue in field surveys (Laake et al., 2008). As such, the simulated data exhibit "full independence" (Laake & Borchers, 2004).
For each parameter set, and for each simulated data set, we estimated abundance using two models: one that included an observation model for individual detection and one that did not. We then repeated the simulation and estimation process 200 times for each scenario. For our proposed model and the reduced model, we used the simulations to calculate bias, coverage, and root mean square error for the estimates of total abundance. We simulated data in Program R (v 3.1.1, R Core Team, 2016). We evaluated model likelihoods in JAGS (v 4.2, Plummer, 2003; note that v 4.2 or later is required for truncation) using the jagsUI (v 1.3.7, Kellner, 2016) interface in Program R.

| RESULTS
In the 12 scenarios considered, abundance estimates under the MRDS-Nmix model had lower bias and root mean square error and coverage closer to the nominal rate of 95% than estimates under the standard MRDS model (Table 2). Unsurprisingly, the differences between the models were larger for lower individual detection probabilities. Because individual detection must be high for small groups (due to the zero-truncation), bias was low for both models for small groups.
When group detection probabilities were low, estimates of individual detection, and therefore abundance, were less precise under the MRDS-Nmix model because there were fewer groups with multiple counts. Overall, parameters were identifiable and bias was reduced by accounting for imperfect detection of individuals.

| DISCUSSION
Many species of conservation interest reside in groups, including ungulates, cetaceans, galliformes, primates, and others. Surveys for these species often attempt to account for imperfect detection of groups using distance-sampling or double-observer methods, but they almost universally assume that detection of individuals within groups is perfect (Conroy et al., 2008;Forsyth & Hickling, 1997;Griffin et al., 2013;Laake et al., 2008). This assumption is primarily due to a lack log σ i = β 0 +β 1 log n i Parameters include λ: mean size of groups, β 0 : the effect of distance on detecting a group of size 1, β 1 : the effect of group size on detection, τ: the effect of distance on detecting individuals, p 0 : the probability of detecting a group at distance 0, and r 0 : the probability of detecting an individual at distance 0. Resulting mean probability of group detection (p) and mean probability of individual detection, given group detection (r) are also presented.
of suitable models that can account for this second level of imperfect detection, rather than an expectation that it is correct (Conroy et al., 2008;Laake et al., 2008). The available evidence suggests that detection of individuals within known groups is commonly <1. For example, several studies have compared counts of individuals within groups during surveys against more labor-intensive counts, which are presumed to be more accurate, to estimate detection rates for individuals. In Great Blue Heron (Ardea herodias) colonies that averaged 24 nests/colony, aerial surveys detected 72% of nests detected during ground surveys (Dodd & Murphy, 1995). Similarly, a review of aerial photographs of pastured domestic cattle (Bos taurus) and horses (Equus caballus) in herds of <50 animals detected 83% of the animals detected during ground surveys (Terletzky & Ramsey, 2016). Under more experimental conditions, the proportion of duck decoys counted during aerial surveys ranged from 10% to 80% depending on the habitat type and density of decoys (Smith, Reinecke, Conroy, Brown, & Nassar, 1995). In another experiment, observers detected an average of 71% of White Ibis (Eudocimus albus) models in a simulated environment (Frederick, Hylton, Heath, & Ruane, 2003). Furthermore, some studies that relied on the assumption of perfect detection within groups acknowledged some measurement error occurred (Conroy et al., 2008;Walter & Hone, 2003). While detection of individuals may be higher under more favorable conditions, there is clearly a potential for bias in abundance estimates that rely on an assumption of accurate counts within detected groups.
In our simulations, detection of individuals in detected groups depended on the distance to the transect, but averaged between 75% and 90% in the 12 scenarios ( We note that our unbiased estimates were facilitated using "clean," simulated data. In practice, field data may violate model assumptions. In particular, group sizes may be over-dispersed instead of Poissondistributed, as assumed by a standard N-mixture model (Royle, 2004).
Previously, various strategies have been developed for N-mixture modeling of over-dispersed data, including adopting negative-binomial, beta-binomial, or log-normal distributions (Joseph, Elkin, Martin, & Possingham, 2009;Martin et al., 2011;Royle, 2004). We anticipate that these modeling approaches could also be adapted to the MRDS-Nmix model to handle over-dispersed group sizes. We also note that N-mixture models assume that group-size counts are independently obtained (Royle, 2004 T A B L E 2 Bias, coverage, and root mean square error (RMSE) for total abundance estimates under a mark-recapture-distance sampling (MRDS) model and an MRDS-Nmix model. See Table 1 for description of scenarios