Zero‐inflated count distributions for capture–mark–reencounter data

Abstract The estimation of demographic parameters is a key component of evolutionary demography and conservation biology. Capture–mark–recapture methods have served as a fundamental tool for estimating demographic parameters. The accurate estimation of demographic parameters in capture–mark–recapture studies depends on accurate modeling of the observation process. Classic capture–mark–recapture models typically model the observation process as a Bernoulli or categorical trial with detection probability conditional on a marked individual's availability for detection (e.g., alive, or alive and present in a study area). Alternatives to this approach are underused, but may have great utility in capture–recapture studies. In this paper, we explore a simple concept: in the same way that counts contain more information about abundance than simple detection/non‐detection data, the number of encounters of individuals during observation occasions contains more information about the observation process than detection/non‐detection data for individuals during the same occasion. Rather than using Bernoulli or categorical distributions to estimate detection probability, we demonstrate the application of zero‐inflated Poisson and gamma‐Poisson distributions. The use of count distributions allows for inference on availability for encounter, as well as a wide variety of parameterizations for heterogeneity in the observation process. We demonstrate that this approach can accurately recover demographic and observation parameters in the presence of individual heterogeneity in detection probability and discuss some potential future extensions of this method.


Abstract
The estimation of demographic parameters is a key component of evolutionary demography and conservation biology. Capture-mark-recapture methods have served as a fundamental tool for estimating demographic parameters. The accurate estimation of demographic parameters in capture-mark-recapture studies depends on accurate modeling of the observation process. Classic capture-mark-recapture models typically model the observation process as a Bernoulli or categorical trial with detection probability conditional on a marked individual's availability for detection (e.g., alive, or alive and present in a study area). Alternatives to this approach are underused, but may have great utility in capture-recapture studies. In this paper, we explore a simple concept: in the same way that counts contain more information about abundance than simple detection/non-detection data, the number of encounters of individuals during observation occasions contains more information about the observation process than detection/non-detection data for individuals during the same occasion. Rather than using Bernoulli or categorical distributions to estimate detection probability, we demonstrate the application of zero-inflated Poisson and gamma-Poisson distributions. The use of count distributions allows for inference on availability for encounter, as well as a wide variety of parameterizations for heterogeneity in the observation process. We demonstrate that this approach can accurately recover demographic and observation parameters in the presence of individual heterogeneity in detection probability and discuss some potential future extensions of this method.

K E Y W O R D S
Bayesian, capture-mark-recapture, gamma-Poisson, individual heterogeneity, mark-resight, robust design, temporary emigration, zero-inflation

T A X O N O M Y C L A S S I F I C A T I O N
Applied ecology; Demography; Life history ecology; Population ecology
Parameter estimates from CMR models are often used as vital components of population models (Caswell, 2000;Schaub & Kéry, 2021) and to develop a more complete understanding of individual fitness (Cam et al., 2002;Stearns, 1992). CMR models typically consist of two primary components: (1) a model of latent biological processes (i.e., survival, movement among populations, emigration, disease dynamics), and (2) a model of the observation of uniquely identifiable individuals. Models of both latent biological and observation processes typically take the form of categorical or Bernoulli distributions, and individuals are grouped into discrete groups or states (e.g., alive or dead, observed or not observed).
Heterogeneity among "uniquely identifiable" (hereafter, marked) organisms in both biological processes (e.g., Cam et al., 2002;Pledger & Schwarz, 2002) and observation probability (e.g., Pledger, 2005;Pollock, 1982) has long been recognized as a central challenge in CMR modeling (Otis et al., 1978). In a seminal paper, Pollock (1982) proposed that heterogeneity in detection might be accounted for by subdividing primary occasions into multiple secondary occasions.
Similarly, Fletcher (1994) developed a method for modeling the probability of encounter of individuals as a function of the number of unique resights of that individual during the previous occasion.
Shortly thereafter, Kendall and others (Kendall et al., 1995(Kendall et al., , 1997 expanded the method developed by Pollock (1982) to estimate availability for encounter (i.e., zero-inflation) by partitioning primary occasions into shorter secondary occasions, assuming closure among secondary occasions within a primary occasion, and estimating probabilities of temporary emigration from the study area. Since that time, methods have been developed to estimate individual detection probabilities using random effects (Clark et al., 2005;Royle & Dorazio, 2008) or mixtures (Pledger, 2000;Pledger et al., 2003).
More recent efforts have simultaneously used information about marked organism location and the locations of sampling efforts to model spatial variation in reencounter probability (Royle et al., 2013;Royle & Young, 2008). However, the estimation of heterogeneity in the observation process remains a key challenge in CMR studies, and the continued development of alternative approaches is critical for improved parameter estimation.
Heterogeneity in the detection of marked organisms is often driven by two primary processes. The first is whether or not an individual is even present within the bounds of the study area (i.e., temporary emigration as a source of zero-inflation; Kendall et al., 1995;Schaub et al., 2004). The second is variation among the latent encounter probabilities of individuals that are present. This latent heterogeneity can be affected by factors such as variation in individual behavior, life stage, and location relative to sampling effort (Royle & Young, 2008). When primary occasions extend over multiple days, weeks, or months, this can lead to some individuals being encountered many times while others are rarely, if ever, detected.
The key concept in this paper is that in the same way that counts contain more information about the abundance of a population than simple detection/non-detection data, the number of encounters of marked individuals may contain more information about the observation process than detection/non-detection data (e.g., McClintock et al., , 2019. Thus, rather than summarizing capture-reencounter data using ones (encountered) and zeroes (not  (1) demonstrate the use of this approach with simulated data, (2) describe potential benefits relative to more traditional approaches, (3) demonstrate several approaches for modeling individual heterogeneity in encounter probability, and (4) discuss possible future extensions and uses of this parameterization.

| ME THODS
We simulated 250 CMR datasets, each with 10 primary occasions (T = 10). For each simulation, we released 25 marked individuals in the first through ninth primary occasions, for a total of 225 released individuals (I = 225). We simulated the latent state of each individual (z i,t ; 1: alive, 0: dead) from occasion to occasion as, z i,t ∼ Bernoulli z i,t−1 , given a survival probability generated from a beta distribution, ∼ beta(40, 10). If an individual was alive in occasion t, we simulated its availability for encounter (a i,t ; 1: available, 0: unavailable) given simulated Markovian (Kendall et al., 1997) probabilities of availability for encounter ( ), These probabilities are directly analogous to parameters described by Kendall et al. (2013), such that 2 in this study is equal to the probability of availability given availability in t − 1, or a ′′ as defined by Kendall et al. (2013), and 1 in this study is equal to the (1) probability of availability given absence in t − 1, or a ′ as defined by Kendall et al. (2013). During each primary occasion, we sampled individuals that were available for detection for 21 consecutive days (J = 21, that is, 3 weeks) given simulated individual random variation in daily detection probability (d i ; Dorazio et al., 2013;Gomez et al., 2018). Thus, the simulated capture-recapture data form a where is the simulated mean daily detection probability of an average individual, and is the amount of among-individual heterogeneity in detection. We then summarized the daily CMR data for analysis with four different model types: (1) a Cormack-Jolly-Seber model where the secondary captures are ignored (CJS; Cormack, 1964;Jolly, 1965;Seber, 1965), (2) a robust design model (RD; Kendall et al., 1995Kendall et al., , 1997, To summarize the robust design encounter data (R) for the robust design capture-reencounter model, we subdivided each 21-day long primary occasion into three one-week long secondary occasions (K = 3). If an individual was observed on any day of a week in a secondary occasion, then that secondary occasion (r i,t,k ) equaled one. If an individual was not observed on any day during a specific secondary occasion, then r i,t,k = 0. Finally, we summarized the counts of reencounters by individual and primary occasion by simply summing the total number of encounters of each individual during each primary occasion, c i,t = ∑ 21 j=1 y i,t,j . In the same way that the data were generated, all four capturerecapture models share a common likelihood for the survival process.
The latent state of each individual during each occasion (z i,t ) was modeled as a function of the individual's latent state in the previous occasion (z i,t−1 ) and a survival probability ( ), z i,t ∼ Bernoulli z i,t−1 × . A vague prior was used for survival, ∼ beta(1, 1). For the CJS model, we then simply modeled the primary occasions encounter data (M) as a function of the individual's latent state and a detection probability (p), m i,t ∼ Bernoulli z i,t × p . We specified a vague prior for detection probability p ∼ Beta(1, 1). For the remaining three models, we also estimated whether an individual was available for detection (a i,t ) given its previous state (a i,t−1 ) and vague priors for Markovian probabilities of availability for encounter ( ; Kendall et al., 1997).
For the robust design model, we modeled whether or not each individual was detected during each secondary occasion as a function of its latent availability for detection during the primary occasion (a i,t ) and a secondary occasion detection probability (p). We then derived primary occasion detection probability (p*) from the second-  (Greene, 2008); however, here we assume heterogeneity among individuals, not observations. Fitting these models in a Bayesian framework allows users to easily customize existing described count distributions for use in these model types. We called JAGS (Plummer, 2003) from R (R Core Team, 2018) using the jagsUI package (Kellner, 2016). For each simulated dataset, we sampled three MCMC chains of 50,000 iterations with an adaptive phase of 1000 iterations. We discarded the first 10,000 iterations and retained every tenth saved iteration. We assessed convergence visually, and chains converged acceptably.
We calculated mean signed difference (MSD) as the mean of the differences between the median of the posterior distribution and the true parameter value used to simulate the data, and we calculated ∼ beta(10, 90), ∼ gamma(5, 50), coverage as the proportion of simulations in which the 95% symmetric credible intervals included the true parameter value used to simulate the data.

| RE SULTS
Estimates of survival ( ) were low relative to truth for CJS models  Figure 4). The overdispersion parameter ( ) in the ZIGP model accounted for some of this overdispersion (Figure 4), improving coverage and constancy for ZIGP models relative to other model types. ZIP and ZIGP models were computationally less expensive than RD models (Figure 4) to sample the same number of iterations.

| DISCUSS ION
We demonstrate that CMR models parameterized with zero-inflated count distributions can function much like robust design CMR models. Estimates of survival probability from RD, ZIP, and ZIGP models were centered around truth, while estimates of survival from the CJS model were consistently low relative to truth. Further, the use of these model types may allow for improved estimation of heterogeneity in encounter probability among individuals and improve computational efficiency (Figure 4). We see substantial utility for these parameterizations in a variety of scenarios. For instance, non-breeding resights of individuals at wintering or stopover sites may provide an excellent system to model the total number of encounters rather than simple detection/non-detection data. Further, existing and emerging data types such as camera traps, PIT tags, and automated telemetry may provide large number of detections in discrete time blocks, providing excellent data for the models we describe in this paper.
As we demonstrate, this approach may be particularly useful when unobservable states exist, as counts of reencounters allow for the estimation of a zero-inflation parameter (i.e., availability for detection), which may be biologically analogous to breeding probability or presence at a stopover or wintering site. Count parameterizations might also be used to model secondary occasions within a robust design model; one or more secondary occasions may be estimated from some count distribution and others from a more typical Bernoulli distribution. The inherent flexibility of programs such as JAGS (Plummer, 2003), NIMBLE (de Valpine et al., 2017), and Stan (Carpenter et al., 2017), and ample literature on capturereencounter parameterizations should lead to a wide array of extensions of these model types, and their incorporation into joint likelihood models, such as integrated population models (Schaub & Kéry, 2021).
Critically, the use of these model types also has advantages for estimating heterogeneity in detection probability among individuals that are observable, as some individuals may be seen more often than others. Estimating heterogeneity in probabilities from a small number of Bernoulli trials can be challenging (Fay et al., 2022). Summarizing mark-reencounter data as counts of encounters may provide additional information for estimating latent heterogeneity among individuals or estimating mixtures (e.g., Pledger et al., 2003). For example, rather than the heterogeneity parameterization explored in  (Table 1) and that simulation work may reveal more effective parameterizations than those described herein.
For instance, recent research has demonstrated that a count-based observation likelihood can be useful for helping to address "falsepositives" in reencounter data (Rakhimberdiev et al., 2022). Thus, we suggest that continued extension of these methods may have broad utility moving forward for capture-reencounter modeling.
As with the use of any model, violations of model assumptions will lead to inaccurate parameter estimates. We caution against the use of these models when encounters are conditional on previous encounters within a season (i.e., trap happiness). As a particularly problematic example, if the nest of a marked animal is discovered and the animal is then observed repeatedly while visiting the nest, this would serve as an additional type of zero-inflation (i.e., nesting in the study area is a Bernoulli trial, the discovery of the nest is a Bernoulli trial, and the subsequent visits are a product of study design and nest monitoring protocols, not a random encounter process). We expect that other types of heterogeneity are common in CMR data. For example, the number of encounters might be right truncated if observers cease recording reencounters of individuals that have already been encountered multiple times. Thus, we strongly encourage careful thought about how previous monitoring protocols might affect the distribution of encounters of each individual when applying these models to data and discourage using this approach without explicit information about monitoring protocols.  (Table 1), underdispersion requires the use of more complex distributions such as the Conway-Maxwell-Poisson (Conway & Maxwell, 1962;Lynch et al., 2014).
We suggest that additional simulation work is required to fully understand the benefits and costs associated with using alternative F I G U R E 2 Scatter and density plots of the medians of posterior distributions for availability for encounter relative to truth ( ) from robust design (RD; left), zeroinflated Poisson (ZIP, center), and zeroinflated gamma-Poisson with individual heterogeneity (ZIGP; right), capturemark-reencounter models used to analyze 250 simulated capture-mark-reencounter datasets. Simulated mean daily detection probability (µ δ ) F I G U R E 4 Violin plots of model run times across 250 simulations for Cormack-Jolly-Seber (CJS;Cormack, 1964;Jolly, 1965;Seber, 1965), robust design (RD; Kendall et al., 1995Kendall et al., , 1997, zero-inflated Poisson (ZIP; this study) and zero-inflated gamma-Poisson (ZIGP; this study) capture-mark-recapture models (left), scatter plots of the index of dispersion (D; Var(C)/Mean(C)) for the capture-mark-reencounter count data relative to the simulated heterogeneity in detection probability among individuals ( ), and scatterplots of the mean of posterior distributions of the overdispersion parameter ( ) regressed against the index of dispersion for each capture-mark-recapture dataset. TA B L E 2 Mean difference between the medians of the posterior distributions and truth and parameter coverage (in parentheses) for estimates of apparent survival ( ), availability for encounter given a i,t−1 = 0 ( 1 ), availability for encounter given a i,t−1 = 1 ( 2 ), primary occasion detection probability (p [CJS] or p* [RD]), and the expected number of encounters per individual ( ) from 250 simulated capturemark-recapture datasets analyzed using Cormack-Jolly-Seber (CJS;Cormack, 1964;Jolly, 1965;Seber, 1965), robust design (RD; Kendall et al., 1997), zero-inflated Poisson (ZIP; this study), and zero-inflated Gamma-Poisson (ZIGP; this study) capture-recapture models. distributions. Notably, while the authors have not yet developed goodness-of-fit tests for these model types, the use of these parameterizations might simplify goodness-of-fit testing for capture-reencounter models due to the use of counts rather than Bernoulli trials.

Parameter
While we have demonstrated in this paper that count-based observation parameterizations can be useful for capture-markreencounter studies, much remains to be learned. For example, careful thought will be required for developing appropriate priors (e.g., Northrup & Gerber, 2018), and empirical research may reveal unforeseen problems. Future simulation work might assess the impacts of priors on inference, further examine the impacts of overand under-dispersion, and explore various other capture-recapture parameterizations and count distributions.

ACK N OWLED G M ENTS
We thank David N. Koons and Madeleine G. Lohman for helpful discussion of model parameterizations.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
All of the data used in this manuscript were simulated. The R script for simulating these data is attached as Appendix S1.