Age-dependent capture-recapture models and unequal time intervals

Age–dependent capture–recapture models and unequal time intervals. Estimates of survival probabilities in natural populations can be obtained through capture–mark–recapture (CMR) models. However, when capture sessions are unevenly spaced, age–dependent models can lead to erroneous estimates of survival when individuals change age class during the time interval between two capture occasions. We propose a solution to correct for the mismatch between time intervals and age class duration in two age class models. The solution can be implemented in different ways. The first consists of adding dummy occasions to the encounter histories and fixing the corresponding recapture probabilities at zero. The second makes use of the log–link function available in some CMR software (e.g. program MARK). We used simulated and real data to show that the proposed solution delivers unbiased estimates of age–dependent survival probabilities.


Introduction
Capture-mark-recapture methods (CMR) are widely used for diagnosis of natural populations because they can be applied to obtain robust estimates of demographic parameters accounting for imperfect detection of individuals (Lebreton et al., 1992;Williams et al., 2002;Sanz-Aguilar et al., 2016). Cormack-Jolly-Seber models for the estimates of survival probability in natural populations are based on the important assumption that animals share the same parameters regardless of their past or present history (Pradel et al., 2005). When animals are marked as young this assumption does not hold because newly marked individuals typically have a lower survival probability than already marked individuals (adults). This difference can be accommodated by including age-dependent parameters into the CMR model (Pollock, 1981;Lebreton et al., 1992). In a simple two-age-class model, one parameter, noted ϕ', would apply to the survival probability of young individuals and a second, noted ϕ', would apply to the survival of adults (see examples in Hiraldo et al., 1996;Prugnolle et al., 2003). Age-dependent parameterizations have also been considered when only adults are marked to correct for an excess of animals seen only at marking, i.e. transients (Pradel et al., 1997), for example, when tags are potentially harmful (Saraux et al., 2011) or to model an effect of breeding experience (Sanz-Aguilar et al., 2008. Age-dependent survival probabilities are parameters of interest in many ecological studies (e.g. Clobert et al., 1988;Loison et al., 1999;Tavecchia et al., 2001;Bonenfant et al., 2002;Perret et al., 2003;Catchpole et al., 2004;Sanz-Aguilar et al., 2015). However, while age-classes are equally spaced, intervals between capture-recapture occasions may not be equally spaced on the same scale, leading to erroneous estimates (see the problem in, for example, Covas et al., 2002;Zabala et al., 2011;Zuberogoitia et al., 2016). This is because individuals would change their age class within the interval between two sampling occasions rather than at the end as assumed by CMR models. We briefly introduce the problem and illustrate how it can be solved by taking advantage of the flexibility of CMR models.

The problem
Logistic, financial or weather-dependent constraints can interrupt monitoring or modify the temporal frequency of sampling occasions, leading to different time length between capture-mark-recapture sessions. Unequal time intervals, alone, do not present a major problem in CMR models (Bears et al., 2009;Cooch, 2009;Schmidt et al., 2007). Consider a study with k sampling occasions with intervals between the occasions j = 1, 2, …, k-1. The length of the intervals between the sampling occasions is l j . The l value is taken as the exponent of the survival parameters expressed in some common time unit for the interval j, j + 1, as ϕ lj . For example, the survival parameter over a unit interval (l = 1) would be ϕ 1 , for a two-unit interval (l = 2) it would be ϕ 2 , and so on. Values in the vector l are commonly integers, e.g., years or months, but can also be decimal numbers; for example, the survival parameter over an eighteen-month interval can be written in terms of yearly survival as ϕ 1.5 (l = 1.5). The freely available software for CMR analyses, such as MARK (White and Burnham, 1999), RMARK (Laake, 2013) or ESURGE (Choquet et al., 2009), allows users to specify the vector of l j values. However, unequal time intervals pose a problem in age-dependent models because, contrarily to intervals between occasions, the age classes always retain the same length. As a consequence, an individual may 'move' through age classes during an interval of length l j and the survival parameter can no longer be written as ϕ lj because the instantaneous survival probability changes with the age classes spanned by the interval of length l j . The mismatch between the duration of an age class and the time interval would, for instance, lead to an overestimation of the first-year survival probability if the sampling interval were greater than one year. The fundamental problem is that a given age-dependent parameter applies to only a part of the time interval. This can be solved by specifying the length of the interval for each parameter considered. We outline the solution and provide a step-by-step illustration of how this can be implemented in freely available software for CMR analysis (e.g. MARK, RMark or E-SURGE, see details in the Supplemental Information S1, S2 and S3). Note that the two implementations below are simply two practical approaches to solve the problem (see supplementary material S1, S2 and S3).

Methods
Implementation 1: adding dummy encounter occasions When intervals are of unequal lengths, a possible solution is to add dummy encounter occasions in the encounter histories to 'fill' the temporal gaps between occasions, which means, in practice, adding columns of 0s (e.g. Grosbois and Tavecchia, 2003;Sanz-Aguilar et al., 2010): the recapture probabilities corresponding to these dummy occasions should be fixed at 0. For example, let us consider a 7-year study with five capture-mark-recapture occasions in years 1, 2, 5, 6 and 7. The interval between the second and third occasions lasts three years instead of one. The l-vector of interval lengths would be 1, 3, 1, 1. The encounter history of animals released at the beginning of the study and always seen would be '1 1 1 1 1'. When columns of '0' are added to fill the temporal gaps, the encounter history above becomes '1100111' and all six elements of the l-vector are now equal to 1. The encounter probabilities at dummy occasions (3 and 4) should be fixed at 0 (figs. S2.1 and S2.3 supplementary material S2). The survival parameter of the first age class, ϕ', now always refers to an initial one-year interval. This approach can be implemented in programs MARK and E-SURGE. Adding dummy occasions permits to manipulate the correct survival parameters, but the dummy occasions come with no additional information and identifiability issues in full time-dependent models (tables 1, 2). If the parameters that appears in the gap are unrelated to known parameters from other intervals they will not be separately identifiable, e.g. if all survival parameters are time-and age-dependent, the first age-class survival probabilities ϕ' at the beginning of a gap occasion and the second age-class survival probabilities ϕ that follow them are not separately identifiable (tables 1, 2). However, the approach works well as long as one of the two types of survival parameters is kept constant or modelled as a function of environmental covariates (see simulated data results below, tables 1, 2). Implementation 2: using a log-link function An alternative to the above implementation and especially useful when the l j are not commensurate (or when too many dummy occasions are required) relies on the use of a logarithm transformation (this implementation is not available in program E-SURGE). The survival probability over the initial interval of a young individual can be decomposed into its initial survival as a young for a duration r with a survival probability per time unit of ϕ' followed by the survival as an adult for a duration s with a survival probability per time unit of ϕ. The survival probability over the Table 1. Not separately identifiable parameters in presence of missing occasions: ϕ' juvenile survival; ϕ adult survival; p, recapture; ' t ' time effect; ' cov ' covariate effect; ' . ' constant parameter; Np, number of separately identifiable parameters. Note that no individuals were marked during missing occasions and consequently juvenile survival parameters do not exist in the model for cohorts without released juveniles (i.e. ϕ' 3 , ϕ' 4 for dataset 1 and ϕ' 3 for dataset 2). Similarly, adults were not marked on the first occasion and, consequently, ϕ 1 do not exist in the model.
3 3 whole interval of length r + s is then ϕ' r ϕ s (for year based sampling r = 1). Applying a log-link function the product ϕ' r ϕ s is replaced with a linear combination of survival related quantities, log(ϕ' r ϕ s ) = rlog(ϕ') + slog(ϕ), to be estimated. The known quantities, r and s, can be used as covariates of the survival probability pertaining to the interval (fig. S2.2 supplementary material S2). This approach does not require changing the encounter histories contrary to implementation 1. A similar solution was used by Tavecchia et al. (2001Tavecchia et al. ( , 2002 to estimate monthly survival of game species when marking occurred at different moments during the hunting season.

Simulated cases
To demonstrate the problem generated by unequal time interval in combination with age-dependent models, we considered a simple scenario with a model assuming two age classes and constant survival and recapture parameters. Note that this simple scenario is only for illustrative purposes. We simulated 100 datasets with five sampling occasions during a 7-years period (k = 7). A hundred new juvenile animals were released at each occasion. The time span elapsed between occasions was as l = 1, 3, 1, 1. We assumed constant yearly survival of newly marked juvenile individuals (ϕ' = 0.4) and constant yearly survival of adult individuals (ϕ = 0.8). We first analysed these datasets using unequal time interval (the incorrect approach) to illustrate the biases. We subsequently analysed them adding dummy columns to 'fill' the years without monitoring (implementation 1) and using the log-link implementation with r (= 1) and s (= 2) values for newly marked individuals in the second cohort to constrain the corresponding survival parameters appropriately (implementation 2). For each analysis, maximum likelihood estimates of ϕ' and ϕ were obtained using RMark (Laake, 2013). The code to simulate the data and run the analysis is provided in supplementary material S1. Supplementary material on S2 and S3 illustrate how to implement the solution in MARK (White and Burnham, 1999) and E-SURGE, respectively (Choquet et al., 2009).

Parameter identifiability in time-dependent models
In the example above we have assumed constant survival and recapture probabilities to illustrate the problem and its solution. However, in many cases, parameters are time dependent. To show the applicability of the solution to more complex parameter structures and to study parameter identifiability, we simulated two different datasets with seven intervals (k = 7, supplementary material S4). Both datasets considered temporal variation on juvenile and adult survival parameters as a function of a hypothetical temporal covariate D (adult survival was modelled as 1/(1 + exp(-(1.386 + 0.55 * D))) and juvenile survival as one half of adult survival at each occasion) and a constant recapture probability of 0.7. Datasets differed in the length of the period with no CMR information: dataset 1 considered a gap of two years (data from sessions 3 and 4 are missing), while dataset 2 considered a gap of one year only (data from session 3 are missing) so that l = 1, 3, 1, 1 and l = 1, 2, 1, 1, 1, respectively. To avoid identifiability problems associated with sample size (which is not within the scope of this note) a thousand new juvenile animals were released on each occasion. To explore parameter identifiability, we implemented 18 models to each simulated dataset, considering different combinations of temporal, covariate and constant effects on juvenile and adult survival probabilities, and temporal vs. constant effect on recapture probability. Datasets were analysed using program E-SURGE (Choquet et al., 2009), which provides detailed results on parameter identifiability using the explicit method proposed by Catchpole and Morgan (1997) to detect parameter redundancy (supplementary material S4). ϕ' t ϕ t p t 10 (ϕ' 2 * ϕ 3 * ϕ 4 ); (ϕ 2 * ϕ 3 * ϕ 4 ); (ϕ' 6 * p 6 ); (ϕ 6 * p 6 ) 13 (ϕ' 2 * ϕ 3 ) ; (ϕ 2 * ϕ 3 ); (ϕ' 6 * p 6 ); (ϕ 6 * p 6 ) ϕ' t ϕ t p.
Application to real case We considered a dataset of capture-mark-recapture data of adult Mediterranean storm petrels (Hydrobates pelagicus melitensis) from Palomas Island (Eastern Spain). Birds were captured using mist-nets from 1996- Here we report results obtained by using the unequal time interval option and the proposed solution for comparative purpose. We only present implementation 1 as results of both implementations are equivalent (see results). The goodness of fit test of a model assuming all parameters time dependent (Pradel et al., 2005) indicated a surplus of animals seen only at marking, i.e., transients (ϕ 2 6 = 29.96, p < 0.05). As a consequence, survival during the first year after marking, ϕ', was considered separately from the subsequent survival, noted ϕ, in 2-age class models (Sanz-Aguilar et al., 2010).

Survival estimates in simple two-age classes constant models
Models in which we specified the unequal time interval in the l-vector delivered average estimates of first-year survival probability larger than the true value  supplementary material S1).

Parameter identifiability in time-dependent models
Our results indicate that models with two consecutive gaps present more problems of parameter identifiability than models with a single missing occasion (table 1). When juvenile and adult survival parameters are fully time dependent, survival parameters during the occasions without monitoring are not separately identifiable (tables 1, 2). Moreover, when the recapture probability is also time dependent, the last survival and recapture are not separately identifiable (tables 1, 2). All parameters became identifiable when juvenile and/ or adult survival is constant or modelled as a function of temporal covariates with the exception of models in which more than one consecutive occasion without monitoring and adult survival was time dependent (tables 1, 2). In this case, only the adult survival parameter corresponding to the year in which the gap begins was separately identifiable (tables 1, 2).

Application to real case
As in the simulated example, when using vector l as exponent of survival parameters, we obtained higher estimates for first-year survival probabilities. When gap years were not properly considered, models with the unequal time interval option delivered survival estimates of ϕ' = 0.73 and ϕ = 0.80 (transient pro-intervals and age dependence in capture-recapture models can lead to erroneous estimates of survival and model selection (i.e. biological inference). To avoid the problem of unequal time intervals, sampling protocols should be properly designed. Here we showhow to partially overcome the s problem of uneven intervals in two age-classes capture-recapture models. Adding dummy columns to the encounter history can generally be used when interval lengths are commensurable with a same unit of time during which no change of age class occurs (one month, for instance). However, there might be practical limitations because in some cases this solution would lead to add a great number of dummy occasions and would 'push' estimate of survival probability close to the upper boundary value of 1. In these cases, a log-link function can be especially useful to accommodate heterogeneity in the duration of encounter occasions when uneven periods are of small duration and too many dummy occasions should otherwise be incorporated to account for the unequal time period. However, not all problems can be solved using the approaches suggested. For example, the log-link function works well in rich datasets but might cause numerical problems when data are sparse (Tavecchia et al., 2001). Moreover, by using the log-link function, the effect of temporal covariates cannot be modelled. Also, when survival parameters are fully time-dependent there are still some parameters that cannot be estimated separately, with only their products being identifiable (tables. 1, 2). The longer the period with missing information the higher the number of redundant parameters (tables 1, 2). However, by constraining survival using external covariates, most parameters become identifiable. Here we focused on two age class, single state models, but more complex models such as multistate models or models including multiple age-classes will present additional parameter identifiability problems. Despite these limitations, the solution presented here performed well in relatively simple situations and we recommend its use when age-dependent parameters are incorporated in models with uneven intervals between sampling occasions. Finally, the presence of transient animals can be accommodated in CMR models by using agedependent models (see the real case, Pradel et al., 1997;Sanz-Aguilar et al., 2010). However, recently developed multi-event models (Pradel, 2005) allow to model transients as a specific uncertain category of individuals with known parameter values (survival probability = 0). In this case, age-dependent models are no longer necessary and the problem does not apply (e.g. Genovart et al., 2012;Santidrián Tomillo et al., 2017).

Acknowledgements
Gonzalo G. Barberá and Gustavo Ballesteros kindly allowed the re-analysis of Palomas data collected by the Asociación de Naturalistas del Sureste (ANSE) and by volunteers supported by the Dirección General de Medio Natural of the Regional Government were not properly considered (table 3).

Discussion
Obtaining unbiased estimates of demographic parameters (such as age-dependent survival) is essential for population diagnosis (Williams et al., 2002;Sanz-Aguilar et al., 2016). To achieve informed management decisions concerning biodiversity conservation, therefore, demographic parameters must be accurately estimated. Here we demonstrate that when not handled properly, the combination of unequal time The design matrices to implement the analyses in program MARK using implementation 1 ( fig. S2.1) and implementation 2 ( fig. S2.2), respectively. Note that encounter histories must be changed before implementing solution 2 to include the dummy occasion to fill-in years without monitoring. The files for the full analyses with *.inp, *dbf and *ftp can be found in supplementary material S3.