The need for long-term population monitoring of the world’s largest fish

: Many large marine species are vulnerable to anthropogenic pressures, and substantial declines have been documented across a range of taxa. Many of these species are also long-lived, have low individual resighting rates and high levels of individual heterogeneity in capture probability, which complicates assessments of their conservation status with capture−mark− recapture (CMR) models. Few studies have been able to apply CMR models to whale sharks Rhincodon typus , the world’s largest fish. One of their aggregation sites off Mafia Island in Tanzania is characterised by unusually high residency of this Endangered species, making it an ideal target for CMR methods. Three different CMR models were fitted to an 8 yr photo-identification data set to estimate abundance, population trend and demographic parameters. As anticipated, resighting rates were unusually high compared to other aggregations. Different CMR models produced broadly similar parameter estimates, showing a stable population trend with high survivorship and limited recruitment. Tagging and biopsy sampling for concurrent research did not negatively affect those sharks’ apparent survival or capture probabilities. Scenario-based power analyses showed that only pronounced abundance trends (±30%) would be detectable over our study period, at a 90% level of probability, even with the relatively high precision in yearly abundance estimates achieved here. Other, more transient whale shark aggregations, with reduced precision in abundance estimates, may only be able to confidently detect a similar trend with CMR models after 15−20 yr of observations. Precautionary management and long-term monitoring will be required to assist and document the recovery of this iconic species.


INTRODUCTION
Many marine megafauna species are vulnerable to anthropogenic threats such as target fisheries, bycatch and habitat degradation (Lewison et al. 2014, Mc-Cauley et al. 2015, Queiroz et al. 2019. Low intrinsic population sizes and life-history parameters such as large body size and late maturity exacerbate the ex-tinction risk of these species (Reynolds et al. 2005), while movement across different exclusive economic zones and through international waters hampers effective management (Hays et al. 2016, Sequeira et al. 2019. While documented recoveries in several threatened marine mammal and sea turtle species have shown that these challenges are not insurmountable (Magera et al. 2013, Valdivia et al. 2019, these efforts are aided by accurate long-term population trend data that can be integrated into an adaptive conservation strategy (Lindenmayer et al. 2012).
Monitoring the abundance of air-breathing marine animals is challenging enough. Sharks and rays, by contrast, do not need to surface, and most species spend a significant portion of their lives beyond the narrow depth range in which conventional diving equipment can be used for direct observations. Unsurprisingly, data on elasmobranch abundances and population trends are scarce, with almost half of the species listed as Data Deficient on the IUCN Red List (Dulvy et al. 2017). Even for better-studied species, population trends are typically derived from catch per unit effort indices (Sherley et al. 2019). The whale shark Rhincodon typus is one of these comparatively well-studied animals. Whale sharks predictably aggregate in surface waters, often close to the coast, in over 20 locations around the world (Norman et al. 2017). This behaviour makes them accessible to researchers, at least for short periods. They also have a unique colour pattern that makes them amenable to individual photo-identification (photo-ID; Pierce et al. 2018). Globally, ~13 200 individual whale sharks have been identified and logged on the collaborative database www.sharkbook.ai as of September 2021.
Whale sharks were uplisted to Endangered status on the IUCN Red List in 2016, with their global population estimated to have been halved over the last 3 generations (Pierce & Norman 2016). Whale shark abundance has been estimated from photo-ID data with capture−mark−recapture (CMR) models at several aggregation sites, although short study durations precluded the estimation of population trends in the Maldives, the Seychelles and Mexico (Rowat et al. 2009, Riley et al. 2010, Ramírez-Macías et al. 2012. In Madagascar, a stable population was reported across 4 yr, albeit with a caution that longer-term data are needed to account for natural fluctuations and changes in survey effort (Diamant et al. 2021). In Western Australia, a small increase in annual abundance from 2004−2007 was estimated with an open Robust Design model (Holmberg et al. 2009). Another study in the same region estimated their total abundance at 320−440 sharks (Meekan et al. 2006), although only a small subset of the individuals identified at this site were included in both studies.
CMR models use sightings histories of individuals to derive parameter estimates. Recapture rates, the probability of observing an individual in subsequent events, are a key factor. Low recapture rates can lead to non-estimable parameters or imprecise parameter estimates, precluding the accurate determination of trends (Pine et al. 2003, Dudgeon et al. 2015. High transience is characteristic of whale sharks at most aggregation sites, resulting in infrequent encounters and low recapture rates, making it difficult to estimate whale shark abundance and population trends using CMR models (Holmberg et al. 2009). For example, ~65% of whale sharks in Western Australia and 85% of individuals in Mexico were only seen in one year. Consequently, CMR models were then applied to only a small subset of individuals that were encountered in multiple years (Holmberg et al. 2009, Ramírez-Macías et al. 2012). More recent studies (Fox et al. 2013, Cochran et al. 2016, Araujo et al. 2017, 2019, McKinney et al. 2017, Robinson et al. 2017, Prebble et al. 2018 abandoned CMR methods in favour of models based on the lagged identification rate (LIR), i.e. the probability of resighting a particular shark at any time after it was first identified (Whitehead 2009). LIR models have fewer assumptions and do not require the same high recapture rates but estimate a much narrower range of parameters, i.e. daily abundance, mortality rate and residence times inside and outside the study area. In short, LIR methods can be applied at most aggregations, but CMR estimates are likely to be more informative at sites where sufficient data allow for their use.
Whale sharks occur year-round at Mafia Island in Tanzania (Cagua et al. 2015) and seasonally aggregate in a relatively shallow (< 30 m) bay to feed on sergestid shrimps between October and February (Rohner et al. 2015a). Residency rates within and across years are high (combined maximum residency index: 0.39; Rohner et al. 2020), and high seasonal research effort means that individual whale sharks, if present, are rarely missed by observers. Within a season spanning 2−3 mo, photo-ID only missed ~5% of sharks that were recorded in that time with concurrent passive acoustic telemetry (Rohner et al. 2020). Otherwise, the aggregation is similar to many other whale shark hotspots (Norman et al. 2017) in that it is food-driven (Rohner et al. 2015a) and comprises mostly juvenile males (Rohner et al. 2015b). High residency and sightability of whale sharks at Mafia Island combined with a structured study design is a best-case scenario to examine population parameters with CMR models.
Here, we assessed the demographic structure of whale sharks at Mafia Island in Tanzania and estimated a suite of parameters with CMR models. The main aims were to (1) estimate population parameters, such as capture probability, apparent survival and the probability of entry into the population; (2) examine whether natural characteristics (such as sex and size) or research activities (such as tagging or taking a biopsy sample) had a detectable influence on resighting and survival; (3) estimate the superpopulation size and annual population sizes and determine the population trend (increasing/decreasing); and (4) assess whether the survey design and duration allowed for robust trend estimates.

Whale shark surveys
From 2012−2019, a total of 358 boat-based visual surveys were conducted at the whale shark aggregation site in Kilindoni Bay on the western side of Mafia Island in Tanzania (7.92°S, 39.65°E; see Fig. 1). Surveys were limited to the October−February period, which coincides with inshore feeding aggregations of sharks (Rohner et al. 2020). Survey effort was high in November (n = 149) and December (n = 110) of each year and this was consistent throughout the study. Survey paths varied daily and were chosen to maximise whale shark sightings on a given survey, based on sightings in previous days, information from fishers and local knowledge of the captain and researchers. The broader ~100 km 2 survey area remained constant over time (see Fig. 1).
Once sighted, whale sharks were approached by snorkelers who collected identification photos. The total length (TL) of each shark was visually estimated to the nearest 0.5 m, and sex was determined by looking for the presence (males) or absence (females) of claspers between the pelvic fins. Photo-ID was used to identify individual whale sharks based on their natural markings (Arzoumanian et al. 2005). Their spot patterns remain the same over at least ~20 yr (Norman & Morgan 2016) and are the equivalent of 'marking' an animal in a CMR framework. All encounters were uploaded to the Wildbook for Whale Sharks global database (www.sharkbook.ai) to re-identify an existing animal or add a new shark to the database. Sightings without an associated identification photo or with photos of insufficient quality (i.e. not taken at a right angle, taken from above or below the ID area or where the spot pattern was not clearly visible) and duplicate encounters of an individual on the same day were removed so that each encounter represented a single shark seen on a given day. These encounters were the 'captures' (the first time an individual was photographed) and 'recaptures' (subsequent sightings of a previously identified individual) in a CMR framework. The GPS location of most (73%) encounters was also recorded to examine the spatial distribution of sightings in Kilindoni Bay.
Our survey data were supplemented with citizen science encounters uploaded to the global database. Approximately 95% of whale shark encounters recorded from Mafia Island were from dedicated surveys, ~2% were from before the study began and ~3% were from citizen scientists, collected largely outside of the dedicated survey period. We had 3 sets of data for this study: (1) all whale shark encounters, which were used to plot a discovery curve and assess resighting rates; (2) all whale shark encounters within the CMR sampling period (see Section 2.2), which were used to construct basic population models, i.e. models without covariates; these data sets included citizen science data; and (3) only researcher survey data, which were used in group models based on covariates such as sex or size, as the identification of these characteristics by citizen scientists can be unreliable.

Population modelling
We used a CMR modelling approach to assess a suite of population parameters. The sampling period for all CMR models was defined as November and December 2012−2019, because these were the only months surveyed in all study years, had high and consistent survey effort, and 94.9% of individuals were seen in these months. Three separate CMR model parameterisations were then fit to this filtered data set. First, we used a POPAN parameterisation of a Jolly-Seber (JS) model to examine the superpopulation size (N ) which estimates the overall number of whale sharks available to be sampled over the whole study period; yearly abundance (Ny); apparent survival (ϕ) which encompasses both survival and not emigrating permanently; the probability of entry ( pent); and the capture probability (p). We also calculated pent(0), or the proportion of individuals from the superpopulation present at the start of our study, as 1 − sum(pent[1]:pent [7]). Second, we used a Pradel parameterisation of a JS model to estimate the rate of population change (λ) (λ < 1 indicates decreasing population; λ > 1 indicates increasing population) as well as p and ϕ. Both JS formulations are open models, meaning that individuals can enter (through birth or immigration) and exit (through death or emigration) the study population. These models assume that individuals that have not been identified have the same p and ϕ as previously identified whale sharks, the study area remains the same over the study period, individual whale sharks do not lose their 'tags' (i.e. natural markings) and are correctly identified. Heterogeneity of capture and survival probabilities were addressed through goodness-of-fit testing (see Section 2.4), while the latter model assumptions were fulfilled by our survey design and known temporal stability of natural markings .
Finally, we used Pollock's Robust Design (RD; Pollock et al. 1990) to estimate Ny, ϕ and p, all of which were then compared with the corresponding outputs from the JS models. RD models are a mix of open and closed models. They consist of a primary capture session, here defined as November− December each year, and secondary capture sessions within each primary session, here the separate months of November and December. By incorporating sampling across varying temporal scales, these models allow for the population to be 'open' in the long interval between primary sessions, but the population is assumed to be closed between each of the shorter secondary sessions within the primary sessions. The combination of closed and open methods for parameter estimation minimises the bias that typically results from heterogeneity in capture probability (Kendall & Pollock 1992). Using data from more sampling occasions than JS models can also increase precision in abundance and survival estimates (Pollock et al. 1990).

Influence of research techniques, sex and size
Whale sharks segregate by sex and size, and most feeding aggregations are male-biased, consisting largely of juvenile individuals (Norman et al. 2017). We were interested in how these individual characteristics influence parameter estimates, particularly p and ϕ. The mode (most frequent estimate) of TL for each individual shark was used for analysis, considering that visual estimates varied among encounters but shark growth was likely minimal over the study period (Rohner et al. 2015b). For population models, sharks were binned by TL into small (≤ 5 m; n = 74), medium (5−7 m; n = 65), and large (≥ 7 m; n = 40) size classes. These size classes do not represent different life stages, because almost all individuals here were large juveniles, rather than us having access to the full size spectrum of the species (0.6-20 m TL; Joung et al. 1996, Chen et al. 1997. Research activities are sometimes perceived to negatively affect the probability of resighting or even the survival of whale sharks (Harvey-Carroll et al. 2021). To assess any such influence, we collected photo-ID data on all individuals that we tagged or biopsied for other aspects of the research program at Mafia Island. Over the course of this CMR study, we tagged 56 whale sharks. Most tagged sharks (n = 51) had an external acoustic transmitter; these were deployed largely in October−December 2012 and in December 2014 using a Hawaiian sling to embed an anchor in the sharks' skin (Rohner et al. 2020). Fifteen of these sharks were re-tagged over the course of the study, and mean tag retention was 307 d. Using the same technique, we also deployed 3 satellite tags in December 2015, which stayed attached for <1 mo. Five accelerometer tags were deployed in November 2015 using a temporary fin clamp over a 24 h period. Three sharks had multiple tag types over the course of the study, but we used a binary factor as either tagged or not tagged as the group effect. Small skin biopsies (~1 g) were taken from 136 whale sharks (72.7%) throughout the study period using a Hawaiian sling with a modified tip (Prebble et al. 2018). Most of these sharks (n = 90) were sampled multiple times across years. We tabulated the numbers of sharks in each group (see Table 1).

Goodness-of-fit
We implemented goodness-of-fit tests using the 'R2ucare' package (Gimenez et al. 2018) in R v.3.6.1 (R Development Core Team 2019) to examine the heterogeneity of capture and survival probabilities and test for overdispersion in our data, as indicated by the variance inflation factor ĉ. We also examined the output from each of the 4 tests separately and followed the decision-making tree in Gimenez et al. (2018) to adjust our JS models. The basic data set (no covariates) had a ĉ of 2.49, with only the transience test '3sr' significant (p < 0.01), indicating that the assumption of equal survival of marked and unmarked individuals was violated (Gimenez et al. 2018). To account for this transience in our JS models, we fixed ϕ as dependent on the time since marking (tsm; i.e. first sighting): ϕ ~ tsm, which allowed estimates of survival to differ from the first and subsequent sampling occasions (Pradel et al. 1997), resulting in an updated ĉ of 1.35. The RD models had the advantage of not requiring this adjustment because they account for temporary emigration. We also did not use this adjustment with JS models with group ef-fects, as these data fitted well across all groups (Table S1 in the Supplement at www.int-res.com/ articles/suppl/n047p231_supp.xlsx). Of all subgroups, only one test was significant for males (test 3sr, p = 0.005). To examine the influence of study duration on model performance, we also ran goodness-of-fit tests on the basic model for progressively fewer survey years. The full 8 yr data set had the best fit (ĉ = 2.49) which reduced progressively for 7, 6 and 5 yr study durations to 2.9, 3.6 and 4.5, respectively.

Model selection
We constructed all models using the package 'RMark' (Laake 2013). For all JS models, parameters were estimated for all whale sharks combined (basic model) and also for each group (group model with covariates, e.g. sex, size class, tagged, biopsied) by including each factor as a group effect. For JS parameterisations, we modelled ϕ as dependent on tsm in the basic model, and as constant (.), variable over time (t), or variable with group (group and group + t) in the group models. We modelled p and pent as constant, variable over time, or variable with group. For the Pradel parameterisations, we set λ as time-variable (t and group + t) to examine inter-annual variation, and also as constant (. and group) to examine the overall trend. In JS models that have p and pent or p and λ both as time-variable, initial estimates are confounded, which we accounted for by setting p(2012) = p(2013), which was indicated with (T) in the model formulation. Similarly, when p and ϕ are both time-variable, their final estimates are confounded, which we accounted for by setting p(2018) = p(2019). After compiling candidate model sets, we used Akaike's information criterion (AICc) to rank and assess model support for each of the models, where lower AICc values indicated greater model support (Burnham & Anderson 2004). Model rankings (Quasi Akaike's information criterion, QAICc) were adjusted by ĉ in the basic JS models. Parameters were then estimated by model averaging based on model weight. Because model averaging led to time-variable parameter estimates, in some cases the single, best-supported model was used to estimate parameters of interest without time variability (i.e. constant or with a group effect). Best-supported models were within ΔAICc < 2. We only report results for groups if there was a significant group effect, i.e. if group was included for a parameter in the best-supported models within ΔAICc < 2. Parameter estimates were compared within groups using the upper and lower control limits as 95% CI; SE is also reported.
For RD models, we first determined the temporary emigration scenario that best fit our data, using basic models with constant or time-variable (by secondary session) ϕ and constant or time-variable (by year, by secondary session and by year and secondary session) probability of first capture and the same probability for first capture and recaptures (i.e. p = c). No temporary emigration had constant temporary emigration rates γ' and γ'' = 0, random temporary emigration had time-variable γ' = γ'' and Markovian temporary emigration had the 2 variables dependent on each other (γ', γ''). The temporary emigration rates γ' and γ'' were then set accordingly. For the basic model, we estimated Ny and set ϕ as constant (.) or variable between primary periods (t), and p as constant, variable between primary periods (session), between secondary periods (t) and between both primary and secondary periods (session: t). The recapture probability c was kept equal to p, as we did not expect a behavioural effect of capture (i.e. taking a photo-ID). For group effect models, interaction terms (t + group, or t × group) were added for ϕ and p.

Simulations for trend detection
We examined the population trend with linear regressions fitted to model estimates of λ and the rate of change based on estimates of Ny, derived from the basic POPAN and RD models. We then conducted a power analysis on the trend derived from estimates of Ny in the RD models using the 'fishmethods' package (Nelson 2019) to determine the probability of detecting a trend; that is, a slope different from zero (Gerrodette 1987). Finding a trend when in fact there is none is termed a Type 1 error, with its probability α. Finding no trend when in fact there is one is a Type 2 error, with its probability β. Here, both errors were set at the same level of 0.1. Power, then, is the probability of detecting a trend when this trend occurs and is defined as 1 − β, or a power of 0.9 in our case. Power is influenced by the study duration, the magnitude of the trend, the precision of abundance estimates and the significance required (Gerrodette 1987). We used empirical estimates of precision as input by calculating the mean of the proportional standard error (PSE) of all years. First, we used the empirical study duration and trend data to assess significance (i.e. power = 0.9). Second, we set the population trend at a range of values between −30% (sharp decline) and + 30% (population recovery) and used the empirical PSE and starting value of Ny to calculate the power of detecting such a trend over study periods ranging from 3−30 yr. Last, we assessed the importance of precision of estimates of Ny on trend analysis by simulating different PSEs in a power analysis with the empirical Ny and a study duration ranging 3−30 yr. We used our empirical PSE and the PSE from a CMR whale shark study in Nosy Be, Madagascar (Diamant et al. 2021), along with hypothetical PSE values.

Whale shark sightings and population structure
We recorded 1858 whale shark encounters on 358 surveys between 2012 and 2019, resulting in a mean (± SD) of 5.2 ± 5.78 and a median of 4 individuals sighted per trip. Ten or more sharks were seen on 67 trips (18.7%); the highest number of unique whale sharks identified in a trip was 49. No sharks were encountered on 73 trips (20.4%). These trips were mostly at the end of our survey season, with comparatively less effort in February (60% of trips in February, n = 35) and January (25%, n = 44). November (5.3 ± 6.50), and December (6.6 ± 5.82) were the months with the highest numbers of sharks seen per trip. October also had high sightings (6.1 ± 4.67 sharks trip −1 ) but low effort (n = 20). Encounters were recorded throughout Kilindoni Bay, with markedly fewer sightings recorded from the southwest of the Bay (Fig. 1).
We identified 187 unique whale sharks in our surveys: 158 males (86.3%), 25 females (13.7%) and 4 individuals for which we could not determine the sex (Table 1). The sharks ranged in length from 2.5−9 m TL, with a mean of 5.76 ± 1.17 m and a median of 6 m. Males and females were similar in length (Wilcoxon rank sum test, W = 1806.5, p = 0.48) and had similar numbers of resightings (W = 1803.5, p = 0.49; Table 1). Tagged sharks were larger (W = 2423, p < 0.001) and were resighted more often (W = 743, p < 0.001) than sharks that were never tagged (Table 1). Biopsied sharks were also larger (W = 2505, p = 0.0045) and resighted more often (W = 824.5, p < 0.001) than sharks that were never biopsied (Table 1).
A total of 14 additional whale sharks were reported from encounters outside of our surveys, largely before our project began in 2012. Overall, including encounters from citizen scientists and from before Of those 201 whale sharks, most (81.8%) were encountered multiple times during the study. Individual sharks had up to 57 encounters, with a mean of 10.4 ± 12.0 and a median of 5 encounters shark −1 . More than half of all sharks (59.8%) were seen in multiple seasons. Of the 185 sharks included in the CMR study, 36.2% were seen in only one season, with more females (48%) compared to males (31.4%) seen in only one season. The discovery curve showed an initial steep rise in individuals before our study began and in our first season in 2012−2013 (Fig. 2). After this initial period, the highest influx of new sharks was in 2016−2017. Compared with regional aggregations at Praia do Tofo, Mozambique, and Nosy Be, Madagascar, the discovery curve at Mafia Island was flatter (Fig. 2).

Model selection
We recorded 1634 encounters of 185 different whale sharks during the CMR sampling period of November and December 2012−2019 and used these data for 'basic' models -i.e. those without group effects. As covariate information was missing for some sharks, sample sizes (total individuals) varied across different groups, i.e. sex (n = 176), size class (n = 179), tagged (n = 180) and biopsied (n = 180).
The best-supported basic POPAN model had a constant p and a time-variable pent with a model weight of 0.49 (Table S2). The group effect 'sex' was significant for ϕ(sex), p(sex) and pent(sex + t) in the 2 best-supported models. ϕ and p were not time-variable in the best-supported models, and we thus calculated the model-averaged results for all 3 parameters and the results from the best-supported model for ϕ and p without the time factor. These 2 models, within ΔAICc < 2, had a combined weight of 0.57. The group effect 'tag' was significant for ϕ(tag), p(tag + t) and pent(tag) in the best-supported model with a weight of 0.93. The group effect 'biopsy' was significant for ϕ(biopsy, biopsy + t), p(biopsy, biopsy + T) and pent(biopsy, biopsy + t) in the 6 best-supported models. These models, within ΔAICc < 2, had a combined weight of 0.78. The group effect 'size' was significant for ϕ(size), p(size + t) and pent(size) in the best-supported model with a weight of 0.82 (Table S2).
The 2 best-supported basic Pradel models had a constant or time-variable p and λ with a combined model weight of 0.73 (Table S3). The model with a constant λ also had good support (ΔQAICc = 0.62), and we thus calculated the overall abundance trend.
The group effect 'sex' was significant for all 3 parameters and the best-supported models had λ(sex + t), ϕ(sex), and p(sex, sex + T) with a combined model weight of 0.46. The group effect 'tag' was significant for all 3 parameters and the best-supported model had λ(tag + t), ϕ(tag) and p(tag + T) with a model weight of 0.7. The Pradel models with a group effect for 'biopsy' had 2 best-supported models within ΔAICc < 2, and the group effect was significant for ϕ(biopsy, biopsy + t) and p(biopsy, biopsy + T), with a combined model weight of 0.53. The group effect 'size' was significant for λ(size), ϕ(size) and p(size + t) in the 2 best-supported models with a combined weight of 0.86 (Table S3). The basic RD model with no temporary emigration (γ' = γ'' = 0) had the best support and a model weight of 0.68 (Table S4), and we selected no temporary emigration for the RD models for the basic models and the models with a group effect. The best-supported basic RD model had a constant ϕ and p varied by secondary periods, with a model weight of 0.99 (Table S4). Some of the 40 candidate models for each group had inestimable results and were removed during model selection. The best-supported model with a group effect for 'sex' had ϕ(sex) and p(session) with a model weight of 0.82. The best-supported models with a group effect for 'tag' had ϕ(tag, or t × tag) and p(session × tag) with a combined model weight of 0.99. The 2 bestsupported models with a group effect for 'biopsy' had ϕ(biopsy, or t + biopsy) and p(session × biopsy or session:t) with a combined model weight of 0.92. The best-supported model for group 'size' had ϕ(size) and p(session) with a model weight of 0.99 (Table S4).

Capture probability
The overall value of p was 0.8 ± 0.03 in the basic POPAN models (Table 2) and in the basic Pradel models (Table 3). The RD models estimated p varying by secondary periods, and these monthly estimates were lower (range = 0.4−0.75) than the overall estimate from the JS models (Table 4). Values of p in the RD models were lowest in 2013 and 2018.
In both JS parameterisations, p was significant without a time factor for the groups 'sex' and 'biopsy' (Tables 2 & 3); p was also significant for the groups 'biopsy' and 'tag' in the RD models but incorporated time-variability. The RD models estimated a higher mean p for biopsied whale sharks than those that were not biopsied (W = 208, p = 0.002), similar to the overall JS results (  Table 2. Parameter estimates (± SE) of capture probability (p), apparent survival (ϕ) and probability of entry (pent) from POPAN Jolly-Seber models overall (constant parameter) and for each year (time-varying parameter). Missing fields mean that estimates of that parameter were not supported in the model selection process. pent(0) is given in the 2012 column. Significant differences between subgroups are in bold between the means of p for tagged and untagged whale sharks (t = −1.74, df = 30, p = 0.093). Both JS parameterisations had p values that were significant in models with tag and with size as a group effect, with an added time factor. However, estimates of p for tagged and untagged animals, and between the 3 size classes, were not different in year-to-year comparisons (Tables 2 & 3). Sex and size were not significant for p in RD models.

Apparent survival
Apparent survival (ϕ) was time-variable in the best-supported basic POPAN and Pradel models. It ranged from 0.71 ± 0.07 in 2012−2013 to 0.98 ± 0.07 in 2018−2019 in the POPAN models, but none of the year-to-year comparisons were significant ( Table 2). The Pra del models had similar estimates of ϕ (Table 3). The RD models estimated an overall ϕ of 0.8 ± 0.03 (Table 4).
In the best-supported models for all 4 groups in the POPAN and Pradel models, ϕ varied by group. In RD models, ϕ was also significant for all 4 groups. Male whale sharks had higher ϕ than females, tagged sharks had higher ϕ than untagged individuals, biopsied sharks had higher ϕ than unbiopsied individuals and small sharks had a lower ϕ than both medium and big individuals in the POPAN models ( Table 2). Estimates of ϕ in the Pradel models were similar to the POPAN estimates, with all groups except sex having the same significant differences (Table 3).    Table 3. Parameter estimates (± SE) of the rate of population change (λ), capture probability (p) and apparent survival (ϕ) from Pradel Jolly-Seber models overall (constant parameter) and for each year (time-varying parameter). Missing fields mean that estimates of that parameter were not supported in the model selection process. Significant differences between subgroups are in bold Estimates of ϕ in the RD models were also similar to the POPAN estimates, with all 4 groups having the same differences (Table 4).

Probability of entry
The probability of entry (pent) was only estimated in the POPAN models. It was low overall (0.09 ± 0.01) in the basic model. Estimates of pent varied with time, with a low of 0.03 ± 0.04 in 2013 and a high of 0.12 ± 0.04 in 2016 (Table 2). In the models with a group effect, pent was higher for untagged (0.11 ± 0.01) than for tagged (0.03 ± 0.01) whale sharks, respectively (Table 2).
Within-year differences were mostly not significant in all 4 group effect model sets, with the exception of a higher pent for un tagged than for tagged whale sharks in 2013−2018 ( Table 2). The proportion of the superpopulation present at the beginning of the study (pent(0)) was 0.44 ± 0.01 (Table 2).

Population size and trend
In the basic POPAN model, N was estimated at 208 whale sharks or 12.4% more than the 185 identified individuals used in the CMR models. Estimates of Ny for each season varied from 66.6−95.7 sharks with a mean of 81.3 ± 9.8 (Table 5). Estimated abundance in the basic POPAN models was 18.3− 47.9% higher than the number of photo-identified whale sharks in the corresponding year. Estimates of Ny in the basic RD model with no temporary emigration were similar to those from the POPAN output (two-sample Kolmogorov-Smirnov test, D = 0.25, p = 0.98) and varied from 61.4−97.6 with a mean of 79.0 ± 12.4 sharks. Estimated abundance in the basic RD models was 5.9−55.1% higher than the number of photo-identified whale sharks in the corresponding year.
The model with a constant λ parameter for the rate of change in the popula-  Table 4. Parameter estimates (± SE) for apparent survival (ϕ) and capture probability (p) from Robust Design models overall (constant parameter), for each year (primary session-varying parameter) and the minimum, maximum, mean and SD for parameters varying with secondary session. Missing fields mean that estimates of that parameter were not supported in the model selection process. Significant differences between subgroups are in bold tion size had good support in the basic Pradel model (ΔQAICc = 0.62) and estimated the overall population trend at 1.04 ± 0.03. λ varied with time in the other best-supported basic Pradel model. A decreasing population was estimated for 2013 (between 2012 and 2013; λ = 0.84 ± 0.18) and for 2018 (0.98 ± 0.09), and an increase was estimated for the other 5 yr. Only one group effect had a significant difference in estimates of λ, with untagged sharks having a higher λ than tagged sharks in 2013−2016 (Table 3). The rate of change calculated from estimates of Ny in the POPAN models was similar to estimates of λ in the Pradel models, with a negligible overall in crease of 2% (Fig. 3). The rate of change based on abundance estimates of Ny in the RD models was negative in 2013 (0.87), similar to the JS model estimates, and it was also negative in 2014 and 2019 (Table 5). Overall, 4 years had a declining abundance trend in at least one model, and 3 years (2015−2017) showed increasing abundance in all models. The overall increase based on Ny in the RD models was the same as the estimate from POPAN's Ny at 2%. Linear regressions for all 3 measures of change were positive, but not significant, demonstrating a constant trend (Fig. 3).

Simulations for trend detection
The power analysis on Ny estimates in the RD models using empirical inputs from 8 yr of data, our starting Ny of 80.0 and a PSE of 0.073 corroborated the constant trend seen in the linear regressions. With a power of 0.9 (i.e. the probability of detecting the trend), we would be able to detect a population decline of 25% or an increase of 28%. This is much  Table 5. Yearly abundance counts (photo-ID) and capture−mark−recapture (CMR) model estimates of yearly abundance (Ny) and the total population size (N ) of whale sharks at Mafia Island, Tanzania. The 95% confidence intervals (CI) are lower and upper control limits. Photo-ID has 2 columns: all encounters and only those from the November−December study period used in the CMR models. The rate of change (λ) from the Pradel models is compared to manually calculated population changes from estimates of Ny in the POPAN and Robust Design models. Change is calculated as the proportion of Ny in a season from the Ny in the previous season Fig. 3. Change in the abundance of whale sharks at Mafia Island, Tanzania, calculated from yearly abundance estimates (Ny) in POPAN and Robust Design models and the directly calculated rate of change (λ) from Pradel models. Linear regressions were not significant in all 3 panels higher than the 2−3% increase estimated in the CMR models, which had a power of 0.08−0.10. The power analysis simulations using our empirical PSE and starting Ny showed that large trends can be confidently detected in relatively short periods (Fig. 4A). For example, a 30% decrease in abundance estimates could be detected after 6 yr of observation. Smaller changes in abundance would take much longer to confidently detect; i.e. changes of 5−10% would not even be detected in 30 yr (Fig. 4A). The precision in Ny estimates influenced the study duration required to detect a significant trend (Fig. 4B).
Smaller PSE values resulted in detecting a large population trend of −30% faster than with larger errors. For example, the 6 yr of data collection it would take us to detect a 30% decline at Mafia Island is much shorter than the 16 yr it would take with a PSE derived from a whale shark CMR study in Nosy Be, Madagascar (Fig. 4B). Our empirical PSE derived from Ny estimates in RD models used throughout was lower than the comparative PSE derived from Ny estimates in POPAN models (PSE = 0.12) and from λ in Pradel models (PSE = 0.11).

Population structure
The population structure of whale sharks at Mafia Island in Tanzania was similar to most other aggregation sites globally, consisting of large juveniles with a strong male bias (Norman et al. 2017). In our study, TL ranged from 2.5−9 m, spanning only a part of the size range for the species (0.6− 20 m TL; Joung et al. 1996, Chen et al. 1997, and 86% of identified sharks were males. Globally, both small juvenile and large mature whale sharks are rarely encountered at coastal aggregation sites and are thought to mostly live in offshore areas (Hearn et al. 2016, Ramírez-Macías et al. 2017, indicating strong ontogenetic habitat segregation in this species. Furthermore, juvenile females are seen less than males at almost all aggregation sites (Norman et al. 2017), also demonstrating sexual segregation. These 2 types of segregation mean that population parameters derived from most aggregation sites, including those from the present study, only apply to the segment of a population found at that particular study site rather than re presenting the global-or basinscale functional population sensu stricto. Deriving overall population parameters for whale sharks will depend on resolving their entire life cycle and understanding their movements between habitats that vary with ontogeny and sex. Population parameters derived from a subset of a functional population are still useful, however. They allow characterisation of recruitment, survival, abundance and trends of the part of the species' life cycle that is accessible to observers. Trends in abundance are especially needed to assess the species' recovery from severe population decline (Pierce & Norman 2016), even if this index is most relevant to males. 242 Fig. 4. Power analysis simulations of (A) different trends in abundance using our empirical starting Ny and proportional standard errors (PSE), with the power of detecting a trend calculated for the hypothetical study duration, and (B) different PSEs reflecting different precisions in estimates of Ny. We used a 30% decreasing trend and calculated the power of detecting the trend for the hypothetical study duration. Horizontal line: significant power of 0.9

CMR study design
CMR studies require a constant study area, and parameter accuracy is improved with consistent data collection and high recapture rates (Pine et al. 2003). This is often difficult to achieve when visually monitoring free-ranging marine megafauna that are highly mobile, rare and infrequently seen, particularly for species that do not surface to breathe. Aside from the challenge of a segregated population, which is ubiquitous in whale shark studies, our study was appropriately designed for CMR modelling. Our survey area was small and consistent over time, with sharks seen in the same areas of Kilindoni Bay within the home range that was previously estimated using passive acoustic telemetry (Rohner et al. 2020). This concurrent passive acoustic study on a subset of the study population, comprising 51 tagged individuals, showed that few (5%) whale sharks present in Kilindoni Bay were missed by our photo-ID survey on a seasonal scale (Rohner et al. 2020). November and December were reliable months in which to find whale sharks, with respective means of 5.3 and 6.6 individuals seen per trip. We saw up to 49 different whale sharks in a day, demonstrating that large aggregations do occur, but these were rare. Fewer individuals in an aggregation make it easier to identify all individuals present, which is less the case where large aggregations of >100 individuals occur such as off Isla Mujeres in Mexico, and off Qatar (de la Parra Venegas et al. 2011, Robinson et al. 2013. It is an advantage for CMR studies when fewer sharks are missed. Also considering their high residency, and the relatively long 8 yr study period, this may be the best-case scenario for fitting CMR models to whale shark photo-ID data to date. The progressively worse goodness-of-fit test results for increasingly shorter subsets of our data further underline the importance of longer-term population monitoring.

High residency
Most whale sharks at Mafia Island were resighted in multiple seasons, demonstrating high levels of site fidelity. In addition, LIR modelling of the photo-ID data set (Prebble et al. 2018) and passive acoustic telemetry (Rohner et al. 2020) have both shown high residency of whale sharks in this area. These characteristics translated into high encounter rates, with an average of >10 encounters per identified shark. For comparison, encounter rates at other regional aggregation sites such as Nosy Be in Madagascar (3.4 ± 5.7;Diamant et al. 2021) and Praia do Tofo in Mozambique (2.7 ± 3.1; C. Prebble unpubl. data) are much lower. High residency and resighting rates mostly influenced capture probability in the CMR models, which was high overall at 0.8 in both JS parameterisations. In contrast, capture probability was much lower for whale sharks with lower residency at Ningaloo Reef (< 0.25; Meekan et al. 2006) and in Madagascar (range = 0.48−0.65; Diamant et al. 2021). At a finer temporal scale, the RD models had capture probability varying with month ranging from 0.4−0.75, again higher than monthly estimates from elsewhere (Maldives: 0.26−0.65; Davies et al. 2012). This monthly variation indicates that there may be some grouping among the whale shark population at Mafia Island, with different cohorts visiting our survey area at different times. Further research into social grouping at this site would be needed to test this hypothesis.
High residency also meant low turnover in the whale shark population at Mafia Island. The comparatively flat photo-ID discovery curve indicates that few new whale sharks enter the population over time. The probability of entry from the POPAN models was also low overall at 0.09 ± 0.01, in part because we accounted for transience in the basic JS models with ϕ ~ tsm. CMR models at other sites with higher transience would likely inflate the estimate of probability of entry. In the absence of a high proportion of transient whale sharks here, the low turnover underlines the slow growth and long generation time of whale sharks (Ong et al. 2020).

Influence of research techniques
Tagging or taking small biopsy samples did not negatively influence whale shark sightings over time. Biopsied whale sharks did not have a lower capture probablity or apparent survival than unbiopsied individuals compared across the 3 model types (POPAN, Pradel, RD). Similarly, tagged whale sharks did not have a lower capture probability or apparent survival than untagged individuals in the JS and RD models. We had a large sample size, with 30% of individuals tagged and 73% biopsied over the duration of this CMR study, and can therefore confidently state that these research techniques do not decrease whale shark survival or sightings at this study site.
There are several reasons why tagging and biopsy procedures would have no discernible effect on whale shark behaviour. First, tagging and taking a biopsy sample of whale sharks is swift, taking only a few seconds. Unlike many other shark species (Hammerschlag et al. 2011), whale sharks are not caught or restrained for tagging or sampling, meaning that there is minimal stress associated with the procedure. While there is no lasting effect of taking a biopsy sample, an external tag may stay attached to the shark for extended periods, up to 4.5 yr at this study site (Rohner et al. 2020). Longer-term effects of tagging do occur in other shark species (Manire & Gruber 1991). Biofouling can increase drag (Dicken et al. 2006) or even cause injury when tags are attached to potentially fragile structures such as the dorsal fin (Jewell et al. 2011). These factors likely play less of a role for whale sharks because they are physically large and the tag's dart is inserted into the exceptionally thick skin ) rather than the fin or the muscle. Second, wounds heal quickly in whale sharks (Womersley et al. 2021); small skin punctures from biopsy collection heal completely within 1 mo (C. Rohner pers. obs.). Third, whale sharks rarely react to biopsy removal or tagging, and if they do, they resume normal behaviour quickly. Similar results have been reported for whales that are also not restrained for tagging (Reisinger et al. 2014, Isojunno & Miller 2015. Our field observations suggest that feeding whale sharks either continue or quickly resume feeding after a disturbance. It is possible that disturbances may influence resighting rates more in other areas where whale sharks are not feeding. However, based on the current data, a long-term effect of research on apparent survival or capture probability seems unlikely for the species in typical contexts.
While it is clear that the research techniques did not have a negative influence on capture probability or apparent survival of whale sharks in our study, there are no ecological reasons why tagged and biopsied whale sharks should have a higher capture probability or apparent survival than untagged and unbiopsied individuals. This difference was significant for capture probability in the biopsy group in both JS models, and for apparent survival in the biopsy and tag groups in all 3 model types. This result is likely to be an artefact of our study design. Biopsied whale sharks had a mean of 14.2 encounters shark −1 , compared to just 2.7 encounters per unbiopsied individual, with a similar trend for tagged and untagged individuals. Sharks that are often resighted, and thus have a high capture probability and apparent survival, also have a higher likelihood of being included in the tagging and biopsy studies. This is particularly true for biopsy sampling, which occurred throughout the CMR study period.

Influence of sex and size
The natural covariates of sex and size had little influence on parameter estimates overall. Capture probability did not differ between the groups for either covariate in all 3 model types, which shows that there was no heterogeneity in capture probability between females and males or between the 3 size classes. The probability of entry was also not different for either covariate in the POPAN model. We might have expected smaller sharks to have a higher probability of entry than medium and large sharks, indicating that newly recruited individuals are smaller on average, but that was not observed. Similarly, at Ningaloo Reef, size was not a supported covariate for probability of entry in open RD models, although a recruitment parameter in Link-Barker models ap plied to a small subset of sharks seen in multiple years did vary with size (Holmberg et al. 2009). It is possible that a correlation between shark size and probability of entry was masked because our study duration was short compared to the estimated ~25 yr generation time of whale sharks (Pierce & Norman 2016). Alternatively, recruitment at aggregation sites may in clude large juveniles of any size, as long as they find the small ephemeral feeding areas (Rohner et al. 2013a) within their ocean-wide habitat.
Apparent survival varied with sex and size in all 3 model types, except sex in the Pradel model, with female and small sharks having a lower apparent survival than male, medium and large sharks, respectively. The main caveat with interpreting apparent survival is that it is confounded with permanent emigration. It is thus unresolved whether females have a higher mortality rate than males or whether they rather are more transient and permanently leave the aggregation. The higher proportion of individuals seen in only one season for females (48%) than males (31%) suggests that different levels of transience may play a larger role in explaining the variation in apparent survival than differences in mortality. What is clear, however, is that males remain in the aggregation at a higher rate than females. This may contribute to the observed sexual segregation at most whale shark aggregation sites (Norman et al. 2017). We might have expected larger sharks to have a lower apparent survival due to them leaving the aggregation when reaching maturity (Norman et al. 2017, but this was not the case, likely because most individuals were large juveniles. Small juveniles and neonates likely have a lower apparent survival due to higher mortal-ity, but this size class was also not available in our study. Our estimated values of apparent survival (~0.8) were similar to those at other aggregations also frequented by mostly large juvenile whale sharks (Holmberg et al. 2009). Resolving ontogenetic variation in apparent survival and mortality will depend on estimates from these so far rarely seen size classes.

Population size and trend
The whale shark population at Mafia Island was small, with 201 unique individuals identified to date. Many other aggregation sites for the species have larger identified populations, including those nearby in Mozambique and Madagascar, although only ~13 000 individual whale sharks have been identified globally (www.sharkbook.ai). The estimated superpopulation size was 208 individuals in the POPAN models, only slightly larger than the 201 individuals identified overall at Mafia Island. Even when considering that the CMR input data set was smaller at 185 whale sharks, due to a restricted study period, the total population size is small and was estimated to be only ~12% larger than the number of identified whale sharks. This is in stark contrast to other whale shark aggregation sites. CMR model estimates of the whale shark superpopulation size were 75−270% larger than the number of identified individuals at Ningaloo Reef (n = 159; Meekan et al. 2006) and 69% larger in Madagascar (n = 405; Diamant et al. 2021). Additionally, abundance estimates at Mafia Island had high precision, benefiting from low levels of transience and high capture probabilities. The relatively small difference between identified and modelled numbers, the flat discovery curve, the high estimate of capture probability and the low estimate of probability of entry all show that many whale sharks at Mafia Island stay in this aggregation for long periods -up to 8 yr at least -and few new individuals join the aggregation. This again underlines the importance of local management in aggregation sites where whale sharks spend a disproportionately long time compared to their wide overall distribution.
The constant population trend estimated in the Pradel models and the RD and POPAN estimates of yearly abundance demonstrate that this whale shark population is neither increasing nor decreasing, at least over the 8 yr study period. The power analysis corroborated this result. Bycatch mortalities from net fisheries are occasionally reported in Tanzania, and scars indicate that boats hit the sharks (Rohner et al. 2020). With the low entry to the study population observed, even a comparatively low level of threatwith single digits of whale shark mortalities in fisheries reported per year, generally through social media posts -may be enough to keep this population from increasing or to greatly slow recovery. The dense food available for whale sharks at this site (Rohner et al. 2015a) suggests that environmental carrying capacity is unlikely to be a restriction. However, it is important to consider the small spatial scale of our study compared to the large genetic source population from which new recruits may come. Over an evolutionary time scale, genetic studies indicate whale sharks comprise 2 major subpopulations: the Atlantic and Indo-Pacific (Vignaud et al. 2014). Recent work indicates that whale sharks at Mafia Island are genetically indistinguishable from those in the Red Sea (Hardenstine et al. 2022). Further examination of regional population trends, particularly if a more representative sample of female sharks can be obtained, will he lp clarify whether recruitment at aggregation sites varies, or if mortality rates are still too high for the species to recover.

The need for long-term population monitoring
Power analysis simulations showed that the magnitude of the population trend, the study duration and the precision of the yearly abundance estimates in CMR models all influence the ability to detect a trend. Our empirical data would have been able to detect a large population trend of ± 30% within our study period, if it existed. However, to detect a smaller decrease in abundance of 20% we would require a 13 yr time series, while detection of <10% change in abundance would necessitate a study length exceeding 30 yr. This suggests that a precautionary approach to whale shark management is necessary, as declines in small populations are hard to detect and take a long time to demonstrably reverse even with substantial monitoring and conservation effort (Taylor & Gerrodette 1993). This is despite the low mean of the PSE (0.07) and the high precision of abundance in our RD estimates. Other, more transient whale shark aggregations will have lower recapture rates and thus a lower precision when estimating abundance, and likely have an even higher (minimum) detectable rate of change (Gerrodette 1987). One way to overcome the low power at a single aggregation site is to replicate this CMR approach at other aggregation sites and increase power with a meta-analysis of multiple studies. Such a collaborative global effort is certainly possible for whale sharks (Norman et al. 2017).
We also suggest a combination of approaches to guide management and achieve population recovery for the species. Sightings trends that take environmental factors into consideration can be effective (Rohner et al. 2013b) and may be an alternative in large aggregations where abundance is estimated rather than each individual identified. Demographic models can estimate the maximum intrinsic population growth rate, and recent advances in our understanding of their reproduction ) and growth (Ong et al. 2020) are likely to reduce the large uncertainty in some of the input parameters (Bradshaw et al. 2007). Extensions of CMR models should be explored for use in whale shark studies, as they have improved model fit and accuracy of parameter estimations in some cetacean studies. In particular, the more transient aggregations with lower resighting rates may benefit from these improved models. Multistate CMR models, in which individuals can occupy either observable or unobservable states and transition between these states over time, allow for heterogeneity in capture probability resulting from temporary emigration to be incorporated in the models (e.g. Chabanne et al. 2017, Boys et al. 2019. The application of Bayesian versions of CMR models (Rankin et al. 2016) may also prove useful for addressing temporary migration and individual heterogeneity in recapture probabilities, improving model fit and the reliability of parameter estimates. The results from our CMR models show that even low levels of threats and mortality can prevent whale shark populations from recovering. Precautionary management and elimination of anthropogenic mortality is thus recommended. By combining CMR models with some of these other methods and applying them to more aggregation sites globally, we will be able to better track the trends in abundance of the world's largest fish.