Quantifying annual spatial consistency in chick-rearing seabirds to inform important site identification

Animal tracking has afforded insights into patterns of space use in numerous species and thereby informed area-based conservation planning. A crucial consideration when estimating spatial distributions from tracking data is whether the sample of tracked animals is representative of the wider population. However, it may also be important to track animals in multiple years to capture changes in distribution in response to varying environmental conditions. Using GPS-tracking data from 23 seabird species, we assessed the importance of multi-year sampling for identifying important sites for conservation during the chick-rearing period, when seabirds are most spatially constrained. We found a high degree of spatial overlap among distributions from different years in most species. Multi-year sampling often captured a significantly higher portion of reference distributions (based on all data for a population) than sampling in a single year. However, we estimated that data from a single year would on average miss only 5 % less of the full distribution of a population compared to equal-sized samples collected across three years (min: (cid:0) 0.3 %, max: 17.7 %, n = 23). Our results suggest a key consideration for identifying important sites from tracking data is whether enough individuals were tracked to provide a representative estimate of the population distribution during the sampling period, rather than that tracking necessarily take place in multiple years. By providing an unprecedented multi-species perspective on annual spatial consistency, this work has relevance for the application of tracking data to informing the conservation of seabirds.


Introduction
The accurate estimation of the spatial distributions of animal populations is important for understanding patterns of resource use and demographic change, as well as for informing biodiversity conservation and management (Hays et al., 2019). The at-sea distributions of many species of marine megafauna have in recent decades been revealed using data from animal-borne tracking devices (Bernard et al., 2021;Hussey et al., 2015). To ensure population-level inferences are robust and spatial management is properly targeted, it is vital to consider how representative is a tracking dataset of the movements of the entire population across both space and time (Shimada et al., 2020).
In the marine realm, tracking data have been used widely to inform conservation and management Hays et al., 2019;Hindell et al., 2020). Population-level spatial distributions derived from tracking data have contributed to assessments of the impacts of threats at sea, such as incidental mortality (bycatch) in fisheries, overfishing, and resource extraction (Clay et al., 2019;Garthe et al., 2017;Grémillet et al., 2016;Queiroz et al., 2019). Sites contributing to the global persistence of species, such as Key Biodiversity Areas (KBAs), can now be identified for a wide diversity of marine taxa using tracking data and novel analytical tools (Beal et al., 2021). It is important to consider, however, that if key sites are identified using tracking samples that do not fully encompass natural variability in space use, the resulting borders may not adequately represent the areas on which a population depends, potentially increasing exposure to risks elsewhere (Lovvorn et al., 2014).
The importance of tracking sufficient individuals to capture a stable picture of population-level space use has received considerable attention (Gutowsky et al., 2015;Hindell et al., 2003;Shimada et al., 2020;Soanes et al., 2013). However, another aspect pertinent to assessing the representativeness of a tracking sample is the potential for a population to change distribution from year to year. A number of studies have investigated the importance of annual (among-year) variability in seabird foraging areas, reporting both minimal and substantial shifts (Bogdanova et al., 2014;Cerveira et al., 2020;Fromant et al., 2021;Meier et al., 2015;Osborne et al., 2020). Global standards for identifying important sites for biodiversity, such as the KBA Standard, account for annual variation by setting minimum thresholds for the number of years of distribution data that are required from the population to delineate important sites (e.g., three years for KBAs; KBA Standards and Appeals Committee, 2020). However, whether such universal thresholds are appropriate for different species and environmental contexts is uncertain. Given the substantial financial and labor costs involved in tracking animals in remote locations, understanding whether multi-year sampling is necessary to identify stable sites of importance can help ensure that the potential of available tracking data for informing conservation is realized (Canessa et al., 2015;Williams et al., 2020). To date, few studies have compared annual consistency in population-level space use between seabird species, limiting our understanding of the relative importance of multi-year sampling for informing conservation planning (but see Arcos et al., 2012;Carpenter-Kling et al., 2020;Evans et al., 2021).
Here, we analyzed movement tracks from 23 seabird species to investigate the importance of annual variability in space use for areabased conservation. We analyzed the distributions of each species during chick-rearing, as this is the time of year when seabirds are most spatially restricted and therefore when area-based management measures can be particularly effective (Oppel et al., 2018). Indeed, tracking data collected during chick-rearing are often used to identify priority areas for conservation Handley et al., 2021;Heerah et al., 2019). To estimate the average degree of consistency across years for each species, we quantified the spatial similarity between distributions from different years, and explored whether taxonomy, type of foraging habitat, and latitude could explain inter-specific variation therein. We investigated the influence of sampling regimes on important site identification by performing two resampling procedures. First, we held the number of tracked individuals constant and quantified the degree to which tracking across multiple years can provide a fuller representation of the population distribution compared to a sample from a single year. Second, for each species, we varied both the number of tracks and the number of years to understand the contribution of each sampling level to the population distribution. By assessing the relevance of spatial consistency across years for identifying important sites for seabirds differing widely in morphology and lifestyle, our results inform the design of future tracking studies and support ongoing efforts to improve area-based conservation planning at sea.

Study species and data assembly
We compiled Global Positioning System (GPS) tracking data using 1 Present address. several selection criteria to ensure comparability across species and sufficient sample sizes to allow for rigorous testing of sampling effects (Table 1). We solicited datasets meeting the following criteria: at least four years of GPS data with a minimum of 10 birds tracked in each year, where all birds were tracked from the same breeding colony and during the chick-rearing stage (alias 'chick-provisioning'). In total, we collated tracking data for 23 species, representing seven of 14 families, and four of seven orders of seabirds. Collated tracking data came from 3 to 6 different breeding years for each colony, and sampled years were 1-17 years apart (median difference 3 years; Table 1). For two species, Australasian gannet (Morus serrator) and little penguin (Eudyptula minor), we had sufficient data from two colonies; for all other species, we analyzed data from a single colony (Table 1). When producing population distributions (see Population distribution estimation section), we treated each species-colony dataset separately, and then took the mean across colonies when comparing species-level metrics (e.g., annual overlap). Although all datasets were from breeding adults during chickrearing, the relative timing in terms of the age of the brood (i.e., whether adults were brooding chicks or not) differed in some cases between species (see Table S2 for breeding-stage coverage). Nevertheless, for 104 out of 106 annual datasets (98 %), tracking was initiated during the first half of the chick-rearing period.

Tracking data standardization
We cleaned and filtered tracking datasets for each species and study colony to improve comparability. First, we applied speed filters, with thresholds set to include only biologically realistic travel speeds (Table S1; Adams and Flora, 2010;Baylis et al., 2019). Then, using the R package track2KBA (Beal et al., 2021), we split tracks from individual birds into discrete foraging trips, which we defined as periods of a minimum duration spent outside a spatial buffer around the breeding Table 1 Summary of GPS-tracking data used in this study to analyze annual consistency in the space use of seabirds. 'n years' indicates the number of years in which tracking data was collected, over a range of study years (i.e., 'Year range'). 'n birds' refers to the median number of individuals tracked per year and 'n trips' to the median number of foraging trips recorded per year, with the ranges shown in parentheses. Species are arranged alphabetically by taxonomic order (Charadriiformes to Suliformes) and family. colony. Tracking data for each population were inspected to set an appropriate buffer radius that would exclude GPS-locations in the vicinity of the colony, likely representing periods spent at the nest, resting on land, or rafting nearby (Fig. S3). To further standardize comparisons and reduce the effect of extended tracking of some species and not others, we only analyzed data from trips initiated within the first two weeks of device deployment for each individual bird. We used a threshold of two weeks to reduce any effects of advancing season on space use and to minimise the likelihood of including data for the same individual when its breeding stage or status had changed (e.g., as it transitioned from early to late chick-rearing).
Tracking data were originally collected at varying sampling intervals (Fig. S1), so to improve comparability, we regularized data to a 10 min interval via linear interpolation using the R package adehabitatLT (Calenge, 2006). Although many year-datasets were of a higher temporal resolution, we used a standard 10 min interval to ensure that most datasets (102 of 105 [98 %]) had a ratio of interpolated to raw points below 2:1. European shag (Gulosus aristotelis) and pelagic cormorant (Urile pelagicus) perform short foraging trips and therefore interpolation to 10 min would result in too few location estimates (<10) to run kernel density estimation for >5 % of all foraging trips; to avoid losing these shorter-duration trips, we instead interpolated data for these species to 5 min intervals.

Population distribution estimation
To estimate space use, we used kernel density estimation (KDE) to derive utilization distributions (UD) for each foraging trip (hereafter referred to as 'trip UDs'), which is a method often used when analyzing tracking data to inform conservation (Beal et al., 2021;Lascelles et al., 2016;Soanes et al., 2016). Determining an appropriate smoothing parameter, or bandwidth, is an important step in KDE, as it determines the scale at which the data points are smoothed. When comparing species that move at similar scales, authors often recommend using the same parameter value, to avoid introducing differences as an artifact of processing (Carneiro et al., 2020). However, the movement scales of the species in this study ranged from <5 km to >700 km in terms of maximum range from the colony. Therefore, applying a standard smoothing value across species would either over-smooth data at a scale larger than the maximum range, or under-smooth the data and result in no overlapping use-areas. To achieve a similar degree of smoothing for each species, and thereby make overlap estimates comparable, we calculated the reference smoothing parameter (href), which reflects the number of positions and their spatial variance in the X and Y directions (i.e., longitude and latitude) and is a typical smoother used for identifying important sites for biodiversity (Beal et al., 2021). To check for smoothing values that were outliers (and thereby result in inflated or underestimated distributions), we fitted a second-order polynomial function to smoothing value vs. species rank, ordered by foraging range (calculated as the median of the maximum distance from the colony for each trip). Next, for species with an href value that deviated >5 km from the value predicted by the model, the prediction was used instead of the actual values to set the smoothing parameter systematically, i.e., relative to the ranked scale of movement of each species (Fig. S2). We assessed whether the parameter selection process imposed a pattern on the results by including the smoothing parameter in a linear mixed-model framework along with the various factors that might explain annual consistency as predictors (see Section 2.4.1, Fig. S7).
For each species and colony, we produced single-year population distributions by averaging together, with equal weighting, trip UDs from all birds tracked in each year ( Fig. 1, Fig. S3). If multiple trips were available for each individual, we randomly selected a single trip UD per individual and year. This process of selecting a trip per individual was re-iterated in each analysis to maximize usage of available information while accounting for potential pseudoreplication (see Overlap analyses section for details; Lascelles et al., 2016). We then generated multi-year reference distributions (hereafter the 'reference distribution') for each species by averaging together single-year distributions across years (range 3 to 6 single years). We averaged single-year distributions (i.e., took the mean of grid cell probability densities), rather than combine all trip UDs from across the multi-year dataset, to avoid years with higher sample sizes contributing disproportionately to the shape of the reference distribution (Fig. 1).

Overlap analyses 2.4.1. Annual consistency
To estimate the degree of annual consistency in space use for each  (Table 2). Overlap was calculated using the Bhattacharyya's Affinity (BA) and Volume of Intersection (VI) indices, which both provide probabilistic measures of UD similarity, ranging from 0 (no overlap) to 1 (identical UDs) (Fieberg and Kochanny, 2005). We report the mean BA index values for the main analysis, as this is the recommended index for comparing UD similarity (Fieberg and Kochanny, 2005). As VI integrates over the minimum probability density of each cell between the two UDs being compared, overlap values are generally lower than BA (Kochanny et al., 2009). We used the mean VI overlap values to validate whether the relative differences between species were sensitive to the chosen index. The trip UDs contributing to each single-year distribution were randomly re-sampled across 100 iterations (i.e., one trip selected per individual per iteration), to ensure that different trips from a given individual would be included, and pairwise annual overlap calculated. For datasets with fewer than 50 unique combinations of trip UDs, unique sets of UDs were determined for each iteration (e.g., with only 1 trip per bird, only 1 iteration was run). We then calculated the mean overlap for each pairwise comparison across iterations (Fig. S4), and plotted mean annual overlap by species, family and foraging habitat to illustrate intraand inter-specific variation in consistency. We classified species as foraging in predominantly 'shelf', 'oceanic' or 'mixed' habitat by inspecting the distribution of foraging trips of each population overlaid on bathymetry (see Supplement for details and Table S1 for classification).
We fitted a linear mixed-effects model using the R package lme4 (Bates et al., 2015) to evaluate which predictor variables explained significant amounts of variation in mean overlap among years, with variables either representing (1) experimental design choices (i.e., smoothing parameter definition, time lags), or (2) biological factors (habitat type, latitude, taxonomy). As we lacked sufficient data to properly control for phylogeny in this analysis, the purpose was not to predict distributional consistency across seabirds in general, but to explain the variation in our dataset. We specified random intercepts for each species and site to account for foraging patterns specific to species or locations, and evaluated the following fixed factors: smoothing parameter, lag (time difference between single-year distributions), taxonomic family, foraging habitat (oceanic, shelf, mixed), and foraging latitude (centroid of at-sea locations). The full model specification and results of this analysis are located in the Supplementary Materials.

Single-year vs. multi-year sampling
To test whether sampling across years provides more robust estimates of population distributions than sampling in a single year, we compared the spatial coverage of reference distributions (i.e. distributions of each population estimated using data from all individuals and years) with samples of equal size (i.e., the same number of individuals) drawn from varying numbers of years (Table 2). For this analysis, we limited comparisons to a maximum of three years to ensure comparability across species. We iteratively re-sampled n trip UDs from one, two, or three years, and averaged them together to form a sample distribution. The number n was determined by the third-highest number of individuals tracked in any given year (m) for each species, and was set at n = m − 2 to ensure that many more than m combinations of trip UDs were possible. For example, if a species had four years of data, with yearly sample sizes of 10, 12, 14, and 16 birds, then the third highest yearsample would be m = 12 individuals, and n = 10 trip UDs would be drawn per iteration, allowing for a total of 66 unique combinations of trip UDs to be drawn. The third-highest sample size was used (as opposed to the first-or second-highest) to ensure that at least three different years could contribute data when drawing data from a single year. In each iteration, we calculated the percentage of the 50 % and 95 % UD areas of the reference distribution covered by sample UDs of the same level (i.e., 50 % and 95 % sample UDs).
For this analysis, we calculated simple spatial overlap (i.e., directional measure of the percent spatial coverage of the reference distribution by a sample distribution) to ensure the effect sizes were interpretable in terms of area, which is more useful for the practicalities of management than probability values (e.g., BA or VI indices). Sample distributions were generated and overlap calculated over 100 iterations, wherein trip UDs were randomly re-sampled in each iteration. We used Tukey HSD post-hoc tests (of ANOVAs) to test whether sample distributions based on data drawn from a single year covered a different amount of the reference distribution compared with distributions based on samples from two or three years. We report the mean model effect size (i.e., average difference in the percentage of the reference distribution covered by two and three years of data) for species-years, which represents the predicted spatial information gained by sampling a Table 2 Analysis workflow table illustrating the three analyses of spatial overlap conducted in this study. Each column corresponds to an analysis based on the overlap of spatial distributions of 23 species of GPS-tracked seabirds during the chick-rearing period. In each analysis, pairwise spatial overlap was calculated between distributions using either a probabilistic (Bhattacharyya's Affinity and Volume of Intersection) or Euclidean (% coverage) metric of overlap. In the Annual consistency analysis, single-year distributions of each population were compared, each of which included foraging trips from all birds tracked in each year. In the Single-year vs. multi-year sampling analysis, sample distributions were created by drawing varying numbers of trips from the full tracking sample and then overlaid on the multi-year reference distribution of each population to assess the degree of coverage. In N tracks vs. N years, both the number of trips and number of years was varied when forming sample distributions, which were again compared to the reference distribution (based on data from all birds and years). In all three analyses, spatial distributions (single-year, sample, and reference) were estimated by randomly selecting a single foraging trip from each individual bird, running kernel density estimation with the selected trips, and repeating this process over 100 iterations.

Annual consistency
Single

Number of tracks vs. number of years
We varied both the sample size (i.e., number of birds tracked) and the number of years from which the sample was drawn to illustrate the relative importance of each sampling level for capturing the multi-year reference distribution (Table 2). We used the same re-sampling procedure as in Single-year vs. multi-year sampling to calculate percentage of overlap between sample and reference distributions, and iterated the process 100 times at each sample size to include different trip UD combinations. The maximum number of birds drawn for a species was again capped at n = m − 2, but in this case m was set at the smallest sample size available among year-samples to be able to visualize the relationship up to the maximum number of years available. For example, for a species with four years of data and yearly sample sizes of 10, 12, 14, and 16 birds, sub-samples up to 8 birds would be iteratively selected, averaged together, and overlapped with the reference distribution. Both here and in Single-year vs. multi-year sampling the proximity to 100 % coverage of the reference distribution is a measure of sample representativeness. However, the focus of these analyses is rather on the difference in coverage of the full distribution provided by samples of varying sizes (i.e., 1 to n birds) and from differing numbers of years (i.e., 1 to N years ), as this indicates whether annual variability influences the estimation of population distributions.
We visualized the percent coverage relationship for each species and site to provide guidance on the information gained by tracking populations across a varying number of years. We present three examples from species which showed contrasting degrees of consistency and provide the results for the remaining species in the Supplement.

Annual consistency
Estimates of annual consistency measured by BA overlap among single-year distributions were generally high ( Fig. 2A), with 22 of 23 species falling between 0.69 (mean, SD 0.1) for Pelagic Cormorant (Urile pelagicus), and 0.93 (mean, SD 0.02) for Chinstrap Penguin (Pygoscelis antarcticus); Common Diving Petrel (Pelecanoides urinatrix) was an outlier, with a mean consistency estimate of 0.56 (SD 0.27) ( Fig. 2A). As expected, overlap values calculated using the VI index were lower, with species-level mean consistency estimates ranging from 0.35 (SD 0.22) to 0.74 (SD 0.06) (Fig. S5). The species ranks were very similar between the BA and VI indices, with a significant positive correlation in species ranks ordered by mean overlap (Spearman rank correlation: S = 60, rho = 0.97, p < 0.001, Fig. S6).
Our GLMM exploring which factors explained variation in annual consistency showed that the smoothing parameter did not affect the population-level estimate of consistency (Fig. S7, Table S3). Time lag between years was negatively related to consistency; however, the effect size was marginal relative to within-and among-species variation at the time scales (1-4 years) analyzed for most species (Fig. S8). Despite differences between families in terms of consistency (Phalacrocoracidae significantly lower than Spheniscidae, Table S3), within-family variation (i.e., combination of inter-and intra-specific components) was greater than between-family variation (Fig. 2B). There was no effect of the predominant foraging habitat type on annual consistency (Fig. 2B, Table S3). Foraging latitude was positively related to consistency; however, the effect size was also marginal and the latitudinal range covered within most families was limited (Supplementary Methods, Fig. S9, Table S3).

Single-year vs. multi-year sampling
Using ANOVA post-hoc tests, we assessed whether distributions derived from samples of the same number of individuals drawn from a single year or multiple years differed in the percent coverage of the multi-year reference distribution. We found that multi-year sample distributions covered a significantly higher percentage of reference distributions in 17 and 16 species (n = 23 species) for 95 % and 50 % UD areas, respectively (Fig. 3, Table S4). For 95 % sample UDs, the mean difference in coverage across species was 5.0 % and ranged from a species mean of − 0.3 % in masked boobies (Sula dactylatra) to 17.7 % in common diving petrels. For 50 % sample UDs, the mean difference in coverage was 6.1 %, and ranged from − 1.0 % in common murre (Uria aalge) to 21.7 % in common diving petrels (Table S4). Three species, wandering albatross (Diomedea exulans), common murre and streaked shearwater (Calonectris leucomelas) showed no significant difference in coverage between single-year and multi-year samples for either 95 % or 50 % sample UDs, and in six other species, Laysan albatross (Phoebastria immutabilis), Buller's albatross (Thalassarche bulleri), chinstrap penguin (Pygoscelis antarcticus), Westland petrel (Procellaria westlandica), masked booby (S. dactylatra) and red-footed booby (S. sula), there were only differences for either the 95 % or the 50 % sample UDs (Fig. 3, Tables S4-S5), but not both.

Number of tracks vs. number of years
For all 23 species, increasing the number of birds tracked increased the coverage of the multi-year reference distribution (Fig. 4, see supplementary Figs. S10-S31 for all species). Sample distributions based on the smallest sample size available among years for each species covered Fig. 3. Comparison of the degree to which tracking samples from differing numbers of years capture full multi-year population distributions for 23 seabird species. For each species, a certain number of foraging trips, indicated by the number above the x-axis, were re-sampled 100 times. In each run, a sample distribution was derived, by averaging together the selected trip utilization distributions, and overlapped with a reference distribution made from all tracked birds and years of tracking data. Tukey HSD post-hoc tests of ANOVAs were used to test whether sample distributions from a single year (N year = 1) differed in their proportional cover of the reference distribution compared to samples drawn from multiple years (N year = 2 or 3); bold text at the top of each panel indicates where at least one test comparison was significant (1-2 y or 1-3 y), with the value indicating the mean effect size between comparisons. Species were ranked according to the mean effect size for the 50 % utilization distribution. Panels correspond to the coverage for 95 % (upper) and 50 % (lower) utilization distributions. a mean of 62.8 % (min = 45.9 %, max = 79.6 %, n = 23) and 66.3 % (min = 40.3 %, max = 86.6 %, n = 23) of reference distributions based on all tracks for a species, for 95 % and 50 % UDs respectively. Additionally, for many species, the relationship between sample size and coverage appeared to begin levelling off, indicating that the sample sizes in our datasets captured a large portion of the population-level spatial distribution (Figs. S10-S31). Increasing the number of years in which the population was tracked added spatial information for most species (in addition to that gained by tracking more birds), but lacked a marked gain in some (Figs. S10-S31). As indicated by the estimates of average spatial overlap between years ( Fig. 2A), the relative importance of sampling across years differed among species, with the benefit of multiyear samples most apparent for common diving petrels, and negligible for Northern gannets (M. bassanus), highlighted here to show the variation among species (Fig. 4)

Effects of study duration
Using GPS-tracking data, we estimated the spatial consistency of 23 seabird species during the chick-rearing period to assess the relevance of sampling across years for identifying important sites for conservation at the population level. Most species showed a similarly high degree of average consistency in their distributions between years. We also found that the number of individuals tracked had a large effect on the estimation of population distributions, supporting previous findings in seabirds (Gutowsky et al., 2015;Soanes et al., 2013;Thaxter et al., 2017). By contrast, the time lag between years, taxonomic family, and latitude explained only small amounts of variation in annual consistency among the populations studied here. We found that tracking seabirds in multiple years generally improved estimates of population-level distributions, although in most cases there was only a marginal loss of information about space use by the population if tracking data were only available for one year. These results indicate that, tracking samples deemed representative of the wider population during the sampling period can be useful for informing the area-based conservation of seabirds, even if only collected in a single year.
When using tracking data to identify important sites for seabirds, our results suggest that a key consideration is whether the sample of tracked birds is representative of the population distribution during the sampling period. If the tracking data available for a population are limited to one or two years, we further recommend using independent information to assess whether conditions were typical of the region and season. Important sites identified from data collected during periods of abnormal conditions have relevance for conservation, as they may represent places used when regular feeding areas are unprofitable (Bogdanova et al., 2014). However, in such cases, tracking data from additional years will likely be needed to also identify areas used under typical environmental conditions.

Temporal and spatial scale
The datasets we analyzed were collected in 3 to 6 different years, sometimes in sequential years and in other cases collected >10 years apart (Table 1, Fig. S4). Climatic cycles, such as the El Niño-Southern Oscillation, often operate at a decadal scale and are known to affect the spatial distributions of marine predators and their prey (Ballance et al., 2006;Philander, 1983). Therefore, it is possible that the sample of years for which we had tracking data for our study populations were Fig. 4. Sampling effects on estimation of population-level distributions for three species of seabirds (left to right: common diving-petrel, grey-headed albatross, Northern gannet). By re-sampling the number of tracks (i.e., one per bird) and the number of years from which tracks were selected, the functional relationship between sample size and annual sampling was estimated for each species. The process of re-sampling tracks was iterated 100 times, and in each iteration the percentage to which the 95 % (top row) and 50 % (bottom row) probability areas of the resulting sample distribution covered the same quantile areas of a multi-year reference distribution was calculated. Reference distributions were the average of all single-year distributions derived using the full samples for each species. Points signify the mean spatial coverage across iterations, and bars denote the standard deviation. insufficient to fully capture distributional changes in response to environmental cycles. To identify links between at-sea distribution, environmental conditions, and demographic responses, it is necessary to monitor populations for the length of a generation, at minimum (Ventura et al., 2021). Indeed, as tracking studies begin to extend over decadal scales, they are revealing responses of populations to shifts in climatic conditions (Bogdanova et al., 2014;Clark et al., 2021;Weimerskirch et al., 2014). Although such long-term monitoring is clearly useful and desirable, conservation is often constrained financially, by the duration of opportunities to implement effective management, and by the imminent nature of threats (Bolam et al., 2019). As such, pragmatic approaches are often necessary to help decide how much tracking data are sufficient to inform decision-making (see Section 4.3).
A number of studies conducted at timescales comparable to those explored here identified shifts in space use at the population level in response to oceanographic variability (Bogdanova et al., 2014;Evans et al., 2021;Osborne et al., 2020). Indeed, the dataset we analyzed from common diving petrels was used to illustrate the effects of a marine heatwave on breeding success and at-sea distributions, explaining the low consistency and large effect of adding years of data we report for this species (Fromant et al., 2021). This specific case illustrates how tracking data collected over just a few years can provide important information about the areas used during stressful climatic events, when effective protection may be particularly important for populations (Bogdanova et al., 2014). In contrast, for black-legged kittiwakes we found only small changes in space use across four years for a population in England, during the same period in which a population in the Gulf of Alaska shifted their distribution to largely new areas in response to a local heatwave (Osborne et al., 2020). These contrasting patterns within the same species suggest that the spatial consistency of seabird populations can vary more due to local environmental conditions than taxonomy or general foraging behavior.
Spatial scale is a key parameter to consider in analyses of overlap (Winner et al., 2018). Here, we determined appropriate scales of analysis using a standard approach recommended for identifying important sites for biodiversity (i.e., the 'href' method; Beal et al., 2021), adapted to ensure comparability among species. Nevertheless, it is important to recognize that our results correspond to species-specific scales, and are thereby not universal reflections of the spatial consistency of seabirds in general. For analyses aimed at identifying important sites for a single population, other factors should also be considered, such as the size of local management units, potentially requiring the use of different smoothing parameter values or grid cell sizes than those we employed (Soanes et al., 2015). Further, for investigations focused on ecological questions, analyses at finer scales may uncover subtle shifts in space use at the population level Warwick-Evans et al., 2016).

Three-year rule for KBA designation
As area-based management tools are being increasingly integrated into global marine conservation policy (De Santo, 2018), the criteria used to designate sites as important for biodiversity are becoming standardized across countries and taxa (e.g., the KBA program; IUCN, 2016). Our results for chick-rearing seabirds suggest that setting standard thresholds of the number of years of distribution data needed to identify sites offers little advantage in many cases, particularly where a representative sample of individuals has been tracked in typical conditions. Given the high expense involved in tracking marine animals in remote locations, the requirement to track a population in three years may be impractical, and ultimately to the detriment of conservation if important sites might otherwise have been identified from data from one or two years, and protective measures put in place sooner.
Sites identified for their importance to biodiversity are meant to be re-assessed over time (e.g., every 8-12 years for KBAs). This adds to the costs involved in identifying important sites by requiring three years of data be collected over a relatively short period and the process repeated roughly every decade. Although standard thresholds, such as the KBA 'three-year rule', are designed to ensure temporal robustness of sites, more flexible solutions might better enable sites to be identified where they are needed, not just in countries where there is funding for longterm studies. For example, instead of requiring a certain number of years of tracking data be used, site identification guidelines could incorporate estimates of the representativeness of tracking samples at the population level, or the vulnerability of a population to extreme weather events such as heatwaves. Existing tools, such as the R packages track2KBA and SDLfilter, can be used to assess the representativeness of a tracking sample for the period in which it was collected (Beal et al., 2021;Shimada et al., 2020). Further, when the available tracking data are limited to one or two years, data on breeding success, or climatological and oceanographic information could be used to evaluate whether conditions were anomalous. In practice, this could range from basing assessments on expert judgement to combining various data sources in an integrated analytical framework (Ventura et al., 2021).

Recommendations for future studies
We analyzed tracking data from seabirds collected during the chickrearing phase; therefore, a useful extension of this work would be to analyze distributions during other phases, including the pre-laying, incubation and non-breeding periods, when the movements of many species are less constrained (Phillips et al., 2017). Lower movement constraints outside chick-rearing mean birds can travel further in search of food, potentially reducing spatial consistency between years. If spatial consistency varies across the annual cycle, then differing sample sizes of tracked birds would be needed to achieve representativeness, and ultimately to identify important sites (Carneiro et al., 2020).
Globally, tracking effort in seabirds is biased toward large-bodied and high-latitude species (Bernard et al., 2021;Mott and Clarke, 2018). Therefore, it is also important to include under-studied taxa (e.g., storm-petrels and terns) and regions (e.g., the tropics) in future analyses of spatial consistency in seabirds. The combination of datasets we analyzed contained insufficient species contrasts within and across taxonomic families to properly account for phylogenetic effects, limiting our ability to robustly identify general predictors of annual consistency in seabirds. As multi-year tracking data become available for more species, life-history classes and stages in the annual cycle, it will become possible to examine the effects of ecological drivers and seasonality in movement patterns in conjunction with intrinsic factors such as sex, age, status and breeding population (Carneiro et al., 2020;Jovani et al., 2016;Phillips et al., 2017). Finally, as tracking studies begin capturing decadal scales of movements, it will be important to identify the degree to which important at-sea sites remain relevant in the face of long-term climatic shifts (Weimerskirch et al., 2014).

Conclusion
We found high average levels of annual spatial consistency across a broad variety of seabirds during the chick-rearing period. In addition, we show that a vital consideration when applying tracking data to questions of area-based conservation is whether the sample of tracked individuals is representative at the population level. Our findings indicate that tracking chick-rearing seabirds across years improves the estimation of at-sea spatial distributions. However, in most cases the information gain we found was marginal, suggesting that representative samples collected in one or two years are also useful for the identification of important sites. When only one or two years of tracking data are available for important site identification, we further recommend using independent evidence to assess whether conditions were typical of the region and time of year. This work has relevance for the use of tracking data to inform marine area-based conservation, as the identification of important sites for seabirds is an integral step in the process of assessing threats at sea and designing an effective network of marine protected areas.

Declaration of competing interest
The authors report no competing interests.