Slicing: A sustainable approach to structuring samples for analysis in long‐term studies

The longitudinal study of populations is a core tool for understanding ecological and evolutionary processes. Long‐term studies typically collect samples repeatedly over individual lifetimes and across generations. These samples are then analysed in batches (e.g. qPCR plates) and clusters (i.e. group of batches) over time in the laboratory. However, these analyses are constrained by cross‐classified data structures introduced biologically or through experimental design. The separation of biological variation from the confounding among‐batch and among‐cluster variation is crucial, yet often ignored. The commonly used approaches to structuring samples for analysis, sequential and randomization, generate bias due to the non‐independence between time of collection and the batch and cluster they are analysed in. We propose a new sample structuring strategy, called slicing, designed to separate confounding among‐batch and among‐cluster variation from biological variation. Through simulations, we tested the statistical power and precision to detect within‐individual, between‐individual, year and cohort effects of this novel approach. Our slicing approach, whereby recently and previously collected samples are sequentially analysed in clusters together, enables the statistical separation of collection time and cluster effects by bridging clusters together, for which we provide a case study. Our simulations show, with reasonable slicing width and angle, similar precision and similar or greater statistical power to detect year, cohort, within‐ and between‐individual effects when samples are sliced across batches, compared with strategies that aggregate longitudinal samples or use randomized allocation. While the best approach to analysing long‐term datasets depends on the structure of the data and questions of interest, it is vital to account for confounding among‐cluster and batch variation. Our slicing approach is simple to apply and creates the necessary statistical independence of batch and cluster from environmental or biological variables of interest. Crucially, it allows sequential analysis of samples and flexible inclusion of current data in later analyses without completely confounding the analysis. Our approach maximizes the scientific value of every sample, as each will optimally contribute to unbiased statistical inference from the data. Slicing thereby maximizes the power of growing biobanks to address important ecological, epidemiological and evolutionary questions.


| INTRODUC TI ON
Individuals and populations are shaped by ecological and evolutionary processes which generally occur over many years or decades (Clutton-Brock & Sheldon, 2010). Consequently, long-term studies are key in determining the proximate and ultimate causes of biological processes. Sampling a population repeatedly over individual lifetimes and across multiple generations allows quantification and separation of genetic variation from environmental variation and estimation of such effects with appropriate precision and statistical power (Martin, Nussey, Wilson, & Reale, 2011;van de Pol, 2012).
However, statistical analyses of such comprehensive biological datasets are often complex due to hierarchically structured data and difficulties in separating variation from sources of interest and confounding variables.
Due to the hierarchical nature of biology, for example, phenotypic traits nested within individuals, individuals nested within social groups and social groups nested within populations (Figure 1a), appropriate statistical methods are required that model the hierarchical structure of biological datasets. While nested designs, either natural or through experimental design (Figure 1a), can be analysed in linear models, this inflates the degrees of freedom and thus reduces statistical power (Gelman, 2005;Quinn & Keough, 2002;Underwood, 1997). A better approach is the mixed model framework, which estimates fixed effects while flexibly accounting for the variance explained by random effects, incorporating multilevel hierarchies in data (Bolker et al., 2009;Gelman & Hill, 2006;Snijders & Bosker, 2011;Zuur, Ieno, & Elphick, 2010). However, in cross-classified designs (Table 1), where one individual is associated with more than one batch (Figure 1b) or even more than one cluster (Figure 1c), advanced statistical methods to estimate fixed effects and variance components are required compared to nested designs (Schielzeth & Nakagawa, 2013). While cross-classified data structures in shortterm studies are often the result of the experimental design (e.g. cross-fostering), in long-term studies the timing of the analyses of data often naturally leads to cross-classification of data (Figure 1b,c).
In long-term studies, the individual-based collection of longitudinal data and biological samples from natural or laboratory populations produces large, continuously growing biobanks (Clutton-Brock & Sheldon, 2010). Through laboratory analyses, these biobanks provide information on, for example, individual telomere length (Boonekamp, Mulder, Salomons, Dijkstra, & Verhulst, 2014;Fairlie et al., 2016), serological values (Andraud, Casas, Pavio, & Rose, 2014;Telfer et al., 2008) and genetic variation (Berry, England, Marriott, Burridge, & Newman, 2012;Tollenaere et al., 2012). However, the laboratory analysis of samples from growing biobanks is often conducted on separate groups of samples over time (e.g. after each fieldwork season, each year or coinciding with grant cycles). Such a group of samples-a cluster-will be collectively analysed under similar conditions, but these conditions might differ between clusters (e.g. different analyst, machine or month). Samples within a cluster are often further subdivided into batches (e.g. qPCR-plates) where, again, samples are analysed under similar conditions, but conditions may vary between batches (e.g. different reagents or day). While batches are nested within clusters, the continuous collection of samples in the field and intervals between laboratory analyses result in longitudinal samples from a single individual that may not be nested within batches or even clusters, causing cross-classified data structures in long-term studies (Gelman & Hill, 2006;Figure 1b,c).
Cross-classification of data induces variation that can be confounded with the independent variables of interest, which can reduce the ability to compare results across samples and draw reliable conclusions (Greenland, Robins, & Pearl, 1999;Schielzeth & Nakagawa, 2013). This is problematic if cross-classification is not explicitly accounted for, or there is not sufficient cross-classification to disentangle these sources of variation with high statistical power.
For example, temporal variation or, where multiple populations are studied, spatial differences in resource availability can be confounded with laboratory analysis when samples are analysed after each period of collection, resulting in a failure to separate the effects of resource availability and laboratory analysis on a response variable. The experimental design and therefore the method in 4. While the best approach to analysing long-term datasets depends on the structure of the data and questions of interest, it is vital to account for confounding among-cluster and batch variation. Our slicing approach is simple to apply and creates the necessary statistical independence of batch and cluster from environmental or biological variables of interest. Crucially, it allows sequential analysis of samples and flexible inclusion of current data in later analyses without completely confounding the analysis. Our approach maximizes the scientific value of every sample, as each will optimally contribute to unbiased statistical inference from the data. Slicing thereby maximizes the power of growing biobanks to address important ecological, epidemiological and evolutionary questions.

K E Y W O R D S
ageing, biobank, cross-classified, long-term studies, mixed models, nested, slicing, telomeres F I G U R E 1 Schematic of nested and cross-classified data structures: (a) with a nested design applied to laboratory analyses (left) and populations (right); (b) cross-classification of data among batches that is confounded by time of analysis; and (c) cross-classification common in longitudinal data in laboratory analyses across clusters. Black dashed delineation indicates nested, whereas red dashed delineation indicates cross-classified structures While relatively few studies report the approach used to structure samples into clusters, currently two main approaches are used, and both are prone to confounding effects and cross-classified data structures. First, sequential structuring of samples to clusters: analysing samples in clusters, in the same order in which they were collected (e.g. by year). This approach may be used, for example, in physiological studies (e.g. Takizawa et al., 2004) and has the advantage that samples can be analysed immediately without any issues in placing or labelling of samples. However, sequential structuring of samples confounds cluster with organizing variable (e.g. year) effects ( Figure 2a). The second approach, randomization of samples from multiple years within a cluster, ensures that samples are sufficiently mixed to avoid confounds, and should already be standard practice (Figure 2b). The use of randomization is widespread in, for example, telomere length (e.g. Spurgin et al., 2017), disease (e.g. Swanson et al., 2015) and hormone analyses (e.g. Dantzer et al., 2013). However, this randomization approach requires a delay before analyses can be completed so that samples collected at different time points can be analysed together, and organizing variable and cluster effects can be separated. Furthermore, the randomization of large numbers of samples is time-consuming and detailed reordering of samples from the biobank is prone to error due to sample labelling and placing. Most importantly, however, is that after applying this randomization approach once in a longterm study, any subsequently collected samples cannot be directly compared to the previously randomized samples as they will be subject to statistically inseparable variation due to clustering of the samples already analysed. For example, randomizing two time periods of 4 years of sampling separately into two clusters results in uncontrollable variation between these two clusters and confounds the first 4 years in cluster one with the subsequent years in cluster two (Figure 2b), leading to cross-classified data structures ( Figure 1c). Analysing the same samples multiple times in subsequent clusters can avoid this issue, often referred to as 'golden'

TA B L E 1 Definitions of key terms
or 'reference' samples. However, the additional costs or potential depletion of the 'golden' sample can make this approach difficult.
More importantly, it is unclear how effectively one golden sample can control for among-batch and among-cluster variation. For example, the 'golden' sample might not be representative of all F I G U R E 2 Schematic of three strategies to structure samples from the biobank. The sequential analysis strategy (a) can confound cluster and year, while randomization of multiple years within a cluster (b) prevents this confound but generates uncontrollable variation between clusters. The slicing approach (c) combines the advantages of these approaches and can be used to sequentially analyse growing biobanks while maintaining independence between cluster and associated variables. The biobank is sliced (e.g. by year), thereby analysing a set of continuously collected samples sequentially in each subsequent cluster. Each sample only needs to be analysed once, where different samples from the same slice are analysed across batches and clusters (e.g. years 4 and 5), which enables controlling for batch and cluster effects. Slicing width (frequency of new samples collected) and angle (degree of independence between slices) determine the level of statistical independence between clusters samples, and the sample can degrade over time thus not returning the same value in different analyses. In short, these two popular approaches to structuring cross-classified samples do not fully account for among-cluster and among-batch variation, leaving an unknown amount of variance unquantified and thus compromising conclusions drawn from such studies.
The analyses of longitudinal data can be turned into a nested design when samples from a single individual are aggregated within a batch and cluster ( Figure 1a). This is thought to increase the statistical power to detect within-individual effects. The reasoning is that longitudinal samples are then exposed to the same technical noise, which allows greater statistical power to dissect out the biology from batch effects ( Here, we present an approach to the analysis of samples from growing biobanks that, while maintaining statistical independence, accounts for among-cluster variation and controls for other potentially confounding effects ( Figure 2c). Additionally, we provide a case study of this novel approach and subsequently test the assumption that aggregating longitudinal samples within batches results in greater statistical power to detect within-individual effects. We then discuss the analysis of long-term data and highlight the importance of statistical mixed models. While we will mainly consider the field of evolutionary biology, using telomere dynamics as an illustrative example, these considerations and techniques can be applied to a range of fields, including epidemiology, ecology and laboratorybased science.

| Slicing approach
We have developed a slicing approach to structure samples from growing biobanks, such that recently collected samples are analysed in clusters together with previously obtained samples, ensuring statistical independence of collection time and cluster. This approach can overcome the experimental design and statistical issues with cross-classification and confounding variables in long-term studies (Gelman & Hill, 2006;Greenland et al., 1999;Schielzeth & Nakagawa, 2013), by bridging batches and clusters together. This allows for a smaller width) are required to be able to partition these confounding effects. Smaller slices lead to a greater statistical power to separate potentially confounding effects within and between batches (as there are more slices within a batch and each slice occurs in more batches). Setting the slicing angle and width is a trade-off between statistical independence (assessing statistical power in the case of confounding effects) and the number of samples that remain unanalysed until the addition of newly collected samples. This latter point is a constraint, as the number of samples that can be analysed simultaneously will be reduced, if only slightly, by this approach. We argue that the creation of statistical independence and accounting for among-cluster variation are merits that outweigh this limitation.

| A case study: structuring samples for telomere length analysis in wild house sparrows
We provide a case study of how slicing can be applied to structure samples for analysis in a long-term (>20 years) study on a natural  . Immigration to and emigration from the island is low (0.5% of recruits; Schroeder et al., 2015), with an annual resighting probability of 0.91-0.96 (Simons et al., 2015). This closed island population on Lundy thus provides precise ages and life-history data for all individuals.
We use a subset of the Lundy dataset containing 12 years of data (2000-2011; Table S1), where the population consisted on average of 130 individuals that were blood sampled on average twice a year.
The total biobank we selected for in this case study contains 2,733 samples from 515 individuals. The hypothesis to be tested is that telomere length and age are negatively associated within individuals, and therefore we will analyse all samples collected every 6 years (i.e. 12/6 = 2 clusters) with 12 qPCR plates (i.e. batches) in each cluster ( Figure S1).  Figure S1). Second, we determine the slicing angle. Since the population density varied strongly between years, the slicing angle should be low ( Figure S1). This way a single year crosses more batches which allows confounding effects (i.e. population density and year) to be separated from variation in sources of interest. Third, since the number of samples exceeds the preferred slicing width and angle, multiple batches with the same lay-out will be used ( Figure S1). These slicing parameters result in at least three slices within a batch to enable the separation of confounding environmental effects (e.g. population density, sampling year) from laboratory effects (e.g. batch), when using mixed models (Gelman & Hill, 2006).
The slicing approach allows an accurate estimation of the relationship between telomere length and age. Since the Lundy sparrow study is ongoing, the slicing approach can be continued into new year or cohort effects through simulations.

| Simulations
We used simulations run in r 3.3.1 (R Development Core Team, 2019) to determine the statistical power (i.e. ability to reject the null hypothesis when false) and precision (i.e. width of the distribution) to detect individual, year and cohort effects, using different sample allocation strategies (i.e. longitudinal samples aggregated in a single batch, randomly allocated to batches, or 'sliced' across batches; see Data S1).
We simulated a population of 200 individuals in 10 cohorts that were sampled once a year for a maximum of 5 years, providing an equal sample size in all simulations. 'Telomere length' was used as an example response variable; however, this is applicable to any longitudinally measured continuous variable. Starting telomere length was drawn from a Gaussian distribution to fix between-individual standard deviation (SD = 1.00) and all individuals shared the same within-individual shortening rate of telomeres (0.06*1, scaled to SD = 1 parameter, =0.06 per year).
Year effects were simulated by taking 0.7 multiplied by a generated value drawn from a uniform distribution (between 0 and 1) for each year and added these to the response variable. In separate simulations, we replaced year with cohort effects (20 individuals per cohort) by taking 0.9 multiplied by a generated value from a uniform distribution (between 0 and 1) for each cohort. We chose to model We simulated the relationship between telomere length and age (in years) both within and between individuals. Betweenindividual effects were modelled using the mean age at which the individual's trait was measured, and within-individual effects as the age at which an individual's trait was measured minus the mean measurement age for that individual (van de Pol & Wright, 2009).
Simulations were run 5,000 times, for a varying number of samples (12, 24, 36, 48) per batch and simulated differences between batch means (batch attributable error, SD: 1, 2.5, 5, 10, 20, 40). This error is relatively high to ensure that we control for potential effects of batch attributable error when determining the variation in statistical power among sample allocation strategies. Simulations were repeated three times to obtain three separate results per sample allocation strategy.
The slicing strategy was simulated at an angle that resulted in at least three slices per batch. Note, to start the sample allocation, the first batch was filled by 3/4 with the first slice and by 1/4 with the second slice, where subsequent batches were filled by 1/4, 1/2 and 1/4 with subsequent slices (Figure 2c). Additional simulations were run with the slicing angle halved, slicing width halved, and a doubled sample size (n = 400).
The simulated data were analysed using linear mixed models in lme4 1.1-14 (Bates, Machler, Bolker, & Walker, 2015), where the model included random effects (at the intercept level) for individual (to control for repeated measurements on the same individual) and batch, and year or cohort was fitted as a fixed factor. Statistical power was determined by the number of significant values (p < .05) for each variable out of the total number of simulations (n = 5,000).
It is important to understand the effect of sample allocation strategy on precision estimates, as well as statistical power. We therefore quantified precision as the width of the distribution of parameter estimates from the models run on the repeated simulated datasets, as the absolute difference between the 75% and 25% percentile divided by the median (note, a precision value closer to zero means higher precision).
Parameters of the simulations were manually optimized so that a statistical power of approximately 0.5 was achieved to detect between-individual effects for the random allocation strategy, determined by a t-value of less than −2 (α ≈ 0.05). This intermediate level of statistical power avoids thresholding effects at either end of the power spectrum (0 or 1). Such a simulation strategy maximizes the sensitivity in detecting any modulation in relative statistical power among sample allocation strategies, which is our focus rather than achieving a certain absolute statistical power.

| RE SULTS
Our simulations tested the widely held assumption that aggregating longitudinal samples of the same individual in a single batch increases statistical power to detect within-individual effects (e.g. Herborn et al., 2014;Nettle et al., 2015). In simulations with year effects, the statistical power to detect within-individual effects was much lower when longitudinal samples were aggregated (mean statistical power ± SD across sample sizes and three runs per simulation = 0.059 ± 0.030) than when samples were sliced across batches (0.269 ± 0.008) or randomly allocated to batches (0.267 ± 0.007; Figure 3). For between-individual effects, again, the statistical power was much lower when longitudinal samples were aggregated in a single batch (0.138 ± 0.077) compared to when samples were sliced across batches (0.443 ± 0.007) or randomly allocated to batches (0.441 ± 0.007; Figure 3). The statistical power to detect year effects was higher when longitudinal samples were aggregated in a single batch (0.776 ± 0.008) or randomly allocated to batches (0.782 ± 0.014) than when sliced across batches (0.622 ± 0.012; Figure 3). However, a lower slicing angle (crossing four batches; 0.741 ± 0.009) and smaller slicing width (half a batch; 0.751 ± 0.007) (1) resulted in a similar statistical power to detect year effects to aggregation of longitudinal samples and random allocation while maintaining statistical power to detect within-and between-individual effects ( Figure 4).
In simulations with cohort effects, the statistical power to detect within-and between-individual effects was lower when slicing across batches (0.159 ± 0.032; 0.324 ± 0.020) compared to aggregation (0.557 ± 0.009; 0.390 ± 0.020) and randomization (0.542 ± 0.014; F I G U R E 3 Statistical power analyses of simulated data for individual and year effects among four batch sizes (n = 12-48) using three sample allocation strategies: (1) aggregating samples per individual in the same batch (solid, red), (2) assigning samples randomly to batches (dashed, blue) or (3) slicing samples across batches with an angle that crosses two batches and a slicing width of a single batch (dotted, yellow). Raw data points from three separate simulations with mean statistical power per sample size are shown against among-batch variation, with 95% confidence intervals as shaded areas. Scales differ between year, within-and between-individual effects In simulations with either year or cohort effects, the precision to estimate within-and between-individual effects followed F I G U R E 4 Statistical power analyses of simulated data for individual and year effects among four batch sizes (n = 12-48) using three different slicing parameters: (1) slicing angle that crosses two batches with a slicing width of a single batch (solid, red), (2) halved slicing angle which crosses four batches (dashed, blue) or (3) halved slicing width of half a batch (dotted, yellow). Raw data points from three separate simulations with mean statistical power per sample size are shown against among-batch variation, with 95% confidence intervals as shaded areas. Scales differ between year, within-and between-individual effects where approaches with lower statistical power showed greater precision to detect such effects ( Figures S9 and S10). A doubled sample size (n = 400) increased precision but did not alter variation in precision among sample allocation strategies (Figures S11 and S12). Additionally, varying the strengths of year and cohort effects changed the precision, but not the variation among sample allocation strategies (Figures S13-S16).
Our slicing method performs similar to randomization of samples and outperforms aggregation of longitudinal samples to disentangle within-and between-individual effects when year effects apply, an objective shared by many longitudinal studies (Nussey, Froy, Lemaitre, Gaillard, & Austad, 2013;van de Pol & Wright, 2009 parameter sets specific to current or future datasets can be included in the script provided (Data S1).

| D ISCUSS I ON
The analysis of comprehensive long-term datasets is often com-  Nettle et al., 2015). Such efforts will reduce the statis-

| Integral approach to growing biobank analysis
The optimal sample structuring strategy for analysing long-term datasets depends on the structure of the data and questions of interest. However, in the majority of long-term datasets, slicing has benefits over other structuring strategies by overcoming problems with confounding variables and cross-classified data structures which commonly occur in the analysis of long-term studies.
The assumption that longitudinal samples should be aggregated in a single batch could hinder the slicing approach, but our simulations have disproven this assumption. Slicing performs, in terms of statistical power and precision, equally well to randomization when applying correct slicing parameters (i.e. low width and angle). Slicing across batches and clusters and bridging them together provides the slicing approach with statistical power to disentangle confounding effects.
The key benefit of slicing over randomization is that slicing allows separate analysis of current data and flexible inclusion of these data into future analyses without completely confounding the analysis.
Furthermore, slicing allows sequential analysis of samples, which only need to be analysed once, preventing complicated sample labelling and placing among clusters, reducing sample volume required and avoiding any defrosting issues and therefore reducing the potential for human error.
Slicing has some potential limitations. For example, substantial differences among years in the number of samples collected could limit the ease with which the slicing approach is applied. Additionally, a failed analysis of samples (e.g. plate failure leading to sample loss during analysis) using slicing results in missing data within a certain time window, whereas with randomization this is scattered across the dataset. While slicing performs similarly to randomization in terms of statistical power and precision, we think that slicing is more practical with merits (i.e. sequential analysis, statistical independence) that outweigh the limitations. We stipulate that, because of sequential analysis in our slicing approach, hypotheses need to be pre-defined and power analyses conducted before experimental and statistical analysis (Fraser, Parker, Nakagawa, Barnett, & Fidler, 2018 and references therein).
The use of mixed models is common in the analysis of longitudinal datasets, especially in ecology (Bolker et al., 2009;Gelman & Hill, 2006). We highlight the use of mixed models because they are necessary when using the slicing approach to account adequately for experimental and environmental variation. The combination of slicing and mixed models in long-term studies allows analysis of commonly occurring cross-classified data structures that arise due to hierarchical biology mixed with cross-classified data collection and analysis. Interpretation of the variance components in these models depends on a crossed or nested design (Schielzeth & Nakagawa, 2013), where the random effect structure can be used to account for potentially confounding experimental and environmental variables with cluster effects (e.g. storage duration, batch).
The failure to include these effects can inflate type I and type II errors when there is a temporal, spatial or other spurious correlation with any independent variable.

| CON CLUS IONS
A major current challenge in long-term studies is analysing data as it is collected while also including it in future analyses without creating uncontrollable variation, allowing comparison of results over multiple years or even decades. This requires the ability to compare differentially timed analyses that are potentially biased by confounding cluster effects. Our study shows the importance of considering the structure of samples among clusters and batches in long-term studies. Our slicing approach retains statistical independence and accounts for among-cluster variation in the sequential analysis of growing biobanks. Slicing also provides similar statistical power and precision to detect cohort, year, within-and between-individual effects to randomization, if analysed using appropriate statistical mixed models and consistent methodology to control for confounding effects. A single sample's scientific value increases through this approach, as it can be used separately in current studies, but can also be included in subsequent studies, providing sustainable (re-) use of collected data. The approach we propose here (slicing and mixed models) is easy to apply and improves the potential for these growing biobanks to address important ecological and evolutionary questions.

ACK N OWLED G EM ENTS
The authors gratefully acknowledge feedback on an earlier version of the manuscript from Dan Nussey. We also thank two anonymous reviewers for comments which greatly improved the manuscript. This work was supported by a Leeds

DATA AVA I L A B I L I T Y S TAT E M E N T
Supporting Table S1 and Figures S1-S16 are provided in the supporting information. Simulation r scripts (Data S1) are provided in the supporting information, archived at GitHub (https ://github. com/Dugda leRes earch Group/ Slicing), and available at Zenodo