A Recipe for Accurate Estimation of Lifespan Brain Trajectories, Distinguishing Longitudinal and Cohort Effects

We address the problem of estimating how different parts of the brain develop and change throughout the lifespan, and how these trajectories are affected by genetic and environmental factors. Estimation of these lifespan trajectories is statistically challenging, since their shapes are typically highly nonlinear, and although true change can only be quantified by longitudinal examinations, as follow-up intervals in neuroimaging studies typically cover less than 10 % of the lifespan, use of cross-sectional information is necessary. Linear mixed models (LMMs) and structural equation models (SEMs) commonly used in longitudinal analysis rely on assumptions which are typically not met with lifespan data, in particular when the data consist of observations combined from multiple studies. Generalized additive mixed models (GAMMs) offer an attractive alternative to LMMs and SEMs. In this paper, we propose various ways of formulating GAMMs for accurate estimation of lifespan trajectories of 12 brain regions, using a large longitudinal dataset and realistic simulation experiments. We show that GAMMs are able to accurately fit lifespan trajectories, distinguish longitudinal and cross-sectional effects, and estimate effects of genetic and environmental exposures. Finally, we discuss and contrast questions related to lifespan research which strictly require longitudinal data and questions which can be answered with purely cross-sectional data, and in the latter case, which simplifying assumptions that need to be made. The examples are accompanied with R code, providing a tutorial for researchers interested in using GAMMs.


Introduction
Cohort effect: The effect of birth year (cohort) on the relationship between a set of explanatory variables and an outcome of interest. Cross-sectional effect: The effect of age on an outcome of interest at a particular point in time, across participants with different birth dates. Longitudinal effect: The effect of increasing age for participants belonging to a given birth cohort. Linear mixed models (LMMs): Linear regression models used for data with hierarchical structure, e.g. longitudinal data with multiple measurements per individual. Fixed effects: Regression parameters in mixed models which are common for all participants. Random effects: Regression parameters in mixed models which are unique to each participant, used to model correlation between repeated measurements. Polynomial model: Linear regression model which includes the first n powers of an explanatory variable x as distinct variables. Quadratic model: A polynomial model containing the first two powers of an explanatory variable x, on the form β 0 + β 1 x + β 2 x 2 . Cubic model: A polynomial model containing the first three powers of an explanatory variable x, on the form β 0 + β 1 x + β 2 x 2 + β 3 x 3 . Generalized additive model (GAM): A linear regression model in which the outcome is modeled as an unknown smooth function of the explanatory variables. Generalized additive mixed model (GAMM): An extension of GAMs to data with hierarchical structure, containing random effects.
Smoothing parameter: Parameter controlling the degree of nonlinearity (wiggliness) of a function estimated by a GAM/GAMM. Cubic regression splines: A set of cubic polynomials, each of which is defined over a small part of the x-axis and is zero elsewhere. Thin-plate regression splines: A set of functions, each of which represents a given nonlinear shape over the full x-axis.
Smooth function: A nonlinear function estimated by GAMs/GAMMs represented as a weighted sum of (e.g., cubic or thin-plate) regression splines.
Box 1: Key terms used in this paper, defined in the context of longitudinal data analysis.
Large datasets with cross-sectional and longitudinal structural magnetic resonance images (MRIs) of participants whose ages span from early childhood to late adulthood provide ample opportunities to study lifespan brain trajectories. Important questions such data can contribute to answer include how brain structure is related to aging, how the aging effect is modified by genetics and environmental exposures, and at which age critical events like maximum volume or maximum rate of change occur. Lifespan brain trajectories are nonlinear and differ between regions, as illustrated in Figure 1 for volumes of cerebral white matter, cortex, and hippocampus for 4,352 observations of 2,017 healthy participants from the Center for Lifespan Changes in Brain and Cognition longitudinal studies (Fjell et al., 2017;Walhovd et al., 2016), henceforth referred to as the LCBC data. Modeling the type of nonlinear effects shown in Figure 1 using structural equation models (SEMs) (McArdle and Hamagami, 2001;Meredith and Tisak, 1990) or linear mixed models (LMMs) (Laird and Ware, 1982) with polynomials typically leads to poor fits at least over parts of the lifespan, and is highly dependent on manual selection of terms (Fjell et al., 2010). Generalized additive mixed models (GAMMs) 1 (Lin and Zhang, 1999) offer an attractive alternative, typically yielding good fit over the full lifespan in an automated and data-driven manner. This is illustrated in Figure 2, comparing a GAMM to LMMs with quadratic and cubic polynomials for the effect of age on cerebellum cortex volume. See Box 1 for a definition of these and other key terms used in this paper. Similar to often researched structures like hippocampus and cerebral white matter, cerebellum cortex is characterized by a nonlinear age trajectory. In contrast to the GAMM, neither of the LMMs capture the steep increase seen in early childhood, and the cubic LMM predicts an increase in cerebellum cortex volume in old age, whereas the GAMM adequately captures the decline seen in the data. In addition, both the quadratic and cubic model estimate cerebellum cortex volume to reach its maximum at the age of around 25, while the GAMM instead estimates the maximum to occur around 14 years of age, and the latter seems to be in better agreement with the data. Figures 1 and 2 and all subsequent plots were created in R (R Core Team, 2019) with ggplot2 (Wickham, 2016). The goal of this paper is to provide clear recommendations for optimal estimation of lifespan trajectories of brain development and aging. To this end, several aspects need consideration. First, as has been emphasized by a large number of authors, when analyzing data with repeated observations over time, care must be taken to distinguish within-individual and between-individual effects, which for the purpose of this paper are longitudinal and cross-sectional effects (Curran and Bauer, 2011;Hoffman, 2007;Hoffman and Stawski, 2009;Morrell et al., 2009;Sliwinski et al., 2010). True change can only be measured by use of longitudinal data, but how important are longitudinal data when the task is to estimate trajectories spanning many times the maximum follow-up interval realistically attainable in a neuroimaging study? Large datasets combined from different studies, either conducted by the same group as for the LCBC data or by multiple groups participating in a data-sharing consortium like Lifebrain (Walhovd et al., 2018), present further challenges for longitudinal modeling as the number of measurements per participant and the time intervals between measurements are typically highly varying. All of these issues are illustrated for the LCBC data in Figure 1920 1930 1940 1950 1960 1970 1980 Cohort Figure 4: Cohort effects. Illustration of the impact of cohort effects in a hypothetical dataset. Dashed lines show the crosssectional age effect in 1990, colored dots show four cohorts of participants whose age in 1990 was 10, 30, 50, and 70 years, respectively, and the blue lines show longitudinal age effects for each cohort. In the left plot, there are no cohort effects, and hence longitudinal and cross-sectional effects coincide. In the center plot, the cohort effects are independent of age, and the longitudinal effects differ by an offset but the effect of aging is identical across cohorts, as seen by the parallel blue lines. In the right plot, the cohort effects depend on age, and in this case also the slope of the longitudinal effect varies between cohorts.
3. While GAMMs flexibly handle data with varying follow-up intervals, the statistical literature on use of GAMMs for longitudinal analysis has almost exclusively focused on cases in which each participant has been followed over the full time range under consideration, from a common baseline (Brumback and Rice, 1998;Durbán et al., 2005;Edwards et al., 2005;Gu and Ma, 2005;Ke and Wang, 2001;Lambert et al., 2001;Sullivan et al., 2015). There is hence a need for an understanding of how GAMMs should be optimally used in lifespan brain research, with short follow-up intervals and varying dates of initial measurement as shown in Figure 3.
The outline of this paper is as follows. In Section 2 we introduce GAMMs formally and define three different candidate models for estimating lifespan brain trajectories. We also describe simulation experiments conducted in order to compare these GAMMs in a realistic setting for estimating 12 different brain regions. In Section 3 the simulation results are presented, and next we show two example applications demonstrating how GAMMs can be used for estimating lifespan brain trajectories and the effect of genetic variations on the trajectories. Accompanying R code provides a tutorial for researchers interested in using GAMMs. In Section 4 we discuss the results taking into regard currently available longitudinal studies. We contrast questions that strictly require longitudinal data to questions that under some simplifying assumptions may be answered with cross-sectional data. Finally, in Section 5 we conclude by presenting recommendations for how to use GAMMs in lifespan brain research.

Longitudinal and Cross-Sectional Effects
The effect of age on an outcome in a population can be completely explained by longitudinal and cohort effects, with the former representing the effect of aging for participants in a given birth cohort and the latter determining how the longitudinal effects differ between participants belonging to different birth cohorts (Diggle et al., 2002, Ch. 1.1). Cross-sectional effects are the effects of age across cohorts when considered at a particular point in time, as illustrated in Figure 4. In the absence of cohort effects the cross-sectional and longitudinal effects are identical. Age-independent cohort effects result in different slopes for the longitudinal and cross-sectional effects, while age-cohort interactions lead to longitudinal effects whose slopes depend on age. Selective survival, by which life expectancy is correlated with the dependent variable, leads to population changes over time and hence are part of the longitudinal effects (Baltes, 1968). In contrast, with sampling bias, by which the probability of recruitment or the probability of dropout before the end of the study depends on the outcome variable, the sample is not representative of the population under study and biased estimates may result (Molenberghs and Fitzmaurice, 2009).

Generalized Additive Models
Generalized additive models (GAMs) (Hastie and Tibshirani, 1986) model the effect of a variable x on an outcome y with smooth functions f (x), constructed as weighted sums of K basis functions b 1 (x), b 2 (x), . . . , b K (x) with weights β 1 , β 2 , . . . , β K , i.e., f (x) = K k=1 β k b k (x). Commonly used basis functions are cubic regression splines and thin-plate regression splines (Wood, 2003), and the number of basis functions is typically chosen large enough to allow a wide range of nonlinear patterns to be estimated, while small enough to allow computational efficiency. For a GAM with a single smooth term, y = f (x) + , the estimate given n observations is computed by finding the values of β 1 , . . . , β K minimizing the criterion The first term is the least squares criterion using the basis functions as explanatory variables, and the second term represents the wiggliness of f (x) as measured by its squared second derivative over some range [a, b], typically the minimum and maximum values of x in the sample. The smoothing parameter λ controls the extent to which wiggliness is penalized, striking a balance between overfitting (too low λ, too wiggly f (x)) and underfitting (too high λ, too smooth f (x)). For data with repeated measurements, GAMs can be extended to GAMMs by the inclusion of random effects. A key insight allowing use of LMM software for efficient fitting of GAMMs is that the penalized smooth terms may be converted to random effects with variance proportional to 1/λ (Lin and Zhang, 1999;Wood, 2004Wood, , 2010.

Generalized Additive Mixed Models for Longitudinal Data
Now consider a dataset of n participants indexed i = 1, . . . , n, assume an outcome y ij has been measured m i times in participant i, with timepoints indexed by j = 1, . . . , m i , and let a ij denote the age of participant i at her/his jth timepoint. The question of interest is how the outcome varies as a function of age, and this can be modeled with the GAMM where f (a ij ) is the effect of age, β 0 is the intercept, b 0i is the random intercept for participant i, and ij is a random noise term. Both b 0i and ij are assumed to be normally distributed, b 0i ∼ N (0, σ 2 b ) and ij ∼ N (0, σ 2 ), with σ b representing the between-participant standard deviation and σ the withinparticipant residual standard deviation. We do not consider random slopes, due to the low number of repeated measurements in the typical applications considered in this paper, although this could be included with an additional term b 1i a ij in (1). With sufficient data, use of random slopes is recommended, as it relaxes the assumptions on the covariance structure of repeated measurements (Fitzmaurice et al., 2011, Ch. 19).
In the presence of cohort effects, the term f (a ij ) represents some weighted combination of cross-sectional and longitudinal effects, and hence cannot be interpreted as either. The typical method of correcting for this in LMMs is by splitting the age term into a i1 representing age at first measurement, and t ij representing time since baseline (Fitzmaurice et al., 2011;Zeger and Liang, 1992). Extending this to a GAMM yields where f (a i1 , t ij ) is a smooth two-dimensional surface representing the nonlinear interaction of baseline age and time, constructed by multiplying basis functions for a i1 and t ij (Wood, 2006), thus containing the main effect of both terms and their interaction. Considering the plots in Figure 1, including an interaction seems necessary for estimating lifespan trajectories, as the direction of change clearly depend on baseline age. In model (2) the longitudinal effect of aging t from a baseline a i1 is given by f (a i1 , t) keeping a i1 constant, while the cross-sectional effect of varying baseline age a is given by f (a, 0). Model (2) has some important limitations, however. First, an assumption in the LMMs motivating its definition is that all participants have identical baseline dates. Second, when participants are followed over a short period compared to the full lifespan, the values of t ij vary between zero and some maximum which is much lower than the maximum age, which might make estimation of nonlinear longitudinal effects challenging. We hence introduce an alternative GAMM modeling cohort effects by including birth date c i , in which f (a ij ) is defined as for (1), while β(a ij )c i is a varying-coefficient term (Hastie and Tibshirani, 1993) representing the main effect of cohort (birth date) as a function of age. The longitudinal effect of aging a for a participant belonging to cohort c i is given by f (a) + β(a)c i keeping c i constant. The cross-sectional effect of age a at date d is given by f (a) + β(a)c, with c = d − a representing the birth date of participants of age a at date d, and hence both a and c are varying in this case. Model (3) is not identified if all participants have identical measurement dates, since then birth date is a linear function of age, i.e., c i = d j − a ij where d j is the common date of the jth timepoint. However, as illustrated in Figure 3 (right), both the dates of initial measurements and the times between measurements may be highly varying in lifespan data, and this variability helps identifying the estimates of model (3). The effect of a time-invariant variable x i on the age trajectory can be estimated by adding an interaction term f i (a ij )x i in models (1) and (3), or an interaction term f 2i (t ij )x i in model (2). If x i is a categorical variable, this amounts to estimating how the trajectories associated with a given category differ with respect to a reference category, while for continuous x i the interaction becomes a varying-coefficient term as was introduced in model (3).

Simulation Experiments
In order to compare the GAMMs (1)-(3), characteristic lifespan curves were estimated for 12 brain regions with the LCBC data, using GAMMs on the form (1), with additional covariates sex, scanner, and total intracranial volume (ICV). Volumes were estimated with FreeSurfer 6.0 (Dale et al., 1999;Fischl et al., 2002;Reuter et al., 2012), and detailed sample characteristics are presented in the Supplementary Material. The curves, shown in Figure 5, were used as ground truths from which measurements were sampled. For each region, three cases were considered: no cohort effects, age-independent cohort effects, and age-cohort interactions. In the latter two cases, cohort effects were added to the characteristic curves as illustrated in Figure 6 for cerebral white matter, cortex, and hippocampus, and in the Supplementary Material for the remaining regions. Data were generated with n = 1, 000 participants, and the number of timepoints m i for each was uniformly distributed in {1, 2, 3}. The time between two measurements of a given participant was uniformly distributed between 1 and 8 years, which combined with the maximum number of 3 timepoints set the maximum possible follow-up interval to (3 − 1) × 8 = 16 years. Baseline age was uniformly distributed between 4 and 90 years, and the date of initial measurement was uniformly distributed over 10 years, from 1st January 2000 to 1st January 2010. The simulations were repeated with identical dates of initial measurement, with results shown in the Supplementary Material. Random intercepts b 0i and residuals ij were sampled from normal distributions with mean zero and standard deviations equal to 50 % and 20 % of the sample standard deviation of the region's volume, respectively, similar to what was observed in the LCBC data. Datasets for each of the 12 × 3 = 36 combinations of regions and cohort effects were randomly sampled 1,000 times, and the six models presented in Table 1 were fitted to each dataset.
Identifier Description (1a) Model (1) without random effects, using only the first timepoint. (1b) Model (1) fitted to the complete data. (2a) Model (2) with a varying-coefficient term for the interaction between baseline age and time. (2b) Model (2) with a two-dimensional smooth surface for jointly modeling the effect of baseline age and time.
Model (3) with linear age-independent cohort effects. (3b) Model (3) with a varying-coefficient term allowing cohort-age interactions.  Simulation results in the case of no cohort effects, showing the RMSE of the predicted value after baseline ages 10, 35, and 60 years. For any given time t along the x-axis, the curves represent the RMSE of the predicted longitudinal effect of t years of increased age since baseline. Column headers specify the model fitted to the data, as defined in Table 1.  (2), on the other hand, had much poorer fits than the other models, even for times very close after baseline, for which the data contained a large number of observations. Figure 8 shows the results in the presence of age-cohort interactions. Now model (1) with or without longitudinal data had almost as high RMSE as model (2). Model (3), on the other hand, was able to accurately estimate the longitudinal effects. Model (3b), which allows cohort-age interactions, had slightly better overall performance than model (3a), which only contains the cohort effect as a single offset term. Results for age-independent cohort effects and for other regions are shown in the Supplementary Material. Figure 9 shows the RMSE of the longitudinal estimates 25 years ahead averaged over each year and over baseline ages of 10, 35, and 60 years, for 6 regions chosen by alphabetic ordering. In the absence of cohort effects (left column), model (1) was most accurate, with variant (1b) utilizing longitudinal data performing slightly better than (1a) which only uses cross-sectional data. The two formulations of model (3) performed well also in the absence of cohort effect, although with higher variation than model (1), likely due to the additional noise contributed by the cohort term which in this case had zero actual effect. As expected, with age-independent cohort effects, model (3a) which includes the cohort effect as a single offset term performed better than model (3b) which also estimates cohort-age interactions, while the opposite was true with cohort-age interactions. The R package gghalves (Tiedemann, 2020) was used for creating Figure  9. Figure 10 shows estimated longitudinal effects of age on hippocampal volume from a baseline age 10 years from 100 model fits randomly selected among the 1,000 simulated samples. While the estimates of model (3) follow the true effect over the full 25-year period, the estimates of model (2) start diverging before 10 years after baseline. This happens because most participants were not followed for more than about 10 years, and as a consequence of the model formulation of (2), there is not enough data to estimate the effect of time beyond 10 years. In the absence of data smooth estimates tend toward straight lines, due to the second

Simulation Experiments
Hippocampus 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15  showing the RMSE of the predicted value after baseline ages 10, 35, and 60 years. For any given time t along the x-axis, the curves represent the RMSE of the predicted longitudinal effect of t years of increased age since baseline. Column headers specify the model fitted to the data, as defined in Table 1. derivative penalty, as Figure 10 shows. The same effect can be seen by considering the right column of Figure  9, with cohort-age interactions. In this case model (2) had accuracy close to model (3) for accumbens area and caudate, while model (3) was far more accurate for the remaining regions. From Figure 5 it is clear that accumbens area and caudate are characterized by close to linear trajectories, while the four other regions have highly nonlinear trajectories, supporting the hypothesis that model (2) is well suited for estimating linear longitudinal effects, but not nonlinear effects.
Simulation results with identical baseline dates shown in the Supplementary Material are practically identical to those described in this section, suggesting that the issue of varying baseline dates is not critical for GAMMs. Instead, as Figure 10 shows, the main challenge with estimating longitudinal effects using the GAMM (2) is caused by the fact that time t ij spans a short period compared to the full lifespan, making estimation of nonlinear effects beyond the maximum follow-up time impossible. For estimation of crosssectional effects, the differences between models were much smaller. However, the two versions of model (2) still showed the poorest performance, while the two versions of model (3) showed the best performance. Detailed results for estimation of cross-sectional effects are shown in the Supplementary Material.

Example Applications
In this section we show how GAMMs can be applied to the study of lifespan brain development, with example R code using the packages mgcv (Wood, 2017) and gamm4 (Wood and Scheipl, 2017). Some parts of the code are omitted for ease of presentation, and can be found in the Supplementary Material.

Modeling Lifespan Volume Trajectories
We first consider the hippocampal volumes shown in Figure 1 (right). The data are organized in long format in the dataframe dat, with each row representing one timepoint of a single participant, containing the following variables: • ID: Unique participant ID.
• Age: Age in years at MRI session.
• Hippocampus_z: Estimated hippocampal volume, standardized to have zero mean and unit variance.  • ICV: Estimated total intracranial volume in mm 3 .
• ICV_z: Estimated total intracranial volume, standardized to have zero mean and unit variance.
• Scanner: Factor variable indicating which scanner was used for MRI.
• Age_bl: Age in years at initial MRI session.
• Time: Time in years since initial MRI session.
• Birth_Date_z: Decimal number of years between birth date and 1st January 1970.
The transformed variables with suffix _z were created because the algorithms used in the models to be fitted are most stable when the numbers are of similar magnitude.
Models not Separating Longitudinal and Cross-Sectional Effects. We start by fitting a GAM using only the first timepoint of each participant, using the gam() function from mgcv. By default, gam() uses generalized cross-validation (Golub et al., 1979) for smoothing, but for comparison with mixed models we specify that restricted maximum likelihood (REML) should be used, with the argument method = "REML". The smooth function corresponding to the term f (a ij ) in model (1) is specified with s(Age, k = 20, bs = "cr"), where we use k = 20 cubic regression (bs = "cr") splines. The default is thin-plate splines (bs = "tp"), but in our experience cubic regression splines typically require half the computing time, without yielding poorer fit. ICV, sex, and scanner are used as additional covariates. In addition to mgcv, we load the dplyr package (Wickham et al., 2020) for data manipulation. Throughout this section, the names of the fitted models correspond to the model identifiers in Table 1. library(mgcv); library(dplyr) # Keep only first timepoint (Time = 0) using dplyr's filter function cross_sectional_data <-filter(dat, Time == 0) # Fit GAM to cross-sectional data mod1a <-gam(Hippocampus_z~s(Age, k = 20, bs = "cr") + ICV_z + Sex + Scanner, data = cross_sectional_data, method = "REML") The estimated smooth function can be immediately visualized with the plot() function. The output is not shown, but see the curve labeled mod1a in Figure 13 (left).

plot(mod1a)
We can check that the number of splines is sufficiently high with k.check(), which implements a permutation algorithm described in Wood (2017, Ch. 5.9). A significant p-value and estimated degrees of freedom (edf) close to the maximum degrees of freedom k', indicates that more splines are required. As shown in the output below, k seems sufficiently high. The maximum number of degrees freedom k' = 19 is one less than the number of cubic regression splines, because one degree of freedom is used to center the smooth function such that it has zero mean over the range of age values in the data.
k.check(mod1a) ## k' edf k-index p-value ## s(Age) 19 7.862725 0.9946078 0.3775 Next, we fit model (1) using the complete data. This can be achieved both with the gam() and gamm() functions from mgcv, and the gamm4() function from gamm4. For the type of longitudinal data considered here, with a low number of repeated measurements of a large number of participants, gam() is very slow compared to gamm() and gamm4(). We opt for the latter, as it is typically faster and more numerically stable. The main difference from the model with only cross-sectional data, is that we now specify a random intercept with the argument random =~(1|ID). library(gamm4) mod1b <-gamm4(Hippocampus_z~s(Age, k = 20, bs = "cr") + ICV_z + Sex + Scanner, data = dat, random =~(1|ID)) On a MacBook Pro, fitting this GAMM took 1.7 seconds, while fitting the GAM above took less than 0.08 seconds. The gamm4() function returns a list with two elements, named mer and gam. The element mer contains information from the lmer() function in the lme4 package (Bates et al., 2015) used in the numerical computations, and is useful for studying the random effect distributions. The element gam contains information about the smooth functions, and is useful for studying smooth terms and parametric fixed effects.
# Print information about model object: str(mod1b, max.level = 1) ## List of 2 ## $ mer:Formal class 'lmerMod' [package "lme4"] with 13 slots ## $ gam:List of 32 ## ..-attr(*, "class")= chr "gam" The lme4 package is automatically loaded when gamm4 is loaded, and its accessor functions can be used to study the random effect distributions. The summary below shows that the between-participant variation, σ b = 0.658, is much larger than the within-participant variation,σ = 0.146, where the numbers are in sample standard deviations of hippocampal volume. Note that for the cross-sectional mod1a this information is not available. The second line in the output (σ λ = 0.0235) is related to the formulation of smooth functions as random effects, and the estimated smoothing parameter is given by,λ =σ 2 /σ 2 λ = 38.7.
VarCorr ( Modeling Cohort Effects. Next, we take cohort effects into account by fitting two versions of model (3), mod3a which contains a linear cohort effect term, and mod3b which contains a varying-coefficient term β(a ij ) consisting of five cubic regression splines, allowing the cohort effect to depend on age.
mod3a <-gamm4(Hippocampus_z~s(Age, k = 20, bs = "cr") + Birth_Date_z + ICV_z + Sex + Scanner, data = dat, random =~(1|ID)) mod3b <-gamm4(Hippocampus_z~s(Age, k = 20, bs = "cr") + s(Age, by = Birth_Date_z, bs = "cr", k = 5) + ICV_z + Sex + Scanner, data = dat, random =~(1|ID)) We use the tidy() function from the broom package (Robinson and Hayes, 2020) to study the cohort effect estimated by mod3a. The argument parametric = TRUE is required to extract the parametric effects. Before printing, dplyr's mutate() function is used to convert the numbers to mm 3 by multiplying with the sample standard deviation, and 95 % confidence intervals (CIs) are computed by adding the standard error of the estimate multiplied by the 2.5 % and 97.5 % quantiles of the standard normal distribution. As the output shows, the estimated cohort effect from this model is a Next, the varying-coefficient term in mod3b is extracted as shown below. Its estimated degrees of freedom is 2, implying that the cohort effect is estimated as a straight line defined by an intercept and a slope. Its p-value of 0.0506 also suggests that there is some evidence of an age-dependent cohort effect. The estimated cohort effect can be plotted with the code shown below, using the plot() function for gam objects. The argument scale = 0 ensures that the y-axis limits are adjusted to the term to be plotted, rather than also covering the full range of the term representing the main effect of age. The function supplied to the trans argument is used to convert the estimates back to mm 3 . plot(mod3b$gam, select = "s(Age):Birth_Date_z", scale = 0, trans = function(x) x * sd(dat$Hippocampus)) A slightly modified version of the resulting plot is shown by the solid and dashed lines in Figure 11. The fact that the estimated cohort effect averaged over all ages is slightly negative is in agreement with mod3a, which estimated a negative but non-significant cohort effect. The CIs shown by the plot() function have the property that under repeated sampling from the population, the true function will on average be confined within the upper and lower limits over 95 % of the x-axis (Marra and Wood, 2012;Nychka, 1988). These across-the-function CIs will contain the true function less than 95 % of the time under repeated sampling from the population, which explains why the upper limit in Figure 11 is well below zero despite the p-value being larger than 0.05. Simultaneous CIs, on the other hand, would fully contain the complete function 95 % of the time under repeated sampling, and can be constructed using a simulation-based approach (Ruppert et al., 2003;Simpson, 2016) shown in the Supplementary Material. These simultaneous CIs are shown as the dotted lines in Figure 11, and are wider than the across-the-function CIs. The fact that its upper limit is very close to zero for high ages and its lower limit never is above zero, is in agreement with the p-value being approximately 0.05.
The age-dependent cohort effects estimated by mod3b imply that a participant born in 1970 is expected to have a 131 mm 3 (CI: [−407, 145] mm 3 ) higher hippocampal volume at age 20 than a participant born in 1920 had at age 20. Conversely, a participant born in 1970 is expected to have a 340 mm 3 (CI: [−5.7, 685] mm 3 ) lower hippocampal volume than a participant born in 1920, at age 70. As for mod3a, these results suggest that a cohort effect cannot be ruled out, despite the term not being significant, since cohort effects of relatively large magnitude are contained within the 95 % CIs.
Modeling Baseline Age and Time Since Baseline. Finally, we fit model (2), using the t2() function to create a two-dimensional smooth term . The argument k = c(20, 5) specifies that 20 cubic regression splines are used for the effect of baseline age, while only 5 are used for the effect of time, as this term does not span more than 11 years. The construction of the two-dimensional function involves forming products of all combinations of splines for baseline age and time, implying that the total number of degrees of freedom used by the term equals 20 × 5 − 1 = 99, where the one degree of freedom subtracted has been used for imposing a sum-to-zero constraint. Fitting mod2b below took ≈ 90 seconds on a MacBook Pro. mod2b <-gamm4(Hippocampus_z~t2(Age_bl, Time, k = c(20, 5), bs = "cr") + ICV_z + Sex + Scanner, data = dat, random =~(1|ID)) Model (2) in Section 2.3 could alternatively be formulated with three smooth terms: the main effects of age and time, and their interaction. This allows significance testing of each term separately, e.g. to investigate the extent of age-cohort interactions. Functionality for fitting such a model is provided by mgcv's ti() function, representing a two-dimensional tensor interaction term in which the main effects have been removed (Wood, 2006). As it is not available in gamm4, the gamm() function needs to be used. The syntax is very similar, except that the random intercept is specified with random = list(ID =~1) and the use of REML rather than the default marginal maximum likelihood is specified with method = "REML" for comparability with the models fitted with gamm4().
# Alternative formulation with tensor interaction terms mod2b_ti <-gamm(Hippocampus_z~s(Age_bl, k = 20, bs = "cr") + s(Time, k = 5, bs = "cr") + ti(Age_bl, Time, k = c(20, 5), bs = "cr") + ICV_z + Sex + Scanner, data = dat, random = list(ID =~1), method = "REML") Information about the model components can again be obtained with tidy(), and shows that all terms are significant. Interestingly, the main effect of time is estimated to be linear, as can be seen by its single degree of freedom, while the two-dimensional interaction term is highly nonlinear. Two-dimensional smooth terms can also be visualized with the plot() function, for which a perspective plot is produces by setting scheme = 1. plot(mod2b$gam, scheme = 1) # Plot full tensor product plot(mod2b_ti$gam, select = 3, scheme = 1) # Plot tensor interaction, term #3 The resulting plots are shown in Figure 12. Considering the left part of the plot, the cross-sectional effect is visualized along the baseline age axis, with a trajectory similar to the lifespan hippocampal volume shown in Figure 5. The longitudinal effect, plotted along the time axis, is positive for low baseline ages and negative for higher baseline ages. The tensor interaction term plotted in the right part of Figure 12 shows that the effect of time is more positive in the youngest participants than in adults, while the effect of time is more negative in the oldest participants compared to adults. Since this is an interaction term, the direction of the estimated total longitudinal effect cannot be evaluated based on Figure 12 (right) alone, but also needs to take the main effect into account.
Comparison of Model Fits. Figure 13 shows estimated cross-sectional and longitudinal effects from the five models estimated in this section. The cross-sectional effects are estimated for 1st January 2010, and are all quite similar. Model 1a, estimated with only cross-sectional data, indicates a less steep growth during childhood, and also exhibits some wiggliness between the age of 20 and the age of 50. The longitudinal effects are estimated 15 years ahead from baseline ages of 10, 30, 50, and 70 years. Models 1a and 1b do not distinguish longitudinal and cross-sectional effects, and hence have identical estimates in both plots. The estimates from model 2b are quite different from those of the other models, except for a baseline age of 70. Given the simulation results of Section 3.1, we suspect that the estimates from 2b are not accurate. The longitudinal estimates of models 3a and 3b are close to the estimates of model 1b, again suggesting that the cohort effects in these data are moderate.
Estimating Age at Maximum Volume. A question of interest when estimating lifespan curves, is the age at which critical points occur, e.g. the age of maximum volume, maximum growth, or maximum decline. Point estimates of such critical ages can be read directly from the fits, if necessary after computing derivatives, but an assessment of their statistical uncertainty is not directly available. A Bayesian view of the smoothing introduced by Kimeldorf and Wahba (1970) lets us achieve this. Lettingβ denote the estimated regression parameters, including spline weights, andΣ β their covariance matrix, the posterior distribution of the true coefficients β is now a normal distribution with meanβ and covarianceΣ β , β|y ∼ N (β,Σ β ) (Wood, 2017, Ch. 6.10). By sampling from this posterior distribution we can make confidence statements about any quantity derived from the smooth functions. As an example, Figure 14 shows 50 samples from the posterior distribution of volume curves for cerebellum white matter and hippocampus, with the maximum of each marked with a red dot. Even from these small samples it is evident that there is high uncertainty about the age at which cerebellum white matter volume is maximal, while there is less uncertainty about hippocampal volume. By sampling 20,000 curves and locating the age at maximum for each, we obtained posterior distributions of the age at maximum for each region. Figure 15 shows 95 % highest posterior density intervals computed from the posterior distributions for all 12 regions, using the HDInterval package (Meredith and Kruschke, 2019). The plot shows that the uncertainty about the location of the maximum is highly variable between regions, with very narrow intervals for, e.g. caudate, cerebellum cortex, and thalamus proper, and wide intervals and high uncertainty, e.g. for the brain stem and cerebellum white matter.

Interaction Effects on Lifespan Volume Trajectories
We now demonstrate how factor-smooth interactions can be used to study how lifespan brain volumes are affected by a categorical variable. As an example application we consider the apolipoprotein E (APOE) 4 allele, which is a known risk factor for Alzheimer's disease (Corder et al., 1993;Genin et al., 2011), and study how lifespan trajectories of cerebellum cortex volume differ between carriers of zero, one, or two APOE 4 alleles. Similar models were used by Walhovd et al. (2019), who studied the impact of the APOE 4 allele on lifespan hippocampal volume.
The data are still contained in a dataframe named dat, with identical structure to the data used in Section 3.2.1, except that variables representing hippocampal volume now are replaced by variables Cerebellum and Cerebellum_z, representing cerebellum cortex volume in mm 3 and in sample standard deviations, respectively. In addition, a new variable Gene_APOEnE4 represents the total number of APOE 4 alleles. After excluding participants without information about APOE status, dat contained 2,707 observations of 1,139 participants. Of these, 764 (1,838 observations) had zero alleles, 341 (789 observations) had 1 allele, and 34 (80 observations) had two alleles.
Factor Smooths. In order to estimate the interaction effects, the variable Gene_APOEnE4 needs to be coded as an ordered factor. This is done with the following code. Age at maximum volume (years) Figure 15: Age at maximum volume. 95 % highest posterior density intervals for the age at maximum volume of 12 brain regions. Red dots show posterior means.
dat <-dat %>% mutate(Gene_APOEnE4 = ordered(Gene_APOEnE4)) levels(dat$Gene_APOEnE4) # Print the levels of the ordered factor ## [1] "0" "1" "2" A factor-smooth interaction is defined by s(Age, by = Gene_APOEnE4, k = 10, bs = "cr"). For an ordered factor variable with L levels, this term creates L−1 smooth functions, each representing the difference between the trajectory associated with the lth level and the trajectory associated with the baseline level. The difference between trajectories does not include pure offset effects, and hence the main effect of the ordered factor must be added. In this case, two smooth factor interaction terms are created, associated with 1 or 2 APOE 4 alleles. In contrast, if Gene_APOEnE4 was a numeric variable, gamm4() would estimate a varying-coefficient term as used in Section 3.2.1 treating the number of alleles as a continuous variable, and if Gene_APOEnE4 was a factor variable a single smooth term would be independently estimated for each factor level.
mod <-gamm4(Cerebellum_z~s(Age, k = 10, bs = "cr") + s(Age, by = Gene_APOEnE4, k = 10, bs = "cr") + Gene_APOEnE4 + ICV_z + Sex + Scanner, data = dat, random =~(1|ID)) Again, we use broom's tidy() function to study the estimated smooth terms. The str_detect() function from stringr (Wickham, 2019) is used to detect terms containing the pattern "Gene_APOE". The terms starting with s(Age):Gene_APOEnE4 represent the difference between trajectories of carriers of one or two alleles to carriers of zero alleles, respectively. From the p-values, it is clear that there is no evidence that the shape of the lifespan cerebellum white matter volume depends on APOE 4 status.
library(stringr) tidy(mod$gam) %>% filter(str_detect(term, "Gene_APOE")) # Keep only terms containing pattern "Gene_APOE" The main effects of APOE 4 status can be extracted by supplying the argument parametric = TRUE to tidy(). The estimates, which are in units of sample standard deviations, are close to zero and not significant, indicating that there is no evidence for an offset effect of APOE 4 status on cerebellum cortex volume. The suffixes .L ('linear') and .Q ('quadratic') are a consequence of how R treats ordered factors, and represent the offset effect of having one or two alleles, respectively, relative to having zero alleles.
tidy(mod$gam, parametric = TRUE) %>% filter(str_detect(term, "Gene_APOE")) # Keep only terms containing pattern "Gene_APOE" Prediction from GAMMs. Creating predictions from GAMMs aids interpretation of the estimated effects, and we illustrate it here by comparing the estimated lifespan cerebellum cortex volumes for participants with zero, one, or two APOE 4 alleles. First, a grid over which to compute the predictions is created. Using tidyr's crossing() function (Wickham and Henry, 2020), all combinations of ages between 4 and 94 years with a spacing of 0.1 years, number of APOE 4 alleles, and sexes are generated. The predict() function requires all variables in the model to be defined, and we hence set ICV_z equal to the sample mean and Scanner arbitrarily to "ousAvanto", which is one of the scanners used in the LCBC data. Other values of ICV_z and Scanner would shift the resulting curves vertically, but the interpretation would not change.
pred <-predict(mod$gam, newdata = grid) # Compute fit on grid plot_df <-grid %>% # Add fit to grid and convert to mm^3 mutate(fit = pred * sd(dat$Cerebellum) + mean(dat$Cerebellum)) library(ggplot2) # Plot with grouping by sex and number of alleles ggplot(plot_df, aes(x = Age, y = fit, group = interaction(Gene_APOEnE4, Sex), color = Gene_APOEnE4, linetype = Sex)) + geom_line() A slightly modified version of the resulting plot is shown in Figure 16. Note that since Sex is a parametric term, it merely shifts the curves vertically, without changing their shapes. In this example, none of the interaction effects were significant, but the plots still show how smooth interaction terms create different functional shapes depending on the number of APOE 4 alleles.

Discussion
This paper has highlighted that GAMMs are well-suited for estimating lifespan brain trajectories. However, the issue of potential cohort effects requires careful consideration, and direct translation of LMM formulations used for separating longitudinal and cross-sectional effects has potential pitfalls. In Section 2.3 we defined model (1) which ignores cohort effects, model (2) which is a direct extension of LMMs commonly used to separate longitudinal and cross-sectional effects, and the alternative model (3) which includes participant birth date as a model term. These models' abilities to accurately estimate longitudinal and crosssectional effects were compared in realistic simulation experiments reported in Section 3.1. Not surprisingly, in the absence of cohort effects model (1) was most accurate, with the version of (1) using longitudinal data performing slightly better than the equivalent model with only cross-sectional data. With cohort effects, on the other hand, model (3) was most accurate. More importantly, model (2) -which may be seen as a "classic" model used to separate longitudinal and cohort effects -consistently performed worse than (3), and as shown in Figure 9 it had considerably higher variation across the simulated samples. Figure 9 also shows that model (2) was closest to model (3) for accumbens area and caudate volumes, whose lifespan trajectories are close to linear, while model (3) had the clearest advantage for regions with highly nonlinear trajectories, e.g. amygdala and cerebellum cortex. A natural interpretation of this, supported by Figures 7 and 8, is that model (2) is not able to estimate longitudinal effects beyond the typical follow-up interval. In the extreme case of linear longitudinal effects, on the other hand, longitudinal effects for any time after baseline can in principle be accurately estimated with follow-up intervals of arbitrary length. Interpretation of the terms in model (2) as longitudinal and cross-sectional effects also requires that all participants have equal dates of initial measurement. However, simulations reported in the Supplementary Material suggest that varying baseline dates have a very small effect on the accuracy of model (2) in the settings considered here.
While limitations will vary with regard to study specific characteristics, we find it important to emphasize, in light of the present findings, that "classic" models will never be able to make accurate estimates of lifespan trajectories, or even trajectories for any substantial part of the lifespan. For technical reasons, if no other, available human cohorts with longitudinal imaging data do not span the desired intervals. Furthermore, reaching acceptable power is impossible if dates of initial measurement are to be contained within a small fraction of time. A look at some of the most powerful and impressive combined cross-sectional and longitudinal studies of brain changes with age, suggests that fractional follow-up interval and range of variation in initial measurement dates realistically need to be accommodated in all statistical models. For instance, even the ABCD study (Casey et al., 2018) which utilizes numerous scan sites to track development in thousands of children at very similar age, need to allow for some variation in initial date of measurement, and follow-up intervals are so far limited to a couple of years. While there luckily are major and most impressive studies that contain information on participant samples for many decades, such as the Whitehall study (Filippini et al., 2014), the Baltimore Longitudinal Study of Aging (Tian et al., 2015), the Betula (Gorbach et al., 2017), or the Lothian Birth Cohort study (Cox et al., 2018) these still await longitudinal imaging data (Filippini et al., 2014), or typically have MRI data only for a small fraction of the time, less than a decade (Cox et al., 2018;Gorbach et al., 2017;Tian et al., 2015), sometimes with scan waves being completed across several years (Gorbach et al., 2017). We note this not as a critique of any study, but as a reminder that statistical models at the very least need to accomodate the realistic situation for the best possible data.

When is Cross-Sectional Data Sufficient?
The dangers of using cross-sectional data have repeatedly been pointed out in the quantitative psychology literature. For example, mediation analysis using purely cross-sectional data is likely to lead to biased and misleading estimates under realistic conditions (Cole and Maxwell, 2003;Lindenberger et al., 2011;Lindenberger and Pötter, 1998). In mediation analysis, the goal is to understand the causal paths through which one or more variables x influence an outcome y, directly or through one or more mediating variables m. Since a cause precedes its effects, carefully designed longitudinal data collection as well as models capable of utilizing this information are then necessary (Collins et al., 1998). Longitudinal data is also required to understand how within-individual change differs from between-individual change and how within-individual change is correlated across multiple processes (Lindenberger et al., 2011;Molenaar, 2004). Traditionally, such studies, for which SEMs are ideal analysis tools, have been conducted by following a group of participants of similar age over a number of waves, e.g. Cox et al. (2020); Raz et al. (2005Raz et al. ( , 2010. While the above mentioned cautions about use of cross-sectional data are completely justified, they do not necessarily extrapolate to estimation of lifespan trajectories. If the goal is to estimate the population effect of aging on the volume of one or more brain regions, potentially including interaction effects of static trait variables like genetic variations or education level (after completed education), a single measurement per participant may be sufficient. One example is when the strong assumption of no cohort effects is made. If it holds, cross-sectional and longitudinal effects are equal, and both can be accurately estimated by model (1) using purely cross-sectional data. However, the assumption of no cohort effects is not always necessary; with sufficient variation in baseline dates, model (3) is in principle able to estimate longitudinal and crosssectional effects using a single measurement per participant. In practice, however, we have experienced that models of the form (3) become more stable and accurate with longitudinal data. In particular, both the additional variation provided by repeated measurements with heterogeneous follow-up intervals and the correlation between repeated measurements of the same participant likely contribute to better separation of age effects and cohort effects. Furthermore, as shown in Section 3.2.1, a GAMM using longitudinal data also estimates the between-individual variation and the within-individual variation, quantifying the extent to which differences between participants are due to systematic variation and noise, respectively.

Limitations and Future Directions
The GAMMs studied in this paper also have some limitations. Model (3) is not identified if all participants have been measured at the exact same dates, since the cohort term c i then is a deterministic function of age. This is also true with longitudinal data, and emphasizes the fact that these models are developed for heterogeneous data, typically combined from multiple studies.
There is also a need for methodological development related to estimation of correlated change between regions. In principle, this could be done by fitting GAMMs separately for each region, and using the correlation of random slopes across regions as an estimate of correlated change. A more principled approach is offered by joint modeling frameworks for LMMs Verbeke, 2004, 2006), which in this case would amount to fitting a single hierarchical GAMM (Pedersen et al., 2019) for the lifespan trajectories of multiple regions, with interaction terms distinguish trajectories for each region and random effect structures modeling the within-individual level and change correlation between region trajectories. However, the fact that the extent of correlated change between any pair of brain regions is likely to vary across the lifespan would also need to be taken into account, e.g. by modeling correlations as functions of age. Combined with the need for three or more timepoints to accurately estimate random slopes, it may currently be challenging to obtain a sufficient amount of longitudinal data for fitting such models.
Furthermore, while we have considered GAMMs for estimating how time-invariant variables interact with lifespan trajectories in Section 3.2.2, interaction with time-dependent variables may also be of interest. Although time-dependent interaction variables can be used within the framework considered here, the interpretation of the estimated effects becomes more challenging. If only the value of the time-dependent variable at the given timepoint affects the outcome, the effect can be interpreted in exactly the same way as for a time-invariant variable (Fitzmaurice et al., 2011, Ch. 13.5). In many applications, however, it may be more plausible to assume that also the variable's change since the previous timepoint contains relevant information, and in this case the models used in Section 3.2.2 will have biased estimates. Continuous-time SEMs (Driver and Voelkle, 2018;Oud and Jansen, 2000) may be useful for this purpose, as they allow regressing a time-dependent process (the outcome of interest) on the value of another time-dependent process (the interaction variable) at earlier times, without the restrictive assumption of equally spaced time intervals imposed by ordinary SEMs. However, estimation of nonlinear smooth functions within continuous-time SEMs has not been reported in the literature, and is likely to be both computationally challenging and require large longitudinal datasets.

Conclusion
GAMMs are attractive tools for estimating lifespan brain trajectories, which flexibly handle the nonlinear effects and variable follow-up intervals and measurement dates characteristic of lifespan data. If cohort effects are negligible, GAMMs on the form (1) which directly model the effect of age yield the most accurate estimates, and in this case purely cross-sectional data may even be sufficient. More realistically, cohort effects are likely to be present, and in this case the GAMM (3) which directly models the effect of birth cohort is able to accurately estimate longitudinal and cross-sectional effects. On the other hand, the GAMM (2) which separates the effect of age into a baseline term and a time term as is common with LMMs, yields poor estimates of longitudinal effects. With sufficient variation of measurement dates and follow-up intervals, we thus recommend model (3) for estimating lifespan brain trajectories. On the other hand, for time-structured data containing little variation in measurement dates, model (3) is not identified and model (2) seems to be the best option, with the caveat that estimated longitudinal effects will not be reliable for times larger than the average follow-up interval, as will also be apparent from the confidence intervals.
The R packages mgcv and gamm4 provide efficient software for fitting GAMMs, and are complemented by additional packages enabling easy visualization and interpretation of summary statistics.

Declaration of Competing Interest
The authors report no competing interests.

Data and Code Availability Statement
Code for running simulation experiments is available in the Supplementary Material. The LCBC data are available from the corresponding author on reasonable request, given appropriate ethical and data protection approvals.

Ethics Statement
Collection of the LCBC was approved by a Norwegian Regional Committee for Medical and Health Research Ethics South. Written informed consent was obtained from all participants of age at majority, and the parents or guardian for children below this age.