“Brain age” relates to early life factors but not to accelerated brain aging

Vidal-Piñeiro, D. PhD*, Wang, Y. PhD, Krogsrud, SK. PhD, Amlien, IK. PhD, Baaré, WFC. PhD, Bartrés-Faz, D. PhD, Bertram, L. MD, Brandmaier, A.M. Dr, Drevon CA. MD, PhD, Düzel, S. PhD, Ebmeier KP., MD, Henson RN PhD, Junque, C. PhD, Kievit RA, Kühn, S. PhD, Leonardsen, E. MsC, Lindenberger, U. PhD, Madsen, KS. PhD, Magnussen, F. MsC, Mowinckel, AM. PhD, Nyberg, L. PhD, Roe, JM. PhD, Segura B. PhD, Sørensen, Ø. PhD, Suri S. DPhil, Zsoldos E. DPhil, the Australian Imaging Biomarkers and Lifestyle flagship study of ageing**, Walhovd, KB. PhD, and Fjell, AM. PhD


Introduction
The concept of brain age is increasingly used to capture inter-individual differences in the integrity of the aging brain 1 . The biological age of the brain is estimated typically by applying machine learning to magnetic resonance imaging (MRI) data to predict chronological age. The difference between brain age and chronological age (brain age delta) reflects the deviation from the expected norm and is often used to index brain health. Brain age delta has been related to brain, mental, and cognitive health and proved valuable in predicting outcomes such as mortality [1][2][3] . To different degrees, it is assumed that brain age delta reflects past and ongoing neurobiological aging processes 1,3-6 . Hence, it is common to interpret positive brain age deltas as reflecting accelerated aging 1,4,6 .
The assumption that brain age delta reflects an ongoing process of neurobiological aging implies that there should be a relationship between cross-sectional and longitudinal estimates of brain age.
Alternatively, deviation from the expected brain age could show lifelong stability and capture early genetic and environmental influences 3,7,8 . These perspectives offer fundamentally divergent interpretations of results showing higher brain age (delta) in groups experiencing specific life events, brain disorders, and other medical problems. Here we tested whether brain age is related to accelerated brain aging, early-life factors, or a combination of both (Fig. 1a). If brain age reflects accelerated brain aging, cross-sectional brain age delta -indexed by the centercept -should be positively associated with yearly increases of brain age delta over time (brain age deltalong). If the early-life account plays a substantial role, one should observe a relationship between brain age and early factors -indexed here as birth weight and polygenic scores for brain age (PGS-BA) given evidence of lifelong effects of genetic risk on age-related phenotypes 9,10 (Fig. 1b).
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint

Results
Chronological age (Fig. 1c) was predicted based on multimodal regional and global features from structural T1-weighted (T1w) MRI, including cortical thickness, area, volume, and gray-white matter contrast, as well as volume and intensity of subcortical structures (|N| = 365). See list in Supplementary Table 1, 2, and Fig. 1d for pairwise correlations with age. The model was trained on 38682 participants with a single MRI from the UK Biobank 11 dataset using gradient boosting as implemented in XGBoost (https://xgboost.readthedocs.io) and optimized using 10-fold crossvalidation and a randomized hyper-parameters search. The trained model (Fig. 1e) was then used to predict brain age for an independent test dataset of 1372 participants with 2 MRIs each (age range = 47.2 -80.6 years, mean [SD] follow-up = 2.3 [0.1] years). The predictions revealed a high correlation between chronological and brain age (r = 0.82) with mean absolute error (MAE) = 3.31 years and root mean squared error (RMSE) = 4.14 years (Fig. 1f), comparable to other brain age models using UK Biobank MRI data 12 . Brain age delta was calculated as the difference between brain and chronological age. We used generalized additive models (GAM) to correct for the brain-age bias, i.e., the underestimation of brain age in older individuals and vice versa 6 . Brain age delta at baseline and follow-up were strongly correlated (r = 0.81). To corroborate generalizability, we replicated our results using a different machine learning algorithm -a LASSO-based approach 12 -and an independent longitudinal sample from the Lifebrain consortium 13 with up to 11.2 years of follow-up (3292 unique participants, age range = 18.0 -94.4 years). See Supplementary Fig. 1 and Supplementary Table 3 for additional information. All the code used to generate the results will be available at https://github.com/LCBC-UiO/VidalPineiro_BrainAge. . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint Fig. 1. Theoretical expectations and study characteristics. a) Three hypothetical trajectories leading to higher brain age delta. Higher brain age delta can be explained by a steeper rate of neurobiological aging (green), distinct events that led to the accumulation of brain damage in the past (yellow), or early-life genetic and developmental factors (purple). The black . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint 8 arrow represents normative values of brain age through the lifespan. b) Brain aging (green) vs. early-life (blue-purple) accounts of brain age in older age. For the brain aging notion, cross-sectional brain age (points) relates to the slope of brain age as assessed by two or more observations across time (continuous line), reflecting ongoing differences in the rate of aging (dashed line, green scale). For the early-life notion, cross-sectional brain age (points), relates to early environmental, genetic, and/or developmental differences such as birth weight (blue-purple scale). c) Relative age distribution for the UK Biobank test and training datasets. d) Age variance explained (r 2 ) for each MRI feature in the training dataset. Features are grouped by modality and ordered by the variance explained. e) Brain age model as estimated on the training (n = 38682), and f) test datasets (participants = 1372; two observations each). In e) and f), lines represent the identity (grey), the linear (green), and the GAM (orange) fits of brain age by chronological age. Confidence intervals represent standard errors (SE). In d) gwc = gray-white matter contrast, (c) = cortical, and (s) = subcortical.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in  Next, we tested if birth weight was associated with brain age delta or change in brain age delta. Linear mixed models were used to fit time (from baseline; years), birth weight, and its interaction on brain age delta, using age at baseline, sex, site, and eICV as covariates. Birth weight was significantly related to brain age delta (β = -0.70 [± . 30] year/kg, t (p) = -2.3 (.02), r 2 = .009, Fig. 3a) but not to delta change were replicated using the LASSO approach (β = -0.79 [± .29] year/kg, t (p) = -2.8 (0.006), r 2 = .009, Fig.   3b).
Finally, we tested whether polygenic scores for brain age delta (PGS-BA) related to brain age delta and change in brain age delta (n = 1339). PGS-BA was computed using a mixture-normal model based on a genome-wide association study (GWAS) of the brain age delta phenotype in the UK Biobank training dataset. To test the association, linear mixed models were used as above with 10 additional . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint covariates accounting for population structure. See Supplementary Fig. 5 for GWAS results. PGS-BA was positively associated with brain age delta (β = 0.54 [± 0.09] year/kg, t (p) = 9.4 (< .001), r 2 = .02, Fig. 3c) and negatively associated with brain age delta change (β = -0.06 [± .03] year/kg, t (p) = -2.4 (0.02)) in the independent test dataset. Likewise, PGS-BA was associated with brain age delta derived from the LASSO algorithm (β = 0.53 [± 0.09] year, t (p) = 10.4 (< 0.001), r 2 = .02) but not to brain age delta change (β = -0.001 [± .02] year, t (p) = 0.0 (1.0)).

Fig. 3. Relationship between cross-sectional brain age delta and birth weight. a) Main analysis using the UK Biobank dataset
and boosting gradient (n = 770). b) Replication analyses using a different training algorithm (LASSO) (n = 770). c) Relationship between polygenic scores for brain age delta and brain age delta (n = 1339). XGB = boosting gradient as implemented in XGBoost. Confidence intervals represent SE.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint

Discussion
Altogether, these findings do not support the claim that cross-sectional brain age is related to ongoing brain aging. Rather, brain age seems to reflect early-life influences, and only to a negligible degree actual brain change in middle and old adulthood. A lack of relationship between brain age and rate of brain aging can potentially be explained by the effect of circumscribed events such as isolated insults or detrimental lifestyles that occurred in the past resulting in higher but not accelerating, brain age.
Yet, variations in brain age can equally reflect developmental and early-life differences and show lifelong stability. Brain-age paradigms are generally ill-suited for disentangling between these sources of variation but are often interpreted in line with the former. This assumes that variation in brain age largely results from the accumulation of damage and insults during the lifespan, with similar starting points for everyone. An exception is Elliott and colleagues 3 , who found that middle-aged individuals with higher brain age already exhibited poorer cognitive function and brain health at age three years.
This fits a robust corpus of literature showing effects of lifelong, stable influences as indexed by childhood IQ 14 , genetics 10 , and neonatal characteristics 8 on brain and cognitive variation in old age.
Strictly speaking, brain age delta is a prediction error from a model that maximizes the prediction of age in cross-sectional data. Prediction errors also reflect noise, attenuating any relation between cross-sectional and longitudinal brain age. The Lifebrain replication sample with more observations and longer follow-up reduces the likelihood of noise as the main factor behind the lack of relationship.
Furthermore, previous studies have found that changes in brain age are partly heritable 15 , suggesting that it captures biologically relevant signal, although with substantially different origins from crosssectional brain age. Without longitudinal imaging, one should thus not interpret brain age as accelerated aging. This aligns with theoretical claims and empirical observations that covariance structures capturing differences between individuals do not necessarily generalize to covariance . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint structures within individuals 16,17 . Neither does indirect evidence, via associations with other bodily markers of aging or with cognitive decline, yield decisive support for cross-sectional brain age as a marker of individual differences in brain aging 2,3,18 . Relationships between cross-sectional and longitudinal brain age may thus be restricted to specific disease groups such as Alzheimer's disease patients 18 where interindividual brain variation is dominated by the prevailing loss of brain structural integrity.
The results further showed that birth weight, which reflects differences in genetic propensities and to a large degree prenatal environment 19 , explained a modest portion of the variance in brain age. Subtle variations in birth weight are associated with brain structure early in life and present throughout the lifespan 8 . This association should be considered as proof-of-concept that the metric of brain age reflects the distant past more than presently ongoing events in the morphological structure of the brain. This was confirmed by the consistent association between PGS-BA and brain age delta but not with brain age delta change. Since PGS-BA was computed based on cross-sectional brain age delta, this relationship may not be surprising, but still suggests a different genetic foundation for longitudinal brain age. These findings link with evidence that brain development is strongly influenced by genetic architecture that, in interaction with environmental factors, lead to substantial, long-lasting effects on brain structure. By contrast, aging mechanisms seem to be more related to limitations of maintenance and repair functions and have a more stochastic nature 20 .
As distance from birth increases, chronological age as a marker of individual development is reduced.
The results call for caution in interpreting brain-derived indices of aging based on cross-sectional MRI data and underscores the need to rely on longitudinal data whenever the goal is to understand the trajectories of brain and cognition in aging.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

Participants and Samples
The main sample was drawn from the UK Biobank neuroimaging branch (https://www.ukbiobank.ac.uk/) 11 Table 4) have ethical approval from the respective regional ethics committees . All participants provided informed consent.

MRI acquisition and preprocessing
See https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf for details on the UK Biobank T1-weighted (T1w) MRI acquisition. UK Biobank and Lifebrain MRI data were acquired with 3 and 10 . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in We used summary regional and global metrics derived from T1w data. For UK Biobank we used the imaging-derived phenotypes developed centrally by UK Biobank researchers 11 [29][30][31] and used similar atlases for structural segmentation and feature extraction.

Birth weight
We used birth weight (Kg) from the UK Biobank (field #20022). Participants were asked to enter their birth weight at the initial assessment visit, the first repeat assessment visit, or the first imaging visit.
In the case of multiple birth weight instances, we used the latest available input. n = 894 participants from the test dataset had available data on birth weight. The main analysis was constrained to normal variations in birth weight between 2.5 and 4.5 Kg (n = 770) 32 due to lower reliability of extreme scores and to tentatively remove participants with severe medical complications associated with prematurity.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

Genetic preprocessing
Detailed information on genotyping, imputation, and quality control was published by Bycroft and colleagues 33 . For genetic analyses, we only included participants with both genotypes and MRI scans.
Following the recommendations from the UK Biobank website, we excluded individuals with failed genotyping, that had abnormal heterozygosity status, or that withdrew their consents. We also removed participants that were genetically related -up to the third degree -to at least another participant as estimated by the kinship coefficients as implemented in PLINK 34

Polygenic scores (PGS)
The GWAS results for the training dataset were used to compute PGS (PGS-BA) in the independent test dataset (n = 1339 participants). We used the recently developed method PRS-CS 36 to estimate the posterior effect sizes of SNPs that were shown to have high quality in the HapMap data 37 . Rather than estimating the polygenicity of brain age delta from our data, we assumed a highly polygenic . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint architecture for brain age delta by setting the parameter --phi=0.01 38 . The remaining parameters of PRS-CR were set to the default values. PGS was based on 654725 SNPs and was computed on the independent test data using the --score function from PLINK. We also computed the population structures PCs' in the test dataset using the same procedure as in the training dataset.

Statistical analyses
The code used in this manuscript is available at https://github.com/LCBC-UiO/VidalPineiro_BrainAge. All statistical analyses were run with R version 3.6.3 https://www.r-project.org/. We used the UK Biobank as the main sample and the Lifebrain cohort for independent replication. The main description refers to the UK Biobank pipeline, though Lifebrain replication followed identical steps unless otherwise stated. For replication across machine learning pipelines, we used a LASSO regression approach for age prediction, adapted from https://james-cole.github.io/UKBiobank-Brain-Age/. See more details in Cole, 2020 12 . The correlation between LASSO-based and Gradient Boosting-based brain age deltas was .80.

Brain age prediction
We used machine learning to estimate each individuals' brain age based on a set of regional and global features extracted from T1w sequences. We estimated brain age using gradient tree boosting (https://xgboost.readthedocs.io). We used participants with only one MRI scan for the training dataset (n = 36682) and participants with longitudinal data as test dataset (n = 1372). All variables were scaled prior to any analyses using the training dataset metrics as reference.
Next, we recomputed the machine learning model using the entire training dataset and the optimal hyper-parameters and used it to predict brain age for the test dataset. The predictions revealed a high correlation between chronological and brain-predicted age (r = 0.82) with MAE = 3.31 years and RMSE = 4.14 years (Fig. 1e). These metrics are similar or better than other brain age models using UK Biobank MRI data 12,39 , and than the cross-validation diagnostics. Brain age delta was estimated as the difference between brain age and chronological age. We used GAM to correct for the brain-age bias estimation where brain age is underestimated in older ages 6 ; r = -0.54 for the test dataset. Note that we used GAM fittings as estimated in the training dataset so delta values in the test dataset are not centered to 0. The correlation between brain age delta corrected based on the training vs. the test fit was r > 0.99. Also, GAM-based bias correction led to similar brain age delta estimations to linear and quadratic-based corrections (r > 0.99).

Higher level-analysis
Relationship between cross-sectional and longitudinal brain age. For each participant, we computed the mean brain age delta across the two MRI time points and the yearly rate of change (brain age deltalong). We selected mean, instead of baseline brain age delta, to avoid statistical dependency between both indices 40,41 . Brain age deltalong was fitted by mean brain age delta using a linear . CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint regression model, which accounted for age, sex, site, and estimated intracranial volume (eICV). We used mean eICV across both time points.
Relationship between brain age PGS and cross-sectional and longitudinal brain age. This association was tested using linear mixed models with time from baseline (years), PGS-BA, and its interaction on brain age delta. Age at baseline, sex, site, eICV, and the 10 first principal components for population structure were used as covariates. The principal components of population structure were added to minimize false positives associated with any form of relatedness within the sample. Effects of birth weight on brain age. Linear mixed models were used to fit time, birth weight, and its interaction on brain age delta, using age at baseline, sex, site, and eICV as covariates. We explored the consistency of the results by modifying the birth weight limits in a grid-like fashion [0.5, 2.7, 0.025] and [4.2, 6.5, 0.025] for minimum and maximum birth weight (Supplementary Fig. 4). Self-reported birth weight is a reliable estimate of actual birth weight. However, extreme values are either misestimated or reflect profound gestational abnormalities 42,43 . Assumptions were checked for the main statistical tests using plot diagnostics. Variance explained for single terms refers to unique variance (UVE), which is defined as the difference in explained variance between the full model and the model without the term of interest. For linear mixed models, UVE was estimated as implemented in the MuMIn r-package.
Equivalence tests. Post-hoc equivalence tests were carried to test for the absence of a relationship between cross-sectional and brain age deltalong 44 . Specifically, we used inferiority tests, to test whether a null hypothesis of an effect as least as large as Δ (in years/delta) could be rejected. We rerun the three main models assessing a relationship between cross-sectional and longitudinal brain age delta (UK Biobank trained with boosting gradient, UK Biobank trained with LASSO, and Lifebrain trained with boosting gradient) varying the right-hand-side test (Δ) [-0.02, 0.05, 0.001] (p < 0.05, onetailed) (Supplementary Fig. 2).
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in GAMs. Quality control. Prior to any analysis, we tentatively removed observations for which >5% of the features fell above or below 5 SD from the sample mean. The application of this arbitrary high threshold led to the removal of 10 observations. We considered these MRI data to be extreme outliers and likely to be artifactual and/or contaminated by important sources of noise. Also, before brain prediction, we tentatively removed variance associated with the different scanners using generalized additive mixed models (GAMM) and controlling for age as a smooth factor and a subject-identifier as random intercept. This correction was performed due to differences in age distribution by scanner and lack of across scanner calibration. Hyperparameter search and model diagnostics. The optimal parameters for the Lifebrain replication sample were: number of estimators = 600, learning rate = 0.05, maximum depth = 4, gamma = 1.5, and min child weight = 1. Using cross-validation, the model predicted r 2 = 0.92 of the age-variance with MAE = 4.75 and RMSE = 6.31. Brain age was underestimated in older age (bias r = -0.33). Model prediction. The age-variance explained by brain age was r = 0.90 with MAE = 4.68 and RMSE = 6.06. Brain age was underestimated in older age (bias r = -0.25) (Supplementary Fig. 3). Higher level-analysis. For each individual, mean brain age delta was considered as the grand-mean brain age delta across the different MRI time points. To compute brain age deltalong we set for each participant a linear regression model with observations equal to the number of time points that fitted brain age delta by time since the initial visit. Slope indexed change in brain age delta/year. The relationship between mean and brain age deltalong was tested using linear mixed models controlling for age, sex, and eICV as fixed effects, and using a site identifier as a random intercept. Note that eICV was identical across timepoints as a result of being estimated through the longitudinal FreeSurfer pipeline.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in  20 We performed control analyses to account for possible effects of varying follow-up intervals and to consider the presence of young adults in the Lifebrain sample. We repeated the analyses including follow-up interval as an additional covariate, restricting the analysis to individuals with a follow-up of >4 years (n = 424). The relationship between cross-sectional and brain age deltalong was not significant in both cases (β = -0.008 [± 0.01] year/delta, t (p) = -0.7 (.45); β = -0.008 [± 0.007] year/delta, t (p) = -1.1 (.26)). We could not obtain the required information on genetics and birth weight to replicate the analyses supporting the early-life account.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint

Data availability
The raw data were gathered from the UK Biobank, the Lifebrain cohort, and the AIBL. Raw data requests are specific to each cohort. UK Biobank and AIBL data are available upon application to UK Biobank and at https://aibl.csiro.au upon corresponding approvals. For the Lifebrain cohorts, requests for raw MRI data should be submitted to the corresponding principal investigator. See contact details in Supplementary Table 5. Note that MRI data availability for some individuals may be restricted as participants did not consent to share publicly their data. Different restrictions and sample agreements might be required.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint

Code availability
Statistical analyses in this manuscript will be available at https://github.com/LCBC-UiO/VidalPineiro_BrainAge. All analyses were performed in R 3.6.3. The scripts were run on the Colossus processing cluster, University of Oslo. UK Biobanks' data acquisition, MRI preprocessing, and feature generation pipelines are freely available (https://www.fmrib.ox.ac.uk/ukbiobank). For the Lifebrain cohorts, the image acquisition details are summarized in Supplementary Table 4. MRI preprocessing and feature generation scripts were performed with the freely available FreeSurfer software (https://surfer.nmr.mgh.harvard.edu/). For bash-sourcing scripts, please contact the corresponding author.
. CC-BY-NC 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted February 8, 2021. ; https://doi.org/10.1101/2021.02.08.428915 doi: bioRxiv preprint