r a f t Cross-sectional brain age assessments are limited in predicting future brain change

The concept of brain age (BA) describes an integrative imaging marker of brain health, often suggested to reflect ageing processes. However, the degree to which cross-sectional MRI features, including BA, reflect past, ongoing and future brain changes across different tissue types from macro-to microstructure remains controversial. Multimodal imaging data of 39 , 325 UK Biobank participants, aged 44 − 82 years at baseline and 2 , 520 follow-ups within 1 . 12 − 6 . 90 years, provide insufficient evidence that BA reflects the rate of brain ageing.

Biomarkers which successfully characterise ageing still need to be established.An emerging candidate for such a marker is the concept of biological brain age (BA).Algorithms that predict BA provide insight into the differences between imaging metrics of healthy populations and independent target populations, for example, presenting a certain pathology.BA can be predicted from different types of imaging data, such as different modalities or brain regions 1 .The difference between BA and chronological age, called the brain age gap (BAG), has been used as a proxy for brain health.Previous studies identified the largest group-level differences in BAG between healthy controls and individuals with neurodegenerative disorders 2,3 which makes BAG particularly interesting in the context of ageing; both healthy and pathological.
To increase the clinical utility of BAG metrics, it is necessary to understand the degree to which cross-sectional BAG can predict brain ageing later in life 4,5 .
Here, we capitalised on the largest accessible multimodal magnetic resonance imaging (MRI) dataset featuring T 1 -weighted and diffusion MRI from the UK Biobank including thousands of healthily ageing participants.These two modalities have previously been shown to be accurate BA predictors 1,2,6 .After exclusions based on poor MRI data, participant withdrawal from the study, and presence of a psychiatric or neurological disorder based on the ICD-10 (see Online Methods), we retained baseline brain scans of a total of N T P 1 =39,325 individuals.BA prediction models were trained on data from participants without available follow-up scans (N = 36,805, aged 64.63 ± 7.70 years, range: 44.57− 82.75 years).
For model training, different machine learning algorithms were implemented, using k-fold cross-validation with 5 outer and 10 inner folds, including hyperparameter tuning, to determine the best performing model (see Online Methods for details on algorithms probed).Among the various approaches, the best performing algorithm was linear regression which was ultimately used for testing, predicting individual BA from T 1 -weighted and diffusion MRI-extracted brain features individually as well as their combination (multimodal MRI) on two data points from a subset of participants with follow-up scans (N=2,520), aged 62.22±7.23 years (range: 46.63−80.30years) at baseline.The follow-up scan was obtained within 2.45 ± 0.75 years from the baseline scan (range: 1.12 − 6.90 years).To account for age bias introduced by the training sample's age-distribution, we used a linear age correction (Online Methods).The brain features used in the three models were region-averaged cortical surface area, volume, and thickness measures extracted from FreeSurfer 7 recon-all pipeline (208 total features), and various diffusion measures from conventional and advanced diffusion approaches (1,794 total features, see Online Methods).
Despite the short inter-scan interval (ISI), we could observe tissue maturation indicated by significant time-point differences in MRI-derived regional brain features (cortical thickness, surface area, cortical volume and diffusion metrics across brainregions, Fig. 1d).More than 90% of the T 1 -weighted, and more than 78% of the diffusion-derived features changed significantly ( |d T1w | = 0.26, |d dM RI | = 0.15; Fig. 1d, Suppl.Data 1), with larger magnitude of these changes observed for T 1 -weighted (|d| = 0.20 − 0.30; Suppl.Fig. 5) compared to diffusion metrics (|d| = 0.12 − 0.16; see Suppl.Fig. 6 metric-level changes).The features that showed significant change (p <0.05, N T1w =202,N dM RI =1618) between baseline and follow-up were then used to compute principal components of the centercepts/averages ( P C) and the annual rate of change of these features (∆P C; Suppl.Figs. 1, 3).Fig. 1 demonstrates that BA trained on cross-sectional data can be applied in longitudinal data.Training and test sample characteristics were similar (Fig. 1a).
The stronger association between BAG from T 1 -weighted features and both the respective principal component of change and ∆BAG were not necessarily reflected when predicting the regional ageing patterns across metrics within modalities, outlining significant associations in only 9% of change in specific brain regions.While cross-sectional T 1 -weighted BAG was limited in predicting future change in T 1weighted features, longitudinal T 1 -weighted BAG significantly correlated with 75% of these changes (Fig. 2c).DMRI-based cross-sectional BAG reflected a higher proportion (38%) of regional brain change, also indicated by larger significant effect sizes (| β| = 0.140), compared to T 1 -weighted (| β| = 0.121), but not multimodal BAG (| β| = 0.190).The strongest regional dMRI-based BAG associations were found for BRIA's microscopic fractional anisotropy, in the anterior corona radiata (β = −0.263),superior longitudinal fasciculus (β = −0.262),and intra-axonal water fraction in the corpus callosum (β = −0.262)and superior longitudinal fasciculus (β = −0.238).However, the overall strongest associations with BAG were found for skeleton averages of microscopic fractional anisotropy (β = −0.280)and the intra-axonal water fraction (β = −0.277).
As a higher BAG can be expected at higher ages and potentially also the rate of change in BAG to accelerate, we show that our analyses are independent of both, by correcting for the age bias (see Online Methods), and by showing that baseline BAG can predict future changes in BAG, independent of ISI, between baseline and followup.This was indicated by the effect of the interaction between the ISI and BAG  on ∆BAG being non-significant (p > 0.05; Suppl.Table 5, Suppl.Fig. 2) when using either a linear or cubic interaction term.This indicates that the observed associations between cross-sectional BAG and ∆BAG were independent of the ISI in the current study, and hence not just an artefact of study design, age or ageing.
BAG was limited in reflecting sub-clinical health characteristics.Our sample was selected to not contain neurological or psychiatric disorders and showed relatively stable health based on various health indicators.Yet, these health indicators were limited in reflecting BAG.Health characteristics were evaluated by examining different risk factors for age-related diseases and mortality, including cardiometabolics, depression, neuroticism, and polygenic risk scores (PGRS) of different disorders.
Small associations were found between both BAGs and ∆BAG and different crosssectional health indicators (Suppl.Fig.The identified increase in BAG over time indicates an acceleration of brain ageing during ageing without neurological or neuropsychiatric diagnosis, which has also previously been highlighted in white matter microstructure 9 .In contrast, BAG during adulthood without pathology can be expected to be stable, since tissue changes remain small 5 .However, here, we show that during pathology-free ageing, which is accompanied by regional brain changes, also the BAG will change over time.Hence, BAG provides an indicator of the brain's morphometric state.Such state has been shown to be influential for the future development of disorders, as indicated by disability accumulation in multiple sclerosis 10 or changes in dementia ratings 11 .However, the biological underpinnings of these associations remain unclear.
We observed considerable modality-dependent differences in brain ages.BAG dM RI was most predictive of brain change.Modality-dependent differences might originate from the attempt to reduce a more complex feature space into single scores, such as brain age or principal components.Future modelling might focus on different spatial scales, such as voxel-level analysis, and simultaneously on different biophysical modelling approaches to extract meaningful brain metrics.
In conclusion, we find that cross-sectional BAG estimates are limited in reflecting future brain changes.This limits the potential of BAG for longitudinal inference and establishing BAG as a biomarker.During generally pathology-free ageing, BAG is not stable but increases, together with morphometric changes, potentially due to accelerated ageing.Yet, only dMRI-based BAG also reflected regional morphometric changes.These findings provide new and more pronounced insights into the mechanism of BAG.For example, a higher BAG does not automatically indicate the presence of a disorder, which would however be crucial for diagnostics.Instead, the observed modality dependencies suggest that dMRI and multimodal BAGs reflect the morphometric state, which is influenced by early life factors 4 .DMRI BAG might reflect regional brain changes better than the other approaches.This more nuanced understanding of

Sample characteristics
We obtained UKB data 12 containing dMRI data of N = 46,637 cross-sectional datasets, of which N = 4,871 entailed data available at two time points, and N = 48,044 T 1 -weighted MRI datasets of which N = 4,960 were followed up.Participant data were excluded when consent had been withdrawn, or data quality deemed to be insufficient based on the YTTRIUM method 13 applied to dMRI data, and for T 1 -weighted data based on Euler numbers 14 , leading to exclusions when three standard deviations from the mean were exceeded.Additionally, we excluded participants which were diagnosed with any mental and behavioural disorder (ICD-10 category F), disease of the nervous system (ICD-10 category G), and disease of the circulatory system (ICD-10 category I).The remaining datasets, after the exclusions were Reading (5.60%).

MRI acquisition and post-processing
UKB MRI data acquisition procedures and protocols are described elsewhere 12,15,16 .
T 1 -weighted images were processed using FreeSurfer (version 5.3) 7 automatic recon-all pipeline for a cortical reconstruction and subcortical segmentation of the T 1 -weighted images (http://surfer.nmr.mgh.harvard.edu/fswiki) 28.Notably, the influence of the FreeSurfer version on the brain age predictions was estimated as well and assumed to be small in this case 29 .
In total, we obtained 26 WM metrics from six diffusion approaches (DTI, DKI, WMTI, SMT, mcSMT, BRIA; see for overview Suppl.Table 8).In order to normalise all metrics, we used Tract-based Spatial Statistics (TBSS) 30 , as part of FSL 21,22 .In brief, initially all brain-extracted 31 fractional anisotropy (FA) images were aligned to MNI space using non-linear transformation (FNIRT) 22 .Following, the mean FA image and related mean FA skeleton were derived.Each diffusion scalar map was projected onto the mean FA skeleton using TBSS.To provide a quantitative description of diffusion metrics at a region level, we used the John Hopkins University (JHU) atlas 32 , and obtained 48 white matter regions of interest (ROIs) and 20 tract averages based on a probabilistic white matter atlas (JHU) 33

Cardiometabolic risk factors
We used a selection of cardiometabolic risk factors, which have association with BAG and relevant to brain ageing 1 .Smoking, hypertension, and diabetes were binary and the waist-hip ratio (WHR) is a scalar value.

Depression and neuroticism scores
Depression scores were computed using the Recent Depressive Symptoms (RDS-4) score (fields 2050, 2060, 2070, 2080), which was suggested in a previous investigation using UKB imaging data. 35Neuroticism scores (UKB data-field 20127) were derived as a summary score from the Eysenck Neuroticism (N-12) inventory which includes items describing neuroticism traits.

Polygenic risk scores (PGRS)
We estimated PGRS for each participant with available genomic data, using PRSice2 36 with default settings.As input for the PGRS, we used summary statistics from recent genome-wide association studies of Autism Spectrum Disorder (ASD) 37 , Major Depressive Disorder (MDD) 38 , Schizophrenia (SCZ) 39 , Attention Deficit Hyperactivity Disorder (ADHD) 40 , Bipolar Disorder (BIP) 41 , Obsessive Compulsive Disorder (OCD) 42 , Anxiety Disorder (ANX) 43 , and Alzheimer's Disease (AD) 44 .We used a minor allele frequency of 0.05, as the threshold most commonly used in PGRS studies of psychiatric disorders.
predictions and chronological age and commonly used error metrics, including Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) on the training sample and were, therefore, used to predict BA (Suppl.Table 7).The superiority of linear models was also underscored by a lower parameter shrinkage when predicting in the test data.These predictions are presented in the main text, whereas the results from the other algorithms are presented in the Supplement.Altogether, 2,002 features (i.e., brain regional metrics based on T 1 -weighted or dMRI measures) were used per individual.After the training procedure was completed on the participants for which only a baseline scan was available, we predicted BA in the remaining participants (N = 2,678) with tow available data points for each of these two study time points (baseline and follow-up).
We calculated corrected BA estimates by first calculating the intercept (α) and slope (β) of the linear associations between predicted BA (γ train ) and chronological age (Ω train ) in the training (baseline) sample (Eq.1): The calculated intercept (α) and slope (β) from the training sample were then used to estimate a corrected BAG (BAGc), as previously suggested 53 , from the predicted age (γ test ) and chronological age (Ω test ) separately in each of data points of the testing (longitudinal) sample: We present the results for both corrected and uncorrected BAG.
As a control, we randomly split the longitudinal data into equal parts, trained models and predicted in N T P 1−2 = 1,339 at each time point within the same individuals (due to the high dimensionality of the dMRI and multimodal data in contrast to the degrees of freedom, only T 1 -weighted data were considered; Suppl.Tables 9−11).These predictions were used to repeat the analyses presented in the main text (Suppl.Note 1).

Rate of change and centercepts
In order to investigate how single time point BA predictions relate to longitudinal changes in BA and features, we estimated the annual rate of change and centercepts in both features and BAs.Centercepts were used to establish cross-sectional proxies (of the BAG, PCs, and brain features) which are statistically independent from the annual rate of change.Centercepts are the average of two measures without considering the inter-scan interval (ISI).The annual rate of change, on the other hand, has the ISI as denominator

Exploratory analyses
Time point correlations between brain ages at each time point were assessed using uncorrected Pearson's correlations.To assess time point (T P ) difference in the a) corrected and b) uncorrected brain age gaps (BAG), we used mixed linear models (MLMs) with ID, Site, Age, Sex, and the Age * Sex interaction as fixed effects, the subject/ID as random effect (u), and the subject residuals (e).
We used paired samples t-tests to assess features changes over time.Changing features were included in the the principal components analyses.
We evaluated the association of BA centercept or the annual rate of change in BA The associations between PGRS and rate of change of PC and BAG had the lowest statistical power due to the limited availability of participant's genetic data (N = 2, 160).We conducted a power analysis to ensure we would be able to detect meaningful effect sizes.We aimed for a power of 80%, and an α-level of 0.05 in simple linear regression models, as described above, indicating that effect sizes as small as Cohen's f 2 = 0.006 can be detected, corresponding to a Pearson's correlation coefficient of r = 0.006 or Cohen's d = 0.012.
First, to test whether there were relationships between cross-sectional and longitudinal measures of the obtained BAGs and PCs, we associated their centercepts 54 and annual rates of change.We ran first MLMs predicting BAG changes (BAG ∆ ) from cross sectional BAG ( BAG).We also predicted the longitudinal principal component of brain feature changes (P C ∆ ) from the cross-sectional principal component ( P C) controlling for the inter-scan interval (ISI), age, sex, and the age-sex interaction as fixed effects, and scanning site as random effect (Eq. 7, 8).

Fig. 2
Fig. 2 illustrates that baseline (cross-sectional) BAG, represented by centercept/average of the BAGs ( BAG), based on T 1 -weighted features is limited in predicting longitudinal brain changes.Within modalities, only the T 1 -weighted based BAG was significantly and positively associated with the annual rate of BAG change (∆BAG; β std = 0.028 ± 0.148; Fig. 2a) and the principal component of longitudinal feature changes ∆P C (β std = 0.054 ± 0.015; Fig. 2b, Suppl.Table6).Moreover, BAG from

Fig. 1
Fig. 1 Training and test sample had similar characteristics and brain age was predicted with high accuracy in training and each test data time point individually.Moreover, we denote brain changes indicated by regional a) Sample age distribution at each visit, separating the cross-sectional training data from the longitudinal test data.b) Model Performance for the training set, and the two test points for each MRI modality.Uncorrected estimates are presented, which were overlaid with a cubic spline with k = 4 knots.c) Time point differences for age and both crude and age-bias corrected BAs for each MRI modality indicated by Cohen's d. d) Distribution of effect sizes indicating the change in anatomical features of diffusion MRI (dMRI) and T 1 -weighted MRI (T1w).
BAG underscores the need for closer examinations of the biological underpinnings of BAG to aid the general interpretation of the marker and to increase clarity around BAG's clinical utility.R = 0.052, p = 0and Longitudinal Uncorrected BAG Associations Associations between Longitudinal and Cross−Sectional Brain Age Gap Measures R = − 0.087, p = 1.5e−05Brain Age Gaps and Principal Components R = 0.035, p = 0.093 R 2 = 0.0013 , p = 0

Fig. 2
Fig. 2 BAG is overall limited in reflecting brain change, yet, T 1 -weighted brain age reflects the strongest regional brain changes.a) Associations between uncorrected BAG and ∆BAG in the top row, and corrected associations in the bottom row.Associations were obtained specific to each modality: T 1 -weighted (T1w), diffusion (dMRI), and multimodal MRI.The displayed line fits were cubic splines with k = 4 knots.b) Associations between the centercepts (proxy for cross-sectional BA measures) of each modality-specific BAG and PCs of both the centercepts and the annual rate of change in brain features.The left two columns present associations of uncorrected BAG estimates, and the right two columns of training-sample age-corrected BAG estimates, respectively.The displayed line fits were cubic splines with k = 4 knots.c) Top row: Distribution of associations between corrected BAG and brain features and annual change of brain features (including associations with p Bonf erroni < .05).Bottom left: Absolute mean and standard deviation of the associations between centercept and rate of change in corrected BAG and annual change of brain features.Bottom right: Percentage of significant associations between centercept and rate of change in corrected BAG and the annual change of brain features after Bonferroni-correction.

4 × 4 ) 3 ×
with a) the first cross-sectional principal component (of the centercepts of features) and b) the principal component of features' annual rate of change (P C), correcting for the ISI.BAG = β 0 + β 1 × P C + β 2 × ISI + β 3 × Age+ β Sex + β 5 × Age * Sex + u Site + e (When assessing how a) the centercept of the BAG and b) the annual rate of change in BAG (BAG) reflect brain features and change in brain features (F ), to ensure model convergence, we used simple linear models.F = β 0 + β 1 × BAG + β 2 × ISI + β 3 × Age+ β 4 × Sex + β 5 × Age * Sex + β 6 × Site(5)Finally, we explored the associations between PC, BAG, and their annual rates of change BAG, ∆BAG, P C, ∆P C (four outcome variables), and time-point specific principal components P C T P 1 and P C T P 2 and BAGs BAG T P 1 and P C T P 2 (another four outcome variables; all summarized in the formula as P C/BAG) with pheno-and genotypes (P/G), including PGRS of psychiatric disorders and Alzheimer's, depression and neuroticism scores, and cardiometabolic risk factors.P C/ BAG = β 0 + β 1 × P/G + β 2 × Age+ β Sex + β 4 × Age * Sex + β 5 × Site (6)

Sample Age Distributions by Sex
4), including PGRS of common psychiatric disorders and Alzheimer's disease (|β std | < 0.06,p Bonf erroni > 0.05), clinically relevant state (depression rating) and trait (neuroticism) assessment scores (β std < 0.08,p Bonf erroni > 0.05), with larger group-level differences for cardiometabolic factors hypertension and diabetes (β std < 0.35,p Bonf erroni < 0.047).Among the longitu-Although brain age can be vulnerable to individual differences 5 , such a single number is intuitive when set in contrast to a person's chronological age, and does not require expert knowledge to be interpreted: a very high brain age in contrast to the chronological age might be alarming.Hence, brain age predictions hold the promise to provide additional information on routine clinical scans, and for example support incidental findings.Closer examinations of how BAG reflects developmental trajectories under different conditions and on different samples offer room for future research.
8inally available phenotypes, only waist-to-hip ratio (WHR), previously shown to be related to BAG8, changed significantly between time points at the group level (t = 10.36,d=0.15, p < 2.2 × 10 −16 , p Bonf erroni < 2.2 × 10 −16 ).However, while WHR showed a small, significant association with BAG T1w at baseline (β std < 0.08), WHR changes were not predicted by BAGs or PCs (p > .05;Suppl.Fig.4).Neuroticism (t = 2.83, d = 0.04, p = 0.005, p Bonf erroni = 0.030) and depression (t = 2.13, d = 0.04, p = 0.033, p Bonf erroni = 0.198) scores decreased over time, however, changes in these scores were not found to be predicted by BAGs or PCs of brain feature change or centercepts (p > .05).Taken together, our findings indicate that BAG is limited in reflecting longitudinal brain changes.Overall, a) cross-sectional BAGs presented small associations with longitudinal brain ages across modalities, b) only BAGs from T 1 -weighted MRI features showed significant but small positive association with the respective longitudinal principal components, and c) BAGs explained less than 1% of the variance of the mentioned principal components and BAG change.Yet, dMRI-based cross-sectional BAG correlated significantly with the annual change in around 38% of the region-level features (at a relatively small average effect of | β| = 0.140).Assessing the rate of BAG change, T 1 -weighted BAG correlated significantly with the largest portion of regional brain change (75%).Hence, despite BAG correlating weakly with future change in BAG and future change principal components, a single-time-point dMRI BAG might allow to capture a portion of future changes in brain morphometry on the region level, whereas T 1 -weighted BAG is most reflective of the brain state.Future investigations might focus on the constructing explainable brain age models which leverage region-level data, and further investigate the potential of brain age in datasets with multiple follow-ups.Alternatively, other markers which reflect a person's deviation from a norm defined by the characteristics of a training dataset, might be of interest.Brain age allows to reduce large amounts of information into a single personalised health score.