Brain Age Prediction Reveals Aberrant Brain White Matter in Schizophrenia and Bipolar Disorder: A Multisample Diffusion Tensor Imaging Study

Siren Tønnesen, Tobias Kaufmann, Ann-Marie G. de Lange, Geneviève Richard, Nhat Trung Doan, Dag Alnæs, Dennis van der Meer, Jaroslav Rokicki, Torgeir Moberget, Ivan I. Maximov, Ingrid Agartz, Sofie R. Aminoff, Dani Beck, Deanna M. Barch, Justyna Beresniewicz, Simon Cervenka, Helena Fatouros-Bergman, Alexander R. Craven, Lena Flyckt, Tiril P. Gurholt, Unn K. Haukvik, Kenneth Hugdahl, Erik Johnsen, Erik G. Jönsson, Karolinska Schizophrenia Project, Knut K. Kolskår, Rune Andreas Kroken, Trine V. Lagerberg, Else-Marie Løberg, Jan Egil Nordvik, Anne-Marthe Sanders, Kristine Ulrichsen, Ole A. Andreassen, and Lars T. Westlye

Supporting a neurodevelopmental origin, it has been demonstrated that patients with adolescent-onset SZ show WM aberrations (17) and that their developmental trajectory is altered and delayed (18) compared with age-matched, normally developing peers. Further, children and adolescents with increased symptom burden, albeit presumably at subclinical levels, were found to exhibit altered diffusion-based WM properties compared with peers with low or no symptoms of mental distress (19), highlighting a critical role of WM development in mental health in youths. To which degree group differences observed between adult patients and HC subjects accelerate during the course of the adult lifespan is unclear. The neurodegenerative account of SZ and severe mental illness is debated (20) and lacks unequivocal support from imaging studies (16,21), but some studies have suggested stronger age-related deterioration of the brain in patients compared with HC subjects (22,23).
Despite converging evidence of case-control differences both preceding and following disease onset, recent brain imaging studies have documented substantial heterogeneity within patient groups (24,25). In contrast to conventional group-level analyses, brain age prediction using machine learning on imaging features allows for brain-based phenotyping at the individual level and enables an efficient dimensionality reduction of the neuroimaging data into one or more biologically informative summary measures (26,27). The discrepancy between an individual's chronological age and predicted brain age, referred to as the brain age gap (BAG), has been found to be higher in patients with SZ (5,28,29) and in several other brain disorders (29). However, these previous studies have exclusively used brain gray matter features for brain age prediction. Thus, given the well-documented role of WM aberrations in patients with mental illness (15,(30)(31)(32), brain age prediction based on diffusion imaging is clearly warranted.
In order to fill this current gap in the literature, here we compared individual BAGs between patients diagnosed with SZ or BD and HC subjects using 4 conventional metrics (fractional anisotropy [FA], mean diffusivity [MD], radial diffusivity [RD], and axial diffusivity [AD]) obtained from diffusion tensor imaging (DTI). We used an independent training set comprising 927 HC subjects 18 to 94 years of age and applied the resulting model to our test sample including patients with SZ (n = 648), patients with BD (n = 185), and HC subjects (n = 990) from 10 independent cohorts (see Methods and Materials for details). In order to specifically assess the robustness and quantify the heterogeneity of effects across cohorts, we adopted a meta-analytic statistical framework in addition to a mega-analysis across cohorts.
We trained 6 different models based on various combinations of the DTI metrics, which allowed us to compare prediction accuracy and subsequent group differences for each model. Based on converging evidence of widespread WM aberrations in patients with severe mental disorders (15), we hypothesized higher BAG in patients with SZ and patients with BD compared with HC subjects, with stronger effects in SZ compared with BD. To test the relevance of the varying spatial resolution of the feature sets, which is important to inform the discussion regarding the anatomical specificity of brain WM aberrations, we compared models including various atlasbased tracts of interest with models including only global features. Based on previous studies comparing the age prediction accuracy and clinical sensitivity between metrics (16,27,33), we hypothesized high age prediction accuracy and sensitivity to group differences but remained agnostic concerning the relative ranking of the various features.

METHODS AND MATERIALS
We combined diffusion magnetic resonance imaging (MRI) data from 2750 individuals from 11 sites/studies across 10 different scanners. Figure 1A, Figures S1 and S2, and Tables S1 and S2 summarize key demographics for each cohort. Table S3 summarizes the MRI systems and diffusion acquisition protocols.
The dataset was split into a training set and a test set. Figure S2 shows the age distribution within each of the 2 cohorts in the training set.
Fitting of the diffusion tensor was done using dtifit in FSL, yielding conventional DTI metrics, including FA, MD, RD, and AD. FA, MD, RD, and AD maps were further processed using tract-based spatial statistics (40). FA volumes were skullstripped and aligned to the FMRIB58_FA template supplied by FSL using nonlinear registration (FNIRT) (41). Next, mean FA was derived and thinned to create a mean FA skeleton, representing the center of all tracts common across subjects. We thresholded and binarized the mean FA skeleton at FA . 0.2. The procedure was repeated for MD, AD, and RD. For each individual, we calculated the mean skeleton value for each metric, as well as the mean values within 23 tracts of interest (Table S4) based on 2 probabilistic WM atlases [CBM-DTI-81 WM labels atlas and the Johns Hopkins University WM tractography atlas (42)(43)(44)]. In total, we derived 96 DTI features per individual including the mean skeleton values.

Quality Assessment
Subjects with poor image quality due to subject motion or other visible image artifacts (e.g., due to metal) were removed (n = 160; 59 HC subjects, 39 patients with SZ, 28 patients with White Matter Brain Age in Severe Mental Illness BD, and 34 individuals with missing information). Demographics of the excluded participants are presented in Table S5. Additionally, we employed a multistep quality assessment (QA) procedure (16) that included maximum voxel intensity outlier count (MAXVOX) and temporal signal-to-noiseratio (45) prior to statistical analyses. Briefly, we ran the QA iteratively, excluding participants with a QA score of 2.5 SD below the mean. In order to compute the QA score, we inverted the MAXVOX score, z-normalized both scores independently (MAXVOX and temporal signal-to-noise-ratio), and computed a summary score combing the two scores. In short, manual inspection of the flagged datasets after QA suggested adequate quality. Thus, we present results on the full dataset with supplemental results from a stringent QA [see (16) for additional information].

Brain Age Prediction
We trained 6 age prediction models. Our main model included all 96 features across all DTI metrics. To assess sensitivity for each metric separately, we trained 4 additional models based on all tracts of interest for each metric (FA, RD, MD, or AD). To test the value of including regionally specific information, we trained an additional model with only the global mean skeleton feature from all 4 metrics included.
The following pipeline for brain age prediction was identical for all 6 models: we used the xgboost framework in R version 3.3.3 (2017-03-06) (R Foundation for Statistical Computing, Vienna, Austria) (46) to build the prediction model. The number of rounds (nround), maximum depth (max_depth), and subsample were tuned and optimized using a fivefold crossvalidation of the training data, with early stopping if the prediction errors did not improve for 20 rounds. Based on previous experience, the learning rate (h) was set to h = 0.01 in order to increase transparency in the parameter selection stage. Besides the default setting, the following parameters were used in the model: nround = 1400, max_depth = 14.
Prior to implementing the model, we regressed out the main effect of scanner from the DTI features in the entire dataset while accounting for age, age 2 , and sex using linear models in R. To estimate the reliability of our age prediction model, we used a 10-fold cross-validation procedure within the training sample and repeated the cross-validation step 100 times to provide a robust estimate of model prediction. Within the same procedure, we tested the performance of our trained model by predicting age in unseen subjects in the test sample. By applying the model to the test sample 100 times, we obtain both a mean estimate and an estimate of uncertainty. For each iteration, we calculated the BAG, defined as the difference between chronological and predicted age. For each individual in the test set we computed the average BAG across the 100 folds and corrected these values for main effects of scanner and a well-documented age-related bias using linear models, per previous recommendations (47). Next, based on the ageand scanner-corrected BAG, we computed a corrected brain age for each individual, and then computed the mean absolute error (MAE), root mean square error (RMSE), and correlation between corrected predicted age and chronological age as measures of model performance.

Statistical Analyses
Statistical analyses were performed using R. We tested for main effects of diagnosis using linear models with corrected BAG as dependent variable and group, sex, age, and scanner site as independent variables, and performed pairwise group comparisons as appropriate. Using the metafor package (48) in R, we adopted a meta-analytic framework in order to assess the heterogeneity and generalizability of the results. A random-effects model was used to weigh the primary studies prior to aggregating the effect size. Effect sizes were aggregated using the estimated marginal means of the BAG from each group contrast (HC/SZ, HC/BD, and BD/SZ), accounting for age, age 2 , and sex. For effect size estimates, we used Hedges' g. Cochran's heterogeneity statistic (Q) was used to test the homogeneity of effect sizes. A c 2 test with k21 degrees of freedom was used to examine the significance of Cochran's Q. The heterogeneity was quantified using the I 2 statistic, which is sensitive to the degree of inconsistency in results between cohorts.

Brain Age Predictions
Age prediction in the training set using 10-fold crossvalidation revealed high correlations between chronological and predicted age for the main model including all features (r = .924; 95% confidence interval, .912-.935; MAE = 6.49; RMSE = 8.08) ( Figure S3). Figure 1B shows predicted age plotted as a function of chronological age for the unseen test set when using the full feature set, and Table 1 summarizes the prediction accuracy for all 6 models. The age prediction models generalized to HC subjects (r = .806; MAE = 6.92; RMSE = 8.46), patients with BD (r = .808; MAE = 6.85; RMSE = 8.50), and patients with SZ (r = .798; MAE = 7.11; RMSE = 8.79). While all models performed relatively well, prediction accuracy was highest for the full model, and the global mean skeleton model outperformed the region of interest-based single-metric models. Figure S4 shows the correlation matrix between all models, indicating a strong correlation between all models, with the exception of FA and AD. Table 1 and Figure S5 summarize the results from the group comparisons from the 6 models, and Figure 1C shows the distributions of fitted corrected BAG within each group for the all-features model. Briefly, all models revealed significant main effects of group, with higher corrected BAG in patients with SZ and BD compared with HC subjects, with effect sizes (Cohen's d) ranging between 20.10 and 0.33. The model based on FA yielded the strongest effect size for the main group effect, although the models including MD and RD in addition to FA revealed similar patterns. The model based on AD revealed less consistent results and was the only model not showing significant group differences between patients with SZ and HC subjects. Figure 2 shows a forest plot summarizing the results from the meta-analysis for BAG computed using the full feature model. Figures S6 to S10 show the results from the other models. In short, in line with the mega-analysis, the results revealed significantly higher BAG in patients with SZ and BD White Matter Brain Age in Severe Mental Illness compared with HC subjects, with moderate effect sizes. The analysis did not support a group difference in BAG between BD and SZ. Whereas the effect sizes varied slightly between cohorts for the full model, the Q and I 2 statistics indicated low and nonsignificant heterogeneity. Figure S11 shows each cohort's contribution to the heterogeneity and influence on the result from the meta-analysis. Figure S12 summarizes the results from multistep QA. Briefly, higher corrected BAG was observed in patients with SZ and BD compared with HC subjects across all levels of QA, with highly similar effect sizes.

DISCUSSION
The etiology of severe mental disorders has a substantial neurodevelopmental component, which is among other characteristics reflected in altered brain maturational trajectories during the formative years of childhood and adolescence, and as group-level differences in adult patient populations. Along with evidence of genetic and clinical overlap with several aging-related conditions, including cardiovascular risk factors and increased mortality, the neurodevelopmental account supports the need for a dynamic lifespan perspective in the search for disease mechanisms. Here, in 10 different cohorts comprising HC subjects and patients with SZ and BD, we used machine learning to estimate brain age using DTI indices of WM structure and organization. This novel approach yielded 5 main results. First, in a large independent training set, we found high accuracy of brain age prediction across the adult lifespan using DTI features, which largely generalized to the independent test set, supporting the feasibility and sensitivity of the approach. Second, applying the model to an independent test set revealed significantly higher BAG in patients with SZ and BD compared with HC subjects. Third, follow-up meta-analysis and tests of heterogeneity suggested high consistency across independent cohorts and scanners. Fourth, brain age models based on FA showed higher sensitivity than models based on the other metrics, both alone and combined. Fifth, the reduced set of global mean skeleton features compared with a number of regional atlas-based features revealed highly converging results. We next discuss the implications of these findings in more detail.  Brain age prediction provides an informative summary measure that may serve as a proxy for brain integrity and health across normative and clinical populations. Neuroimaging-derived WM and gray matter phenotypes carry distinct biological information of brain integrity, and tissuespecific brain age models may provide higher sensitivity and specificity to relevant biological processes compared with conventional models based on gray matter features alone (27). DTI has been broadly applied in clinical neuroscience owing to its proposed sensitivity to microstructural properties of brain tissue. However, whereas previous studies have documented higher brain age in patients with severe mental disorders, these were based on gray matter models only (5,28,29). In order to test if previous findings suggesting clinical deviations from normative gray matter trajectories generalize to WM, we performed brain age prediction using different combinations of DTI metrics. In line with previous brain age prediction studies using diffusion MRI (27,49) we obtained high age prediction accuracy across most models. In accordance with previous evidence suggesting that regional DTI-based indices of brain aging reflect relatively low-dimensional and global processes (12,50), we found similar prediction accuracy for the reduced models comprising global mean skeleton values and the models including extended sets of regional features. Although brain WM aging shows some regional heterogeneity (12), these findings demonstrate that the most relevant information required for brain age prediction is captured at a global level. This conjecture is also supported by a recent twin study demonstrating that a large proportion of the estimated heritability of specific tracts is accounted for by a general factor (51).
Likewise, we found that the sensitivity to group differences was not strongly dependent on the inclusion of the full feature set. Indeed, the effect size obtained when comparing patients with SZ and HC subjects was slightly higher for the global mean skeleton model compared with the full model. These findings are in line with recent evidence of anatomically widely distributed group differences between healthy control subjects and patients with SZ (15). Interestingly, the largest effect when comparing patients with SZ and HC subjects was obtained for the FA-only model, supporting the sensitivity of FA to clinical differences in WM properties (15,16). Higher predicted brain age gap in the patient groups compared with HC subjects may indicate altered rate of brain maturation or accelerated brain aging in patients with severe mental disorders. However, our cross-sectional design does not permit us to make inference about brain development or aging per se, and previous reports of relatively age-invariant group differences in brain volumetry (21) and DTI indices (16) suggest that the reported group differences in brain age may reflect differences accumulating early in life. Unfortunately, owing to the current study design with adults only, we cannot address the maturational trajectories in the formative years. Further, the data in the training and test sets were collected using different scanners, and the absolute brain age estimates and corresponding deviance from the chronological age should be interpreted with caution and not without reference to an appropriate comparison sample. Although the application of diffusion MRI as the basis for age prediction is novel, higher gray matter brain age has been shown in several brain and mental disorders (29,52). We expand these previous findings by documenting higher DTI-based WM brain age in both SZ and BD, and although with moderate effect sizes, we show that the effects generalize relatively well across cohorts and scanners, with only minor heterogeneity in effect sizes between cohorts.
We found no significant difference in DTI-based brain age between BD and SZ, supporting previous evidence of partly overlapping clinical and biological characteristics between these 2 diagnostic categories (16,53,54). While the current results support the existence of a common set of mechanisms across disorders, future studies utilizing a broader range of imaging modalities in combination with specific genetic, clinical, cognitive, sociodemographic, and biological phenotypes may allow for the identification of specific diagnostic signatures and subgroups. However, inherent limitations associated with the classical case-control design in mental health research have recently been emphasized using neuroimaging data (24,25). In particular, the current lack of biologically informed diagnostic criteria should motivate future studies to consider alternative approaches to promote a novel clinical nosology based on both symptomatology and data-driven clustering (55), as well as brain-based and biological phenotypes cutting across diagnostic boundaries.
Our results document robust group-level deviances in WM structure manifesting as older-appearing brains in patients with severe mental disorders compared with their healthy peers. Whereas DTI-based markers are sensitive to different biological and anatomical characteristics, the current specificity does not allow for inference on the distinct neurobiological mechanisms involved. Myelin integrity and myelin packing density are among the proposed candidate mechanisms for observed changes in DTI metrics (56)(57)(58), but the specificity is low, and the current results probably reflect a combination of neurobiological processes and macroanatomical differences. Previous evidence implicated myelinrelated abnormalities and neuroinflammation both in the pathophysiology of severe mental disorders and in brain aging (59)(60)(61)(62). Future studies may benefit from the inclusion of advanced multishell diffusion MRI, allowing for stronger inference on the microstructural milieu of the brain tissue, including microstructural indices based on different diffusion scalar metrics [e.g., neurite orientation dispersion and density imaging (63,64), diffusion kurtosis imaging (65), WM tract integrity (66), and restriction spectrum imaging (67)].
In line with previous findings of widely distributed effects in well-powered studies of brain aging (12) and SZ (15), we found similar age prediction accuracy and subsequent group differences in brain age for the model including only global mean skeleton values and the model including a range of regionally informative values extracted from various atlas-based tracts and regions of interest. Although specific symptoms and clinical traits may map preferentially onto specific neuroanatomical subsystems [see, e.g., (19)], these novel results suggest that a large proportion of the variance associated with age and corresponding deviations in the patient groups is captured by primarily global brain processes, with relevance for our understanding of the anatomical heterogeneity and dimensionality of brain aging and severe mental illness.
In addition to the anatomical distribution of effects, the spatiotemporal dynamics of brain development and aging and their deviations in patients with mental disorders remain unclear.

White Matter Brain Age in Severe Mental Illness
The individual-level onset and rate of the group-level deviations from the normative WM trajectory is unknown and can only be inferred using longitudinal designs covering sensitive periods of neurodevelopment. Previous studies have shown both delayed neurodevelopment during adolescence (18) and accelerated aging in adulthood (5) in patients with severe mental disorders. Whereas these observations are not mutually exclusive, future studies should aim at disentangling the lifespan dynamics, e.g., by including individuals with a wider age range, and pursuing longitudinal designs including individuals across a wide range of functional levels and risk. The latter may be particularly pertinent to disentangle primary disease-related mechanisms and secondary factors related to the disease, including medication and lifestyle factors such as nutrition, physical activity, education, and a range of sociodemographic variables, all of which interact with key neurodevelopmental processes (68). Unfortunately, although possible effects of psychotropic drugs on the brain is a topic of great interest and importance (69)(70)(71), in common with other studies employing a cross-sectional and nonrandomized design, the current design does not allow us to make inference about the effects of medication and other clinical and lifestyle factors on brain age, which should be investigated by future and properly designed studies. Meanwhile, previous studies reporting associations with medication status in smaller samples need to be interpreted in light of the recent lack of significant associations in the largest DTI study to date (15). We did not exclude WM hyperintensities in the training or test sets, and future studies including a wider range of MRI modalities are necessary to determine the possible confounding effects of WM hyperintensities on the age prediction models and subsequent group comparisons. Whereas our procedure for correcting for scanner effects using linear models effectively removes simple main effects of scanner, even subtle differences in clinical recruitment and other participant characteristics between sites might induce interactions with site-or scanner-related variance that are very difficult to account for statistically. Additionally, future studies using different samples and approaches for brain age prediction is needed to validate and test the generalizability of the model.
In conclusion, in this multisample study including patients from 10 different cohorts, we report higher brain age in patients with SZ and BD compared with HC subjects using various DTIbased indices of WM structure and organization. In contrast to most previous studies comparing diffusion MRI metrics directly between groups, we used a multisample approach, which allowed us to specifically assess generalizability across 9 or 10 different cohorts, sites, and scanners. These results represent a highly relevant contribution to the field and an important supplement to previous reports, which have largely ignored between-sample heterogeneity and generalizability. Although the effect sizes were modest, our unique design allowed us to specifically quantify the heterogeneity and robustness of effects across cohorts and scanners, supporting that brain age prediction using diffusion MRI is a sensitive marker in the clinical neurosciences.  This article was published as a preprint on bioRxiv: doi: https://www. biorxiv.org/content/10.1101/607754v1. KH and ARC own shares in NordicNeuroLab, Inc., which produced addon hardware for acquisition of data at the Bergen site. All other authors report no biomedical financial interests or potential conflicts of interest.