Prediction of East Asian Brain Age using Machine Learning Algorithms Trained With Community-based Healthy Brain MRI

Simfukwe, Chanda; Youn, Young Chul

doi:10.12779/dnd.2022.21.4.138

Dement Neurocogn Disord. 2022 Oct;21(4):138-146. English.
Published online Oct 31, 2022.
https://doi.org/10.12779/dnd.2022.21.4.138

Original Article

Prediction of East Asian Brain Age using Machine Learning Algorithms Trained With Community-based Healthy Brain MRI

Chanda Simfukwe

and Young Chul Youn

Author information

Author notes

Copyright and License

- Department of Neurology, College of Medicine, Chung-Ang University, Seoul, Korea.
Correspondence to Young Chul Youn. Department of Neurology, Chung-Ang University Hospital Seoul, 102 Heukseok-ro, Dongjak-gu, Seoul 06973, Korea. Email: neudoc@gmail.com

Received August 31, 2022; Revised October 20, 2022; Accepted October 31, 2022.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background and Purpose

Magnetic resonance imaging (MRI) helps with brain development analysis and disease diagnosis. Brain volumes measured from different ages using MRI provides useful information in clinical evaluation and research. Therefore, we trained machine learning models that predict the brain age gap of healthy subjects in the East Asian population using T1 brain MRI volume images.

Methods

In total, 154 T1-weighted MRIs of healthy subjects (55–83 years of age) were collected from an East Asian community. The information of age, gender, and education level was collected for each participant. The MRIs of the participants were preprocessed using FreeSurfer

(https://surfer.nmr.mgh.harvard.edu/) to collect the brain volume data. We trained the models using different supervised machine learning regression algorithms from the scikit-learn (https://scikit-learn.org/) library.

Results

The trained models comprised 19 features that had been reduced from 55 brain volume labels. The algorithm BayesianRidge (BR) achieved a mean absolute error (MAE) and r squared (R²) of 3 and 0.3 years, respectively, in predicting the age of the new subjects compared to other regression methods. The results of feature importance analysis showed that the right pallidum, white matter hypointensities on T1-MRI scans, and left hippocampus comprise some of the essential features in predicting brain age.

Conclusions

The MAE and R² accuracies of the BR model predicting brain age gap in the East Asian population showed that the model could reduce the dimensionality of neuroimaging data to provide a meaningful biomarker for individual brain aging.

Keywords

Machine Learning; Regression Analysis; Magnetic Resonance Imaging; Asians

INTRODUCTION

Aging in the East Asian population is associated with health conditions that significantly challenge individuals and communities.1 On an individual level, aging is related to a progressive decline in physical health and increasing risk of neurodegenerative diseases; on a community level, the aging population is connected to more significant socioeconomic costs.2, 3 Therefore, advancing our understanding of brain aging is vital to help improve the detection of early-stage neurodegeneration and predict age-related cognitive decline. There are currently many efforts aiming to achieve the early detection of age-related diseases with the goal of preventing or delaying their progression. Brain magnetic resonance imaging (MRI) is a reliable method with which to evaluate brain development due to its high soft-tissue resolution, multi-parameter imaging advantages, and non-invasive nature.4

There has recently been significant research interest shown in measuring brain change through brain age prediction using machine learning methods, mostly based on structural T1-weighted MRI images.5, 6, 7 Most machine learning models learn patterns from data, then use these patterns to make predictions about new data. A vital advantage of these methods over traditional statistics is that they make it possible to capture differences at an individual level rather than solely at a group level, thereby increasing the potential for clinical translation.8 Brain age prediction studies commonly build regression machine learning models using structural MRI data from healthy controls.6 To minimize the prediction error of the model and improve its accuracy, different regression-supervised machine-learning algorithms such as linear regression (LR), support vector regression (SVR), gradient boosting regressor (GBR), and BayesianRidge (BR) have been utilized.

These algorithms are the conventional approaches used to develop a predictive model because of their simple development process, high interpretability, and improved stability.9 The LR, SVR, GBR, and BR are widely used in biological research since they can handle unclean and noisy data well, support different loss function, and have strong predictive ability for nonlinear data.10, 11, 12, 13

The primary outcome measure in brain age prediction is the difference between an individual's predicted brain age and chronological age, which is called the 'brain-age gap.’ Suppose that one’s predicted brain age is lower than their chronological age. That person might then be considered to have a “younger” brain than anticipated, be more pathology-resistant, or have accumulated less pathology. Meanwhile, if the predicted brain age is significantly higher than the individual’s chronological age, that person may have an “older” brain than is typical, and may be genetically predisposed to certain pathologies.14

Previous studies have reported high correlations between brain age predictions and chronological age in healthy subjects. Wang et al.15 examined the T1-MRI dataset of 3,688 dementia-free participants with a mean age of 66 years on a convolutional neural network (CNN) deep learning algorithm to predict brain age. Logistic regressions were conducted to evaluate the model performance in assessing the association of the age gap with incident dementia, and the mean absolute error (MAE) of 4.45 years was obtained. Paul et al.16 designed a model with an MAE of 3.60 years. That model was trained with 1,597 T1-MRI on a CNN algorithm. Aycheh et al.17 collected MR images of 2,911 cognitively normal subjects (ages 45–91) and then executed a non-LR gaussian algorithm to fit the final age prediction model. The experimental results indicated an MAE of 4.05 years. Other studies have also shown a relationship between brain age prediction with chronological age based on MRI raw data trained with machine learning.6, 7

In this paper, we build machine learning models to predict chronological age in the East Asian population using brain volume data (subcortical segmentation). Unlike other studies, we used four different machine learning algorithms LR, SVR, GBR, and BR to identify the most suitable algorithm for our dataset, and we evaluated the feature importance of the brain volume data during model training (Table 1).18

Table 1
The 20 features from 55 labels (Supplementary Fig. 1) included in the MRI image’s brain volume data

Click for larger image
Click for full table
Download as Excel file

METHODS

Participants

In total, 154 community-based healthy participants were placed in this study; the participants were elderly volunteers who were assessed yearly and who had a mean and standard deviation of 64.78±6.38 years of age. The participants underwent a systematic clinical assessment, including medical history, neurological and neuropsychological examinations, and brain MRI. All subjects were free from a diagnosis of mild cognitive impairment or other forms of dementia.

The inclusion criteria were as follows: 1) The absence of memory complaints; 2) Normal general cognition within 1 standard deviation (SD) of the age- and -education-adjusted norms of the Korean version of the Mini-Mental State Examination and a score >26, 3) Intact activities of daily living; 4) Korean Dementia Screening Questionnaire <7; and 5) The absence of a depression (short from geriatric depression score <8). This study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Chung-Ang University Hospital with IRB number 2202-015-497 and C2013142(1102). The study was conducted with 66 males and 88 females (Table 2). We trained the machine-learning models with four different algorithms—LR, SVR, GBR, and BR—with the brain volume features (Table 3) used as a training dataset to predict the age (target) of the participants.

Table 2
Demographics of participants included in the study

Click for larger image
Click for full table
Download as Excel file

Table 3
Comparison of BayesianRidge with the other three methods

Click for larger image
Click for full table
Download as Excel file

MRI processing and feature selection

The brain MRI scans of all participants were preprocessed using the FreeSurfer toolbox (https://surfer.nmr.mgh.harvard.edu/) recon-all FreeSurfer cortical reconstruction as the initial preprocessing step in the computational subcortical segmentation procedure. FreeSurfer is a tool that can be used to preprocess structural MRI, functional MRI, diffusion MRI, and positron emission tomography.19 This study focused on structural MRI, particularly subcortical segmentation (brain volume). In this study, FreeSurfer was run on Ubuntu Linux operating system version 22.04 LTS (https://ubuntu.com/). This research used brain volume data, and for this work, 19 (Fig. 1) subcortical segmentation features were selected from the RandomForestClassifier (RF), a supervised machine learning algorithm from the sci-kit-learn library (https://scikit-learn.org/stable/index.html), to select essential features considered for the regression model.

Fig. 1
List of 19 features from subcortical segmentation applied in machine learning.Data are shown as mean (Blue) ± standard deviation (Orange). The figure shows a comparison of the subcortical segmentation (brain volume) data of 19 features in healthy participants extracted from T1-MRI scans for a regression model.
RP: right pallidum, LH: left hippocampus, CCMP: corpus callosum mid posterior, WMH: white matter hypointensities T1-MRI scans, LCP: left choroid plexus, OC: optic chiasm, RLV: right lateral ventricle, 3rdV: 3^rd ventricle, RV: right vessel, CCMA: corpus callosum mid anterior, RH: right hippocampus, LC: left caudate, RILV: right inferior lateral ventricle, RVDC: right ventral diencephalon, LTP: left thalamus proper, LAA: left accumbens area, CSF: cerebrospinal fluid, RCP: right choroid plexus, RC: right caudate, MRI: magnetic resonance imaging.

Model training and statistical analyses

The first step of model training involved normalizing and organizing the 55 points of brain subcortical segmentation data after the preprocessing step using Panda’s library with Python based on Jupyterlab (https://www.anaconda.com/).

In the second step, we applied the RF classifier—a machine learning algorithm from the sci-kit-learn library—to select essential features for training the regression models.

The third step involved splitting the dataset and training the machine learning models using the LR, SVR, GBR, and BR algorithms in sci-kit-learn, which required going through the following preprocessing steps: the data was imported and formatted with “.csv” with 19 features (Fig. 1) and age as the target variable, and the train_test_split function from the sci-kit-learn library was used to split the data into training and test dataset randomly. The train size was 0.70, which indicated the percentage of the data withheld for the training set, and the test dataset comprised 30%. Then, the LR, SVR, GBR, and BR algorithms were applied from the sci-kit learn library to train the model to predict the brain age. As an example, the code that predicted brain age in the brain volume data is given (https://drive.google.com/drive/folders/1ih0nDnanxgilsBFbOVaZDDZWYfzGa04I?usp=sharing).

The last step involved calculating the model's performance using the MAE and the root squared (R²) value of age prediction. The metrics were calculated using the sci-kit-learn metrics functions (https://scikit-learn.org/stable/). Further, regression graphs of the test dataset were visualized using the scatter plot function from the matplotlib library (https://matplotlib.org/) to show the correlation between chronological and predicted age.

RESULTS

In total, 154 participants were allocated to the training and test datasets. The test dataset had 47 participants, while the training dataset had 107 participants. The results showed that the prediction accuracy of the model trained with the BR algorithm was better than those of the other regression models; it had an MAE score of 3.59 and a RMSE of 4.28 years (Fig. 2D). The respective MAE and RMSE values were 3.51 and 4.56 years for the GBR (Fig. 2C), 3.59 and 4.58 years for SVR (Fig. 2B), and 3.59 and 4.62 years for the LR (Fig. 2A). Regression graphs were plotted for the models based on the testing dataset to show the correlation between the real age of the participants and the predicted age (Fig. 2). The detailed results of the models are presented in Table 3.

Fig. 2
Comparison between the real and predicted ages according to four models based on the test dataset of the participants.
MAE: mean absolute error, MSE: mean squared error, RMSE: root mean squared error, R²: R-squared.

DISCUSSION

This study aimed to predict the brain age of healthy adults in the East Asian population using machine learning algorithms based on brain volume data from T1-weighted MRI scans. Machine learning studies have established a number of LR models to study the relationship between the predicted age and real age using brain imaging features.20

Nevertheless, the characteristics of the brain features used to train the models can lead to different estimates of MAE and R² in age prediction, with other studies having a wide range of ages. Lancaster et al.6 used a dataset of 2003 healthy individuals (MRI brain volume data) aged 16–90 with an SVR algorithm in a study aiming to predict chronological age. They achieved an MAE of 5.08 years in predicting chronological age. Hwang et al.7 investigated feasibility and clinical relevance of brain age prediction using axial T2-weighted images of healthy subjects with a deep CNN algorithm. The CNN model was trained with 1,530 scans, and the MAE evaluated the performance between the predicted age and the chronological age based on an internal and external test dataset. The model showed MAEs of 4.22 years in the internal test set and 9.96 years in the external test set.

In our study, we showed that the selection of type features is generally more important than the choice of model. The models exhibited excellent performance with only 19 features (Fig. 1) from 55 brain volume data (Supplementary Fig. 1). The RF classifier algorithm (https://docs.google.com/document/d/1uzvkHBgNQ1waeb1Ex94dLLzz5Ds_LVM4/edit?usp=sharing&ouid=113534775365702156330&rtpof=true&sd=true) selected the essential features from the sci-kit-learn library. The algorithm-selected features correlated well with our predicting target (age).

This study evaluated four LR models—LR, SVR, GBR, and BR—for age prediction (55–83 years). Regression is an approach to modeling the relationship between dependent and independent variables. This relationship is modeled using a linear predictor function whose unknown model parameters are estimated from the data.10 All the algorithms were trained using the same brain volume data. The results showed that the correlation between the predicted and chronological ages in the BR algorithm was strong (R² = 0.3). In addition, the MAE was 3.31 years. The MAE value was lower than those of the other three models, indicating that the BR was a more suitable model for age prediction. Most neuroimaging-based age predictions using machine-learning regression algorithms are proposed as a biomarker of brain aging that are related to cognitive performance, health outcomes, and progression of neurodegenerative disease.21, 22

A limitation of our study was that our models showed poor performance for other ages because the dataset did not include sufficient numbers of participants. The regression can explain this toward the mean phenomenon. It is desirable to match training and testing data age distribution during model training. If enough participants were included in the training dataset, this could have improved the performance in other age predictions.

Unlike other studies, the present study aimed to evaluate different regression algorithms and predict brain age in the East Asian population based on subcortical brain segments extracted from T1-weighted MRI scans of healthy participants. Our results showed that the BR algorithm achieved better accuracy than other methods (Table 3). We also proved that 19 features (Fig. 1) played a vital role in predicting brain age during the model training using machine-learning regression algorithms.

In this study, we trained and tested machine learning regression algorithms LR, SVR, GBR, and BR with brain volume data from MRI scans of community-based healthy subjects. We found that the BR algorithm obtained better results than LR, SVR, and GBR. In particular, BR achieved an MAE and R² of 3 and 0.3, respectively. Secondly, our results showed that simple volumetric features, such as the right pallidum, white matter hypointensities, and left hippocampus, were important in predicting chronological age when comparing the left cerebral white matter and right white matter hypointensities on T1-MRI scans. The brain age gap between predicted and chronological ages can reduce the dimensionality of neuroimaging data to provide meaningful biomarkers for individual brain aging.

SUPPLEMENTARY MATERIAL

Supplementary Fig. 1

All 55 brain volume data features. The random forest classifier algorithm selected the essential features from the sci-kit-learn library.

Click here to view.^{(175K, ppt)}

Notes

Funding:The National Research Foundation of Korea (NRF-2017S1A6A3A01078538).

Conflicts of Interest:The authors have no financial conflicts of interest.

Author Contributions:

Conceptualization: Simfukwe C.
Funding acquisition: Youn YC.
Methodology: Simfukwe C.
Project administration: Youn YC.
Supervision: Youn YC.
Validation: Simfukwe C.
Visualization: Simfukwe C.
Writing - original draft: Simfukwe C.
Writing - review & editing: Youn YC.

ACKNOWLEDGEMENTS

We would like to thank the Department of Neurology at Chung-Ang University Hospital for providing the tools to make this research successful.

References

1. Yeung WJ, Lee Y. Aging in East Asia: new findings on retirement, health, and well-being. J Gerontol B Psychol Sci Soc Sci 2022;77:589–591.
  PubMed
  
  CrossRef
1. Hou Y, Dan X, Babbar M, Wei Y, Hasselbalch SG, Croteau DL, et al. Ageing as a risk factor for neurodegenerative disease. Nat Rev Neurol 2019;15:565–581.
  PubMed
  
  CrossRef
1. Cristea M, Noja GG, Stefea P, Sala AL. The impact of population aging and public health support on EU labor markets. Int J Environ Res Public Health 2020;17:1439.
  CrossRef
1. Tocchio S, Kline-Fath B, Kanal E, Schmithorst VJ, Panigrahy A. MRI evaluation and safety in the developing brain. Semin Perinatol 2015;39:73–104.
  PubMed
  
  CrossRef
1. Anatürk M, Kaufmann T, Cole JH, Suri S, Griffanti L, Zsoldos E, et al. Prediction of brain age and cognitive age: quantifying brain and cognitive maintenance in aging. Hum Brain Mapp 2021;42:1626–1640.
  CrossRef
1. Lancaster J, Lorenz R, Leech R, Cole JH. Bayesian optimization for neuroimaging pre-processing in brain age classification and prediction. Front Aging Neurosci 2018;10:28.
  PubMed
  
  CrossRef
1. Hwang I, Yeon EK, Lee JY, Yoo RE, Kang KM, Yun TJ, et al. Prediction of brain age from routine T2-weighted spin-echo brain magnetic resonance images with a deep convolutional neural network. Neurobiol Aging 2021;105:78–85.
  PubMed
  
  CrossRef
1. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2021;2:160.
  PubMed
  
  CrossRef
1. Beheshti I, Ganaie MA, Paliwal V, Rastogi A, Razzak I, Tanveer M. Predicting brain age using machine learning algorithms: a comprehensive evaluation. IEEE J Biomed Health Inform 2022;26:1432–1440.
  PubMed
  
  CrossRef
1. Pandis N. Linear regression. Am J Orthod Dentofacial Orthop 2016;149:431–434.
  PubMed
  
  CrossRef
1. Martín-Guerrero JD, Camps-Valls G, Soria-Olivas E, Serrano-López AJ, Pérez-Ruixo JJ, Jiménez-Torres NV. Dosage individualization of erythropoietin using a profile-dependent support vector regression. IEEE Trans Biomed Eng 2003;50:1136–1142.
  CrossRef
1. Li X, Li W, Xu Y. Human age prediction based on DNA methylation using a gradient boosting regressor. Genes (Basel) 2018;9:424.
  CrossRef
1. da Silva FA, Viana AP, Correa CC, Santos EA, de Oliveira JA, Andrade JD, et al. Bayesian ridge regression shows the best fit for SSR markers in Psidium guajava among Bayesian models. Sci Rep 2021;11:13639.
  PubMed
  
  CrossRef
1. Ly M, Yu GZ, Karim HT, Muppidi NR, Mizuno A, Klunk WE, et al. Improving brain age prediction models: incorporation of amyloid status in Alzheimer’s disease. Neurobiol Aging 2020;87:44–48.
  PubMed
  
  CrossRef
1. Wang J, Knol MJ, Tiulpin A, Dubost F, de Bruijne M, Vernooij MW, et al. Gray matter age prediction as a biomarker for risk of dementia. Proc Natl Acad Sci U S A 2019;116:21213–21218.
  PubMed
  
  CrossRef
1. Paul H, Simon J, Gilles W, Thomas C. Brain age prediction of healthy subjects on anatomic MRI with deep learning: going beyond with an “explainable AI” mindset. bioRxiv. 2018
1. Aycheh HM, Seong JK, Shin JH, Na DL, Kang B, Seo SW, et al. Biological brain age prediction using cortical thickness data: a large scale cohort study. Front Aging Neurosci 2018;10:252.
  PubMed
  
  CrossRef
1. Lidauer K, Pulli EP, Copeland A, Silver E, Kumpulainen V, Hashempour N, et al. Subcortical and hippocampal brain segmentation in 5-year-old children: validation of FSL-FIRST and FreeSurfer against manual segmentation. Eur J Neurosci 2022;56:4619–4641.
  PubMed
  
  CrossRef
1. Fujihara K, Takei Y. FreeSurfer as a platform for associating brain structure with function. Brain Nerve 2018;70:841–848.
  PubMed
1. Gómez-Ramírez J, Fernández-Blázquez MA, González-Rosa JJ. Prediction of chronological age in healthy elderly subjects with machine learning from MRI brain segmentation and cortical parcellation. Brain Sci 2022;12:579.
  CrossRef
1. Hong J, Feng Z, Wang SH, Peet A, Zhang YD, Sun Y, et al. Brain age prediction of children using routine brain MR images via deep learning. Front Neurol 2020;11:584682
  PubMed
  
  CrossRef
1. Cole JH, Franke K. Predicting age using neuroimaging: innovative brain ageing biomarkers. Trends Neurosci 2017;40:681–690.
  PubMed
  
  CrossRef