Gradient boosting decision-tree-based algorithm with neuroimaging for personalized treatment in depression

Introduction: Pretreatment positron emission tomography (PET) with 2-deoxy-2-[18F]fluoro-D-glucose (FDG) and magnetic resonance spectroscopy (MRS) may identify biomarkers for predicting remission (absence of depression). Yet, no such image-based biomarkers have achieved clinical validity. The purpose of this study was to identify biomarkers of remission using machine learning (ML) with pretreatment FDG-PET/MRS neuroimaging, to reduce patient suffering and economic burden from ineffective trials. Methods: This study used simultaneous PET/MRS neuroimaging from a double-blind, placebo-controlled, randomized antidepressant trial on 60 participants with major depressive disorder (MDD) before initiating treatment. After eight weeks of treatment, those with ≤ 7 on 17-item Hamilton Depression Rating Scale were designated a priori as remitters (free of depression, 37%). Metabolic rate of glucose uptake (metabolism) from 22 brain regions were acquired from PET. Concentrations (mM) of glutamine and glutamate and gamma-aminobutyric acid (GABA) in anterior cingulate cortex were quantified from MRS. The data were randomly split into 67% train and cross-validation (n = 40), and 33% test (n = 20) sets. The imaging features, along with age, sex, handedness, and treatment assignment (selective serotonin reuptake inhibitor or SSRI vs. placebo) were entered into the eXtreme Gradient Boosting (XGBoost) classifier for training. Results: In test data, the model showed 62% sensitivity, 92% specificity, and 77% weighted accuracy. Pretreatment metabolism of left hippocampus from PET was the most predictive of remission. Conclusions: The pretreatment neuroimaging takes around 60 minutes but has potential to prevent weeks of failed treatment trials. This study effectively addresses common issues for neuroimaging analysis, such as small sample size, high dimensionality, and class imbalance.


Introduction
There is an urgent need for reliable prediction of potential antidepressant failure in treatment of major depressive disorder (MDD). Pretreatment electroencephalogram (EEG) has predicted antidepressant efficacy with performance considerable for clinical utility [1]. EEG electrodes are placed on the surface of the brain to obtain functional measures, leading to lower spatial resolution compared to molecular neuroimaging, where the source of signal may not be apparent [2]. Molecular imaging modalities, such as positron emission tomography (PET) with 2-deoxy-2-[ 18 F]fluoro-D-glucose (FDG) and proton magnetic resonance spectroscopy ( 1 H-MRS or MRS) have been useful for early assessment and risk stratification in patients with neurological, oncological and cardiovascular disorders [3]. Yet, they have not been proved clinically useful due to lack of accuracy [4].
FDG-PET is a sensitive indicator of cerebral function, leading to its extensive use for assessing response to the most widely used first-line MDD treatment, selective serotonin reuptake inhibitor (SSRI) [5]. However, these prior studies do not agree on the predictive region/measure and their prediction could not be replicated using conventional statistics [6].
Glutamine (Gln) is the precursor to the excitatory neurotransmitter, glutamate (Glu) and inhibitory neurotransmitter, γ-aminobutyric acid (GABA), and all three are essential amino acids for brain metabolism [7]. Prior magnetic resonance spectroscopy (MRS) studies showed that higher pretreatment Glu in the anterior cingulate cortex (ACC) [8] can predict better response, but other MRS studies showed no predictive potential for antidepressant response using Glu [9,10], GABA [9][10][11], or Glx (Gln/Glu) [12], adding to the lack of consensus on biomarkers of MDD treatment. These prior inconsistent findings using single imaging modalities with fewer functional features and conventional statistical models have motivated the use of machine learning (ML) with multimodal neuroimaging (both PET and MRS measures) for better prediction accuracy [13].
The purpose of this study was to identify biomarkers to predict remission (absence of depression) after eight weeks of treatment using pretreatment neuroimaging measures from i) PET and ii) MRS with a widely popular (due to superior performance) supervised, gradient tree boosting ensemble algorithm, eXtreme Gradient Boosting (XGBoost), which is a fast, scalable, and explainable artificial intelligence (AI) classifier with strong regularization [14]. XGBoost allows the adjustment of multiple hyperparameters to avoid overfitting and automatically ranks the most predictive features that can be used as biomarkers for remission. This article will have the following contribution regarding predictive biomarkers for antidepressant treatment.

•
This study provides a novel instance of the development of a comprehensive machine learning model by integrating pretreatment brain functional measures from simultaneous PET/MRS in one framework.

•
The pretreatment simultaneous PET/MRS used in the study takes around 60 minutes and has the potential to prevent weeks of failed treatment trials.
• Applying predictive measures from neuroimaging can reduce delay to effective treatment, patient suffering and economic burden, and enhance long-term functional outcomes.

•
The findings may assist clinicians with treatment selection and shed light on the neurobiology of remission.

Study cohort
This study involved the analysis of simultaneously acquired PET/MRS neuroimaging data from a randomized, placebo-controlled, double-blind, single-site SSRI (Escitalopram) trial on 60 participants with MDD before initiating treatment. The study design and rationale for the data set have been previously described [6,10]. 43 regional measures of metabolic rate of glucose (MRGlu, mg/(min*100 mL)) from FDG-PET across multiple brain regions [6], and concentrations (mM) of Glx (Glu+Gln, a composite peak formed by Glu and Gln), GABA and the ratio of Glx to GABA (Glx/GABA)) in the ACC from MRS [10] were quantified as described in prior publications. After eight weeks of treatment, those with ≤ 7 on the 17-item Hamilton Depression Rating Scale were designated a priori as remitters (free of depression).

Machine learning analysis
The ML model development with hyperparameters is illustrated in Fig. 1. The data partition was performed by random splitting into 67% train and cross-validation (n = 40), and 33% test (n = 20) sets, stratified by outcome (remitters vs. non-remitters) and treatment assignment (SSRI vs. placebo), to ensure comparable distribution in each data set. There is no established power calculation for XGBoost, so the conventional ML practice of evaluating the fitted model on validation set was used.
The hyperparameters for subsampling, number of trees, and depth of tree were optimized using GridSearchCV with 3 repetitions of stratified 10-fold cross-validation (preferred for depression research [16,17]). The 50 input features including the 46 imaging (43 PET and 3 MRS) measures, and information on age, sex, handedness, and treatment assignment were entered into XGBoost for training with optimized hyperparameters to predict remitters vs. non-remitters. The model performance was evaluated on the test data using confusion matrix (Fig. 2). Statistical and machine learning analyses were performed using STATA/SE 13.0 (StataCorp LLC, College Station, TX) and Python 3.9.0 (Python Software Foundation, Beaverton, OR).

Study cohort
The study consisted of 60 participants with an age range of 18 to 64 years (mean ± standard deviation: 30 ± 14 years). 37 (62%) were females, 30 (50%) were placed on SSRI and 51 (85%) were right-handed. After eight weeks of treatment, 22 participants remitted (37% remitters) and 38 participants did not (63% non-remitters). There was no significant difference between the non-remitters and remitters groups in the study sample in terms of age, sex, handedness, or treatment assignment.

Machine learning analysis
In the training set (n = 40), there were 14 remitters (35%) with 5 on SSRI and 9 on placebo, and 26 (65%) non-remitters with 15 on SSRI and 11 on placebo (scale_pos_weight = 1.86). The most predictive neuroimaging features based on "gain" (according to their contribution to the fitted model) are shown in Fig. 2. The cross-validated training and testing Receiver Operating Characteristic (ROC) Area under the Curves are shown in Fig. 3.
Supplemental Table 1 and Table 1 below show the performance of the fitted model on the unseen test data (n = 20) consisting of 8 remitters (40%) with 3 on SSRI and 5 on placebo, and 12 (60%) non-remitters with 7 on SSRI and 5 on placebo.

Novelty
This study was the first effort to build a comprehensive predictive model using simultaneous PET/MRS data from randomized clinical trial for prediction of remission. This was also the first time a gradient boosting decision-tree-based algorithm was used for this purpose. In this novel architecture, the XGBoost hyperparameters were utilized following oversampling and outlier removal, which resulted in the current model's weighted accuracy (77%) comparable to previous studies on the prediction of antidepressant treatment outcome with much larger sample size [17][18][19][20].

Generalizability
The splitting in the current study ensured an adequate test set (as opposed to the alternative practice of 80/20 or 90/10 train/test split) to protect against performance misestimation in MDD research [21]. This model's generalizability is further strengthened through Stratified Cross Validation which is particularly useful for analyzing small data sets with unbalanced classes, as seen in our data set with the remitters and non-remitters class. This technique allows similar proportion of different classes in each fold to ensure all strata of the data is well represented.

Regularization
Instead of feature reduction, the current model optimized regularization hyperparameters that ensures higher accuracy and better uncertainty assessment [22]. These XGBoost regularization hyperparameters, a.k.a. penalty terms alpha (L1, LASSO Regression) and lambda (L2, Ridge Regression), shrink the coefficients of less relevant features toward 0 [23]. Using this technique, this model achieved accuracy higher than a previous model with 59% accuracy that used feature reduction to select 25 most predictive variables for remission after 12 weeks of SSRI treatment from 164 patient-reportable variables [18]. Including more features also protects the predictive model performance from being affected by influential data points [19].

Performance
The true positive rate and true negative rate for classification in the held-out test data are reasonably high compared to the negligible values of false positive rate and false negative rate, attesting the acceptable performance of the model. At least 100 number of trees were used following convention when searching for the optimal number of trees, however the Receiver Operating Characteristic Area Under the Curve indicates that having a lower number of trees (~ 70) might have given slightly better performance.

Limitations
To address limitations related to small sample size in neuroimaging research, two most common data augmentation techniques for image classification including generative adversarial networks and unity game engine can be explored in future. However, this will require finding the optimal data augmentation strategy and developing evaluation systems to ensure quality of augmented data sets, while accounting for the computational cost for slower convergence.

Potential biomarkers
Even though there is a scarcity of predictive analytics using biomarkers with machine learning in neuropsychiatry, the insights from successful machine learning applications in neurological disorders such as stroke may be useful, considering the bidirectionality of stroke and depression [24]. The current model has been compared to predictive analytics using biomarkers from other modalities in Table 2. One such potential biomarker could be electromyography (EMG), since motor activity can be lower in depression [25][26][27][28][29], and increase with depression improvement [25,30,31]. The tree-based algorithm, random forest with real-time signals from thighs and calves of 287 participants has shown predictive accuracy > 90% for stroke [32]. Furthermore, electroencephalography (EEG) has been used for predicting stroke with Classification and Regression Trees (C&RT) algorithms with 89% accuracy [33]. With tree-based models such as C5.0 and random forest, EEG data has shown only around 70% accuracy for predicting stroke [34]. Pretreatment EEG measures have been useful for predicting antidepressant efficacy with > 87% accuracy using a mixture of factor analysis (MFA) classifier [1]. However, the biosignals received from EEG electrodes placed on the surface of the brain are less precise for locating the source of the signal as compared to PET/MRS used in the study [2]. Nonetheless, adding these cost-effective modalities to molecular neuroimaging may help develop a more comprehensive predictive model with improved sensitivity for predicting antidepressant response.

Conclusions
To our knowledge, this was the first effort to develop a gradient tree boosting classifier by integrating pretreatment multimodal molecular neuroimaging with easily interpretable brain functional measures in one framework, with accuracy comparable to previous predictive models. This study provides information on effectively addressing common issues related to neuroimaging analysis, such as small sample size, high dimensionality, and class imbalance. More importantly, the pretreatment neuroimaging takes around 60 minutes and has the potential to prevent weeks of failed treatment trials.  XGBoost receiver operating characteristic area under the curve.  Table 1 The performance metrics of the fitted model on test data.   [34] Wireless EEG devices used for data collection Expertise will be required for EEG data interpretation and tree-based algorithms showed 70% accuracy