Machine Learning Predicts Treatment Response in Bipolar & Major Depressive Disorders

Diagnosis of bipolar disorder (BD) patients with complex symptoms presents a challenge to clinicians. Patients tend to spend more time in a depressive state than a manic state. In such complex cases, the current Diagnostic and Statistical Manual (DSM), which is not based on pathophysiology, can lead to misdiagnosis as major depressive disorder (MDD) and an imperfect or even harmful medication response. A biologically-based classification algorithm is needed to improve the accuracy of diagnosis. Osuch et al. (2018) presented a kernel support vector machine (SVM) algorithm to predict the medication-class of response from new patient samples whose diagnoses were unclear. Here we also utilize the kernel support vector machine (SVM) algorithm but with a few novel contributions. We applied the robust, fully automated neuromark independent component analysis (ICA) framework to extract comparable features in a multi-dataset setting and learn a kernel function for support vector machine (SVM) on multiple feature subspaces. The neuromark framework successfully replicates the prior result with 95.45% accuracy (sensitivity 90.24%, specificity 92.3%). To further evaluate the generalizability of our approach, we incorporated two additional datasets comprising bipolar disorder (BD) and major depressive disorder (MDD) patients. We validated the trained algorithm on these datasets, resulting in a testing accuracy of up to 89% (sensitivity 0.88, specificity 0.89) without using site or scanner harmonization techniques. We also translated the model to predict improvement scores of major depressive disorder (MDD) with up to 70% accuracy. This approach reveals some salient biological markers of medication-class of response within mood disorders. Highlights We demonstrate a DSM-free approach for predicting treatment response from resting-state functional magnetic resonance imaging (fMRI) data. We identify several replicable biomarkers using the approach. Our work has potential for clinical application by replacing trial-and-error in treating complex psychiatric disorders.


44
Patients without any evident symptoms of mania pose a difficult chal-45 lenge in differentiating bipolar disorder (BD) from major depressive disor-46 der (MDD) using the current "gold standard," the DSM (American Psychi-47 atric Association, 2013). BD can, in some cases, be misdiagnosed as MDD 48 because patients with BD generally spend more time in depressive than in 49 manic states (Judd et al., 2002). When it comes to the treatment response 50 of patients to the different medication classes, mood stabilizers (MSs) often 51 do not effectively treat MDD. At the same time, antidepressants (ADs) may 52 2 worsen BD type I. Fundamental differences between MDD and BD have been 53 widely reported in many cases (Osuch et al., 2018;de Almeida and Phillips, 54 2013; Bowden, 2005;Perlis et al., 2006). It is necessary to obtain the correct 55 mood diagnosis and medication class to best support the patient's recovery. 56 Group independent component analysis (ICA) is a widely used data-57 driven algorithm for conducting multi-subject fMRI studies. The most widely 58 used spatial group ICA approach identifies maximally spatially independent 59 spatial maps (SMs) using the temporal concatenation of multi-subject data. 60 It subsequently decomposes each subject's data into unique time courses 61 (TCs) and considerably variable SMs using a back-reconstruction technique 62 (Erhardt et al., 2011). This entirely data-driven approach can be challeng-63 ing to implement on asynchronous multi-dataset analyses due to the need to 64 analyze all the data together and the complexities of selecting and labeling 65 components. As an alternative, the subject-specific features can be computed 66 using spatially constrained ICA (scICA), which automatically and adaptively 67 estimates individual-level independent components (ICs) using a priori net-68 work templates as guidance. Several ICA algorithms available in Group ICA 69 of fMRI Toolbox (GIFT) (https://trendscenter.org/software/gift/) can be 70 used for scICA (Lin et al., 2010;Du and Fan, 2013). This work combines 71 scICA with component templates derived and replicated from multiple large 72 N (N>800) data sets and takes 74 individual subject's fMRI data as input.

73
Support vector machines (SVMs) are a set of supervised binary classifi-74 cation algorithms which can also be extended for regression and multiclass 75 classification (Vapnik, 1999(Vapnik, , 1998. The SVM algorithm incorporates a sam-76 ple selection mechanism, i.e., only the support vectors affect the decision 77 3 function. It constructs a maximal margin linear classifier in high-dimensional 78 feature space by mapping the original features via a kernel function. We can 79 define a unique kernel function for applying the SVM algorithm to classify 80 fMRI features such as SMs and TCs. We utilize the similarity measures of 81 subspaces in the commonly used kernel functions (Chang and Lin, 2001).

82
Here we extended the classification algorithm proposed in prior work to 83 multi-feature (i.e., using SMs and TCs) and multi-dataset cases (Osuch et al., 84 2018;Fan et al., 2011 (Osuch et al., 2018;89 Poldrack et al., 2016;Trivedi et al., 2016). We also used feature selection on 90 the model trained using known DSM-based MDD vs. BD, type I patients to 91 reveal neurophysiological differences in these populations. The longer-term 92 hope is that the algorithm will be helpful to predict AD vs. MS response 93 in complex patients whose DSM diagnoses are unclear. The focus on medi-94 cation class response provides a 'DSM-free' and potentially clinically useful 95 approach to identifying biological markers of medication-class of response 96 within mood disorders.

97
This work extends our prior work in several ways (Salman et al., 2021).

98
Previously, we reported results using thresholded SMs as features. Here 99 we use the unthresholded SMs, which lowers the number of false positives 100 and false negatives and results in better sensitivity/specificity. We also use 101 the unthresholded SM in conjunction with functional network connectivity 102 4 (FNC) to perform multi-feature prediction in Western data and extend the 103 framework in LA5C and EMBARC data. We also include the prediction 104 of MDD patient treatment response improvement scores in the EMBARC 105 dataset using the same algorithm.  acquisition can be found in prior work (Osuch et al., 2018).

122
MRI data were acquired using a 3.0T Siemens Verio MRI scanner at the 123 Lawson Health Research Institute using a 32-channel phased-array head coil.   For this analysis, we used the controls and BD subjects. Additional data 147 descriptions can be found in prior work (Poldrack et al., 2016). about the data acquisition can be found in prior work (Trivedi et al., 2016).

157
The patients were divided into four groups between 1 and 4 based on the 158 treatment response. Response to treatment was defined by being at least

162
We preprocessed all subjects' data from the three different datasets using 163 the Statistical Parametric Mapping (SPM) software (Friston, 2007). We 164 performed rigid body motion correction to correct for subject head motion, 165 followed by the slice-timing correction to account for the timing difference in 166 slice acquisition. We subsequently warped the fMRI data into the standard 167 Montreal Neurological Institute (MNI) space using an EPI template and 168 resampled the data to 3 × 3 × 3 mm 3 isotropic voxels. We further smoothed 169 the resampled fMRI images using a Gaussian kernel with a full width at half 170 maximum (FWHM) of 6 mm.

171
We took the following quality control measures after preprocessing. We 172 retained a subject for analysis if they had: (1) data with head motion amount-173 ing to less than 3 • rotation and 3mm transition along the whole scanning 174 period; (2) data with more than 120 time points in the fMRI acquisition; (3) 175 data providing a successful normalization to the whole brain as assessed by 176 visual inspection and spatial correlation with the template (Fu et al., 2021b).  (Björck and Golub, 1973;Fan et al., 2011).

207
Then subspace similarity between different subjects is defined correspond-208 ingly as Here p is short for projection, and k is the number of subspace dimensions.
where S(A, B) is the similarity in Eq. 1, and γ is the kernel parameter.

218
With this kernel function, we can build a SVM classifier. 219 We used the following formulation to estimate subspace similarities as We performed each of the following experiments using a 10-fold CV 237 around the kernel SVM framework described above. We repeated each ex- SMs. We noted these salient brain activity patterns for discussion. tivity and specificity were quite high, as reported in the table, and generally, 298 sensitivity was slightly higher than specificity.

299
We also report FNC-based classification results in each alternate row of 300  further improve results (Johnson et al., 2007;Fortin et al., 2017Fortin et al., , 2018.  We also separately ran classification experiment on the independent datasets 346 using 10-fold CV and SM and FNC features. In addition, we used the 347 medication-class of response as the target label in the Western data but 348 the diagnosis label in the case of LA5C and EMBARC data. In doing so, 349 we can demonstrate that the model trained on treatment response data can 350 also predict the DSM diagnosis, although perhaps slightly less accurately.  We used Eq. 3 to generate a kernel function consisting of multiple modal-

356
ities. This process can be improved by using a weighted approach or multiple 357 kernel learning approach (Tanabe et al., 2008;Gönen and Alpaydın, 2011).

358
The Riemannian distance measure is most useful when orthonormal basis