Verification of the brain network marker of major depressive disorder: Test-retest reliability and anterograde generalization performance for newly acquired data

Background: Recently, we developed a generalizable brain network marker for the diagnosis of major depressive disorder (MDD) across multiple imaging sites using resting-state functional magnetic resonance imaging. Here, we applied this brain network marker to newly acquired data to verify its test-retest reliability and anterograde generalization performance for new patients. Methods: We tested the sensitivity and specificity of our brain network marker of MDD using data acquired from 43 new patients with MDD as well as new data from 33 healthy controls (HCs) who participated in our previous study. To examine the test-retest reliability of our brain network marker, we evaluated the intraclass correlation coefficients (ICCs) between the brain network marker-based classifier ’ s output (probability of MDD) in two sets of HC data obtained at an interval of approximately 1 year. Results: Test-retest correlation between the two sets of the classifier ’ s output (probability of MDD) from HCs exhibited moderate reliability with an ICC of 0.45 (95 % confidence interval,0.13 – 0.68). The classifier distin- guished patients with MDD and HCs with an accuracy of 69.7 % (sensitivity, 72.1 %; specificity, 66.7 %). Limitations: The data of patients with MDD in this study were cross-sectional, and the clinical significance of the marker, such as whether it is a state or trait marker of MDD and its association with treatment responsiveness, remains unclear. Conclusions: The results of this study reaffirmed the test-retest reliability and generalization performance of our brain network marker for the diagnosis of MDD.


Introduction
Psychiatric disorders including major depressive disorder (MDD) are thought to be the result of brain circuit dysfunction (Insel and Cuthbert, 2015). However, in current medical practice, MDD is diagnosed by evaluating various symptoms, such as depressed mood and loss of interest, by detailed interviews, and there is no objective biological diagnostic method.
In recent years, resting-state functional magnetic resonance imaging (rs-fMRI) and machine learning techniques have been applied in an effort to create a classifier that can be used for an objective auxiliary diagnosis of MDD that reflects brain function. A meta-analysis of studies attempting to classify MDD and healthy individuals using such techniques was also performed, and it was reported that the results of studies using rs-fMRI had an average sensitivity of 85 % and specificity of 83 % (Kambeitz et al., 2017). However, few studies have verified the practical generalization of classifiers to data from other facilities. A recent study using multimodal MRI data from two centers reported being able to discriminate patients from healthy controls (HCs) with an area under the curve of 0.916 and accuracy of 84.8 % in a validation cohort (Sun et al., 2022), indicating the importance of multicenter joint research projects.
In the Strategic Research Program for Brain Sciences, which was conducted from November 2013 to March 2018, we were engaged in multi-center data analysis and the standardization of imaging protocols to verify the reproducibility of results as well as technological developments to overcome the differences between MRI scanners (Yamashita et al., 2019). By utilizing this technology, we developed a diagnostic brain network marker that can distinguish patients with MDD and HCs based on resting-state functional connectivity (FC) patterns with a probability of 66 %, even with a completely independent large dataset (Yamashita et al., 2020). However, all previous studies on the generalization of MRI biomarkers for psychiatric disorders were retrograde. That is, independent validation cohorts were acquired before or in parallel to the acquisition of the discovery cohorts and construction of the MRI biomarkers. A more stringent generalization test should utilize independent validation cohorts that are acquired after the biomarker is constructed, as in prospective clinical trials. This was one of the two main objectives of the current research.
In rs-fMRI, in addition to inter-facility differences, variability between measurements in the same person (Noble et al., 2019) is a barrier to the development of reliable biomarkers. Although we reported that our brain network marker achieved high accuracy in approximately 50 scans of nine healthy participants (mean accuracy of all participants = 84.5, 1SD = 12.8, across participants) (Yamashita et al., 2020), this result was for data that were measured at a short interval from subjects who were particularly accustomed to scanning. In order to verify that the reproducibility of our marker is sufficient for clinical use, it is necessary to check its reproducibility at longer intervals with a sufficient number of participants.
In this study, we applied this brain network marker for MDD diagnosis (Yamashita et al., 2020) to data collected after the end of the Strategic Research Program for Brain Sciences project, in order to examine if we could achieve the same result if we measure the same person twice at an interval of approximately 1 year and that sufficient sensitivity can be reproduced using data from new patients.

Participants
A total of 47 patients with MDD (22 males and 25 females; average age, 44.62 ± 11.78 years; 43 right-handed and 4 left-handed) were recruited from Hiroshima University Hospital and local clinics between April 2018 and December 2018. The patients had been diagnosed by an expert clinician using the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition; the Japanese version of the Mini-International Neuropsychiatric Interview (M.I.N.I.) (Otsubo et al., 2005;Sheehan et al., 1998) was administered at the time of participation to confirm the diagnosis. Exclusion criteria for the patient group included current or past manic episodes, psychotic episodes, alcohol dependence or/and abuse, substance dependence or/and abuse, and antisocial personality disorder based on M.I.N.I.
Thirty-nine HCs (21 males and 18 females; average age, 60.26 ± 12.83 years; 36 right-handed and 3 left-handed) were recruited between April 2018 and December 2018 from those who participated in our previous study (Yamashita et al., 2020) in 2017, and in the present study, we examined test-retest reliability between measurements taken at an interval of approximately 1 year. HCs were also screened with the M.I.N.I. to confirm they had no psychiatric disorder. The depressive symptoms of each participant on the day of the MRI scan were assessed using the Japanese version of Beck Depression Inventory-II (BDI-II) (Beck et al., 1996;Kojima et al., 2002).
The study was conducted in compliance with the relevant guidelines and regulations and the latest version of the World Medical Association's Declaration of Helsinki. The current study protocol was approved by the Ethics Committee of Hiroshima University. Before the administration of any experimental procedure, written informed consent was obtained from all participants.

Preprocessing and calculation of the resting-state FC matrix
The preprocessing and calculation of the resting-state FC matrix used have been described in detail elsewhere (Yamashita et al., 2020). We preprocessed the rs-fMRI data using FMRIPREP version 1.3.2 (Esteban et al., 2019). The first 10 s of the data were discarded to allow for T1 equilibration. Preprocessing steps included slice-timing correction, realignment, coregistration, distortion correction using a field map, segmentation of T1-weighted structural images, normalization to the Montreal Neurological Institute space, and spatial smoothing with an isotropic Gaussian kernel of 6 mm full width at half maximum. To analyze the data, we used ciftify toolbox version 2.0.X (Dickie et al., 2019). We used Glasser's 379 surface-based parcellations as regions of interest (ROIs) (Glasser et al., 2016); the blood oxygen level-dependent (BOLD) signal time courses were extracted from these 379 ROIs. A temporal bandpass filter was applied to the time series using a first-order Butterworth filter with a pass band between 0.01 and 0.08 Hz. Framewise displacement, which represents head motion between two consecutive volumes as a scalar quantity, i.e., the summation of absolute displacements in translation and rotation, was calculated for each functional session, and we removed volumes with a framewise displacement > 0.5 mm (Power et al., 2014). If the ratio of the excluded volumes after scrubbing exceeded 47 %, the participants were excluded from the analysis (Yamashita et al., 2020). As a result, the rs-fMRI data from four patients with MDD and six HCs were removed. Thus, we included 43 patients with MDD (20 males and 23 females; average age, 45.33 ± 11.78 years; 39 right-handed and 4 left-handed) and 33 HCs (17 males and 16 females; average age, 59.39 ± 12.38 years; 30 righthanded and 3 left-handed) for further analysis. FC was calculated as the temporal correlation of rs-fMRI BOLD signals across 379 ROIs for each participant. Fisher's z-transformed Pearson's correlation coefficients were calculated between the preprocessed BOLD signal time courses of each possible pair of ROIs and used to construct 379 × 379 symmetrical connectivity matrices. We used 71, 631 FC values ([379 × 378] / 2) of the lower triangular matrix of the connectivity matrix for further analysis. In our previous study (Yamashita et al., 2020), in order to construct the classifiers, logistic regression analyses using the least absolute shrinkage and selection operator method were used to select the optimal subset of FCs. In this study, the performance of the classifiers created in our previous study (Yamashita et al., 2020) was evaluated with newly acquired data. Yamashita et al. (2020) As 100 classifiers of MDD (10-fold cross-validation × 10 subsamples) were created in the original study (Yamashita et al., 2020), we applied all of these classifiers to the new data set. Next, we averaged the 100 outputs (diagnostic probability) for each participant. Importantly data newly acquired at Hiroshima University were sent to Advanced Telecommunications Research Institute International without a diagnostic label, and the diagnostic probability calculated by Advanced Telecommunications Research Institute International was compared with the actual diagnostic label stored at Hiroshima University, so the persons responsible for the diagnostic probability computation were completely blinded to the diagnostic label. As mentioned earlier, the recruited HCs also participated in our previous study, allowing for longitudinal comparisons. We examined the test-retest reliability of the brain network marker by calculating intraclass correlation coefficients (ICCs) from the classifier's outputs in two sets of HC data from the same individuals acquired in 2018 and 2017. We then assessed the correlation between the averaged diagnostic probability value (probability of MDD) and the BDI-II score, and tested the sensitivity and specificity of the network marker by using the data newly acquired in 2018 from HCs and new patients with MDD. We considered a participant to be a patient with MDD if the probability of MDD was >50 %.

Test-retest reliability at a 1-year interval
The test-retest correlation between the two sets of classifier's output of HCs is shown in Fig. 1. The classifier's output exhibited moderate reliability with an ICC of 0.45 (95 % confidence interval = 0.13-0.68; P = 0.004).

Generalization performance for new MDD patients
We found significant correlations between the classifier's output (probability of MDD) and depressive symptoms (BDI-II score) (r = 0.26, P = 0.024), and the classifier distinguished patients with MDD and HCs with an accuracy of 69.7 % (sensitivity, 72.1 %; specificity, 66.7 %) (Fig. 2). As this study included people aged over 60 years who may already have functional and structural changes, we examined the effect of age on the classifier's output. As a result, there was no relationship between the classifier's output and age (r = 0.093, P = 0.423) (Fig. S2), and the correlation between the classifier's output and BDI-II score demonstrated stronger significance (r = 0.31, P = 0.006) in partial correlation analysis with the effect of age as a covariate.

Discussion
In this study, we confirmed that similar results can be obtained by measuring the same person twice with the brain network marker for MDD developed in our previous study (Yamashita et al., 2020), and that sufficient sensitivity can be reproduced for new patients with MDD.
Although we have reported the generalizability of the network marker to completely independent data obtained at different facilities in our previous study (Yamashita et al., 2020), repeated confirmation of the reproducibility of network markers is important when considering their clinical significance. An important advance in the present study was that whereas our previous study divided the existing data into discovery and validation cohorts to evaluate the performance of the network marker, we evaluated its performance on data acquired after it was developed. In addition, regarding the reliability of the network marker, we had verified its reliability using only a small number of healthy subjects at a short interval, but in the present study, we increased the number of subjects and verified its reliability at a relatively long interval. By utilizing a longitudinal rs-fMRI dataset of HCs obtained at an interval of approximately 1 year, we determined that the classifier's output (probability of MDD) was acceptably stable. Compared to a recent review of the test-retest reliability of FC indicating that individual edges show a weak ICC of 0.29 on average (Noble et al., 2019), our brain network marker exhibited moderate reliability with an ICC of 0.45. We assessed the magnitude of the variation of individual FCs in repeated measurements in our rs-fMRI by using the data from our previous study (Yamashita et al., 2019). As a result, we found that the variation between measurements was approximately 2.5 times the subject effect (slightly reduced by harmonization), and even in analysis limited to the same machine used in this study, it was slightly less than twice the subject effect (Fig. S1). Since the variability of individual FCs was very large in our own data, the result for test-retest reliability suggests that our brain network marker consists of a small number of connections that are robust against not only facility-to-facility differences but also inter-measurement variations.
By applying our brain network marker for MDD diagnosis to newly acquired data, the classifier distinguished patients with MDD and HCs with an accuracy of 69.7 % (sensitivity, 72.1 %; specificity, 66.7 %). This result was comparable to or even better than the 66 % accuracy for the validation dataset and 66 % accuracy for the discovery dataset in our previous study (Yamashita et al., 2020). Generalization performance was confirmed again with the same or even better accuracy, indicating that our brain network marker for MDD diagnosis is highly stable. In the future, we will clarify the clinical significance of the brain network marker by conducting prospective studies and examining the relationship between longitudinal changes and treatment responsiveness.

Limitations
The data of patients with MDD in this study were cross-sectional, and the clinical significance of the marker, such as whether it is a state or trait marker of MDD and its association with treatment responsiveness, remains unclear. Repeated measurements of rs-fMRI were performed only on HCs in this study. Given that damage to brain structure or function may be more rapid and severe in patients with MDD compared to HCs, the test-retest reliability of the marker in patients with MDD needs further investigation.

Conclusions
The results of this study reaffirmed the test-retest reliability in HCs and generalization performance of our brain network marker for the diagnosis of MDD (Yamashita et al., 2020) using newly acquired and longitudinal data.

CRediT authorship contribution statement
GO, MK, and YO designed the study. GO, EI, SY, TK, YM, MT, AY, AM, OY, NY, TT, HJ, and YO collected the data. GO, TY, and AY performed analysis under the supervision of MK and YO. GO drafted the manuscript. GO, MK, and YO discussed the results and conclusions for editing the manuscript. All authors contributed to the article and approved the submitted version.

Conflict of interest
MK is an inventor of patents owned by the Advanced Telecommunications Research Institute International related to the present work (PCT/JP2014/061544 [WO2014178323] and JP2015-228970/ 6195329). AY and MK are inventors of a patent application submitted by the Advanced Telecommunications Research Institute International related to the present work (JP2018-192842). sites via the separation of site differences into sampling bias and measurement bias. PLoS Biol. 17, e3000042.