Neurobiologically Based Stratification of Recent-Onset Depression and Psychosis: Identification of Two Distinct Transdiagnostic Phenotypes

BACKGROUND: Identifying neurobiologically based transdiagnostic categories of depression and psychosis may elucidate heterogeneity and provide better candidates for predictive modeling. We aimed to identify clusters across patients with recent-onset depression (ROD) and recent-onset psychosis (ROP) based on structural neuroimaging data. We hypothesized that these transdiagnostic clusters would identify patients with poor outcome and allow more accurate prediction of symptomatic remission than traditional diagnostic structures. METHODS: HYDRA (Heterogeneity through Discriminant Analysis) was trained on whole-brain volumetric measures from 577 participants from the discovery sample of the multisite PRONIA study to identify neurobiologically driven clusters, which were then externally validated in the PRONIA replication sample (n = 404) and three datasets of chronic samples (Centre for Biomedical Research Excellence, n = 146; Mind Clinical Imaging Consortium, n = 202; Munich, n = 470). RESULTS: The optimal clustering solution was two transdiagnostic clusters (cluster 1: n = 153, 67 ROP, 86 ROD; cluster 2: n = 149, 88 ROP, 61 ROD; adjusted Rand index = 0.618). The two clusters contained both patients with ROP and patients with ROD. One cluster had widespread gray matter volume deficits and more positive, negative, and functional deficits (impaired cluster), and one cluster revealed a more preserved neuroanatomical signature and more core depressive symptomatology (preserved cluster). The clustering solution was internally and externally validated and assessed for clinical utility in predicting 9-month symptomatic remission, outperforming traditional diagnostic structures. CONCLUSIONS: We identified two transdiagnostic neuroanatomically informed clusters that are clinically and biologically distinct, challenging current diagnostic boundaries in recent-onset mental health disorders. These results may aid understanding of the etiology of poor outcome patients transdiagnostically and improve development of stratified treatments.

same biological processes may be provided with a different diagnosis, a practice that may have detrimental effects on outcome prediction development (22)(23)(24). Recent research has highlighted this mismatch between diagnostic labels and the clinical and neuroanatomical picture in depression and psychosis (25), and heterogeneity may be particularly pronounced in early stages of developing mental health disorders (26)(27)(28)(29)(30). The lack of biological validity of diagnostic groups is thought to be one of the major reasons for poor biomedical translation in psychiatry (31)(32)(33).
Only 20% of people with psychosis and 25% of people with depression achieve full remission and response to pharmacological treatment, with the remainder achieving partial response or response without remission (34)(35)(36)(37). Biologically driven illness models, able to relate to those at highest risk of poor outcome and chronicity, may allow new and targeted treatments to be delivered early (22). However, recognizing patients on a path to chronic disability, at an early stage, is still difficult in both psychosis and depression (38,39). Previous transdiagnostic research has stressed the need for the use of machine learning (40) and has identified specific patterns of neurocircuit disruption across major psychiatric disorders in emotional reactivity and regulation (41). Reininghaus et al., building on previous calls for a dimensional approach to psychosis (42), have shown the use of multidimensional item response modeling to predict psychosis biotypes transcending traditional diagnostic boundaries, with suggestion of an underlying transdiagnostic dimension across psychotic diagnoses (43)(44)(45). Recent semi-supervised machine learning studies using neuroanatomical data have identified the presence of an impaired neuroanatomical cluster that is characterized by overall poorer outcomes and functioning in schizophrenia (46) and in youth with internalizing symptoms (47). However, there has not yet been a transdiagnostic investigation of neuroanatomy specifically in depression and psychosis.
Herein, we aimed to identify replicable neuroanatomical clusters across patients with recent-onset depression (ROD) and recent-onset psychosis (ROP). We hypothesized that neuroanatomically derived clusters would be transdiagnostic and related to distinct phenotypes drawn from symptom, neurocognitive, and inflammatory data across both disorders. We further aimed to explore the predictive validity of neuroanatomically identified clusters and externally validated our neuroanatomically based clusters in chronic depression and chronic schizophrenia in an accelerated longitudinal design. We also developed supervised machine learning models to predict symptom remission in ROP and ROD and our neuroanatomically based transdiagnostic clusters. We hypothesized that models developed in neuroanatomically based transdiagnostic clusters will show greater predictive accuracy than those in traditional diagnostic groups.

Study Design
This study uses data from the PRONIA study, an EU-FP7-funded seven-center study, and three external validation datasets. Details of the PRONIA study sites, recruitment protocol, and quality control procedures can be found in Supplement sections 1.1, 1.2 and 1.3 (Tables S1-S3) and a prior publication (48). Data used in this analysis included structural magnetic resonance imaging (MRI), demographic, clinical, neurocognitive, and blood-based biomarker measures. See the Supplement for full details.

Inclusion and Exclusion Criteria
In brief, participants with ROP had to meet the following criteria: 1) DSM-IV-TR (49) affective or nonaffective psychotic episode (lifetime), 2) criteria for DSM-IV-TR affective or nonaffective psychotic episode fulfilled within the past 3 months, and 3) onset of psychosis within the past 24 months. Patients with ROD had to meet the following criteria: 1) DSM-IV-TR major depressive episode (lifetime), 2) major depressive disorder criteria fulfilled within the past 3 months, and 3) duration of first depressive episode no longer than 24 months. General inclusion criteria can be found in Supplement section 1.5.

MRI Data Acquisition, Quality Control, and Preprocessing
Participants underwent a multimodal MRI protocol. A minimal harmonization protocol, with which the MR sequences across the different scanners had to comply, and imaging preprocessing is described in Supplement sections 1.3 and 1.4.

Semi-supervised Machine Learning Analysis
HYDRA (Heterogeneity through Discriminant Analysis) (50) is a semi-supervised machine learning clustering algorithm able to dissect disease heterogeneity by portioning patients based on patterns or transformations between the subpopulations (i.e., clusters) from the patient group and the reference group (i.e., healthy control [HC] subjects) through the use of a convex polytope formed by the combination of multiple linear max-margin classifiers (i.e., support-vector machines [SVMs]) and is able to regress out nuisance covariates, such as age and sex. We used the python version of HYDRA (50) to simultaneously classify patients (ROP + ROD) from HC subjects and partition patients into clusters based on disease-related heterogeneity using structural MRI.

ComBat Harmonization
To mitigate site effects, prior to applying HYDRA, the R version of the ComBat harmonization technique was used (https://github.com/Jfortin1/ComBatHarmonization). ComBat uses an empirical Bayesian framework that removes variance attributed to scanner differences while retaining disease effects. To further ensure that disease variance would be retained distinct from scanner variance, ComBat was trained on HC subjects and then derived estimates were applied to the patients.

Model Training
We used whole-volume (GMV and cerebrospinal fluid) brain measures derived from 280 regions of the neuromorphometrics atlas parcellation (CAT12) (four regions excluded due to zero variance) from 577 participants with ROP and ROD and HC subjects (discovery sample of the PRONIA study). Patients with ROP and patients with ROD were grouped together into one patient group. HYDRA was trained using a repeated hold out cross-validation strategy (i.e., 1000 repetitions with 80% of the data for training in each repetition). Age, sex, and total intracranial volume were controlled as covariates. HYDRA was run for 2 to 8 clustering solutions, and adjusted Rand index was used to measure cluster stability. The most stable cluster solution was selected for further analysis. The statistical significance of clusters was assessed in three ways including testing our clustering solution against a Gaussian distribution, which assumes a dimensional severity explanation of our data. Details can be found in Supplement section 1.11.

Phenotype Characterization
Identified clusters were compared with each other and with HC subjects in terms of neurocognitive performance, blood-based biomarker (IL-1 receptor antagonist, S100B, IL-6, tumor necrosis factor α, CRP, transforming growth factor β, and BDNF [brainderived neurotrophic factor]) (Supplement section 1.6) and symptom differences (Positive and Negative Syndrome Scale, Beck Depression Inventory, Scale for the Assessment of Negative Symptoms) with univariate statistics corrected for multiple comparisons using false discovery rate. Neuroanatomical differences were examined using voxel-based morphometry (two-sample t test SPM12) to identify the brain regions on which the neuroanatomically derived clusters differed. See Supplement section 1.14 for further granular investigation of clinical and inflammatory marker differences between clusters.

Independent and External Validation
To examine the generalizability of neuroanatomically based clusters, we developed an SVM model using the 280 features on which our HYDRA model was trained (46) to classify patients from the discovery sample into the identified clusters. This SVM was applied to the PRONIA independent replication sample of patients with ROP and ROD (n = 404), collected at a different timescale from the discovery sample (May 2016 to February 2019). ComBat was trained on the replication HC group and applied to the replication transdiagnostic patient group to mitigate site effects in the replication dataset. The SVM validation model that was trained on the discovery data was then applied to the replication data.

Predictive Utility
We trained SVM models using symptom and blood-based biomarker data to predict symptom recovery (as defined by a Global Assessment of Functioning-Symptom [GAF-S] score of ≥61) (51) at 9 months. To assess the predictive utility within the neuroanatomically based clusters and within ROP and ROD groups, we trained four different SVM models (one for each different diagnosis of ROP, ROD, cluster 1, and cluster 2) and compared their predictive accuracy in terms of area under the receiver operating characteristic curve, balanced accuracy (BAC), sensitivity, and specificity. Details can be found in Supplement section 1.8. A detailed depiction of the analysis pipeline can be seen in Figure 1.

Demographic Information
A total of 155 participants with ROP, 147 patients with ROD, and 275 HC subjects from the discovery sample were included in the HYDRA semi-supervised machine learning analysis. The mean age of the ROP group was 25.3 (SD 5.5) years, the mean age of the ROD group was 25.9 (SD 6.2) years, and the mean age of the HC group was 25.5 (SD 6.4) years. The ROP group consisted of 96 male and 59 female patients, the ROD group had 66 male and 81 female patients, and the HC group had 107 male and 168 female participants. A summary of sociodemographic and clinical information is provided in Table 1. Sociodemographic and clinical information for the PRONIA replication and external validation samples (COBRE, MCIC, and MUC) is provided in Supplement section 1.9.

HYDRA Semi-supervised Machine Learning Analysis
The optimal clustering solution was two transdiagnostic clusters (cluster 1: n = 153, 67 ROP, 86 ROD; cluster 2: n = 149, 88 ROP, 61 ROD; adjusted Rand index = 0.618). Patients in cluster 1 had a mean age of 26.2 (6.2) years, and those in cluster 2 had a mean age of 24.9 (5.4) years. There were 78 male and 75 female patients in cluster 1 and 84 male and 65 female patients in cluster 2. The two clusters did not differ in terms of age (p = .071), sex distribution (p = .358), total intracranial volume (p = .144), or medication exposure and differed in terms of original diagnosis distribution (p = .008). A sociodemographic and clinical description of the two clusters can be found in Table 1.

Cluster Statistical Significance
The clusters were statistically significant 1) in terms of whether they would be different than if there was no disease-related variability present (p = .010), 2) in terms of whether the disease structures were different (p < .001), and 3) in terms of whether the data could be better explained by a single Gaussian distribution (p < .001), suggesting that our data could not be explained in terms of a single Gaussian (continuous) distribution assuming a dimensional severity model. Details of the statistical significance tests can be found in Supplement section 1.11.

Voxel-Based Morphometry Analysis of Neuroanatomically Based Clusters
We conducted a voxel-based morphometry analysis for the purpose of demonstrating the brain regions in which the two clusters differed. Here, cluster 2 exhibited widespread GMV loss compared with cluster 1 and HC subjects in areas including the superior temporal gyrus, cingulate gyrus, and thalamus, among others. Cluster 1 revealed increased GMV compared with HC subjects in cerebellar areas. These results can be seen in Figure 2 and in the Supplement (Tables S7 and S8, Figure S2).

Independent and External Validation
In independent validation, the two-cluster model showed generalizability in the PRONIA replication sample, with patients classified into the two clusters in the replication sample showing similar clinical and neuroanatomical patterns to the ones from the discovery sample (Supplement section 1.18). When externally applied to the MCIC and COBRE (chronic schizophrenia) and MUC (chronic depression) datasets, patients from datasets with a higher mean age and/or longer duration of illness were more often placed in cluster 2, as indicated by negative decision scores. The effects of duration of illness and age were statistically significant (F 2,278 = 27.88, p < .001). Post hoc analyses using the Tukey honestly significant difference post hoc criterion for significance indicated that the mean decision score was significantly lower in the MUC group than in the MCIC group (p < .001). Mean decision score differences between the MCIC and COBRE (p = .078) groups showed a trend toward statistical significance. The results can be seen in Table 2.

Prognostic Validation
Within the neuroanatomically based clusters, stacking a blood-based biomarker (IL-1 receptor antagonist, CRP, tumor necrosis factor α, BDNF, and transforming growth factor β) SVM model to a symptom data (baseline Positive and Negative Syndrome Scale, Beck Depression Inventory, and GAF-S individual item scores) SVM model (i.e., a combined model) increased accuracy for predicting symptomatic recovery at 9 months (GAF-S), with BAC of 71.2% for cluster 1% and 57.0% for cluster 2. This outperformed a similar stacked blood-based biomarker and symptom data SVM model predicting GAF-S in ROP and ROD groups (

DISCUSSION
In this study, we identified two transdiagnostic clusters across psychosis and depression, using semi-supervised machine learning and neuroanatomical data in a large sample of patients with ROD and ROP. Both clusters contained similar numbers of patients with depression and psychosis; however, they were clinically distinct, with one cluster being characterized by more general and negative symptom loading, functional impairment, and widespread GMV loss (hereafter called the impaired cluster), and one cluster characterized by fewer symptoms, less GMV loss, and less functional impairment but more core depressive symptomatology (hereafter called the preserved cluster). The neuroanatomically based clusters were generalizable to a replication sample and further externally validated in three datasets of patients with chronic illness. Patients with chronic illness, with a higher duration of illness and mean age, were more likely to be classified into the impaired cluster. We were further able to demonstrate that SVM learning models using clinical and blood-based biomarker data to predict symptom remission at 9 months showed a higher accuracy in the neuroanatomically derived clusters compared with traditional diagnostic categories.
The precise etiology of mental illnesses including psychosis and depression, remains elusive despite decades of research, with a stagnation in advance of new pharmacological and psychotherapeutic treatments (52)(53)(54). Our results suggest that current diagnostic categories, particularly in early stages of illness, may mask transdiagnostic phenotypes that include an identifiable group with greater impairment and poorer chance of remission across disorders. In our impaired cluster, patients had reduced GMV in areas that have been identified as central to the disease processes of both schizophrenia and depression, such as the superior temporal gyrus, anterior cingulate, insula, and thalamus (55)(56)(57)(58). In our analysis, a significant number of patients with depression, who may be perceived as having a less severe illness and better prognostic outlook than patients with psychosis, were ascribed to the impaired phenotype, suggesting that they are on a path toward poor outcome. Conversely, a significant number of patients with psychosis were not assigned to the impaired group and therefore potentially have an identifiable early signature of good prognosis, which was further indicated by the fact that predicting 9-month symptomatic outcomes in that group was more accurate than traditional diagnostic groupings.
Categorical diagnoses have survived because some individuals (specifically those with chronic established illness) do indeed fit within these nosological entities, and more valid solutions remain elusive to date (59). However, within the scope of affective and nonaffective major psychiatric diseases, the Kraepelinian dichotomy of dementia praecox and manic-depressive psychosis has long been challenged. Studies have shown that our understanding of the clinical and neurobiological distinction between disorders may be particularly challenging during early phases of illness (5,25,60,61). The concept of affective disorders as a differential diagnosis for psychosis, particularly in the early years of illness, is waning, with recent research suggesting a central and causal role for depression in the pathogenesis of psychosis and mutual biological underpinnings. This further challenges the distinction between affective and nonaffective pathways to psychosis (25,(61)(62)(63). Fischer and Carpenter (64) suggest that reducing heterogeneity in syndromes is essential to decisively address the Kraepelinian dichotomy. Despite the fact that dementia praecox does not directly map to nonaffective psychosis, the Verrücktheit (chronic nonaffective psychoses) made distinct in Kraepelin's first edition (1883) led to the (mis)understanding that schizophrenia was nonaffective (65). The impaired cluster, which contains both patients with schizophrenia and depression, has more cognitive symptoms and a brain signature that is identified in our chronic replication sample. Deficit schizophrenia is a concept introduced over 30 years ago to reduce clinical heterogeneity and suggests the existence of a homogeneous schizophrenia subtype with persistent trait negative symptoms (66). The impaired cluster we identified could be characterized as a transdiagnostic deficit cluster across depression and psychosis due to its higher load of negative symptoms, a previously proposed marker of the deficit syndrome across diagnoses (67). Furthermore, our findings of greater GMV reduction in the impaired cluster corroborate previous research that identified temporal GMV reduction as a marker of very poor outcome (68). Our neuroanatomically derived clusters contained both patients with depression and psychosis in recent onset, replicated in our independent PRONIA sample. This suggests lack of diagnostic hierarchy across depression and psychosis, and that some syndromes may hold equal weight in association with poor outcome regardless of relationship to diagnosis. These results add to the challenge of the separation between affective and nonaffective psychoses, with affective and psychotic diagnostic groups featuring in both clusters, corroborating previous studies that found that high affective symptom scores were equally common in patients with affective and nonaffective psychosis and question the clinical validity of such a distinction (69).
Our results support the common biological susceptibility model of psychiatric disorders and suggest that the biological underpinnings of disease course, at least in depression and psychosis, may be related to transdiagnostic mechanisms, which are potentially hidden by current nosological systems. A similar transdiagnostic model has previously been reported in genomic research, which has shown a certain degree of overlap in biological susceptibility to mental illness across mood and psychotic disorders; evidence of a transdiagnostic biological cause of major psychiatric disorders is evident with the identification of genetic variants that confer a transdiagnostic risk for bipolar disorder, major depressive disorder, and schizophrenia related to the major histocompatibility complex featuring in both schizophrenia and depression genome-wide association studies (70,71). Our finding that elevated proinflammatory cytokines add to predictive accuracy of poor outcome in an impaired phenotype suggests that this genomic immune influence may be ongoing in those on a path to poor outcomes. Schizophrenia GMV deficits in the hippocampus, temporal gyrus, and cerebellum are associated with genetic factors such as SATB2, GABBR2, and CACNA1C (72). A common genetic basis between risk for altered brain structure and neuropsychiatric disorders has been conferred by findings of risk variant enrichment associations with brain structural phenotypes across diagnoses (73). Our results suggest a transdiagnostic cluster of GMV impairment, suggestive of common biological underpinnings for poor outcome across depression and psychosis, with potentially more valid structures than traditional diagnostic categories for use in predicting symptomatic remission.
Heterogeneity and comorbidity may be especially pronounced in the early stages of these disorders; this creates diagnostic uncertainty and difficulties in predicting disease and treatment course (26)(27)(28)(29)(30). Our results suggest that a bottomup approach based on neurobiological data may be more reliable in the elucidation of patients with potential for greater impairment and offer a potential future solution for the diagnostic challenges of mental illness. Our external validation findings show that the impaired cluster potentially identifies patients who are on a path to chronic illness from early stages of illness, given that the majority of patients in the external validation sample with chronic illness fell into the same cluster as our impaired group. This has potentially significant clinical implications in terms of personalized treatment and focused recovery interventions. The fact that patients from chronic samples with a higher mean age and illness duration were more likely to be assigned to the impaired cluster could be an indication that our neuroanatomically based clusters identify an accelerated transdiagnostic brain aging effect in recent-onset samples, corroborating previous brain age studies (74,75).

Strengths and Limitations
This analysis exhibits several strengths including a large dataset with rich clinical, neurocognitive, biomarker, and imaging data from both ROP and ROD groups, independent and external validation, and significance testing of our clustering solutions (e.g., by testing whether the data could be better explained by a Gaussian distribution, which assumes a dimensional severity explanation of the data). Furthermore, the technique we used for the identification of subgroups (HYDRA) offers a solution to issues that are usually associated with clustering based on unsupervised machine learning models that are built on biological data such as the detection of groups that may reflect underlying nuisance variance such as age, sex, body type, and common ancestry (genetics) (76). Nevertheless, our results should be interpreted with caution because there are certain limitations. Due to the nature of our recent-onset sample and using an HC sample as a reference group in the semi-supervised model, there is a risk that the differences between the groups are not as marked as would be seen in more chronic cases. We addressed that limitation by performing permutation tests to robustly assess the significance of the identified clusters. Furthermore, our models were developed in recent-onset patients with a significantly lower mean age than that of our external validation samples. We addressed that limitation by following a robust pipeline that removed age and site effects while retaining disease variance in the data. Although we developed an accelerated longitudinal design with the use of recent-onset and chronic samples and had a 9-month follow-up for prediction of symptom remission, definitive findings would need large longitudinal datasets with repeated measures, such as functional outcome, over many years. Finally, we used only neuroanatomical features to parse neurobiological variance among complex clinical presentations. Psychiatric illness is not a single variable problem, and we have addressed that by examining whether the brainbased clustering solution is reflected in the phenotypic, cognitive, and inflammatory levels. Future studies should consider using multiple biological measures and larger population-level data to encompass the pleiomorphic nature of clinical entities such as depression and psychosis.

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript

Conclusions
Using semi-supervised machine learning, we were able to identify two neuroanatomically based transdiagnostic clusters. One cluster was characterized by an impaired functional and neurocognitive profile and greater symptomatic loading and GMV loss, while the other cluster was characterized by a more preserved neuroanatomical and reduced symptom signature. Our distinct impaired cluster included patients with depression and psychosis and may provide insight into transdiagnostic etiopathogenetic pathways of chronicity and poor outcome. The identified clusters have been derived in recent-onset samples using structural MRI and could eventually lead to the development of MRI-based prediction and decision-making tools. In external validation, older patients with longer duration of schizophrenia and depression were assigned to the impaired cluster, suggesting a potential identifiable transdiagnostic signature of chronicity and path to poor outcome at the early disease stages. Using clinical and blood-based biomarker data, we were able to predict symptomatic and functional remission more accurately in the derived clusters compared with traditional diagnostic groups. While such challenge to current diagnostic structures will need significant further replication and longer follow-up, identifying a transdiagnostic signature of poor prognosis has the potential to aid new and targeted treatment strategies across early stages of mental disorder. Analysis pipeline overview. This figure provides an overview of the analysis pipeline undertaken in this study. Patients with recent-onset psychosis (ROP) and recent-onset depression (ROD) were combined into one transdiagnostic group. ComBat was trained on healthy control (HC) subjects and applied to the patients to remove site-related variance from the data. HC and patient data were then entered into the HYDRA algorithm with age, sex, and total intracranial volume (TIV) added as covariates. HYDRA was trained using a repeated hold out cross-validation (CV) strategy (i.e., 1000 repetitions with 80% of the data for training in each repetition). The clusters were validated in the PRONIA replication sample and the three external datasets. Identified clusters were assessed for statistical significance and were then analyzed for clinical and voxel-based morphometry (VBM) differences. Furthermore, the predictive utility of the clusters was assessed. BAC, balanced accuracy; BDI, Beck Depression Inventory; BMI, body mass index; BDNF, brain-derived neurotrophic factor; COBRE, Centre for Biomedical Research Excellence; CRP, C-reactive protein; GAF-S, Global Assessment of Functioning-Symptom; IL-1ra, interleukin 1 receptor antagonist; MCIC, Mind Clinical Imaging Consortium; MUC, Munich; PANSS, Positive and Negative Syndrome Scale; PAT, patients; SVM, support-vector machine; TGFβ, transforming growth factor β; TNFα, tumor necrosis factor α; TR, transdiagnostic. Impaired cluster (cluster 2) gray matter volume reductions compared with the preserved cluster (cluster 1). Gray