Predicting outcome with Intranasal Esketamine treatment: A machine-learning, three-month study in Treatment-Resistant Depression (ESK-LEARNING)

predictors of response are still lacking. Thus, a tool that can predict the individual patients ’ probability of response to ESK-NS is needed. This study investigates sociodemographic and clinical features predicting responses to ESK-NS in TRD patients using machine learning techniques. In a retrospective, multicentric, real-world study involving 149 TRD subjects, psychometric data (Montgomery-Asberg-Depression-Rating-Scale/MADRS, Brief-Psychiatric-Rating-Scale/BPRS, Hamilton-Anxiety-Rating-Scale/HAM-A, Hamilton-Depression-Rating-Scale/HAMD-17) were collected at base-line and at one month/T1 and three months/T2 post-treatment initiation. We trained three different random forest classifiers, able to predict responses to ESK-NS with accuracies of 68.53% at T1 and 66.26% at T2 and remission at T2 with 68.60% of accuracy. Features like severe anhedonia, anxious distress, mixed symptoms as well as bipolarity were found to positively predict response and remission. At the same time, benzodiazepine usage and depression severity were linked to delayed responses. Despite some limitations (i.e., retrospective study, lack of biomarkers, lack of a correct interrater-reliability across the different centers), these findings suggest the potential of machine learning in personalized intervention for TRD.


Background
Predicting existing treatments' effectiveness for individual patients through precision medicine could personalize care (i.e., connecting the 'right patient' with the 'right treatment'), optimizing health system resources.
Machine-learning methods process large amounts of data to stratify patients based on specific features (i.e., clinical phenotyping) for tailored treatments. To date, accuracy values greater than 50 percent are considered practically acceptable (Dadi et al., 2021). Emerging evidence highlights the potential benefits of machine learning approaches in the context of depressive disorders, with applications in diagnosis and personalized treatment strategies (Aleem et al., 2022). These methodologies, by providing an unbiased approach to assess heterogeneous data, hold promise for addressing the notable variation observed in treatment outcomes in this field .
Patients with major depressive episodes who do not respond to two antidepressant treatments adequate in dose and duration are defined as 'treatment-resistant' (TRD) (Sforzini et al., 2022). TRD exerts a substantial health burden along with significant social and economic repercussions (Perrone et al., 2021;Zhdanava et al., 2021). Thus, identifying clinical indicators for predicting treatment response is of paramount importance (Perrone et al., 2021;Shah et al., 2021). Also, as TRD diagnosis occurs after two failed treatments, patients face significant disease duration before accessing second-level therapies. A European study revealed only 19.2% of TRD patients achieved remission after 12 months, with 69.2% unresponsive and 60% unchanged in treatment . Although several studies explore neuroimaging measures predictive of antidepressant response (Rajpurkar et al., 2020;Zhdanov et al., 2020), identifying clinical measures differentiating responders remains challenging.
Intranasal esketamine (ESK-NS) offers new hope for TRD with high response rates in RCTs (Daly et al., 2018;Popova et al., 2019) and naturalistic settings (Martinotti et al., 2022) (70% and 64%, respectively), and nowadays represents an evidence-based approach for TRD management . Naturalistic data also indicates the effectiveness of ESK-NS in several clinical presentations of TRD, like in elderly subjects (d' Andrea et al., 2023), among those with comorbid substance use disorder  or bipolar disorder . However, determining predictive clinical features is still unresolved since ESK-NS efficacy may vary among TRD patients, with some benefiting more than others.
ESK-NS interacts with the glutamatergic system, antagonizing NMDA receptors, resulting in varying efficacy among TRD patients (Zanos et al., 2018). Literature suggests those with high anxiety symptoms benefit more, while high disease severity predicts early negative response to intravenous esketamine (Jesus-Nunes et al., 2022;Lucchese et al., 2021). Unlike other treatments, childhood sexual abuse hasn't been found to predict response negatively (Lipsitz et al., 2021). Studies indicate intravenous ketamine's effectiveness on specific symptoms (e. g., cognition, anhedonia, suicidality, and psychosocial function), supporting the identification of clinical phenotypes for responders to glutamatergic agents, even though preliminary research of clinical moderators of response were inconsistent (Jawad et al., 2023;Price et al., 2022;Rong et al., 2018). These preliminary data highlight the need for more precise ESK-NS applications in TRD. Identification of response predictors may inform patient profiling, clinical decisions, and policies. This study investigates baseline clinical factors predicting ESK-NS response using machine-learning methods and evaluates which TRD patients are likelier to achieve clinical remission after three months of ESK-NS treatment.

Participants and ESK-NS treatment administration
The REAL-ESK study, an observational, retrospective, and multicentric investigation, analyzed ESK-NS use in TRD as part of an early access program (Martinotti et al., 2022). In compliance with guidelines from the Agenzia Italiana del Farmaco (AIFA, i.e., the Italian regulatory drug agency), subjects included were diagnosed with a Major Depressive Episode (MDE) as part of a major depressive disorder or bipolar disorder and met the following criteria: a) in the context of MDD, failed response to at least two prior antidepressant treatments at adequate doses, duration, and adherence, following TRD consensus criteria (Sforzini et al., 2022) while in the context of BD, failed at least two adequate trials (in dosage achieved and duration) from 2 classes of antidepressants and two classes of mood stabilizers, following the operational definition of Murphy and colleagues (Murphy et al., 2014) b) current treatment with at least one SSRI or SNRI; and c) age ≥ 18 years (EMA 2019). Exclusion criteria included comorbid medical diseases, such as untreated hypertension or prior cerebrovascular disorders, which contraindicate ESK-NS administration (EMA 2019). One-hundred-forty-nine patients with TRD (55% female, 45% male; mean age = 52.31 ± 12.28 years) were recruited across various Italian mental health facilities, as detailed in prior publications from the REAL-ESK study group ).

Baseline predictors and outcome measures
Anamnestic data and psychometric assessments were collected from patients' records at baseline (T0), one month (T1), and three months (T2) after treatment initiation. Data collected included sociodemographic factors, previous depressive episodes, all antidepressant trials (including TRD augmentation strategies), and psychiatric comorbidities (See Supplementary Materials).
Treatment outcomes included the treatment response and clinical remission from depression. A reduction of 50% or more in the MADRS total score was set as the threshold for the treatment response (Fedgchin et al., 2019). A reduction of symptoms below the MADRS total score of 10 defined the limit for the clinical remission (Frank et al., 1991).

Data preprocessing
Missing values are common in clinical and psychological data, potentially affecting the dataset size. Among various imputation techniques (Josse et al., 2019;Little, Roderick 2019), we chose to fill missing values with the average and added a feature indicating which values were imputed (Perez-Lebel et al., 2022).
As our class definition relies on MADRS scores at different timepoints, missing values limit subject inclusion in analyses. We imputed MADRS scores using a linear regression model to prevent a significant cohort reduction. A separate dataset of 26 TRD subjects (see Supplementary Materials) informed the model, which predicted MADRS scores based on HAM-D scores. This model was applied to predict missing MADRS values in our study cohort.
We imputed values for ten subjects at T1 and nine subjects at T2, adding an extra variable to indicate MADRS score prediction. Ultimately, the dataset included 146 subjects at T1 and 115 at T2.

Models predicting treatment outcomes
We created machine-learning models with baseline clinical data to predict treatment outcomes (response and remission) and identify features driving the classifiersthose most informative for outcome prediction. Considering the data's heterogeneous nature, we chose ensemble methods, specifically random forest techniques, suitable for these tasks and relevant for post hoc analysis of the results (Breiman 2001). We conducted three analyses. The first two classified subjects as responders or non-responders based on a 50% or greater reduction in MADRS score between T1 and T0 and T2 and T0, respectively (Fedgchin et al., 2019). The third analysis categorized subjects as remitters or non-remitters at T2, with remission defined as a MADRS score below 10 (Frank et al., 1991). Both classifiers included T0 clinical variables, encompassing anamnestic and psychometric features (all variables are detailed in supplementary materials).
We trained a Random Forest (number of trees=100) using 75% of subjects (N = 105 at T1; N = 86 at T2) and tested the remaining 25% (N = 41 at T1; N = 29 at T2). We repeated the procedure by shuffling subjects in the training and testing set 150 times and averaging accuracies in each cross-validation split (Varoquaux 2018).
Due to differing class sizes, we used Random Upsampling and balanced accuracy to prevent bias (He and Garcia 2009). We repeated upsampling 100 times to assess the robustness and reported the standard deviation.
We extracted factor importance from the classifier, utilizing the normalized average Gini importance coefficient (Franklin 2005), which is based on how often the feature is used to build the trees and the hierarchy of the feature in the trees. We selected features with statistically significant values (p<0.05) based on permutation tests on the Gini coefficient. Finally, we analyzed the dependence of feature value on the predicted outcomes using partial dependence analysis, which can be interpreted as a probability marginalization over the feature values (Franklin 2005). Permutation tests (n = 200) were used to assess the significance of the results and feature importance as the gold standard for evaluating machine-learning algorithms (Combrisson and Jerbi 2015;Varoquaux 2018). Class labels are shuffled several times (n = 200), and classification is performed without a relationship between labels and features; this allows us to build a null-distribution for accuracy and feature importance and then compare non-shuffled results with the obtained distribution.

Ethics
The study was conducted following the Helsinki Declaration (WMA, 2013), ensuring the confidentiality and anonymity of patient data. The University of Brescia's ethics committee approved the study (Protocol Number: NP5331). The study protocol was published in the open-access journal of the Italian Society of Psychiatry (D' .

Sociodemographic and clinical characteristics of the sample
The analysis included 149 TRD individuals, with extensive sociodemographic and clinical characteristics detailed in Table 1 (see also Supplementary Material).

Machine learning evaluation
We developed three distinct random forest classifiers to explore: a) the most predictive baseline variables for early response or non-response (one month); b) those for response or non-response at three months; and c) those for remission or non-remission from the MDE at three months.

Benzodiazepine use and depression severity reduce early response at one month, while inner tension predicts rapid response
We trained a random forest to predict treatment responsiveness at T1, achieving an overall accuracy of 68.53% (SD 0.96%; p<0.005, permutation-test, n = 200). The most predictive variables are shown in Fig. 1A, while partial dependence plots for each variable are presented in Fig. 1B.

Anhedonic and mixed features predict response at three months
A second random forest classifier was trained to predict responsiveness three months after ESK-NS treatment initiation. The model achieved an average accuracy of 66.26% (SD 1.18%; p<0.005, permutation test, n = 200). Fig. 2A displays the most informative variables, while Fig. 2B shows partial dependence plots for each variable and their relationship to response/unresponsiveness.
Most predictive variables included inner tension (MADRS item-3), pessimistic thoughts (MADRS item-9), anhedonia (MADRS item-8), reported sadness (MADRS item-2), concentration difficulties (HAM-A item 5), somatic anxiety (HAM-D item 11), hyperthymic temperament, feelings of guilt (HAM-D item 2), restlessness (HAM-A item 14), fears (HAM-A item 3), cardiovascular anxiety symptoms (HAM-A item 9), and failure of previous rTMS treatment. Partial dependence plots revealed mixed influences, with some variables being positive predictors (inner tension, anhedonia, reported sadness, concentration difficulties, hyperthymic temperament, pessimistic thoughts, and feelings of guilt), others negative (previous rTMS treatment), and some with mixed patterns (see Fig. 1. Most predictive variables for T1-response prediction. In panel A, the figure highlights the statistically significant features of the random forest classifier. The features are shown in ascending order from most to least informative, measured using the normalized Gini importance index. The variables with a statistically significant importance index, evaluated using permutation tests, are plotted. In panel B, the plot shows the partial dependence of each important variable on the responsiveness outcome. The x-axis indicates the values the variables can assume, while the y-axis specifies the probability of being a responder (y = 1) or non-responder (y = 0). The solid line represents the average partial dependence of the random forest. Fig. 2B) (somatic anxiety). These findings suggest that various variables can influence remission response depending on symptom intensity patterns (Fig. 2B).

Recurrence, suicidality, and obsessive thoughts may impede remission at three months, while anhedonia and bipolarity could predict successful remission
Finally, we trained a third random forest classifier to evaluate predictors of remission after three months of ESK-NS treatment. The model achieved a classification accuracy of 68.60% (SD 1.10%; p<0.005, permutation tests, n = 200).
Partial dependence plots revealed some features as positive predictors of remission (anhedonia, anxiety, comorbidity with bipolar disorder, motor tension, emotional blunting), others as negative predictors (number of previous MDEs, previous rTMS treatment, obsessivecompulsive symptoms, suicidality), and some with mixed directions (depressed mood, pessimistic thoughts, feelings of guilt, psychic and somatic anxiety) (Fig. 3B).

Discussion
This is the first machine learning study examining factors predicting response and remission in patients with TRD treated with ESK-NS. This statistical approach offers valuable insights into identifying phenotypes responsive to ESK-NS, informing treatment selection. Our prediction model accurately estimates outcomes: one-month response, 68.53%; three-month response, 66.26%; three-month remission, 68.60%.

Responder profiles: the role of anhedonia, anxiety, and bipolar features
Our study highlights the importance of anhedonia and hopelessness/ pessimism as predictors of positive outcomes after three months of ESK-NS treatment, influencing both response and remission. This aligns with Fig. 2. Most predictive variables for T2response prediction. In panel A, the figure highlights the statistically significant features of the random forest classifier. The features are shown in ascending order from most to least informative, measured using the normalized Gini importance index. The variables with a statistically significant importance index, evaluated using permutation tests, are plotted. In panel B, the plot shows the partial dependence of each important variable on the responsiveness outcome. The x-axis indicates the values the variables can assume, while the y-axis specifies the probability of being a responder (y = 1) or non-responder (y = 0). The solid line represents the average partial dependence of the random forest.
Baseline measures of psychic activation and comorbidity with anxiety disorders were positive predictors of ESK-NS treatment response. These dimensions encompass inner tension, restlessness, motor tension, psychic anxiety, and general anxiety symptoms. Prior research indicates ketamine and esketamine's anxiolytic effects in treatment-resistant unipolar and bipolar depression McIntyre et al., 2020McIntyre et al., , 2021. Comorbidity with anxiety symptoms also positively impacted esketamine treatment response . As ketamine derivatives were initially anesthetic agents, their anxiolytic effect is consistent. Our findings suggest a new perspective for TRD treatment algorithms, highlighting anxiety's key role. Anxiety is often considered a predisposing factor for TRD due to poor response rates to conventional antidepressants and frequent co-occurrence during MDEs (Cepeda et al., 2018;Maj et al., 2020).
Our analysis emphasizes the potential predictive role of psychic activation and mixed-related symptoms (restlessness, inner tension, motor tension) for response/remission to ESK-NS treatment. Interestingly, inner tension emerged as a strong predictor of positive outcomes at T1 and T2. Often linked to mixed features, these symptoms affect around 30% of TRD subjects during a MDE (Suppes and Ostacher 2017). Previous studies McIntyre et al., 2020) suggest glutamatergic agents' potential in treating mixed symptoms in TRD due to their pharmacodynamic action, such as modulating neuronal excitability (d' Andrea et al., 2023). Hyperthymic temperament and comorbidity with bipolar disorder also indicate a good likelihood of treatment response. This holds considerable implications, particularly about the involvement of coexisting features in MDEs within the framework of Fig. 3. Most predictive variables for T2remission prediction. In panel A, the figure highlights the statistically significant features of the random forest classifier. The features are shown in ascending order from most to least informative, measured using the normalized Gini importance index. The variables with a statistically significant Gini index, evaluated using permutation tests, are plotted. In panel B, the plot shows the partial dependence of each important variable on the responsiveness outcome. The x-axis indicates the values the variables can assume, while the y-axis specifies the probability of being a responder (y = 1) or non-responder (y = 0). The solid line represents the average partial dependence of the random forest.
TRD and TRDBD. Prior empirical data suggests an increased propensity for non-responsiveness among patients exhibiting mixed features concurrent with an MDE (Fornaro et al., 2020).
Growing evidence supports ketamine's effectiveness and safety in bipolar depression (Bahji et al., 2021;Wilkowska et al., 2021), while evidence for esketamine is preliminary Martinotti et al. 2023) (e.g., no RCTs for ESK-NS). About 60% of initially diagnosed unipolar MDEs are reclassified as bipolar depression (McIntyre and Calabrese 2019), with conversion rates increasing over time (Kessing and Andersen 2017). Shifting to a bipolar spectrum perspective, "soft" or "attenuated" features like hyperthymic temperament, mixed features, family history of BD, and (hypo)manic switches with antidepressants (Akiskal 2003) are linked to recurrences and treatment resistance in MDEs (Mazzarini et al., 2018). Our results suggest considering bipolar spectrum-related features in TRD treatment algorithms as predictors of ESK-NS response. Studying depression sub-phenotypes is crucial, as some forms unresponsive to conventional treatments may respond to novel glutamatergic therapies and belong to the bipolar spectrum.

Non-responder profile: prior unsuccessful rTMS, obsessive thoughts, recurrences, and suicidalit as negative predictors
Our model identifies significant negative predictors of response, offering valuable insights for clinicians. Notably, prior inefficacy of rTMS treatment is a negative predictor. This contrasts with earlier studies on ketamine, wherein its efficacy was similar for individuals with and without a history of neurostimulation treatments (Rodrigues et al., 2022).
However, although ESK-NS and rTMS are distinct antidepressant therapies, they may share underlying mechanisms of action. Both target prefrontal areas to restore proper connectivity between prefrontal regions and the cingulate cortex (Arnsten et al., 2023). While rTMS primarily targets left DLPFC hypoactivation, ESK-NS directly impacts the cingulate cortex, reducing hyperactivation and indirectly restoring top-down control (Arnsten et al., 2023). We could speculate that individuals not responding to rTMS and ESK-NS may have depression without prominent DLPFC-cingulate cortex imbalance. In this perspective, a recent head-to-head preliminary study shows similar response and remission rates in TRD patients treated with rTMS and ESK-NS, with the former showing a more rapid action . However, our finding and previous data doesn't exclude exploring the potential synergistic effects of these combined approaches.
The presence of obsessive symptoms was linked to negative responses. Obsessive thoughts often manifest as ruminative thoughts in depressive episodes (Ehring 2021). Maladaptive traits like anankastia may contribute to depression recurrence and pharmacological resistance, highlighting the need for combined approaches . Aberrant activity in the Default-Mode Network has been linked to ruminative and obsessive thoughts (Koch et al., 2018;Tozzi et al., 2021). Glutamatergic agents act on prefrontal components of the executive control network (Arnsten et al., 2023), but their impact on DMN activity is less understood (Hamilton et al., 2015). We speculate that ruminative/obsessive depression resistant to ESK-NS may respond to therapies rebalancing DMN activity (Carhart-Harris et al., 2012), like psychedelics (Barba et al., 2022).
Intriguingly, higher baseline suicidality predicts poor ESK-NS treatment response at T1 and T2. This, however, does not contest the previously documented anti-suicidal properties of ESK-NS (Mahase 2021), as our study did not investigate the drug's direct impact on suicidal ideation. Our finding underscores the intricate nature of suicidality. From a clinical perspective, suicidality is a remarkably heterogeneous construct, spanning an array of clinical manifestations -from suicidal ideation to instances of deliberate self-inflicted harm. It is commonly associated with the co-occurrence of personality disorders and is typically recognized as an indicator of the most severe form of MDE.
Suicidality also amplifies the likelihood of recurrence, an additional adverse predictor identified in our model. Taken together, our data suggest that ESK-NS exhibits diminished efficacy in treating more complex, chronic forms of depression associated with important negative prognostic factors (i.e., suicidality) and characterized by a decrease in functional ability (i.e., difficult-to-treat depression) (McAllister-Williams 2022).

Late responders: could benzodiazepine use and depression severity delay ESK-NS response?
In our model, several factors interfered with early response to ESK-NS, with concurrent benzodiazepine use and depression severity being the most significant. This contrasts with a previous post-hoc analysis on ESK-NS RCTs that didn't find the use of benzodiazepine to affect ESK-NS action negatively (Diekamp et al., 2021). This difference could be due to the opposing effects of benzodiazepines and ESK-NS on glutamate/GABA balance. Benzodiazepines inhibit glutamate activation by activating GABAa interneuron receptors (Haefely 1984), while ESK-NS increases glutamate activity by antagonizing the NMDA receptor (Zanos et al., 2018). Thus, benzodiazepine use may slow ESK-NS action and reduce its efficacy.
Certain variables indicating high depression severity (psychomotor retardation, emotional withdrawal, self-neglect) predicted early unresponsiveness to ESK-NS. Notably, these features didn't predict T2 response and remission. This discrepancy suggests that high depression severity and benzodiazepine treatment might not impact ESK-NS effectiveness but rather delay response. Reducing benzodiazepines could speed up clinical response, and the most severe patients may need longer ESK-NS treatment for mood improvements.

Study limitations
A key limitation of our study is the heterogeneity of settings and methods due to its naturalistic design. This may have led to underestimating selection bias or inconsistent methods, but it also reflects real-world scenarios, enhancing the applicability of the results. Another limitation is the smaller sample size compared to previous machinelearning studies (Pigoni et al., 2019), which limited our ability to draw significant conclusions. However, the robust statistical analysis and cross-validation schema (Varoquaux 2018) suggest that the findings are not over-inflated. The T2 dropout rate (31/149 subjects) could also affect the study's overall significance. Developing inter-rater reliability among different centers wasn't possible, but evaluations were conducted by well-trained psychiatrists and clinical psychologists, ensuring good reliability and data reproducibility.
Another significant limitation of this study is the lack of biomarkers in our predictive model. Previous research has demonstrated that incorporating biological measures can amplify the overall precision of ML algorithms, additionally providing an objective quantification (Li et al., 2022). Given the inherent nature of this study-multicentric, real-world, and retrospective-the integration of such biological measurements was not feasible.

Conclusion
Our study suggests that machine-learning models can predict ESK-NS treatment outcomes using sociodemographic factors and clinical phenotyping. Despite limitations, these findings may aid clinicians in identifying subjects more likely to respond to ESK-NS. Anhedonic features, comorbid anxiety, and mixed symptoms predict better responses in TRD patients, while chronic and complex depressive disorders are less likely to achieve positive outcomes. The concurrent use of benzodiazepine and high depression severity delay the treatment response. If confirmed in larger samples, these machine learning-based results could help expand personalized medicine approaches in psychiatry.

Authors statement
All persons who meet the authorship criteria are listed as authors, and all authors certify that they have participated sufficiently in the work to take public responsibility for the content, including participation in the concept, design, analysis, writing, or revision of the manuscript. MP, GMar, GDL, RG, GdA, AdA, RSMcI, GMai, LM, SLS, conceptualized the hypothesis and the design of the study. GMar, SB, BDO,RZ, AS, MC, MdN,PDF, MM, RDC, SDF, GN, GRo, DN, RB, VM, AC, IA, AVi and SB were responsible for the patient recruitment and the collection of clinical data. The REAL-ESK Study Group contributed to the collection of clinical data. GdA, RCar, SC, RG, AdA, LDR and MP performed the statistical analysis, carried out data interpretation, and wrote the first draft of the manuscript. GMar, GDL, JR and RSMcI revised the manuscript and provided substantial comments. All authors contributed and approved the final manuscript.

Disclosures
Giovanni Martinotti has been a consultant and/or a speaker and/or has received research grants from Angelini, Doc Generici, Janssen-Cilag, Lundbeck, Otsuka, Pfizer, Servier, and Recordati. Ileana Andriola was a speaker at Janssen sponsored conference.
Gianluca Rosso has been a speaker and/or consultant from Angelini, Janssen, Lundbeck, Otsuka, Viatris.
Raffaella Zanardi has been a consultant/speaker for Baldacci and Italfarmaco.
Dr. Rosenblat is the medical director of the Braxia Scientific Corp, which provides ketamine and esketamine treatment for depression; he has received research grant support from the American Psychiatric Association, the American Society of Psychopharmacology, the Canadian Cancer Society, the Canadian Psychiatric Association, the Joseph M. West Family Memorial Fund, the Timeposters Fellowship, the University Health Network Centre for Mental Health, and the University of Toronto and speaking, consultation, or research fees from Allergan, COMPASS, Janssen, Lundbeck, and Sunovion.

Declaration of Competing Interest
The remaining authors declare that the research was conducted without any commercial or financial relationship that could be construed as a potential conflict of interest.

Funding/Support
This work was supported by the "Departments of Excellence 2018-2022″ initiative of the Italian Ministry of Education, University and Research for the Department of Neuroscience, Imaging and Clinical Sciences (DNISC) of the University of Chieti-Pescara.

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.psychres.2023.115378.