Stratified psychiatry: Tomorrow’s precision psychiatry?

: Here we review the paradigm-change from one-size-fits-all psychiatry to more personalized-psychiatry, where we distinguish between ’precision psychiatry’ and ’stratified psychiatry’. Using examples in Depression and ADHD we argue that stratified psychiatry, using biomarkers to facilitate patients to best ’on-label’ treatments, is a more realistic future for implementing biomarkers in clinical practice. Here we review the paradigm-change from one-size-ﬁts-all psychiatry to more personalized-psychiatry, where we distinguish between ‘precision psychiatry’ and ‘stratiﬁed psychiatry’. Using examples in Depression and ADHD we argue that stratiﬁed psychiatry, using biomarkers to facilitate patients to best ‘on-label’ treatments, is a more realistic future for implementing biomarkers in clinical practice.

An infographic summarizing the more 'diagnostic based one-size-fits-all psychiatry' that is currently in use (left), to more 'prognostic' models such as Stratified Psychiatry (right-top) and Precision Psychiatry (right-bottom). In an 'ideal operationalization,' Precision Psychiatry can be conceived as a way of treating mental disorders based on replicable and objective markers (ranging from imaging to genetics and transcriptomics) that form an individual profile for every individual patient. The treatment interventions are tailored to the profile of the individual patient, thus addressing unique properties of individual patients and maximizing clinical response. Alternatively, Stratified Psychiatry is a way of subgrouping patients with similar biomarker-profiles to enhance the probability of clinical response to known and established treatments within a given disorder. The bargraph shows group-level response and remission rates derived from the largest non-pharma sponsored effectiveness trials or meta-analyses for various antidepressant treatments demonstrating indistinguishable response and remission rates between pharmacological and non-pharmacological treatments ( Cuijpers et al., 2021 ), as well as a lack of significant differences within modalities (n.s.; non-significant), as demonstrated by Cuijpers et al. ( Cuijpers et al., 2014 ) for various psychotherapies, for three different and widely prescribed antidepressants from iSPOT-D ( Saveanu et al., 2015 ) and for two different forms of rTMS by Blumberger et al. ( Blumberger et al., 2018 ) (CBT = Cognitive Behavior Therapy; IPT = Interpersonal Therapy; rTMS = repetitive Transcranial Magnetic Stimulation; iTBS = intermittent Theta Burst Stimulation; SSRI = Selective Serotonin Reuptake Inhibitor; SNRI = Selective Serotonin and Noradrenalin Reuptake Inhibitor).
argue that a 'stratified psychiatry' approach that utilizes biomarkers to stratify patients to existing -on-label -established treatments is a realistic conceptualization of precision psychiatry. Here, biomarkers are employed to identify specific inter-individual properties that make an individual patient more susceptible to responding to one relative to another treatment. We propose that stratified psychiatry is a pivotal, indispensable step towards precision psychiatry, also see figure 1 for more details on the distinction between current one-size-fits-all psychiatry, stratified psychiatry and precision psychiatry as well as supportive data.
In current clinical practice we often rely on a so called 'stepped care' model, where selection of treatments is mostly informed by initial efficacy and side effects, where in depression psychotherapy, SSRI's and SNRI's are considered first choice treatments and ECT and DBS are at the other end of the spectrum, mainly based on the side effect profiles and invasive nature of the treatment. How-ever, as visualized in Fig. 1 , various large studies and metaanalyses have demonstrated on the group level a lack of superiority within treatment-classes (e.g. different types of psychotherapy, different rTMS protocols and different drugclasses of antidepressants), as well as a lack of superiority across treatment classes (e.g. psychotherapy vs. antidepressants ( Cuijpers et al., 2021 )). Therefore, if on the group level there are no differences between these treatments, and thus assigning patients to one of these treatments by flipping a coin makes no difference for the likelihood of remission, the question arises, why don't we use biomarkers to 'stratify' patients to one of these treatments with the aim to enhance treatment response within sub-groups?
Several years ago, biomarkers that predicted response tended to be more generic predictors for non-response, i.e. the more abnormal the biomarker the worse someone's response to treatment. However, recently, data from sufficiently powered large scale studies such as the Inter-national Study to Predict Optimized Treatment in Depression or iSPOT-D have painted a different picture. iSPOT-D recruited 1008 MDD patients that were randomized to escitalopram, sertraline and venlafaxine and among other biomarkers, electrophysiological features (EEG and ERP's) were assessed before and after 8 weeks of treatment. Several studies from this trial have demonstrated that biomarkers can be 1) sex-specific, often cancelling-out effects in the combined group level without separating in males and females ( Arns et al., 2018Dinteren et al., 2015 ), 2) drug-class specific (e.g. SSRI vs. SNRI ) and 3) drug-specific ( Arns et al., 2016 a ), where clinical response can even be differentially predicted to two drugs from the same drug-class. This new reality has further opened-up possibilities for stratified psychiatry as an interim step towards a future of precision psychiatry. In the following we will illustrate the concept of stratified psychiatry further using recent examples in the treatment of depression (MDD) and ADHD, using electroencephalography (EEG) biomarkers. However, note that other biomarkers such as pharmacogenetics (for review see: Schaik et al., 2020 ) and MRI ( Cohen et al., 2021 ), and especially integration of biomarkers across domains and treatments are exciting avenues to further advance the notion of stratified psychiatry.

MDD
In MDD various EEG biomarkers have been described, (reviewed in more detail elsewhere ( Olbrich et al., 2015 )), albeit a recent critical meta-analysis questioned the utility of EEG biomarkers for guiding antidepressant response due to substantial publication bias, lack of replications and especially out-of-sample validations ( Widge et al., 2019 ). Without providing an exhaustive review, we here focus on one line of research to illustrate the treatment stratification concept, and also address some of the concerns raised by Widge and colleagues.
In 2001, Bruder and colleagues reported righthemispheric alpha dominance for female responders to an SSRI ( Bruder et al., 2001 ), which was replicated and confirmed in the iSPOT-D study. A right-frontal dominance of alpha (Frontal Alpha Asymmetry, FAA) was associated with remission to the SSRIs escitalopram and sertraline, in female depressed patients only . Interestingly, no effects were found for males and venlafaxine (SNRI) remitters vs. non-remitters. A retrospective assignment of female patients with a FA A < 0.0 to the venlafaxine and patients with FA A of ≥0.0 to an SSRI resulted in a 53% and 60% remission rate respectively, which can be considered a clinically meaningful improvement relative to the overall remission rate of 46% after randomization. This corresponds to a Number Needed to Treat of ca. NNT = 7 -an often-underrated measure for comparison of treatment decisions in psychiatry ( Pinson and Gray, 2003 ) which is close to an approximated NNT = 5 for remission after psychotherapy in MDD ( Pinson and Gray, 2003 ). This means, as many patients could benefit from an EEG-based stratified treatment decision in comparison to unstratified treatment as would benefit from psychotherapy in comparison to clinical management only. In addition to this sex-and drug-class specific EEG Biomarker, two further drug-specific EEG Biomarkers were reported, namely paroxysmal EEG activity and individual alpha peak frequency (iAPF). The presence of paroxysmal (epileptiform discharges) activity deemed a 3.2 lower likelihood of response to escitalopram and venlafaxine, and an opposite non-significant trend for sertraline ( Arns et al., 2016 a). In addition, a slow iAPF was found in responders to sertraline and no differences for escitalopram and venlafaxine ( Arns et al., 2016 a).
As explained above, if indeed prescribing one of the three iSPOT-D antidepressants based on flipping a coin does not yield better or worse outcomes, we reasoned that exploiting the observed differential response profiles for these EEG biomarkers could inform the right medication choice. We thus set-out to conduct the first prospective trial where these three biomarkers were used to assign patients to one of three antidepressants. The study was mainly a feasibility study, yet a significantly better clinical outcome was found for the EEG-informed group relative to the treatmentas-usual group ( Vinne et al., 2021 ), thereby also providing the first prospective validation for the use of EEG for treatment stratification. Interestingly, to address the concerns raised by Widge et al. ( Widge et al., 2019 ), Ip, Olbrich and colleagues ( Ip et al., 2021 ) recently performed an independent replication study on a range of EEG Biomarkers in which the FA A findings were independently replicated, thereby demonstrating carefully conducted out-of-sample validation as well as prospective utility.

ADHD
Various EEG Biomarkers have been described in the ADHD literature, with the Theta/Beta ratio as most frequently cited diagnostic biomarker, touted to have received FDA clearance (though see a critical appraisal here ( Arns et al., 2016b )). Nonetheless, this metric has not held-up well in meta-analyses ( Arns et al., 2013 ;Saad et al., 2018 ) thereby questioning its diagnostic usage. Various studies however, have investigated the utility of the EEG to predict treatment response. Using a qualitative EEG approach, in 2008 we reported that especially male children and adolescents with ADHD and a slow iAPF responded most poorly to treatment with methylphenidate (MPH) ( Arns et al., 2008 ). Sometime later, in the large multicenter iSPOT-A study that enrolled 336 children and adolescents with ADHD who were all treated with MPH, this initial finding was replicated and refined, where slow iAPF was most specifically associated with non-response in male adolescents with large effect sizes ( Arns et al., 2018 ). However, not until recently the potential clinical relevance of this biomarker became evident, when Krepel and colleagues reported that ADHD children and adolescents with a slow iAPF actually responded significantly better to neurofeedback compared to those with a faster iAPF ( Krepel et al., 2020 ). This also implicated the iAPF as a first trans-diagnostic EEG biomarker, with clinical implications for both depression and ADHD.
Inspired by these findings and noting that most studies had actually used different methods to calculate iAPF, we set out to optimize the algorithm behind this metric and validate it against a ground truth scenario of brain maturation in a sample of 4249 patients. The most biologically plausible permutation was then prospectively validated on MPH (iSPOT-A N = 336) and neurofeedback (Krepel et al.,N = 136) treatments ( Voetterl et al., 2021 ). Results confirmed that a low iAPF was associated with lower remission rates after MPH, but higher remission rates after neurofeedback, with stratification simulations demonstrating 20-27% increased remission rates if patients were only stratified to treatment based on iAPF. Tw o subsequent blinded out-of-sample validations in 1) a MPH trial from Loo and colleagues ( Loo et al., 2016 ) and 2) a Neurofeedback trial from the ICAN group ( Group et al., 2020 ) confirmed the correct predictions and demonstrated meaningful gains in remission when selected using this iAPF based Brainmarker-1 ( Voetterl et al., 2021 ).

'Let's make a deal', the monty hall problem: how partial information can be clinically meaningful
One notion from the above examples that is often hard to comprehend is that the lack or presence of a significant effect for one treatment vs. another treatment can actually be clinically meaningful. For example, a left dominant FA A in females was significantly associated with lower likelihood of remission to an SSRI, whereas no effect for FA A was found for the SNRI venlafaxine. Assuming all groups are sufficiently statistically powered, such a lack of a difference is actually a blessing in disguise, clinically speaking, since, if a patient would present with right dominant FAA, would you prescribe that patient with an SSRI? Or rather consider the SNRI venlafaxine for this patient? Following this logic, the original simulation in MDD as well as the prospective replication yielded a 28-70% relative increase in remission rates Vinne et al., 2021 ).
In the case of triggering 'life-and-death' decisions, high sensitivity and specificity are crucial, since the prediction of 'life' should really outweigh the 'risk of death'. However, in stratified psychiatry the goal is to assign someone to one-ofmany established treatments, that on the group level have similar response and remission rates. Thus, even when a wrong prediction is made, no harm is done relative to the 'one-size-fits-all practice'. Therefore, knowing fairly sure someone will not respond to a given treatment, increases someone's chances to 'anything', except this treatment. Although not changing the probability of response to other treatment forms, the possibility to avoid a non-response increases the overall chance choosing a sufficient treatment within a non-endless space of treatment forms. Again, this also highlights the need for markers of different treatments and the corresponding probabilities for response and remission.
This can be further explained by the Monty Hall problem, derived from the game show 'Let's make a deal'. In the Monty Hall problem the contestant in a game show has to pick one out of three doors, with the initial probability of picking the Ferrari (aka achieving remission) being 33% (i.e. representing current one-size-fits-all practice, and actually quite close to true remission rates in MDD treatments as visualized in Fig. 1 ). Then the gameshow host opens one door showing a goat, with the question if the contestant wishes to change doors or not. Counter intuitively, the only correct choice is to switch doors, since with the new information provided by the host, the probability has increased to 67% after switching (e.g. see: ( Saenen et al., 2015 )). Similarly, in stratified psychiatry we leverage any available information to increase remission rates. In our case, stratification biomarkers are the 'new information' about one specific treatment brought into the equation, whereby the probabilities to achieve remission increase substantially in case of 3 options to 2/3rd odds, relative to the initial 1/3rd odds. Although it is not clear whether there actually is an effective treatment for everyone, this story teaches us that it would be a good choice to take any information that is available into account.

Conclusions & future directions
Ta k e n together, stratified psychiatry holds the possibility to increase response and remission rates in psychiatry without the need for the development of new treatments, but only by assigning the right, already approved treatment type, to a certain patient using a biomarker as an addition to the toolbox of the clinician. Ta k i n g both the outcome on a biomarker and other individual variabilities (e.g. symptom profile or sleep hygiene) into account would already allow psychiatry to make even more informed and personalized decisions. Moreover, this would allow psychiatry to overcome the trial-and-error approach with starting treatments, monitoring response and the need to switch after several weeks (e.g. taking the guesswork out of stepped care). However, although these outlines are promising and by far not restricted to electrophysiological research, several key aspects have to be addressed to make the paradigm-shift a reality. Above all, the need for a more standardized assessment of psychiatric symptoms and side effects is necessary to compare the predictive power of treatment modalities between studies and across different kinds of biomarkers (electrophysiological, imaging, genetic, etc.). This call for standardization is not only relevant for the psychometric dimensions, but also holds true for the markers themselves and the conditions under which they are assessed. While many biomarkers are suffering from missing replication, this topic can only be tackled when studies rely on standardized operating procedures for used assessment hardware (e.g. amplifiers), preprocessing steps, algorithms for marker calculation, assessment environment etc. This also calls for more collaboration and data-sharing. One such openaccess database consisting of > 1200 EEGs, clinical descriptors and treatment outcome data (TD-BRAIN, van Dijk under review) is actually available online at www.brainclinics. com/resources .
Another useful model we would like to propose is 'blinded data sharing', where researchers that have identified biomarkers, prospectively and blindly apply them to biomarker data from other research groups without knowing the patients' response status. The predictions can then be shared back with the original researchers who can confirm or reject the predictive power. Such 'blinded data sharing' has the advantage that researchers can still 'control' what is done with their data, and on the other hand can act as a true blinded out-of-sample validation, maybe constituting the biomarker-equivalent of what a double-blind placebocontrolled trial is for the evaluation of clinical treatments.

Disclosures
MA is unpaid chairman of the non-profit Brainclinics Foundation, a minority shareholder in neuroCare Group (Munich, Germany), and a co-inventor on several patents related to EEG, neuromodulation and psychophysiology, but receives no royalties related to these patents; Research Institute Brainclinics received research funding from neu-roCare Group (Munich, Germany), Brain Resource (Sydney, Australia), Urgotech (France), Neuroscience Software (US) and equipment support from Deymed, neuroConn, Compumedics and Magventure. GvW received research funding from Philips. The other authors have nothing to declare.

Declaration of Competing Interest
The authors declare that they have no known competing financialinterestsor personal relationships that could have appeared to influence the work reported in this paper.