Using Electronic Health Records to Facilitate Precision Psychiatry

,

https://doi.org/10.1016/j.biopsych.2024.02.1006 Precision psychiatry is a data-driven approach that is designed to support the delivery of more personalized mental health care.Clinical prediction models that produce individual-level risk estimates can facilitate this approach (1).They have informed clinical decision making in oncology (2,3), cardiology (4), and primary care (5,6), thereby leading to more effective, efficient care and improved outcomes.
Within the precision psychiatry paradigm, electronic health records (EHRs) are key to advancing clinical prediction models in psychiatric settings because they provide a platform to develop, validate, and implement models using routinely collected, realworld clinical data on a large scale.Using EHR data for clinical prediction models has strengths (real-world data, routinely collected, richly detailed, readily available big data, long-term follow-up) but also has limitations (reflects existing biases; lack of standardization; idiosyncratic, inequitable access; risks of data leaks), which are outlined in more detail in Box 1.
There are several points regarding performance that need to be considered before discussing clinical impact.We need clinical prediction models that are validated not only internally (model's performance assessed within the population on which it was developed) but also externally (performing well in a setting different from the one in which it was developed).Good external validation performance suggests that it will perform well on new data from new settings (7)(8)(9) but is currently underperforming (10,11).It is important to evaluate a clinical prediction model to understand its potential impact on real-world care.Discrimination, calibration, and clinical utility are key considerations when evaluating a clinical prediction model (Box 2).Therefore, a good clinical prediction model can distinguish between individuals with (currently or in the future) and without the outcome of interest (discrimination), produce risk estimates that have good agreement with observed risk (calibration), and show superior potential net benefit over gold-standard or other approaches (clinical utility) in external validation.
Implementing clinical prediction models that leverage EHRs represents a pragmatic step forward in using wider sources of data to improve research and health care.This paper outlines 4 varied use cases, which were chosen to best illustrate a range of potential roles for EHR-based clinical prediction models for severe mental disorders at different steps in the care pathway.These use cases have demonstrated good performance in external validation and seem to be suitable for implementation in existing clinical settings.Respectively, they involve clinical prediction models for identifying individuals at risk of psychosis, evaluating hospitalization risk following discharge from early intervention services, assessing cardiometabolic risk following a first episode of psychosis (FEP), and screening for suicide risk in severe mental disorders.

USE CASE 1: IDENTIFYING INDIVIDUALS AT RISK OF PSYCHOSIS Problem
Identifying people when they are at high risk of developing psychosis provides a unique opportunity for illness prevention (12)(13)(14)(15), and over the past 25 years, this approach has been implemented through the assessment of adolescents and young adults at clinical high risk for psychosis (CHR-P) (16).However, these individuals can be difficult to identify and engage.Thus, even when clinical early detection services are well resourced, only a small minority of people who develop an FEP have previously been engaged by these specialist services (17,18).In fact, people with FEP are more likely to have been seen initially by generic secondary mental health teams and emergency departments (18).This presents an ascertainment opportunity because individuals who contact other mental health teams may have EHRs that contain demographic and clinical information.Thus, screening EHRs provides a way to identify people who are at risk of developing

Routinely Collected
No requirement for collecting additional data because it is all collected as part of daily clinical interactions.This reduces labor and economic costs, while avoiding additional burden and potential outcome contamination driven by asking clinicians to record a specific outcome.

Richly Detailed
As well as containing structured data on sociodemographics, diagnoses, medication, and laboratory test results, NLP can be used to extract information (e.g., symptoms, substance use, medications) from unstructured free text (e.g., clinical notes and letters) (34).These data can be further expanded through linkage to other sources (e.g., census data, research data).

Readily Available Big Data
EHRs are already available in many countries [90% coverage in the European Union (94)], although different systems are used in different countries.
There is therefore no need for prospective data collection, which is time-consuming and burdensome for patients and their clinicians.As a result, the datasets available are large, presenting a promising opportunity for development and validation of generalizable clinical prediction models.
Long-Term Follow-Up EHR data can span years instead of months of follow-up, usually seen in randomized controlled trials.This provides greater certainty in outcomes, particularly when asserting the absence of an outcome.

Reflects Existing Biases
Biases in health care provision, where vulnerable groups receive suboptimal or no treatment will be reflected in any model developed using EHR data (72,73).For example, a clinical prediction model predicting future insurance costs systematically discriminated against millions of Black patients.At any given score, Black patients were substantially sicker than White patients (90).Care is therefore needed when considering predictors, particularly when they are proxy measurements.

Lack of Standardization
Recording of data are rarely standardized, and some key variables (e.g., diagnoses, ethnicity) are not routinely recorded as structured data in all EHR systems (91).This can lead to differences in how these variables are recorded between sites.Similarly, because data entry is not standardized, there can be substantial data missingness.However, the use of NLP can mitigate this by identifying these data in clinical notes and letters (34).

Idiosyncratic
Biomarker-based clinical prediction models are attractive because they may better reflect the underlying etiopathology of the disorders (72,92).Many have been developed, but it is rare for models solely using biomarker data to be externally validated (10,93).In contrast, EHR data are designed to capture clinically relevant information rather than data related to the mechanisms underlying disorders.
FEP but have presented to mental health services with what seemed to be other problems and may benefit from more specialized preventive care from CHR-P services.(18).This indicated that if a case (an individual who developed psychosis) and a control (an individual who did not develop psychosis) were selected from the population at random, the case would have a 79% chance of having a higher risk score.Miscalibration was minimal, with only slight underestimation of risk, particularly at lower levels of observed risk.The risk calculator showed good clinical utility, with net benefits seen for numbers needed to test between 1 and 99 compared with the defaults of treating all or treating none.This suggests that if missing the outcome of interest is equally or up to 99 times more harmful than an unnecessary intervention, then there is clinical benefit in using the model (22,23).The transdiagnostic risk calculator maintained discrimination performance in further external validations in EHR datasets from other sites in the United Kingdom (C = 0.73-0.79;n = 13,702-33,710; 490-868 events) (24,25) and from the United States (IBM MarketScan Commercial Database; data from multiple, geographically dispersed U.S. states, from individuals covered by employer-sponsored health insurance plans; C = 0.68; n = 2,430,333; 24,941 events) (26).There were no major calibration issues, except in the U.S. external validation (24).This external validation performance indicates that the model is transportable and is likely to perform well in new settings outside of the one it was developed in following recalibration, a crucial consideration in the implementation of a clinical prediction model (27).

Approach
The transdiagnostic risk calculator was the first risk prediction model in psychiatry to test its feasibility for prospective use.An initial in vitro phase was used to navigate barriers to implementation (28), assess the model's acceptability, and integrate the risk calculator into a local EHR system (29).Following this, every individual who received their first nonorganic, nonpsychotic ICD-10 mental disorder diagnosis was screened using the model over the course of 1 year.If an individual was estimated to have a 5% risk of developing psychosis within 2 years, their clinician was contacted, and a CHR-P assessment (30,31) was recommended.Clinician acceptability, measured as the proportion of clinicians who responded to the recommendations of the risk calculator, was high (77%) (32).
The transdiagnostic risk calculator has since been refined, with the addition of 14 symptom and substance use predictors (33), extracted automatically from free-text clinical notes and letters using natural language processing (NLP) algorithms (34).This improved its discrimination to C = 0.85 in external validation (SLaM, n = 63,854, 1662 events).Further work is needed to test the performance of this refined model in other settings.Additional refinements have been made to the model Typically measured by Harrell's C: the proportion of randomly selected cases who receive a higher risk score than randomly selected noncases.
There is no strict cutoff for discrimination that is good enough because this depends on the context of the clinical use case and on the available alternatives (88).

Calibration
Clinical prediction models should not only discriminate well but also provide accurate risk estimates, which are assessed by calibration.Calibration assesses the relationship between predicted probabilities and observed risk proportions (89).Miscalibrated models result in over-or underestimation of risk, i.e., a model may discriminate well between those with and without the outcome, but probabilistic estimates of absolute risk may be systematically off-target, which is important if this is being communicated to the patient or used for the clinical decision.A miscalibrated model can lead to patients being misinformed about their true risk and may also have treatment implications: a patient may be recommended an unnecessary intervention (overestimation) or not receive care that is needed (underestimation) (89).Therefore, calibration is essential to prevent potential harm caused by the overestimation or underestimation of risk.

Clinical Utility
Measured by net benefit.Allows us to weigh the benefits and costs associated with using the model.Net benefit is compared to reference strategies (e.g., treat all, treat none, or the current gold-standard approach) across different assumptions of the number needed to test (e.g., a number needed to test/treat of 10 equates to an odds of 1:9, indicating that missing the outcome of interest once is 9 times worse than an unnecessary intervention) (22,23).Therefore, it considers a range of preferences for whether you are more worried about missing the outcome or giving an unnecessary intervention to evaluate potential clinical benefit.The basic version of this model is freely available online at http://psychosis-risk.net, and its use as part of a digital platform is shown here (29).

USE CASE 2: EVALUATING HOSPITALIZATION RISK FOLLOWING DISCHARGE FROM EARLY INTERVENTION IN PSYCHOSIS SERVICES Problem
Early intervention in psychosis (EIP) services typically provide treatment for 2 to 3 years (37).Clinicians may then decide to discharge individuals to primary care or to a generic mental health team.In the 2 years following discharge, one-third of individuals discharged to primary care are subsequently referred to generic mental health teams, and 12% will be hospitalized (38).During this same period, 35% of those discharged directly to generic mental health teams are hospitalized (38).Both clinicians and people with psychosis have raised concerns about the unpredictability of outcomes after the completion of EIP care (39).

Approach
A clinical prediction model was developed (Oxford Health NHS Foundation Trust; n = 831; 79 events) and externally validated (West London NHS Foundation Trust; n = 1393; 162 events) to predict admission to an inpatient psychiatric unit within 12 months of discharge from EIP services (40).The primary outcome was hospitalization within 12 months of discharge.The model was developed using logistic regression analysis, including 8 literature-based predictors (age at discharge, gender, ethnicity, social deprivation, diagnosis prior to discharge, duration of EIP care, number of previous admissions to a psychiatric hospital at discharge, and having a diagnosis of a substance use disorder).
The model was internally validated through bootstrapping, with a resulting discrimination of C = 0.76.Discrimination declined slightly (C = 0.70) in external validation, and calibration was similar to the development dataset, with underestimation of risk at lower observed risk (5%-10%) and overestimation in the low-to-mid observed risk range (10%-20%).This suggests that the model may be generalizable, at least to other EIP services in England, although it may be slightly overfitted due to the limited number of admission events and small sample size in the derivation dataset.Decision curve analysis demonstrated a net benefit of using the prediction model over treating all, treating none, and clinician discretion for a range of numbers needed to test between 2 and 5.

Expected Benefits and Remaining Challenges for Clinical Implementation
Although the feasibility of implementing this model clinically has yet to be assessed, by providing an estimate of the level of risk for relapse, it could be used to inform decisions about whether an individual's subsequent management is likely to require a mental health team as opposed to management in primary care.Those at greatest risk could then be stratified to more intensive follow-up from mental health services, while those at lower risk could be offered monitoring in primary care, thereby better targeting resources.EIP services save £4075 per person through avoiding hospitalization (41), which could potentially be extended through the use of this model.Implementation in new settings could be challenging as noted above; the sample size and event number are relatively low, meaning that model performance and stability may not be optimal.Collaboration across EIP services may refine the model for future implementation.Moreover, due to differences in service configuration, this model may not generalize to international settings.

Problem
Cardiometabolic disorders, such as type 2 diabetes and cardiovascular disease, are highly prevalent in people with psychotic disorders (42,43) and contribute to a reduced life expectancy of 10 to 15 years compared with the general population (44).The average age of psychosis onset is 20.5 years (45).By the time people with psychosis are in their 40s, up to 15% will have already died, mostly as a result of comorbid physical illness (46).It is possible to detect liability to cardiometabolic disorders early through metabolic syndrome, a group of traits including altered glucose-insulin homeostasis, adiposity, and hypertension.Metabolic syndrome is similarly highly prevalent in young people with psychosis (47), with traits emerging at least from psychosis onset and exacerbated by antipsychotic medications (48,49).Most of the existing clinical prediction models for predicting poor cardiometabolic outcomes were originally developed for use in the general population, in whom cardiometabolic dysfunction typically emerges in middle to older adulthood (50) This high discrimination performance has been maintained in subsequent external validations in Switzerland (C = 0.73; n = 558; 103 events) and Spain (C = 0.72; n = 466; 66 events) (52), highlighting the potential for international transportability.There was some evidence of miscalibration with overprediction (Switzerland) and underprediction (Spain) in higher predicted probabilities.A decision curve analysis across all validations indicated that an additional 30% to 46% of metabolic syndrome cases could be detected through the use of PsyMetRiC (51,52).

Expected Benefits and Remaining Challenges for Clinical Implementation
EIP services could use PsyMetRiC to identify individuals who are at particular risk of developing cardiometabolic morbidity as a result of their psychotic disorder and its treatment.Currently, annual costs for physical morbidity in psychotic disorders are around £2413 per person, which represents more than half of the total amount that the NHS spends per person on psychotic disorders (53).Therefore, the use of PsyMetRiC could substantially reduce associated costs for the treatment of psychotic disorders.The use of NLP to capture predictor data, such as smoking status, from clinical notes may help automate screening procedures.Clinical measures designed to minimize physical morbidity, such as interventions that target smoking, alcohol use, diet and exercise, and the selection of medications that are not strongly associated with metabolic side effects could then be preferentially offered to this subgroup.Larger datasets could improve the performance of PsyMetRiC by enabling additional predictors (e.g., diet and other lifestyle behaviors), refinement of existing predictors (e.g., more granular representation of ethnicity), and/or the development of more sophisticated modeling strategies (e.g., to account for antipsychotic switching early in treatment).

USE CASE 4: SCREENING FOR SUICIDE RISK IN SEVERE MENTAL DISORDERS Problem
The risk of suicide in severe mental disorders is high, approximately 17 to 20 times higher in people with schizophrenia (54) or bipolar disorder (55) than in the general population.Therefore, an accurate assessment of suicide risk is an important part of routine clinical care (56)(57)(58).Such assessments can form a valuable component of initial assessment upon service entry, identifying potentially modifiable factors and providing guidance for more intensive interventions for patients at higher risk (59,60).Despite this, there are no specific clinical prediction models for patients with severe mental disorders (61).

Approach
The Oxford Mental Illness and Suicide model (OxMIS) was developed using linked Swedish registry and EHR data (n = 58,771; 494 events) to estimate the 1-year risk of suicide in individuals with schizophrenia spectrum or bipolar disorder (62).It uses multiple sociodemographic and clinical predictors, including male sex, age, previous violent crime, previous drug use, previous alcohol use, previous self-harm, education level, parental drug or alcohol use, parental suicide, recent antipsychotic treatment, recent antidepressant treatment, current inpatient status, length of first inpatient stay .7 days, number of previous episodes .7, receiving benefits, parental psychiatric hospitalization, and comorbid depression (62).Discrimination performance was demonstrated in 2 external validations in Sweden (C = 0.71, n = 16,387, 139 events) (62) and Finland (C = 0.70, n = 137,112, 1475 events) (63).Calibration was generally adequate (62,63), with some evidence of overestimation of risk with predicted probabilities .5%, which applied to a very small proportion of the sample (1.3%) (63).However, this has been mitigated by setting 5% as the maximum possible risk level communicated with the screening tool (63).
The clinical feasibility of the model was explored in a study involving 38 clinicians in Spain (Barcelona and Sevilla) and China (Changsha).Clinicians stated that the model would be practical as part of a suicide risk assessment or treatment plan in 93% of cases, with 89% of clinicians stating that they would consider using it in the future (64).Half of these clinicians rated OxMIS as providing an accurate representation of suicide risk (64).However, no actual suicide data were recorded in this study, and therefore, it was not possible to compare this estimate with the true incidence, and an optimism bias is to be expected.A recent systematic review reported that unstructured clinical approaches were associated with a sensitivity of 31% for future suicidal acts, meaning that there are a high number of false negatives (individuals considered to not be at risk who later die by suicide) (65).OxMIS has higher sensitivities at 55% and 59% and a 0.5% false negative rate in external validations.Limiting false negatives ensures that all patients receive the relevant care they need.
Using OxMIS has been estimated to result in an overall saving of £250 to £599 per person with severe mental illness screened compared with a clinical assessment alone, with £662 per person saved by specifically excluding false negatives (66).These cost savings may increase further with automated predictor retrieval using NLP, which reduces the need for manual entry of predictor data from clinical notes and is feasible (67).

Expected Benefits and Remaining Challenges for Clinical Implementation
OxMIS can be used as part of a clinical suicide risk assessment.It could be particularly useful with people who have presented with psychosis for the first time because the risk of suicide at this stage is especially high (68).This could facilitate the early recognition of suicide risk, underscoring safety planning and tailoring of clinical management accordingly to minimize risk in vulnerable subgroups and guide resource allocation in services by excluding individuals who are at low risk (60).It can also provide an opportunity to transparently discuss suicide risks with patients and their family and/or caregivers.Developing a practical framework for interpreting OxMIS scores will require clinicians to interpret probability scores, as they do with QRISK and Framingham scores for cardiovascular risk and 5-year survival rates in people with newly diagnosed cancer.It will also require clear linkage to additional preventive measures, which will depend on effectiveness and service capacity.Finally, a checklist approach to risk assessment needs to be avoided, with OxMIS instead being part of a range of measures to augment clinical decision making.
OxMIS is freely available as a web tool at http://oxrisk.com/ oxmis.

DISCUSSION
These 4 use cases illustrate how EHR data can facilitate the precision psychiatry approach.Although our examples are varied in the populations of interest and outcomes predicted, they demonstrate good discrimination and calibration performance, as well as evidence of potential transportability through external validation, and have shown potential clinical utility (Table 1).
While a model's predictive performance is an important metric when evaluating its potential utility, a model with suboptimal performance at the individual level may still provide a net benefit on a population level over standard care, depending on the nature of the clinical scenario.For example, an algorithm for selecting patients for clozapine treatment from among those who had not responded to initial antipsychotic treatment with relatively low individual-level performance still resulted in 0.10 more quality-adjusted life-years and saved £7363 per person compared with treatment as usual on a population level (69).Such a model can continue to be refined and improved postimplementation ( 8), but if it performs too poorly, it can erode clinician confidence and obstruct effective implementation (28).Therefore, evidence of clinical utility is essential when considering a model for clinical use, in addition to discrimination and calibration (70).
The application of prediction models to EHR data may also facilitate recruitment to prospective studies and clinical trials by enabling prescreening for participants at scale and identifying individuals who are likely to receive greater benefit from an intervention being trialed.For example, to evaluate whether an intervention with CHR-P individuals reduces the risk of transition to psychosis, the sample recruited must subsequently yield a subgroup of participants that develops psychosis that is large enough to detect an effect.Using an EHRbased clinical prediction model to identify CHR-P individuals who have a greater risk of transition could allow the enrollment of a sample enriched for psychosis risk, thus reducing the sample size required.EHR data can also be useful in mitigating the effects of participants dropping out of prospective studies or clinical trials.If a participant is no longer available for a follow-up assessment, information about their clinical outcomes may still be accessible from their EHR if they have been in contact with clinical services (71).To facilitate this, pseudoanonymized trial participant IDs would need to be linked with local or national EHR IDs.
Ensuring that the use of clinical prediction models provides vulnerable subpopulations with equitable care is an important consideration, particularly because EHR data reflect the underlying biases inherent in the health care system (72,73).Individuals from vulnerable subpopulations (e.g., ethnic minorities; lesbian, gay, bisexual, and transgender individuals) may be underrepresented in training samples, which means that these clinical prediction models may make less accurate predictions in these groups and potentially entrench existing biases and unfairness in health care (74,75).Testing the performance of clinical prediction models across vulnerable subpopulations should be considered to identify any weaknesses in the model, which should be taken into account when considering any prospective clinical use (76), and studies are aligning with recommendations from the new STANDING Together (https://www.datadiversity.org/recommendations)collaboration for data diversity.Similarly, generalizability of model performance needs to be tested in different settings.External validation studies have rarely been conducted in psychiatry (5% of all developed models), with international external validation studies being even rarer and those performed in the Global South even rarer still (10).None of these models have been externally validated in the Global South, although work is ongoing.
Several barriers to the implementation of precision psychiatry using EHRs remain.First, EHR systems are not interoperable (77,78); data from one clinical service often have a different structure and coding than data from another service, particularly if there are different EHR providers (79).Countries with nationalized health systems have an advantage when considering large-scale implementation of clinical prediction models because the structure and coding of their EHRs are more likely to be similar across sites.For example, it is feasible to attain coverage of 96% of the population with EHR data for research in the United Kingdom (80).Furthermore, research measures, such as symptom severity or standardized outcome measures, are not consistently incorporated in EHRs (81), which limits the data available to use as predictors in models.There are ongoing programs that are seeking to address these issues through co-designed, integrated EHR and clinical decision support systems (82).Second, information governance and cybersecurity regulations are important, but they can be complex, and again, these typically differ between sites.Even across Europe, while General Data Protection Regulation is commonly used, its interpretation varies across countries (83).It is therefore essential to have local support at sites to champion implementation.Third, while NLP models can improve the performance or feasibility of automated screening, they are not 100% accurate, even with the most advanced models.This can lead to inconsistencies in NLP algorithms for the same concept at different centers, which adds noise to validation studies.Furthermore, there is a language bias because NLP algorithms are language specific and require additional work to allow international validation and Using EHRs to Facilitate Precision Psychiatry implementation.Finally, as modeling methodologies evolve and become more complex, while model performance may increase, so will complexity.This has implications for the transparency and interpretability of risk estimates (75) as well as required computing power, which could affect implementation.

FUTURE DIRECTIONS
Clinical prediction models may find innovative ways to use EHR data.Combining data from other sources (e.g., metaanalyses) or enhancing EHR data with information from research studies (clinical and biomarker data) are 2 approaches that could enhance our ability to improve the performance of clinical prediction models.
For example, PETRUSHKA (Personalise Antidepressant Treatment for Unipolar Depression Combining Individual Choices, Risks and Big Data) (84) aims to develop a model to predict the efficacy, acceptability, and tolerability of individual antidepressants, combining patient's preferences on side effects with individual-level patient data from both randomized controlled trials from a previously published network metaanalysis (85) and EHRs from U.K. primary care using a metalearner approach (https://www.psych.ox.ac.uk/research/ evidence-based-mental-health/petrushka-trial) (86).
Similarly, the Baseline Biomarker Check study aims to incorporate standardized clinical and cognitive assessments from patients with psychosis, along with imaging and peripheral blood measures into their EHRs.Complementing existing clinical data with additional measures may extend the use of EHR data to improve prediction of clinical outcomes, such as the use of genetic data in EHR-based prediction models in oncology (87).

CONCLUSIONS
EHRs provide a convenient platform to provide large-scale data required to develop and validate clinical prediction models, as well as the opportunity to implement them in situ and inform real-world clinical decision making.There are already several clinical prediction models that have shown good performance in this context and are well positioned for implementation and improving mental health care in the immediate future.

Box 1 .
Strengths and Limitations of Using EHR Data for Clinical Prediction ModelsStrengthsReal-World DataData are representative of all patients in the local setting and their ongoing care, not just those who are well enough to attend potentially burdensome research assessments.

Box 2 .
Definitions of Key Considerations for Clinical Prediction Model Performance: Discrimination, Calibration, and Clinical UtilityDiscriminationAbility to separate individuals with and without the outcome.
estimates dynamically in real-time as new clinical information on symptoms and substance use is recorded.