Risk prediction of mortality for patients with heart failure in England: observational study in primary care

Abstract Aims Many risk prediction models have been proposed for heart failure (HF), but few studies have used only information available to general practitioners (GPs) in primary care electronic health records (EHRs). We describe the predictors and performance of models built from GP‐based EHRs in two cohorts of patients 10 years apart. Methods and results Linked primary and secondary care data for incident HF cases in England were extracted from the Clinical Practice Research Datalink for 2001–02 and 2011–12. Time‐to‐event models for all‐cause mortality were developed using a long list of potential baseline predictors. Discrimination and calibration were calculated. A total of 5966 patients in 156 general practices were diagnosed in 2001–02, and 12 827 patients in 331 practices were diagnosed in 2011–12. The 5‐year survival rate was 40.0% in 2001–02 and 40.2% in 2011–12, though the latter population were older, frailer, and more comorbid; for 2001–02, the 10‐year survival was 20.8% and 15‐year survival 11.1%. Consistent predictors included age, male sex, systolic blood pressure, body mass index, GP domiciliary visits before diagnosis, and some comorbidities. Model performance for both time windows was modest (c = 0.70), but calibration was generally excellent in both time periods. Conclusions Information routinely available to UK GPs at the time of diagnosis of HF gives only modest predictive accuracy of all‐cause mortality, making it hard to decide on the type, place, and urgency of follow‐up. More consistent recording of data relevant to HF (such as echocardiography and natriuretic peptide results) in GP EHRs is needed to support accurate prediction of healthcare needs in individuals with HF.


Introduction
Heart failure (HF) is a complex condition that affects around 900 000 in the United Kingdom and more than 26 million people worldwide. 1 The prevalence is growing 2 ; the UK incident and prevalent case numbers are rising, 3 leading to increasing pressure on health services. One approach to offset the impact of HF on health resource use could include the use of a risk prediction model. 4 HF is a highly heterogeneous condition in its presentation and prognosis; the majority of cases in England are now diagnosed via an emergency hospitalization, and a notable proportion of these admissions end in death. 5 A better understanding of the risk of poor outcomes is likely to assist in identifying high-risk patients early, providing opportunity for more timely effective management and better targeting of costly treatments. [6][7][8] This could result in avoidance of healthcare costs as well as improved shared decision-making between patients and clinicians.
Most HF patients in the United Kingdom are managed in primary care by general practitioners (GPs). However, limitations in access to investigations and specialist assessment, as well as overlap with other clinical conditions, mean that GPs face diagnostic uncertainty for patients whom they suspect have HF, which can contribute to difficulty choosing and initiating treatment. [9][10][11] Many publications have proposed models for predicting death or other outcomes for different subgroups of HF patients, but significant variations in statistical methodologies and a lack of robust testing for biases have meant that no single risk model has been chosen. 7,12 A robust risk model should address both model discrimination and calibration, with calibration especially important as it reflects how well a model estimates absolute risk among patient groups. Risk models are based on population averages and can therefore be less accurate in estimating risk for some patient subgroups or outliers. 13 As a result, it can be challenging for clinicians to appropriately identify who to use risk models for, as the consequences of applying risk estimates to an individual whose risk is not accurately reflected by the model could result in suboptimal treatment or even death. 13,14 Despite this, few publications on risk models report on model calibration or the handling of missing data. 6,7,12 A recent systematic review of HF risk models for outcomes including death for patients in out-of-hospital settings found the majority of published models to have either a moderate or high risk of bias that could lead to misappropriate allocation of scarce resources. 12 Di Tanna et al. applied the PROBAST tool to critically review each study's participant, predictor, outcome, and analysis selection. 12,15 Their commonest concerns were a risk of overfitting and no or limited validation. They called for more research into risk estimation in primary care and high-risk or secondary prevention settings.
Most HF risk models use trial data on hospitalized patients, which may not reflect most HF patients, particularly those aged over 75 years. 16,17 Most 'real-world' HF patients are elderly, have multiple comorbidities, and are managed in primary care, 16 so utilizing real-world datasets may provide more generalizable alternatives 18 ; primary care datasets have the advantage of being representative of the large majority of patients managed in this setting. Additionally, younger HF patients could benefit from risk models for longer term survival beyond the commonly reported 1-or 5-year range to empower clinicians and patients to make the required lifestyle decisions. 12 With increased demand-supply mismatch for diagnostic testing, specialist input, and procedures in many healthcare systems after pandemic, assisting healthcare professionals with risk estimation tools could be important. This study aims to provide all-cause mortality risk predictions for up to 10 years after diagnosis and its predictors in two UK community-dwelling cohorts, 10 years apart, to better understand the extent to which electronic health records (EHRs) can guide GPs and HF patients in shared decision-making and to further the debate about the value (or lack thereof) of such approaches. We note which predictors were consistent across models.

Data
The Clinical Practice Research Datalink (CPRD) is a database of pseudonymized electronic records from about 12% of UK general practices from 1987 to the present, considered broadly representative of the UK population. 19 Primary care records are linked nationally for English practices to hospital admissions [Hospital Episode Statistics (HES)] and the death registry (Office for National Statistics).

Cohort definition
A proxy HF diagnosis date was defined as the earliest mention of an HF clinical code (see Supporting Information, Appendix S1) in either primary or secondary care data. HF diagnosis dates could be in calendar years 2001-02 ('Cohort 1') or in calendar years 2011-12 ('Cohort 2'). We defined 'diagnosis setting' as whether the first HF record was in primary or secondary care. As per our previous work with CPRD, 5 we ran sensitivity analyses using the first date of GP prescription of a loop diuretic as the HF diagnosis date, recognizing that some GPs will treat the symptoms of patients with suspected HF before or while formally investigating for HF.

Predictors
A long list of potential predictors was derived by literature review, taking into account what was recorded with at least 1% prevalence in CPRD. Predictors included in all models compared in the test set were as described below. Gender and baseline age were modelled as a pair of cubic splines for the two genders, with reference points at ages 50, 60, 70, 80, and 90. Other predictors were as follows: body mass index (BMI) (kg/m 2 ), systolic blood pressure (BP) (mmHg), and electronic frailty index 20 based on polypharmacy in the previous 1 year and other deficits over the previous 5 years (modelled as cubic splines interactive with age); Index of Multiple Deprivation (IMD, an area-level socio-economic status measure) 2010 twentile (modelled as an additive cubic spline); cigarette smoking category (lifelong non-smoker, exsmoker, <10 cigarettes/day, 10-19 cigarettes/day, or 20+ cigarettes/day); diabetes status (none, Type I, or Type II); ethnicity from HES (White, non-White, or unknown); and a list of binary predictors. The binary predictors were defined using CPRD records in the previous 5 years, CPRD records in the previous 1 year, HES records in the previous 1 year, or combined CPRD and HES records in the previous 5 years. The CPRD binary predictors measured over the previous 5 years were as follows: comorbidities (atrial fibrillation, arrhythmia other than atrial fibrillation, hypertension, renal diseases, myocarditis, acute myocardial infarction, congenital heart disease, coronary heart disease, chronic pulmonary disease, stroke, and peripheral vascular disease); widowed or bereaved; and recorded HF symptom presence (breathlessness/shortness of breath/shortness of breath on exertion, fatigue, and ankle swelling). The CPRD binary predictors

Statistical analysis
Each cohort was split randomly and equally into training and test sets, each with 174 practices. We fitted models in the training set and compared their performance (in terms of discrimination and calibration) in the test set. A series of survival regression models were fitted to the data in the training set and used to predict k-year survival probabilities for k from 1 to 10 from a list of baseline covariates available at diagnosis time. Models were Gompertz (with intercept parameters equal to baseline hazard rates, effect parameters equal to hazard ratios, and a per-year exponential hazard rate trend parameter), Weibull (with intercept parameters equal to Year 1 death rates, effect parameters equal to hazard ratios, and a time power hazard rate trend parameter), or Cox (with no intercept and an arbitrary baseline hazard). We used cubic reference splines for continuous covariates, defined using the 'polyspline' Stata add-on package, which implements the methods of Newson, 21 and level indicators for binary and other categorical factors. Imputation of missing values for continuous variables was done using a cubic spline model in gender and age to impute missing BMIs and systolic BPs, and imputing missing IMD twentiles to twentile 10. This practice is similar to that used by QRISK 22 and is a lot less computer intensive than multiple imputation. For model discrimination, Harrell's c indices of Year 1 survival probability with respect to survival were estimated using the add-on 'somersd' package (Newson 23 ) to compute confidence intervals (CIs) and P values clustered by general practice. Calibration was measured using decile plots for k-year survival probability for k from 1 to 10. For each model, we divided the whole cohort, and the sub-cohorts for the submodels, into deciles of k-year survival probability. For each decile, we defined predicted k-year survival probability as the mean k-year survival probability for that model for that decile and estimated observed k-year survival probability using the Kaplan-Meier curve for that decile, with normal-theory bootstrap CIs, using the conventional normalizing complementary log-log transform and clustered by general practice. The predicted k-year survival probabilities, and the observed k-year survival probabilities with confidence limits, were plotted against risk decile to form a decile plot. Stata Version 16 was used throughout.

Results
Heart failure diagnostic codes were identified for 5966 patients in 156 practices in Cohort 1, and 12 827 patients in 331 practices were diagnosed in Cohort 2. Diagnoses could be reported first by CPRD in a primary care setting (6964 patients in 342 practices) or reported by HES in a hospital setting (11 829 patients in 347 practices).
Tables 1A-1C show that, compared with Cohort 1 patients, Cohort 2 patients had a greater proportion aged 85+, lower BP and cholesterol, and greater comorbidity (especially atrial fibrillation and diabetes; the estimated glomerular filtration rate and renal disease were poorly recorded in Cohort 1) and more likely to live alone. They also had greater use of beta-blockers and RAS medications. Standard error inspection and the usual tests did not show problems with multicollinearity. Model performance in the test set for each cohort was at best fair (c = 0.70 overall, with some variation by cohort and diagnosis setting: Table 2). We also examined whether discrimination was different for shorter follow-up lengths after diagnosis (i.e., within 3, 6, and 12 months of diagnosis) than for 60 months, and it was not (differences in c statistics were <0.01; data not shown).
Calibration was good. Fitting four separate sets of hazard ratios, one per combination of cohort and diagnosis setting, reduced model performance; calibration was superior for the Cox over the Weibull and Gompertz models in the test set. Figure 1 gives the overall test set calibration for the final Cox model; Supporting Information, Appendix S1 gives the same plots for the Weibull and Gompertz. Calibration was very similar for each cohort and diagnosis setting (not shown). The model showed some overestimation in low-risk patients (risk deciles 1-4) in the first year since diagnosis but otherwise fitted well.
The full sets of hazard ratios for all-cause mortality since HF diagnosis, split by cohort and diagnosis setting, are given in Supporting Information, Appendix S1. Given the large size of these four tables, we have summarized the statistical significance and effect sizes of the predictors in Table 3, with hazard ratios of <0.80 or >1.20 and P < 0.05 marked as 'YY'. Hazard ratios closer to 1 but with P < 0.05 are marked as 'Y'.

Summary of main findings
In our analysis of primary care EHRs linked to administrative hospital data, we found that the best fitting model was Cox with baseline hazards stratified by cohort and diagnosis setting. Model discrimination was modest, with a c statistic of around 0.70. Despite some overprediction in low-risk patients in the year after diagnosis, calibration was very good. The 5-year survival was 40% in both cohorts overall. Mortality predictors were largely unchanged over the 10 years separating the two cohorts, with factors such as older age, male gender, higher BP, BMI (higher risk for BMI under 25 in older people), home visits before diagnosis, smoking, and some comorbidities (renal disease, COPD, and peripheral vascular disease) consistent across cohorts and diagnosis settings.

Comparison with previous studies
A 2020 systematic review 12 found 58 risk prediction models for HF patients, with various outcomes: 'the discriminatory ability for predicting all-cause mortality, cardiovascular death, and composite endpoints was generally better than for HF hospitalization. 105 distinct predictor variables were identified. Predictors included in >5 publications were: N-terminal prohormone brain-natriuretic peptide, creatinine, blood urea nitrogen, systolic blood pressure, sodium, NYHA class, left ventricular ejection fraction, heart rate, and characteristics including male sex, diabetes, age, and BMI'. Our data had the last four of these and systolic BP, and we also found them to be consistent predictors. In the all-cause mortality models, discrimination ranged from 0.66 to 0.84, but in the five studies on non-randomized controlled trial chronic HF patients in the community and therefore relevant to our study, the range was 0.68-0.74, comparable to our models.
The inverse association between use of mineralocorticoid receptor antagonist and outcome has been reported before on several occasions in other real-world datasets. 24,25 It is presumably because the use of such agents (much lower than almost universal use of at least some dose of RAS inhibitor or beta-blocker) identifies a particularly high-risk subgroup of patients with substantial comorbidity.

Strengths and limitations
Clinical Practice Research Datalink is broadly representative of the general UK community-dwelling population and gives a broad representation of the data GPs have available to them in making decisions. Its linkage to HES and the death registry means national coverage for those outcomes. CPRD data are entered by GPs during routine consultations and not for the purpose of research, and it is recognized that coding quality is variable. 26 However, in a validation study with CPRD in 2001, questionnaires concerning 1200 patients flagged as having HF were sent to GPs to confirm the diagnosis. Of the 1146 returned, in only 72 patients did the GP not confirm the diagnosis. A further 136 reported a history of HF that was not recorded in the computer file. 27 Other CPRD limitations include missing values, which we imputed using splines rather than the more computer-intensive multiple imputation. This was a pragmatic choice given the size of the dataset and number of predictors and is similar to the QRISK algorithm approach. 22 Most patients had no BNP or HF type recorded; only 19/58 models in the 2020 systematic review reported the HF subtype. 12 The ejection fraction and, for Cohort 2, BNP values were only recorded in a small minority of patients. Ejection fractions are determined via echocardiography in hospital and, to be recorded in primary care data, need to be sent to the practice and entered and coded correctly by practice staff. It is not clear why, even in those patients who did receive a BNP test in Cohort 2, few had their results in the primary care records in the same way as other blood test results. It seems reasonable to conclude that, although these data may be available to GPs within the EHR, in the body of hospital letters appended to EHRs as attachments, for example, they are likely to be less than immediately visible and so are inaccessible to automated risk prediction tools or decision support systems. We used the first recorded mention of HF as the date of diagnosis. For some GPs, HF may be a working diagnosis, which they may or may not record formally, although they carry out confirmatory investigations. Consequently, we undertook a sensitivity analysis using the date of first loop diuretic prescription to indicate the date when the GP first suspected HF that was only confirmed later. This did not change the findings meaningfully.
Risk assessment tools need to be updated over time to account for therapy advances, but to study longer term prognosis requires a significant time lag, as here. This inevitably means that covariates measured at baseline-when the GP made the initial management decisions-reflect clinical practice at that time, which in our study was 2001-02 and 2011-12. Despite this, it is notable that overall 5-year mortality and which predictors were significant changed little during that time. It should be remembered that the associations between predictors and the outcome that were found in our analysis are not necessarily causal.
Our aim was to fit models with information available to clinicians at the point of diagnosis. This therefore precluded the use of time-varying covariates to capture changes after diagnosis. For continuous variables with multiple measures, we took the measure just before diagnosis, but other options are available that aim to capture the variation or trends in these variables. These include two-stage approaches involving a linear mixed model and then a survival model (the predicted BP trajectory is plugged into the survival model), and a joint longitudinal and survival model. A review found six approaches but without consensus on which to use 28 ; machine learning was one and could also have been applied to our data and modelling framework instead of Cox regression, for instance, using random forest survival analysis. There has been a lot of discussion around machine learning for prediction with EHRs, with the potential for superior prediction. 29 We have reported mortality, but of course poor outcomes aside from death may affect patients with HF, and, like everyone else, they also value the ability to work, to travel, and to socialize. CPRD lacks this information; we report elsewhere the risk of emergency hospitalization. 30

Conclusions
Electronic health records in primary care can accommodate sophisticated risk prediction algorithms, which are likely to be of significant value in supporting clinicians in identification of high-risk individuals among their case load, offering timely and appropriately targeted management, and providing detailed risk information to patients to support evidence-based shared decisions about care. However, although our models for predicting all-cause mortality calibrate well, with some consistent predictors across our two cohorts, they have modest discrimination using only the information available to GPs at the time of HF diagnosis and initial decision-making. This makes it hard to use these models to support decisions on the type, place, and urgency of management and follow-up. This work highlights the need for better recording of key metrics such as ejection fraction and BNP levels in GP EHRs; consistent coding of these data will support the development of effective prediction models likely to be of significant value in identification of at-risk patients and informing conversations with patients and their families around prognosis, care, and treatment options.   BMI, body mass index; BP, blood pressure; CCS, Clinical Classification Software: these variables refer to pre-diagnosis admissions for the specified diagnoses; COPD, chronic obstructive pulmonary disease; eFI, electronic frailty index; GP, general practitioner; HF, heart failure; HR, hazard ratio; IMD, Index of Multiple Deprivation; STD, sexually transmitted disease; TB, tuberculosis. Key: + = P < 0.05 but 0.80 < HR < 1.20; ++ = P < 0.05 and (HR ≤ 0.80 or HR ≥ 1.20); À = P < 0.05 but 0.80 < HR < 1.00; ÀÀ = P < 0.05 and HR ≤ 0.80; else blank.