Predicting mortality after hospitalisation for COPD using electronic health records

Background: Few prognostic models exist for patients hospitalised with chronic obstructive pulmonary disease (COPD); most are based on small cohorts enroled by specialists in academic centres. Electronic health records (EHRs) provide an opportunity to develop more representative models, although they may not record some variables used in existing models. Materials and methods: for this retrospective cohort study, using EHRs, we identified 17,973 patients with an unplanned hospitalisation for COPD (in any diagnostic position) in the Glasgow area between 2011 and 2017. Patients with known lung cancer were excluded. EHR were linked to prior admissions, community prescribing and laboratory data. A pragmatic, parsimonious multivariable model was developed to predict 90-day mortality. Results: we identified 12 variables strongly related to prognosis, including age, sex, length of index hospital-isation stay, prior diagnosis of cancer (excluding lung cancer) or dementia, prescription of oxygen or digoxin, neutrophil/lymphocyte ratio and serum chloride, urea, creatinine and albumin. The model achieved excellent calibration with reasonable discrimination (area under the curve: 0.806; 95% CI: 0.792 – 0.820). A risk-score was developed and an electronic risk-calculator is provided. Conclusions: a small number of variables, including prescriptions and laboratory data obtained from routine EHRs predict 90-day mortality after a hospitalisation for COPD. The risk-calculator provided might prove useful for service-evaluation and audit, to guide clinical management and to risk-stratify and select patients to be invited to participate in clinical research.


Introduction
Breathlessness is a cardinal symptom of heart and lung disease, leading to substantial disability and, when severe, may require hospital admission.Causes of breathlessness are diverse, including heart failure, asthma, chronic obstructive pulmonary disease (COPD), pneumonia, pneumothorax, interstitial lung disease and severe acidosis.Many patients will suffer from more than one of these conditions.
One of the most common chronic and recurrently severe causes of breathlessness is COPD, which accounts for more than 5% of admissions amongst adults in Scotland [1].Recurrent admissions are common and subsequent mortality substantial [2][3][4].There is no cure for COPD, but a range of treatments can improve symptoms, reduce the rate of exacerbations and improve prognosis.
Many prognostic models have been developed for COPD to predict the risk of exacerbations, in-patients mortality and longer term outcomes [5][6][7].Most models were based on fewer than 1000 patients with less than 200 events (median values for in-patient cohorts was 303 patients with a median number of just 67 events [5]), required pulmonary function test results and patient symptom questionnaires and were often developed either to predict functional limitations or long-term mortality.Many were considered to have substantial bias [5].Few substantial studies were designed specifically to assess medium-term prognosis in patients hospitalised with an exacerbation of COPD.Case selection bias was inevitable in these cohorts because only patients who were managed by specialists with an interest in COPD in academic centres were included.
Electronic health records (EHRs) offer an opportunity to include large numbers of patients, managed by a broad range of physicians.
EHRs also have limitations.They rarely include patient questionnaires on smoking habit, the severity of symptoms or quality of life and may not record the results of physical examination (including blood pressure and body mass index), pulmonary function tests, reasons for presentation or a complete list of comorbid disease.On the other hand, symptoms, quality of life and pulmonary function will usually be poor in this setting, rendering such results of lesser importance.Accordingly, we used an EHR, including hospitalisations, prescriptions in primary care and laboratory blood test results to predict 90-day all-cause mortality after an admission associated with COPD.

Patient population
The project was approved by the SafeHaven Local Privacy Advisory Committee reference GSH/18/RM/003.All admissions (SMR01 records) to hospitals served by NHS Greater Glasgow & Clyde with an ICD10 discharge code for COPD (J440, J441, J448, J449) in any diagnostic position, between 01/01/2010 and 31/12/2017 were identified.Only patients with a first event on or after 01/01/2011 were included in this analysis, with the first year of data (2010) being used to provide a prior medical history.Only the first admission with a record of COPD in this period was used.The date of discharge was taken as the index date.
The following exclusion criteria were applied; age < 18 years or > 100 years (there were few such patients and some recorded ages appeared improbable); deaths during the index admission; admissions without a discharge date; planned admissions for investigations or procedures; patients with lung cancer (because their management and clinical course is different); patients with missing standard haematology and clinical biochemistry variables.
Unique patient-identifiers were used to link data from various sources.

Choice of variables for prediction models
A basic model was developed to determine the incremental value of collecting more complex medical data, including age, sex, quintile of social deprivation (Scottish Index of Multiple Deprivation; SIMD), and length of hospital stay.Data on ethnicity and marital status was not available.
Variables were then added to the basic model from three other sources (diagnostic codes, community prescriptions and laboratory variables) to determine whether each of these sources provided additional predictive value.Finally, data from all sources were added to the basic model to identify a parsimonious, pragmatic dataset to produce a model that retained most of the predictive information with good calibration.Recent medical history for each patient was based on emergency department visits and hospitalisations during the 90 days preceding the index admission using ICD-10 codes to attribute causes.
Current medications were determined from community prescribing records with a dispensing date during the 90 days prior to the index admission (in-hospital prescribing was not available).The following classes of medications were considered: inhaled bronchodilators, inhaled corticosteroids, positive inotropic agents, including digoxin, antiarrhythmic medications, beta blockers, hypertension and heart failure medications, lipid lowering drugs, oxygen, thyroid medications, potassium-sparing diuretics, loop and thiazide diuretics.
For laboratory data, the value closest to discharge was used and included haemoglobin, neutrophil and lymphocyte counts and their ratio, serum sodium, chloride, potassium, urea, creatinine, albumin and C-reactive protein.

Outcomes
The pre-specified outcome of interest was all-cause mortality within 90 days of discharge.

Statistical methods
Descriptive data are shown as numbers and percentage when categorical and as median with 1st and 3rd quartiles if continuous.
Univariate associations with index hospitalisation variables and each outcome were explored using odds ratios and cubic splines.A base model with four variables (age, sex, SIMD quintiles, and length of stay) was fitted.Medical history (binary predictors), prescription of medications (binary predictors), and laboratory data (continuous predictors, modelled as cubic associations, after appropriate normalising transformations) were each added to the base model to determine if they improved model fit.Within each group of predictors, models were fitted with all predictors, and forward and/or backward selection procedures, based on Akaike's Information Criterion, were used.The best-fitting models from each group of predictors were combined into a single model, and a backward stepwise procedure applied.Finally, the model was manually pruned based on the plausibility of the observed associations, the complexity of the model, and the statistical significance of each term, to generate a final model in which all associations were mathematically simple, clinically plausible, and highly statistically significant.The discrimination of the final model was assessed for overoptimism using a bootstrap cross-validation method.

Results
NHS GG&C serves approximately 1.1 million people, but 10,000-12,000 died each year during the study period and therefore the cumulative population over this period was about 1.2 million, of whom about 200,000 were aged ≥ 65 years.
Discharge codes for COPD identified 23,442 unique individuals (2% of the whole population but about 10% of older people).Very few patients were excluded because they were younger than 18 years or older than 99 years or because of death during the index admission (Consort Diagram, Fig. 1).However, 3279 patients only had a planned admission and 683 had a history of lung cancer on or before their index admission.One or more standard laboratory variable was missing for a further 1466 patients.
Bronchodilators had been prescribed at least once in the previous year for 74% of patients, inhaled corticosteroids for 55%, positive inotropic agents for 3% (most commonly digoxin) and loop diuretics for 22%.
The median duration of admission was 5 (IQR 2-8) days, with 16% spending more than 14 days and 7% more than 28 days in hospital.
Within 90 days after discharge, 1003 patients (5.7%) died.Mortality increased linearly with age from < 2% in those aged < 60 years to > 9% in those aged > 80 years and was higher in men (6.7%) than in women (4.8%) (Supplementary material 1).On univariate analysis, mortality was higher in those who were most affluent (7.5%) compared to those with the most severe socio-economic deprivation (5.2%), probably reflecting the older age of more affluent patients (Supplementary material 1).Mortality after discharge increased linearly with length of stay (Supplementary material 1).In the basic multi-variable model, age, sex and length of stay predicted mortality, but not social deprivation even after adjusting for age.The model was well-calibrated with an area under the curve (AUC) of 0.702 (95% CI 0.686-0.718)(Supplementary material 2).
Addition of diagnostic codes for hospital episodes in the previous 90days to the basic multivariable model showed that a history of cancer (lung cancer excluded) or dementia were strongly associated with a higher mortality at 90 days and hypertension with a lower mortality at 90 days.Model calibration remained good, and the AUC improved to 0.738 (95% CI 0.722-0.753)(Supplementary material 2).
Addition, to the basic multi-variable model, of information on treatment prior to admission found that prescription of oxygen, loop diuretics or positive inotropic agents (ie: digoxin) was associated with a higher mortality and prescription of statins or antihypertensive agents were associated with a lower mortality.Model calibration remained good, but the AUC (0.717 (0.701-0.732)), was little better than for the basic model (Supplementary material 2).
Addition, to the basic multi-variable model, of laboratory data (Supplementary material 2) found that haemoglobin, lymphocyte and neutrophil counts, and serum sodium, chloride, urea, creatinine and Creactive protein were independent predictors.Model calibration was excellent, and the AUC increased to 0.797 (95% CI 0.783-0.811)(Supplementary material 2).
The association between laboratory variables and outcome is shown graphically by cubic splines with 95% confidence intervals (shaded area; Supplementary material 1).Steep slopes with narrow confidence intervals are characteristic of powerful associations.There was a linear increase in mortality as blood haemoglobin concentration dropped below 14 g/dL with little evidence that higher levels posed greater risk (Supplementary material 1).Higher white blood cell and neutrophil counts and lower lymphocyte and eosinophil counts were associated with a worse prognosis (Supplementary material 1).
Mortality increased linearly as serum sodium concentration dropped below 140 mmol/L (Supplementary material 1) or serum chloride dropped below 105 mmol/L (Supplementary material 1) but higher concentrations of either were associated with a worse outcome.The relationship between serum potassium and 90-day mortality was Ushaped with a nadir between 4.0 and 4.5 mmol/L ((Supplementary material 1).The relationship between serum creatinine, a marker of renal function and lean body mass, and 90-day mortality was also Ushaped ((Supplementary material 1).Mortality increased in a log-linear fashion as urea, another measure of renal function, rose above 12 mmol/ L ((Supplementary material 1).There was a linear inverse relationship between albumin, a measure of hepatic synthetic function and prognosis (Supplementary material 1) and a linear direct relationship with CRP, a measure of inflammation (Supplementary material 1).
The full multi-variable model had good calibration and increased AUC to 0.812 (95%CI 0.798-0.826)(Supplementary material 2).Mortality was associated with 18 variables including greater age, male sex, longer length of stay, a history of (non-lung) cancer or dementia, prescription of oxygen, loop diuretics or digoxin, absence of prescriptions for hypertension, lower haemoglobin, lower lymphocyte count, higher neutrophil count and C-reactive protein, lower serum and chloride, higher serum urea and creatinine and lower serum albumin.
Reducing the model to the 12 most important predictors (age, sex, length of index hospitalisation stay, prior diagnosis of cancer or dementia, prescription of oxygen or digoxin, neutrophil/lymphocyte ratio and serum chloride, urea, creatinine and albumin) maintained calibration with only a slight loss of discrimination (AUC 0.806 (95% CI 0.792-0.820)(Table 2 and central illustration; the diagonal line on the left panel represents the line of equality, or perfect calibration).These 12 variables have been programmed into an Excel spreadsheet calculator for the prediction of 90-day mortality after an admission associated with COPD.

Discussion
This analysis shows that a small number of variables that can easily be gleaned from routine electronic health records predict 90-day mortality for patients with an admission related to COPD.Three variables, age, sex and length of hospital stay provided the backbone of the predictive model, including good calibration, with a 90-day mortality close to zero for the lowest decile of risk and 14% for the highest decile and fair discrimination (AUC 0.702).
Additional information on diagnosis and prescriptions made only a modest improvement in the model.However, addition of routinely collected laboratory variables improved discrimination substantially and identified 10% of patients who had a > 25% chance of dying in the next 90-days.Addition of information on co-existing cancer or dementia and on prescription of oxygen (only 161 patients) or digoxin (574 patients) made modest further improvements to the model.The final model included age, sex, length of stay and just nine other variables with the lowest and highest decile of risk ranging from zero to 25% with fair discrimination (AUC 0.806) between those who survived or died.This compares favourably with previous models that have found AUC ranging from 0.61 to 0.74 [5].
The characteristics of the population are of interest.There were more women than men, as expected from epidemiological studies, but women had a slightly better prognosis [8,9].A large proportion of patients admitted with COPD were in the quintile with the greatest socio-economic deprivation, as might also be anticipated from the epidemiology, lifestyle and risk factors for COPD.Surprisingly, deprivation was not associated with mortality [10,11].More affluent patients were much older but deprivation did not predict outcome when age was included in the multivariable model.This may reflect unmeasured confounders or the inability of our model to adjust adequately for age.A substantial proportion of patients with COPD also had diabetes, cardiovascular disease or cancer and more than half were taking medicines to lower lipids or treat hypertension or heart failure, including beta-blockers.Several observational studies have suggested that beta-blockers might reduce the risk of COPD exacerbation and death in those with moderate or severe COPD [12].However, a recent randomised trial of metoprolol in patients with COPD was stopped earlier than planned for futility and safety concerns [13].Nevertheless, heart failure guidelines recommend prescription of beta-blockers for patients with COPD with careful monitoring for evidence of airways obstruction [14].Prescriptions provide valuable insights into possible diagnoses that may not have been recorded on the discharge record [15,16].
The reasons for the index hospitalisation should be considered carefully.A diagnosis of COPD in any diagnostic position was sufficient for inclusion.Some patients will have been admitted for pneumonia or a pneumothorax and others for reasons not directly related to COPD, such as heart failure, cancer (at sites other than lung) or for general frailty.COPD might not always have been an 'active' problem precipitating admission.Ultimately, an admission due to COPD will encompass a broad range of presentations.Applying a narrower set of inclusion criteria, excluding patients with dementia or other cancers or the small number of patients on digoxin, might have resulted in a similar prognostic model, but with fewer variables.
Many prognostic models for a variety of medical conditions show that laboratory variables are often, along with age, the most powerful predictors of outcome.This might be partly explained by their relationship to disease but it probably also reflects the accuracy of the data.Diagnostic data depends very much on the clinician looking after the patient; many diagnoses will be missed or, at least not recorded.There will also be diagnostic error.Measurement errors due to human factors, either the patient's or health care professionals, render many tests less reliable.Most emergency admissions will have a standard set of blood tests done, regardless of the diagnosis.The completeness of such data adds to its power.
The haematology profile provides several prognostic markers.Anaemia may indicate iron deficiency and other serious underlying disease [17,18].Polycythaemia may indicate chronic hypoxia.The neutrophil/lymphocyte ratio may be a powerful marker of the inflammatory response to infection and other exogenous agents that might promote progression of the underlying lung disease [19,20].A low serum chloride may reflect neuro-endocrine and renal-metabolic stress.
A raised serum urea and creatinine reflect renal dysfunction, with a disproportionate increase in urea particularly bad news [21,22].This might be due to loss of lean body mass (sarcopaenia /cachexia) that reduces production of creatinine.Low serum albumin also indicates catabolism and is a strong predictor of adverse outcomes in several diseases [22].Ultimately, the model we have constructed is likely to be valid for patients with a great variety of chronic medical diseases and hence, there may be little advantage to applying a narrower definition of COPD.This idea and predictive model should be tested not only for COPD but also for patients with heart failure, chronic kidney disease and other long-term medical conditions.
The utility of such models are several.From the perspective of service evaluation, the model can be used for audit purposes to determine whether patient outcomes are better or worse than predicted.From a clinical perspective, stratifying risk can inform management, including deciding on the intensity of monitoring and treatment required or the appropriateness of palliative care.From a research perspective, identifying patients with modifiable risk is essential for outcome trials.Patients who have no events may suffer adverse effects from interventions with no prospect of benefit.Patients with very poor outcomes may be beyond the help of the intervention and are destined to die.Risk may be more readily modified when it is intermediate.
The model developed in this analysis is based purely on data routinely captured in Scottish EHR.This analysis provides proof of concept that these data can be used to derive risk predictions models, in a single large cohort of patients, using relatively simple statistical methods.Given sufficient data, collected at a national level over several years, it should be possible to develop such tools further, through the use of machine learning techniques, to stratify the risk of adverse events post-discharge with increasing accuracy, across a wider range of clinical conditions.With appropriate strategic support, such tools could be incorporated into EHR to provide an automated assessment of patient risk, based on "live" data.Assimilating "live" data would allow the model to evolve with changes in the population and clinical practice, refining predictive modelling still further and thereby improving patients management and outcomes.Our "prediction calculator" is a first important step in this endeavour.

Conclusions
A small number of clinical, prescription and laboratory variables obtained from routine EHRs predict 90-day mortality for patients with a hospital admission related to COPD.These variables have been programmed into an Excel spreadsheet calculator that could be used to audit clinical practice, to guide clinical management or for risk stratification.