Delirium risk stratification in consecutive unselected admissions to acute medicine: validation of externally derived risk scores

Background: reliable delirium risk stratification will aid recognition, anticipation and prevention and will facilitate targeting of resources in clinical practice as well as identification of at-risk patients for research. Delirium risk scores have been derived for acute medicine, but none has been prospectively validated in external cohorts. We therefore aimed to determine the reliability of externally derived risk scores in a consecutive cohort of older acute medicine patients. Methods: consecutive patients aged ≥65 over two 8-week periods (2010, 2012) were screened prospectively for delirium using the Confusion Assessment Method (CAM), and delirium was diagnosed using the DSM IV criteria. The reliability of existing delirium risk scores derived in acute medicine cohorts and simplified for use in routine clinical practice (USA, n = 2; Spain, n = 1; Indonesia, n = 1) was determined by the area under the receiver operating characteristic curve (AUC). Delirium was defined as prevalent (on admission), incident (occurring during admission) and any (prevalent + incident) delirium. Results: among 308 consecutive patients aged ≥65 (mean age/SD = 81/8 years, 164 (54%) female), existing delirium risk scores had AUCs for delirium similar to those reported in their original internal validations ranging from 0.69 to 0.76 for any delirium and 0.73 to 0.83 for incident delirium. All scores performed better than chance but no one score was clearly superior. Conclusions: externally derived delirium risk scores performed well in our independent acute medicine population with reliability unaffected by simplification and might therefore facilitate targeting of multicomponent interventions in routine clinical practice.


Introduction
Delirium is an acute and fluctuating confusional state associated with increased care needs and poor outcomes [1]. Reliable delirium risk stratification will aid screening, anticipation and prevention, and enable better targeting of clinical resources [1]. Major risk factors (besides older age) include cognitive impairment, hip fracture and severe illness, but many other factors are implicated making risk prediction in individual patients difficult [1]. Formal risk scores may help but must be simple to use, have clinical credibility and be externally validated on representative cohorts [2][3][4].
Existing delirium risk scores derived in acute medicine generally incorporate measures of impairment (sensory, cognitive and or functional) and illness severity and or infection, but there are few validations in external cohorts and some include complex measures making them difficult to use in practice [5][6][7][8][9][10]. No studies have examined whether scores derived to predict incident delirium (occurring de novo during admission) will also identify any delirium (prevalent and incident delirium) and vice versa despite the fact that such a score would have clinical utility in both screening/recognition and prediction of delirium.
We therefore determined the reliability of existing acute medicine risk scores described in the literature [5][6][7][8][9] for any, incident and prevalent delirium in a consecutive cohort of older acute medicine patients. We also assessed the robustness of existing scores to simplification for use with data acquired by the medical team as part of routine clinical practice.

Patient cohort
The Oxford University Hospitals Trust (OUHT) provides services for all acute medicine patients in a population of 500,000 and runs an unselected medical admissions system, with the majority of patients remaining under the admitting team. In a prospective observational audit, consecutive admissions to a single team over two 8-week periods (September-November 2010 and April-June 2012) were screened for delirium on arrival and daily thereafter by the admitting team until discharge, transfer or death. The audit was undertaken to inform future service development and was approved by the Divisional Management and registered with the OUHT Audit Team. All data were routinely acquired as part of standard patient care. Some data on agespecific delirium rates, associates and outcomes from this cohort have been published previously [11].
All patients were seen within 24 h of admission by an experienced Consultant Physician (dually accredited in acute general (internal) medicine and geriatrics (S.T.P., S.C.S.)) responsible for the patient's care and at least every other day thereafter. All patients aged ≥65 had the Confusion Assessment Method (CAM) [12] and a cognitive test: Cohort 1 (2010) had the minimental state examination (MMSE) [13] and Cohort 2 (2012) had the abbreviated mental test score (AMTS) [14]. The cognitive test and CAM formed part of the standard OUHT clerking proforma administered by junior doctors on the STP/SCS admitting team all of whom were trained in their use as part of standard OUHT practice led by STP. Cognitive impairment was defined as AMTS < 9 or MMSE < 24 according to published cut-offs [15,16] and/or prior diagnosis of dementia. Delirium diagnosis was made according to DSM IV criteria [17] by the responsible physician (S.T.P., S.C.S.) after discussion with the rest of the medical team and was categorised as any delirium (occurring at any point during admission), prevalent delirium (on admission or within the first 48 h) or incident delirium (occurring after the first 48 h). If delirium was present on admission, a 48-h period without evidence of delirium was required before a new episode of delirium occurring during admission could be recorded.
Demographic data, presenting complaint and potential risk factors were recorded from the patient, relatives and primary care physician (GP) and medical records including living arrangements (care home versus home with care package versus home without formal care) and clinical and physiological parameters (see below). Prior diagnosis of dementia was recorded if the diagnosis was present in the GP letter, reported by the patient or relative or had been recorded previously in the patient's notes. Vision and hearing impairment was recorded if noted in the medical history or was evident during patient admission or subsequent interview. Admission physiological parameters (pulse, temperature and respiratory rate) were taken from the patient's chart. Systemic inflammatory response syndrome (SIRS) was used as a measure of illness severity since it required only routinely collected clinical data and was classed as positive if two or more of the following were present: heart rate >90 bpm, temperature <36 or >38°C, respiratory rate >20 breaths per minute, white blood cell count <4 × 10 9 or >12 × 10 9 cells per litre [18].

Selection and adaptation of externally derived delirium risk scores
We selected delirium risk prediction scores for testing in our sample if they were derived on acute medicine cohorts [5][6][7][8][9] and did not examine scores derived in other environments including surgical cohorts, intensive care, the emergency department or wards restricted to frail, dependent older patients [19][20][21]. Existing risk scores were adapted and simplified where necessary to allow use with data acquired as part of the medical team's routine clinical assessment (Table 1). We were not able to examine the score developed by Carrasco et al. [10] in an acute medicine cohort since this could not be simplified for use with our dataset owing to the need for a numeric value for the Barthel index. Specifically, for all included scores, severe illness was defined by SIRS ≥ 2. Cognitive impairment was defined as a diagnosis of dementia and/or cognitive score below cut-off (MMSE < 24, AMTS < 9). Similarly, in the AWOL (Age, failure to spell 'World' backward, disOrientation to place and iLlness severity) score [8], spelling WORLD backwards and disorientation (1 point each) was replaced by a diagnosis of dementia and/or cognitive score below cut-off (MMSE < 24, AMTS < 9, 2 points). Functional dependency was defined as residence in a care home or at home with carers. In the Indonesian score [7], 'infection with sepsis' was defined as infection together with SIRS ≥ 2.

Statistical analyses
We determined whether the existing acute medicine delirium risk scores could reliably identify those patients with delirium in our cohort. All scores were examined for prediction of any, prevalent and incident delirium even if originally developed specifically to predict risk of incident delirium using the areas under the receiver operating characteristic curve (AUC). To determine the performance of the scores for identifying risk of incident delirium, patients with prevalent delirium were excluded from the analyses. For analyses of prevalent delirium, all patients were included. Missing data were not imputed except for cognitive data where AUCs were calculated both without and with imputed data with missing scores imputed as normal. Statistical differences between the AUCs obtained for the existing risk scores were tested with pairwise comparisons using the z test. Sensitivity, specificity and positive and negative predictive values were calculated.

Sample size calculation
Using an estimate of 33% overall delirium rate in admissions to acute general medicine aged ≥65 from previous pilot work and published estimates [1,[22][23][24], we calculated that a sample size of 300 would yield 100 delirium outcomes giving sufficient power to examine the reliability of the five delirium risk scores all with 3-5 risk factors (given requirement for 20 outcome events per factor examined) [25]. Although this sample size would not give the statistical power to reliably determine small differences between the different risk scores, it would allow us to determine whether individual risk scores perform better than chance (where the lower CI for the AUC is >0.5, the null hypothesis is disproved). The sample size calculation was done on the basis of detection of any delirium. We expected lower rates of incident delirium and thus less power to determine whether scores were reliable specifically for incident delirium.

Results
Among 308 consecutive patients aged ≥65 (mean/SD age 81/8 years, 164 (54%) female) admitted by our acute medicine team over the 4-month period, any delirium occurred in 95 patients (31%) (67 with prevalent delirium of whom 17 had recurrent episodes and 28 with incident delirium). Rates of missing data for parameters required for score completion were generally low (functional dependency, n = 14; SIRS, n = 3; infection, n = 7; age, n = 0; visual impairment, n = 14; . APACHE II, Acute Physiology and Chronic Health Evaluation II score; BUN, blood urea nitrogen. a Severe illness was defined by SIRS ≥ 2 in the current study rather than the APACHE II score since the latter requires arterial blood gas sampling. b Assessed by the researchers using performance in six activities of daily living. c Replaced by diagnosis of dementia or cognitive score below cut-off in the current study. dehydration, n = 18) except for cognitive test (n = 79, no reason documented; n = 12, too unwell; n = 3, dysphasic; n = 1, no English). AUCs for the different risk scores for any and incident delirium are shown in Table 2 and Supplementary data, Appendix Figure 1, available in Age and Ageing online. AUCs ranged from 0.69 to 0.76 for any delirium and 0.73 to 0.83 for incident delirium with no major difference after imputation of missing cognitive data ( Table 2). No score was clearly superior (Supplementary data, Appendix Table  1, available in Age and Ageing online), and all scores performed better than expected on the basis of chance. Scores predicted any delirium even when originally developed for incident delirium and vice versa. Comparing the original published internal validations of the existing risk scores with the external validations in our cohort ( Table 2)  with greatest discrepancy being originally derived from retrospective chart reviews and requiring major modification. Table 3 shows the sensitivities, specificities, positive and negative predictive values for all the risk scores for any and incident delirium. In age-stratified analyses, increasing number of risk factors were associated with increased delirium risk irrespective of age, but older age was associated with both a higher prevalence of multiple factors and greater susceptibility (Supplementary data, Appendix Table 2 and Appendix Figure 2, available in Age and Ageing online).

Discussion
Delirium risk scores, using data acquired by the clinical team in the course of routine assessment and including a short cognitive test, reliably risk stratified patients for both any and incident delirium.
Our inclusive cohort of all patients aged ≥65 had delirium rates consistent with reported prevalences of 18-35% and incidences of 11-14% for acute medicine cohorts of ≥100 subjects [1,22]. Rates are also similar to recent UK studies with different methodologies: 37% in consecutive acute medicine admissions aged ≥75 in Cardiff [23] and 27% in consecutive emergency acute geriatric, medicine and trauma orthopaedic admissions (aged ≥70) in Nottingham [24].
While it is probable that much poor outcome associated with delirium is not preventable, better recognition will facilitate optimal care and targeting of staffing resources to prevent avoidable deterioration, complications and deaths in this vulnerable group [1,[26][27][28]. Our data suggest that since all scores identified both any and incident delirium, such 'risk' scores can help in recognition/screening of delirium as well as in prediction of future risk and will help staff to focus on vulnerable groups. This is of particular clinical utility since multicomponent interventions for example, maintenance of normal sleep wake cycles and daily mobilisation, attention to nutrition and hydration, apply to both treatment and prevention of delirium [1].
We found that scores were robust to adaptation for use with data from routine clinical assessment suggesting that, although delirium is multifactorial, most risk is conferred by a few consistent factors. All adapted risk models included score below cut-off on a short cognitive test and it is likely that this carries significant weight through helping recognition/ a AUC obtained after imputation of missing cognitive data, missing data assumed normal. In external validations, n refers to the number in the sample to which the scores were applied diagnosis of prevalent delirium and identification of cognitive impairment (pre-existing undiagnosed dementia [29] or subsyndromal delirium) in non-delirious patients who are therefore at high risk of incident delirium. However, it should be noted that significant numbers of older patients are untestable at the point of admission to hospital [30], and thus, risk scores incorporating a cognitive test cannot be applied to this group. The higher rates of delirium seen in older patients resulted from greater prevalence of multiple risk factors and also increased susceptibility: for a given number of risk factors, older patients had more delirium. AUCs were around 0.7-0.8 for all scores, probably because of the inclusion of broadly similar risk factors, and all scores performed better than chance for both any and incident delirium. Our findings are in contrast to a study validating risk scores in a post-operative population in which AUCs were lower, varying between 0.50 and 0.66 [19]. However, in this study, the mean age of the patients was relatively young, the incidence of delirium was low and most patients were undergoing elective surgery. For AUCs in the range of 0.7-0.8 as found in our study, high sensitivity comes at the cost of specificity and vice versa, i.e. there will be significant numbers of false positives and negatives and the reliability of the scores is far from perfect. However, in the context of widespread under-recognition of patients at risk of delirium [1], risk scores would appear good enough to facilitate targeting of multicomponent interventions in high-risk groups.
Strengths of our study include the prospective inclusive cohort design, regular consultant review facilitating delirium diagnosis and pragmatic use of factors available to the medical team as part of routine clinical care. We were thus able to externally validate and compare clinically applicable delirium risk scores on a representative cohort as recommended in the literature [2][3][4].There are some limitations to our study. First, we did not examine inter-observer reproducibility of delirium diagnosis. However, the diagnosis of delirium was made by experienced physicians/geriatricians. Second, since we performed the study in the course of routine care, diagnosis was not blinded to the patients' clinical characteristics and thus there is the possibility of bias. However, the fact that our observed delirium rate was similar to that reported in other studies suggests that there was no significant over-diagnosis. Third, many patients did not have cognitive testing completed without a documented reason but may have been testable, and this might have impacted on measured AUC values.
In conclusion, our findings have implications for clinical practice. Risk stratification of patients in routine practice can be achieved with simple and feasible delirium risk scores which will facilitate both recognition and prevention of delirium and help target multicomponent intervention. Such risk scores will also enable estimation of delirium rates by case-mix in the general hospital. Finally, our study will aid sample size calculation and selection of high-risk patients for future clinical trials.

Key points
• Delirium risk scores are reliable in identifying high-risk patients in acute medicine. • Delirium risk scores are robust to simplification.
• Delirium risk scores are feasible for use in routine clinical practice.

Supplementary data
Supplementary data mentioned in the text are available to subscribers in Age and Ageing online.  helped design analyses and advised on risk score application and co-wrote the manuscript.

Conflicts of interest
None declared.
Funding S.T.P. is supported by the NIHR Oxford Biomedical Research Centre. P.M.R. is an NIHR Senior investigator and a Wellcome Trust Senior Investigator. There was no specific funding for this study, and the sponsors had no role in the study design or analyses.