Edinburgh Research Explorer Development and validation of the ISARIC 4C Deterioration model for adults hospitalised with COVID-19

Summary Background Prognostic models to predict the risk of clinical deterioration in acute COVID-19 cases are urgently required to inform clinical management decisions. Methods We developed and validated a multivariable logistic regression model for in-hospital clinical deterioration (defined as any requirement of ventilatory support or critical care, or death) among consecutively hospitalised adults with highly suspected or confirmed COVID-19 who were prospectively recruited to the International Severe Acute Respiratory and Emerging Infections Consortium Coronavirus Clinical Characterisation Consortium (ISARIC4C) study across 260 hospitals in England, Scotland, and Wales. Candidate predictors that were specified a priori were considered for inclusion in the model on the basis of previous prognostic scores and emerging literature describing routinely measured biomarkers associated with COVID-19 prognosis. We used internal–external cross-validation to evaluate discrimination, calibration, and clinical utility across eight National Health Service (NHS) regions in the development cohort. We further validated the final model in held-out data from an additional NHS region (London). Findings 74 944 participants (recruited between Feb 6 and Aug 26, 2020) were included, of whom 31 924 (43·2%) of 73 948 with available outcomes met the composite clinical deterioration outcome. In internal–external cross-validation in the development cohort of 66 705 participants, the selected model (comprising 11 predictors routinely measured at the point of hospital admission) showed consistent discrimination, calibration, and clinical utility across all eight NHS regions. In held-out data from London (n=8239), the model showed a similarly consistent performance (C-statistic 0·77 [95% CI 0·76 to 0·78]; calibration-in-the-large 0·00 [–0·05 to 0·05]); calibration slope 0·96 [0·91 to 1·01]), and greater net benefit than any other reproducible prognostic model. Interpretation The 4C Deterioration model has strong potential for clinical utility and generalisability to predict clinical deterioration and inform decision making among adults hospitalised with COVID-19.


Introduction
The COVID-19 pandemic has continued to overwhelm health-care systems worldwide. 1 Effective triage of patients presenting to hospital for risk of progressive deterioration is crucial to inform clinical decision making and facilitate effective resource allocation, including hospital beds, critical care resources, and targeted drug therapies. Moreover, early identification of subgroups at higher risk of death or deterioration requiring ventilatory or critical care support enables targeted recruitment for randomised controlled trials of therapies with equipoise, 2 and more precise delivery of treatments for which effectiveness is known to vary according to disease severity (including corticosteroids and remdesivir). [3][4][5] Many multivariable clinical prognostic models for patients with COVID-19 have rapidly accrued to predict adverse outcomes of mortality or clinical deterioration. 6 Most have been classified as being at a high risk of bias, and might not be generalisable, often because of inadequate sample sizes, reliance on single-centre data, and non-adherence to best practice methods or reporting standards during model development. 6,7 None of the multivariable prognostic models included in a systematic head-to-head external validation study outperformed univariable predictors, 8 highlighting the need to combine large scale multisite data with rigorous model development methods to improve generalisability.
We previously reported a pragmatic prognostic score for in-hospital mortality from the International Severe Acute Respiratory and Emerging Infections Consortium Coronavirus Clinical Characterisation Consortium (ISARIC4C) study. 9 In this Article, we extend this work through a larger study cohort to develop and validate a prognostic model for in-hospital clinical deterioration (requirement for ventilatory support or critical care, or death). We use the wide geographical coverage of the ISARIC4C study cohort in England, Wales, and Scotland to explore between-region heterogeneity and to comprehensively assess model generalisability with respect to discrimination, calibration, and clinical utility. We have called this the 4C Deterioration model.

Study population and data collection
The International Severe Acute Respiratory and Emerging Infections Consortium (ISARIC)-WHO Clinical Characterisation Protocol UK (CCP-UK) study is being conducted by the ISARIC4C in 260 hospitals across England, Scotland, and Wales (National Institute for Health Research [NIHR] Clinical Research Network Central Portfolio Management System ID 14152). 10 In this analysis, we included consecutive adults (aged ≥18 years) who had highly suspected or PCR-confirmed COVID-19. We included patients with suspected COVID-19 in the analysis because the model is intended for use in participants at the point of initial evaluation for COVID-19, when virological confirmation might not be available. The study is reported in accordance with Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidance. 11 Demographic, clinical, and outcome data were collected through a publicly available standardised case record form, as reported previously. 9,10 Ethical approval was given by the South Central Oxford C research ethics committee in England (reference 13/SC/0149), and by the Scotland A research ethics committee (reference 20/SS/0028). The study is registered with ISRCTN (ISRCTN66726260).

Outcomes
We used a composite primary outcome of in-hospital clinical deterioration, comprising any of the following: initiation of ventilatory support (non-invasive ventilation, invasive mechanical ventilation, or extracorporeal membrane oxygenation); admission to a high-dependency or intensive care unit; or death. This outcome aligns closely with a score of 6 or higher on the WHO Clinical Progression Scale 12 and ensures that the outcome is generalisable between hospitals, since respiratory support practices can vary considerably. We included eligible

Research in context
Evidence before this study An existing systematic review evaluated prediction models for COVID-19 indexed in PubMed, Embase, arXiv, medRxiv, and bioRxiv up to May 5, 2020. 145 models were identified, 50 of which were prognostic models seeking to predict clinical outcomes of mortality or clinical deterioration. The proposed models were considered to be poorly reported, at high risk of bias, and their reported performance was thought to be overestimated. A systematic head-to-head external validation study of 22 of these prognostic models found that none had clinical utility over and above simple univariable predictors of age for mortality and oxygen saturation for clinical deterioration. Thus, none of the multivariable models could be recommended for clinical implementation, highlighting a need for higher quality model development methodology using multicentre datasets to maximise generalisability.

Added value of this study
We developed and validated the 4C Deterioration model, including 11 routinely measured demographic, clinical, and laboratory predictors, for prediction of in-hospital clinical deterioration among 74 944 consecutive adults recruited to the ISARIC4C study across 260 hospitals in England, Scotland, and Wales, in accordance with TRIPOD standards. The 4C Deterioration model showed consistent discrimination, calibration, and net benefit across eight National Health Service regions during model development, with similar performance in held-out validation data from London. Importantly, the 4C Deterioration model suggested clinical utility with higher net benefit than other reproducible candidate models in a decision-curve analysis in all regions. In comparison to our recently reported 4C Mortality Score, 4C Deterioration offers significant additional value by identifying people at high risk of deterioration despite a low risk of mortality, with potential to better target interventions for those who need them and are most likely to benefit.

Implications of all the available evidence
Parallel prognostic models are required for the prediction of a composite outcome of clinical deterioration and of mortality alone among hospitalised adults with COVID- 19. The 4C Deterioration model shows stronger potential for clinical utility and generalisability than any previous prognostic model for clinical deterioration among adults with COVID-19. The model parameters and risk prediction tool will be made freely available online alongside our previously reported 4C Mortality Score to enable independent external validation and facilitate risk stratification for therapeutic interventions.
Prof Mahdad Noursadeghi ‡, Division of Infection and Immunity, University College London, London WC1E 6BT, UK m.noursadeghi@ucl.ac.uk †For ISARIC4C study processes ‡For model derivation participants admitted or first assessed for COVID-19 on or before Aug 26, 2020, to allow at least a 4-week interval for registration of outcome events before the final data extraction date (Sept 24, 2020). Participants who had ongoing hospital care at the end of follow-up (the point at which a final outcome was recorded in the case record form) were classified as not meeting the endpoint because the risk of deterioration declines with time since admission. 8

Candidate predictors
We included candidate predictors considered in our previous development and validation of the 4C Mortality Score 9 that were available in at least 60% of the study population (appendix p 21). These predictors were specified a priori on the basis of previous prognostic scores and emerging literature describing routinely measured biomarkers associated with COVID-19 prognosis. 9 We also included nosocomial COVID-19 acquisition to test the hypothesis that acquisition of infection in hospital might be associated with differential risk. Community-acquired infection was defined as symptom onset or first positive severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) PCR result within 7 days from admission; participants who did not meet these criteria and had either symptom onset or first positive SARS-CoV-2 PCR result more than 7 days from admission were classified as nosocomial cases. 13 Among nosocomial cases, patients who met the deterioration outcome before the onset of COVID-19 were excluded.
Comorbidities were defined according to a modified Charlson comorbidity index, 14 with the addition of clinician-defined obesity. 9 We considered a composite variable representing number of comorbidities for inclusion in the model, which comprised the following comorbidities: chronic cardiac disease, chronic respiratory disease (excluding asthma), chronic renal disease, mild to severe liver disease, dementia, chronic neurological disease, connective tissue disease, diabetes, HIV or AIDS, malignancy, and clinician-defined obesity.
All predictors were taken from the day of hospital admission or the day of first clinical suspicion of COVID-19 for nosocomial cases.

Model development
We hypothesised that heterogeneity among populations and health-care services between geographical regions might contribute to differences in model performance. Therefore, we divided the data into nine National Health Service (NHS) regions 15 linked to contributing hospitals. Eight regions were used in model development and internal-external cross-validation (East of England, the Midlands, North East England and Yorkshire, North West England, Scotland, South East England, South West England, and Wales) as described below. The ninth region (London) was not used in model development but was held out for further validation, independent of the model training cohort.
We used a logistic regression modelling approach in view of the short time horizon for predictions (during hospital admission) and did backward elimination of the a priori candidate variables in the development cohort (appendix p 5). Continuous predictors were modelled with restricted cubic splines using a default of four knots, placed at recommended locations based on percentiles, by generating transformations using the rcs function in the rms package in R. 16,17 Glasgow coma scale scores were categorised as 15 or less than 15 because there were insufficient datapoints below 15 to fit spline functions. We used multiple imputation with chained equations to address missing data; analyses were done in each imputed dataset and pooled using Rubin's rules in the primary analysis (appendix p 5). 18

Model validation
During validation, we assessed model discrimination (how well predictions differentiated participants who experienced the composite outcome from those who did not, quantified as the C-statistic), calibration (agreement between predicted and observed risk, assessed using calibration slopes, calibration-in-the-large, and calibration plots) and clinical utility (quantified as net benefit). 19 An ideal calibration slope is 1, while calibration-in-the-large should be 0 if the number of observed outcome events matches the number predicted.
The model including the selected variables was first validated in the development dataset using the internalexternal cross-validation framework to concurrently examine between-region heterogeneity and assess generalisability. 19,20 In this process, each of the eight contributing NHS regions was iteratively excluded from the development set; the model was then trained using the selected predictors in the remaining regions and validated in the omitted region by quantifying the C-statistic, calibration slope, and calibration-in-thelarge, and by visualisation of calibration plots (appendix p 5). 19 We used random-effects meta-analysis to calculate pooled C-statistics, calibration slopes, and calibrationin-the-large statistics across development regions, and forest plots were examined to assess between-region heterogeneity.
The final model was then trained using the full development dataset and further validated in the heldout NHS region (London).
Decision curve analysis allows assessment of clinical utility by quantifying the trade-off between correctly identifying true positives and incorrectly identifying false positives weighted according to the threshold probability. 21 The threshold probability represents the risk cutoff above which any given treatment or intervention might be considered, and reflects the perceived risk:benefit ratio for the intervention. Decision curve analysis was used in internal-external validation and held-out validation to  quantify the net benefit of implementing the model in clinical practice, 21 compared with the following: a treat-all approach; a treat-none approach; and using other COVID-19-specific and pre-existing prognostic models identified by recent systematic reviews. 6,8,9 We included all models in which constituent variables were available in more than 60% of the cohort. Candidate models using points scores were calibrated to the validation data during decisioncurve analysis, resulting in optimistic estimates of their net benefit. All decision curves were smoothed by locally weighted smoothing (LOESS) from stacked multiply imputed datasets. 19 All analyses were done in R (version 3.6.3; appendix p 6).

Sensitivity analysis
We assessed validation of the final model using complete case data only in the held-out NHS region. We also recalculated validation metrics when stratifying deterioration events by time to deterioration (on vs after day of admission or first COVID-19 assessment; and 0-3 days vs >3 days after admission or first COVID-19 assessment), to assess whether discrimination varied according to time interval to the outcome; when excluding participants in the validation cohort who had ongoing hospital care at the end of follow-up; when stratifying the validation cohort by community versus nosocomial infection; and when excluding community-acquired cases in which patients developed symptoms in the interval between admission and the temporal threshold for nosocomial infection, to assess any effect of incorrect inclusion of nosocomial infections within the communityacquired cases. We also repeated the analysis with an alternative multiple imputation approach, using the aregImpute function from the rms package in R, 17 and recalculated model parameters using alternative temporal definitions of nosocomial SARS-CoV-2 infection. Finally, we assessed the discrimination of each of the continuous variables included in the final model as single univariable predictors.

Role of the funding source
The funder of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report. All authors had full access to all the data in the study and had final responsibility for the decision to submit for publication.  In backward elimination, 11 predictors were retained in more than five (>50%) of ten multiply imputed datasets in more than four (>50%) of eight development NHS regions, and entered the final model. These predictors were age, sex, nosocomial infection, Glasgow coma scale score, peripheral oxygen saturation (SpO 2 ) at admission, breathing room air or oxygen therapy (contemporaneous with SpO 2 measurement), respiratory rate, urea concentration, C-reactive protein concentration, lymphocyte count, and presence of radiographic chest infiltrates. Associations (including non-linearities) between each predictor and the outcome from the final 4C Deterioration model trained on the full development cohort are shown in figure 1, and full model coefficients are presented in the appendix (p 23) to enable independent model reconstruction.
Decision-curve analyses in the validation sets from internal-external cross-validation, without recalibration of the new model, are shown in the appendix (p 11), with benchmarking to 11 existing candidate prognostic models for which the constituent variables were available in more than 60% of participants. The 4C Deterioration model had higher net benefit than any of the existing models and the treat-all or treat-none strategies across a broad range of threshold probabilities in all development NHS regions (without local recalibration).
Subsequently, we validated the final prognostic model, trained on the full development cohort, in the held-out NHS region. Discrimination and calibration metrics for the 4C Deterioration model were similar to the estimates from internal-external cross-validation (table 2), with C-statistic 0·77 (0·76 to 0·78), calibration-in-the-large 0·00 (-0·05 to 0·05), and calibration slope 0·96 (0·91 to 1·01; visual calibration curve shown in figure 3A). Discrimination was higher for the 4C Deterioration model than for the other existing candidates. The sensitivity, specificity, positive-predictive value, and negative-predictive value across the full range of probability thresholds from the model are shown in the appendix (p 12).
Decision-curve analysis in the held-out NHS region to further examine clinical utility for the 4C Deterioration model showed higher net benefit than all other candidates and the treat-all and treat-none approaches across a range of threshold probabilities (figure 3B).
We anticipate that clinicians might wish to evaluate risk of deterioration or death separately. Therefore, for illustration, we compared predictions from the 4C Deterioration model to our previously reported 4C Mortality Score 9 in the London validation cohort, stratified by age ( figure 4A), sex, and ethnicity (appendix p 13). In addition, ten example participants selected at random from each decile of 4C Deterioration predictions in the London cohort are shown in figure 4B, with their clinical characteristics summarised in figure 4C. Overall, deterioration predictions appeared appropriately higher than those for mortality. Importantly, the covariance between 4C Mortality Score and 4C Deterioration predictions was lower among younger age groups, among whom discrepancies between predictions were therefore greater. There were no differences in covariance by sex or ethnicity after stratification by age.
Validation of the model in complete case data from the held-out London region showed similar results to the primary analyses (appendix p 24). Stratification of outcome events (on vs after day of admission or first COVID-19 assessment; and 0-3 days vs >3 days after admission or first COVID-19 assessment) in the London validation cohort resulted in slightly lower C-statistics with longer time horizons for the 4C Deterioration model and most other models (appendix p 26). However, for some models in which mortality was the original intended outcome (including the 4C Mortality model), discrimination appeared better over the longer time   horizons. Validation metrics in the London cohort were similar to those of the primary analysis when excluding participants who had ongoing hospital care at the end of follow-up (appendix p 27), when restricted to community-acquired infections (appendix p 28), and when community-acquired infections with symptom onset after admission were excluded (appendix p 29). Among nosocomial cases, the C-statistic was slightly lower for the 4C Deterioration model than for the primary analysis (0·73 [0·68 to 0·78]), although discrimination remained higher than that of the other candidate models, and calibration-in-the-large was 0·32 (0·12 to 0·53), suggesting elevated baseline risk among participants with nosocomial infection (appendix p 28). Repeating the analyses with use of an alternative multiple-imputation approach and with shorter and longer temporal definitions of nosocomial infection led to similar results to the primary analysis (appendix pp [14][15][16][17][18][19]30). Of the continuous variables in the final model, serum C-reactive protein concentration was the strongest univariable predictor for deterioration (C-statistic 0·68 [0·66 to 0·69]) but had lower discrimination than the full multivariable model (appendix p 31).

Discussion
We developed and validated a prognostic model for inhospital clinical deterioration among 74 944 consecutive adults hospitalised with COVID-19 and recruited to the ISARIC4C study across 260 hospitals in England, Scotland, and Wales. The final model integrates 11 routinely available predictors and is intended for use at the point of admission for community-acquired cases, or first evaluation of suspected nosocomial COVID-19.
Internal-external cross-validation showed consistent discrimination, calibration, and net benefit across NHS regions, which were confirmed in further validation in the held-out London region. The model provides a probability output that indicates the chance of the individual under evaluation having the outcome. These predictions will enable clinicians to objectively assess deterioration risk to inform the need for interventions such as ongoing hospital admission, consideration for critical care, and initiation of therapeutic agents. Importantly, the 4C Deterioration model achieved higher net benefit than other candidate risk-stratification tools across a broad range of risk thresholds in all NHS regions. Thus, the 4C Deterioration model has strong potential for clinical utility and generalisability. Our 4C Deterioration model can be implemented programmatically alongside our previously reported 4C Mortality Score. 9 Covariance between the 4C Deterioration and 4C Mortality predictions was not systematically different by sex or ethnicity, but was attenuated among younger age groups. Thus, the greatest discordance of risks is evident in younger patients. This finding suggests that younger people who deteriorated were more likely to have escalation of treatment through admission to a high-dependency or intensive care unit or through ventilatory support, whereas older people who deteriorated were more likely to die. These observations might be mediated, in part, by differential treatment escalation decisions associated with age. Moreover, our comparison of the models for ten randomly selected patients across the distribution of outcome risks illustrates examples of cases with relatively low risks of death, but moderate to high risks of deterioration. These  discordances underline the need for parallel prognostic models for a composite outcome of clinical deterioration and for mortality alone. Notably, the discrimination of the 4C Deterioration declined slightly with increasing time to outcome events, whereas that of the 4C Mortality model improved, probably reflecting the fact that most deterioration events that occurred more than 3 days after admission were deaths, whereas earlier events were more likely to be initiations of ventilatory support or high-dependency or intensive care. Application of the 4C Mortality Score and 4C Deterioration model together therefore provides the optimal approach for clinicians to predict the appropriate outcome as required to inform clinical management decisions. We overcame the weaknesses of previous COVID-19 predictive models 6,8 by adhering to TRIPOD standards 11 and retaining continuous variables without arbitrary categorisation, while accounting for non-linear associations, to avoid loss of information. 31 Moreover, we used the largest dataset to date, to our knowledge, to develop and validate the 4C Deterioration model, reducing the risk of overfitting due to inadequate sample size. We exploited the wide geographical coverage across nine NHS regions in England, Scotland, and Wales to explore betweenregion heterogeneity in model performance using internal-external cross-validation. 32 Although discrimination, calibration slopes, and net benefit were largely very consistent, we noted minor variation in calibrationin-the-large, suggesting some variation in baseline risk between regions. Our approach of recalibrating the model intercept to each NHS region showed the potential to address such heterogeneity and could be used to update the model if risk changes temporally (as novel therapies are implemented) and among different populations. Nonetheless, net benefit, which accounts for model discrimination and calibration in quantifying clinical utility, was higher for the 4C Deterioration model than for all other candidates, even without recalibration, across all NHS regions and in the held-out validation dataset. This was the case even when comparing to points-based models, which might have achieved overly optimistic performance in decision-curve analyses because they were recalibrated to the validation datasets. We also used a bestpractice approach to missing data with multiple imputation, 33 and obtained consistent results with an alternative imputation approach. Our 4C Deterioration model was developed and validated in the context of current care; predictions should therefore be interpreted as reflecting both baseline risk and potential mitigation through in-hospital interventions. Ongoing prospective external validation of the 4C Deterioration model will be required to consider the need for temporal recalibration 34 and to evaluate model performance in diverse international settings outside of the ISARIC4C study. Although the model showed consistent performance across England, Wales, and Scotland, validation in other counties should be prioritised  to enable its clinical implementation internationally. We have provided the underlying model coefficients to enable this. Another limitation is that we only included predictors that were routinely measured as part of clinical care during the study period, and specified that they had to be available among more than 60% of the population for inclusion in the analysis. Thus, we were unable to assess candidate models that include predictors such as lactate dehydrogenase or D-dimer concentrations, because these variables were only available in a small proportion of participants. Future studies could consider standardised capture of laboratory measurements considered to have prognostic value to enable inclusion of these variables in model development and validation at scale. Moreover, we note that novel molecular biomarkers currently under investigation might also offer prognostic value. 35 Blood transcript, protein, and metabolite measurements will be available from a subset of the ISARIC4C participants and could be integrated into risk-stratification tools in future studies.
In summary, we present a prognostic model for clinical deterioration among hospitalised adults with communityacquired or hospital-acquired COVID-19, validated in nine NHS regions in England, Scotland, and Wales. The model uses readily available clinical predictors to predict the probability of in-hospital deterioration and will be made freely available online alongside our previously reported mortality risk score, 9 to inform clinical decision making and patient stratification for therapeutic interventions.  as President of the British Society for Immunology was unpaid but travel and accommodation at some meetings was provided by the Society. JKB reports grants from the MRC. MGS reports grants from the DHSC NIHR, MRC, and the HPRU in Emerging and Zoonotic Infections, University of Liverpool, during the conduct of the study; and is chair of the scientific advisory board and a minority share holder at Integrum Scientific (Greensboro, NC, USA) outside of the submitted work. LT reports grants from the HPRU in Emerging and Zoonotic Infections, University of Liverpool, during the conduct of the study, and grants from Wellcome Trust outside of the submitted work. All other authors declare no competing interests.

Data sharing
Access to all data and samples collected by ISARIC4C are controlled by an Independent Data and Materials Access Committee composed of representatives of research funders, academia, clinical medicine, public health, and industry. The application process for access to the data is available on the ISARIC4C website.