A Clinical Risk Score to Predict In-hospital Mortality from COVID-19 in South Korea

- ¹Division of Cardiology, Department of Internal Medicine, Kangwon National University School of Medicine, Chuncheon, Korea.
- ²Department of Biomedical Engineering, College of Information-Bio Convergence Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea.
- ³Division of Cardiology, Department of Internal Medicine, Ulsan Medical Center, Ulsan, Korea.
- ⁴Department of Cardiology, Dong-A University Hospital, Busan, Korea.
- ⁵East Lancashire Hospitals NHS Trust, Blackburn, Lancashire, UK.
- ⁶Division of Cardiology, Department of Internal Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Korea.
Address for Correspondence: Eun-Seok Shin, MD, PhD. Division of Cardiology, Department of Internal Medicine, Ulsan University Hospital, University of Ulsan College of Medicine, Ulsan, Republic of Korea. Email: sesim1989@gmail.com

^*Ae-Young Her and Youngjune Bhak contributed equally to this work as first authors.

Received December 09, 2020; Accepted April 05, 2021.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Early identification of patients with coronavirus disease 2019 (COVID-19) who are at high risk of mortality is of vital importance for appropriate clinical decision making and delivering optimal treatment. We aimed to develop and validate a clinical risk score for predicting mortality at the time of admission of patients hospitalized with COVID-19.

Methods

Collaborating with the Korea Centers for Disease Control and Prevention (KCDC), we established a prospective consecutive cohort of 5,628 patients with confirmed COVID-19 infection who were admitted to 120 hospitals in Korea between January 20, 2020, and April 30, 2020. The cohort was randomly divided using a 7:3 ratio into a development (n = 3,940) and validation (n = 1,688) set. Clinical information and complete blood count (CBC) detected at admission were investigated using Least Absolute Shrinkage and Selection Operator (LASSO) and logistic regression to construct a predictive risk score (COVID-Mortality Score). The discriminative power of the risk model was assessed by calculating the area under the curve (AUC) of the receiver operating characteristic curves.

Results

The incidence of mortality was 4.3% in both the development and validation set. A COVID-Mortality Score consisting of age, sex, body mass index, combined comorbidity, clinical symptoms, and CBC was developed. AUCs of the scoring system were 0.96 (95% confidence interval [CI], 0.85–0.91) and 0.97 (95% CI, 0.84–0.93) in the development and validation set, respectively. If the model was optimized for > 90% sensitivity, accuracies were 81.0% and 80.2% with sensitivities of 91.7% and 86.1% in the development and validation set, respectively. The optimized scoring system has been applied to the public online risk calculator (https://www.diseaseriskscore.com).

Conclusion

This clinically developed and validated COVID-Mortality Score, using clinical data available at the time of admission, will aid clinicians in predicting in-hospital mortality.

Graphical Abstract

Keywords

COVID-19; In-hospital Mortality; Death; Prediction; Risk Score

INTRODUCTION

Since the outbreak of coronavirus disease 2019 (COVID-19) in Wuhan, China in December 2019, it was rapidly followed by worldwide outbreaks.1, 2, 3 As of March 22, 2021, the World Health Organization (WHO) reported a total of 123,877,740 COVID-19 cases globally, with an average mortality of 2.2%. In many patients, the disease is mild or self-limiting, however in a considerable portion the disease is severe and fatal. Consequently, it is vital to be able to identify in advance those patients who are at greatest risk of mortality, to enable prompt referral to appropriate care settings, to try and improve outcomes.

Recently, clinical scores to predict the occurrence of critically ill patients and/or a fatal outcome with COVID-19 were developed in a cohort of 1,590 Chinese patients treated in more than 575 centers throughout China.4 Liang et al.4 identified 10 independent predictive factors (abnormal chest radiograph, age, hemoptysis, dyspnea, unconsciousness, number of comorbidities, cancer history, neutrophil-to-lymphocyte ratio, lactate dehydrogenase and direct bilirubin) which were used to produce a risk score which had a mean area under the curve (AUC) of 0.88 in both the development (95% confidence interval [CI], 0.85–0.91), and validation cohort (95% CI, 0.84–0.93), for estimating the risk a hospitalized patient with COVID-19 will develop critical illness. Further reports have shown methods or new severity scores to assess disease severity and mortality of COVID-19 infection (Table 1).5, 6, 7, 8, 9 However, it is important to understand that the mortality of COVID-19 varies according to race and ethnicity, and therefore the accuracy of risk scores is not necessarily transferrable between countries.10 Therefore, the present study aimed to develop a novel COVID-19 in-hospital mortality risk score (hereafter referred to as COVID-Mortality Score), based on data rapidly obtainable soon after hospital admission in South Korea.

Table 1
Current overview of COVID-19 severity or mortality prediction models

Click for larger image
Click for full table
Download as Excel file

METHODS

Data sources and processing

We obtained medical records from laboratory-confirmed hospitalized cases with COVID-19 reported to the Korea Centers for Disease Control and Prevention (KCDC) between January 2020 and April 2020. All 120 hospitals in Korea that were assigned to treat COVID-19 patients submitted the clinical data of all their hospitalized cases with laboratory-confirmed COVID-19 infection to the KCDC by April 30, 2020. All patients with COVID-19 were diagnosed and treated according to the guidelines published by the KCDC (http://www.cdc.go.kr). According to the KCDC guidelines, laboratory confirmation for COVID-19 infection was defined as a positive result on real-time reverse-transcription polymerase-chain-reaction assay of nasal and oropharyngeal swabs. All the patients analyzed either died in hospital or were discharged home despite the limitations that we could not acquire the information for in-hospital mortality due to other causes except COVID-19 or underlying diseases. We collected the data of all COVID-19 patients on clinical status at hospitalization (clinical symptoms and signs, complete blood count (CBC) findings, disease severity, and discharge status). The patients' data were collected up to death or discharge from the hospital. Ordinary variables were converted into separated dichotomous variable. We randomly selected a development and validation set using a 7:3 ratio, respectively. Imputation for missing variables was considered if missing values were less than 30%. We used predictive random forest algorithm for the imputation.

Potential predictive variables

The 35 potential predictive variables included the following patient characteristics at hospital admission: demographic variables and body mass index (BMI), medical history, clinical signs and symptoms, and CBC findings. Demographic variables such as age and sex and BMI were collected for the study. Medical history included diabetes mellitus, hypertension, heart failure, cardiovascular disease, bronchial asthma, chronic obstructive pulmonary disease, chronic renal disease, malignancy, chronic liver disease, autoimmune disease, and dementia. Clinical signs and symptoms included categorical and continuous variables as follows: systolic blood pressure (SBP), diastolic blood pressure (DBP), heart rate, body temperature, fever, cough, sputum, sore throat, rhinorrhea, myalgia, fatigue, dyspnea, headache, unconsciousness, nausea/vomiting, and diarrhea. CBC findings were included as follows: hemoglobin, hematocrit, lymphocyte, platelet, and white blood cell (WBC).

Outcomes

We adopted in-hospital death as the end point because death is the most serious outcome of COVID-19.

Selection of variables and construction of scoring system

We included all 3,940 patients hospitalized with COVID-19 in the development set for selection of variables and development of mortality score. As previously stated, the 35 variables were involved in the selection process. Least Absolute Shrinkage and Selection Operator (LASSO) regression was used for the initial variable selection. For datasets with a low events per variables ratio, LASSO is more appropriate than the stepwise regression analysis, and it is more satisfactory for regression models with high-dimensional predictors. This is a logistic regression model that obtains the subset of predictors that minimized prediction error for a quantitative variable. In penalized regression, it is needed to specify a constant λ to adjust the amount of the coefficient shrinkage. With larger penalties, the estimates of weaker factors shrink toward zero so that only the strongest predictors remain in the model.1 We used LASSO regression augmented with 10-fold cross-validation for internal validation. The best covariates that minimized the cross-validation prediction error rate were defined as the minimum (λ min). The R package “glmnet” statistical software (R Foundation) was used to perform the LASSO regression. The variables identified by LASSO regression analysis that were independently significant in logistic regression analysis were used to generate the risk prediction model (COVID-Mortality Score). The COVID-Mortality Score was generated on the basis of coefficients from the logistic model. We used the following equation to estimate the probability: probability = exp(Σβ × X)/[1 + exp (Σβ × X)].

Assessment of accuracy of prediction model

The accuracy of COVID-Mortality Score was analyzed for using the AUC of the receiver-operator characteristic curve. For internal validation, we used 200 bootstrap resamplings. Statistical analysis was performed with R software (version 4.0.2, R Foundation), and P < 0.05 was considered statistically significant. To validate the generalizability of COVID-Mortality Score, we used data from the 1,688 patients not included in the development set.

Ethics statement

This study was approved by the ethics committee of the Korea Centers for Disease Control and Prevention (KCDC) and written informed consent was exempted because of the de-identified retrospective nature of the publicly available data.

RESULTS

Characteristics of the development set

In the development set, on hospital admission, 134 of 3,940 patients (3.4%) were admitted directly to the intensive care unit (ICU), with the rest (96.6%) admitted to the general ward. The end point of mortality occurred in 4.3% (n = 169). Patients who died were more likely to be men, and were more likely to have a BMI < 18.5, hypertension, an admission to the ICU and more comorbidities than those who lived (Table 2). Fever (23.1%), cough (42.0%), sputum (29.0%), sore throat (15.8%), rhinorrhea (10.9%), myalgia (16.3%), fatigue (4.0%), dyspnea (12.3%), headache (16.8%), unconsciousness (0.6%), nausea/vomiting (4.5%), and diarrhea (9.1%) were the commonest symptoms. CBC findings of the development set are also presented in Table 2. When CBC findings such as WBC, lymphocyte, platelet, and hemoglobin were evaluated as continuous variables, they did not exhibit a significant U-shaped pattern.

Table 2
Clinical characteristics and complete blood cell findings among patients in the development set who did or did not develop mortality

Click for larger image
Click for full table
Download as Excel file

Predictor selection

The 35 variables measured at hospital admission (Table 2, see method) were included in the LASSO regression. After LASSO regression selection (Supplementary Fig. 1), 17 variables remained as significant predictors of death, including age, sex, BMI, SBP, heart rate, body temperature, comorbidities including rhinorrhea, dyspnea, unconsciousness, diabetes mellitus, hypertension, chronic obstructive pulmonary disease, malignancy, WBC, lymphocyte, platelet, and hemoglobin. Inclusion of these 17 variables in a logistic regression model resulted in 13 variables that were independently statistically significant predictors of death and was included in risk score. These variables included age by three groups: 60–69 years (odds ratio [OR], 3.63; 95% CI, 1.64– 8.01; P = 0.001), age 70–79 years (OR, 6.12; 95% CI, 2.84–13.16; P < 0.001), age ≥ 80 years (OR, 21.24; 95% CI, 9.65–46.74; P < 0.001), men (OR, 1.67; 95% CI, 1.04–2.67; P = 0.034), BMI < 18.5 (OR, 3.38; 95% CI, 1.64–6.95; P < 0.001), diabetes mellitus (OR, 2.10; 95% CI, 1.33–3.31; P = 0.001), malignancy history (OR, 2.78; 95% CI, 1.14–6.79; P = 0.025), dementia (OR, 2.67; 95% CI, 1.49–4.78; P < 0.001), rhinorrhea (OR, 0.27; 95% CI, 0.08–0.91; P = 0.035), dyspnea (OR, 4.03; 95% CI, 2.50–6.48; P < 0.001), unconsciousness (OR, 25.10, 95% CI, 6.55–96.18; P < 0.001), WBC (OR per 10³ μL, 1.10; 95% CI, 1.04–1.17, P < 0.001), lower lymphocyte proportion (OR per %, 0.92; 95% CI, 0.89–0.94; P < 0.001), lower platelet count (OR per 10⁴ μL, 0.90; 95% CI, 0.88–0.93, P < 0.001), and lower hemoglobin level (OR per g/dL, 0.81; 95% CI, 0.72–0.92; P = 0.001) (Table 3).

Table 3
Multivariable logistic regression model for predicting development of death in patients hospitalized with coronavirus disease 2019 in Korea

Click for larger image
Click for full table
Download as Excel file

The performance, validation and optimization of COVID-Mortality score

The clinical characteristics and CBC findings of the validation set are presented in Supplementary Table 1. The AUCs of the model with the development set and validation set were 0.97 (95% CI, 0.85–0.91, Fig. 1) and 0.96 (95% CI, 0.84–0.93, Fig. 2), respectively. When the scoring system was optimized for > 90% sensitivity by stratifying age groups, the accuracy was 81.0% with 91.7% sensitivity and 80.5% specificity in the development set. From the validation set, accuracy was 80.2% with 86.1% sensitivity and 80.0% specificity. The optimized scoring system was utilized for the construction of the online risk calculator (https://www.diseaseriskscore.com). The online risk calculator determined whether the patient belonged to high-risk group or low-risk group and presented hazard ratio and high rank percentage for mortality. The model-derived score thresholds used for the optimized scoring system were 0.51 for age less than 40, 0.00252 for age 40s, 0.00176 for age 50s, 0.02302 for age 60s, 0.09532 for age 70s, and 0.13311 for age equal or more than 80.

Fig. 1
Receiver operating characteristic curve analysis in development set. Area under the curve of the COVID-19 in-hospital mortality score in the 3,940 patients that constituted the development set.

Fig. 2
Receiver operating characteristic curve analysis in validation set. Area under the curve of the COVID-19 in-hospital mortality score in the 1,688 patients that constituted the validation set.

DISCUSSION

This study developed and validated a clinical risk score (COVID-Mortality Score) to predict mortality among patients hospitalized with COVID-19 infection. Importantly the 13 variables required for calculating the risk of mortality using this score are all generally readily available on hospital admission. Practically, if the patient's estimated risk for death is low, the clinician may choose careful monitoring, whereas high-risk estimates might require aggressive treatment or immediate ICU care. In this context, we optimized the scoring model to achieve higher sensitivity, to the detriment of accuracy, given the potentially lethal clinical outcome of COVID-19.

Furthermore, this score-based model could assist clinicians when making decisions. For example, clinicians may treat patients with a high-risk score more intensively in an emergency, if resources and ICU beds were limited. Older age, sex, BMI, diabetes mellitus, malignancy, dementia, rhinorrhea, dyspnea, unconsciousness, WBC, lymphocyte, platelet, and hemoglobin were all included in the COVID-Mortality Score. Previous studies have found several of these variables to be risk factors for severe illness related to COVID-19 (Table 1). Wu et al.11 found that older age and more comorbidities were associated with a higher risk of developing acute respiratory distress syndrome (ARDS) in patients infected with COVID-19. Recently Liang et al.4 developed a risk score based on characteristics of COVID-19 patients at the time of admission to the hospital to predict a patient's risk of developing a critical illness. They found that from 72 potential predictors, 10 variables were independent predictive factors and were included in a risk score which had a mean AUC of 0.88 in both the development (95% CI, 0.85–0.91) and validation (95% CI, 0.84–0.93) cohorts. Some of the variables in the Chinese model such as age, dyspnea, unconsciousness, and cancer history were also included in the COVID-Mortality Score, and despite its development in a Korean population, which could limit its generalizability to other areas of the world, the present results show that the AUC is similar at 96 to 97%. Nevertheless, the current COVID-Mortality Score will be needed to validate externally with heterogenous baseline characteristics cohorts because of the limitation of only available predictors in current data.

While mortality prediction is neither perfect nor absolute, having a simple score to predict how severe a patient's illness is and their hospital course, will aid admitting and emergency room physicians in triaging the severity, and predicting the prognosis of COVID-19 infection, which we are realizing has a very broad-spectrum of severity. This can also be used to guide recommendations for palliative care consultations early in a patient's hospital course.

Although this study includes a large sample size for constructing the risk score and a relatively big sample for validation, the data for score development and validation are entirely from Korea and are limited only in specified predictors and mortality as outcomes, limiting the generalizability of the risk score in other areas of the world. However, despite these differences in race, the risk score remained valid in predicting in-hospital mortality.

In conclusion, we developed a risk score to estimate the risk of developing mortality among patients with COVID-19 based on 13 variables commonly measured on admission to hospital in this study. This score could help identify patients in need of more supportive treatment or assist with optimizing the use of medical resources.

SUPPLEMENTARY MATERIALS

Supplementary Table 1

Clinical characteristics and complete blood cell findings among patients in the validation set who did or did not develop mortality

Click here to view.^{(112K, doc)}

Supplementary Fig. 1

Feature selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model. (A) LASSO coefficient profiles of the 35 baseline features. (B) Tuning parameter (λ) selection in the LASSO model used 10-fold cross-validation via minimum criteria.

Click here to view.^{(98K, doc)}

Notes

Funding:This work was supported by the U-K BRAND Research Fund (1.200108.01) of UNIST (Ulsan National Institute of Science & Technology) and the Research Project Funded by Ulsan City Research Fund (1.200047.01) of UNIST (Ulsan National Institute of Science & Technology).

Disclosure:The authors have no potential conflicts of interest to declare.

Author Contributions:

Conceptualization: Her AY, Bhak Y, Shin ES.
Data curation: Her AY, Bhak Y, Jun EJ, Yuan SL.
Formal analysis: Her AY, Bhak Y, Jun EJ, Yuan SL, Lee S, Bhak J.
Investigation: Her AY, Jun EJ, Shin ES.
Methodology: Her AY, Bhak Y, Lee S, Bhak J, Shin ES.
Project administration: Shin ES.
Software: Her AY, Bhak Y, Yuan SL, Lee S, Bhak J.
Supervision: Shin ES.
Validation: Her AY, Bhak Y, Jun EJ, Yuan SL, Garg S, Shin ES.
Visualization: Her AY, Bhak Y, Jun EJ, Garg S, Shin ES.
Writing - original draft: Her AY.
Writing - review & editing: Her AY, Bhak Y, Garg S.

ACKNOWLEDGMENTS

We acknowledge all the health-care workers involved in the diagnosis and treatment of COVID-19 patients in South Korea. We thank the Korea Centers for Disease Control and Prevention Agency, National Medical Center and the Health Information Manager in hospitals for their effort in collecting the medical records.

References

1. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, et al. A new coronavirus associated with human respiratory disease in China. Nature 2020;579(7798):265–269.
  PubMed
  
  CrossRef
1. Li Q, Guan X, Wu P, Wang X, Zhou L, Tong Y, et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med 2020;382(13):1199–1207.
  PubMed
  
  CrossRef
1. Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA 2020;323(20):2052–2059.
  PubMed
  
  CrossRef
1. Liang W, Liang H, Ou L, Chen B, Chen A, Li C, et al. Development and validation of a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19. JAMA Intern Med 2020;180(8):1081–1089.
  PubMed
  
  CrossRef
1. Altschul DJ, Unda SR, Benton J, de la Garza Ramos R, Cezayirli P, Mehler M, et al. A novel severity score to predict inpatient mortality in COVID-19 patients. Sci Rep 2020;10(1):16726.
  PubMed
  
  CrossRef
1. Gao Y, Cai GY, Fang W, Li HY, Wang SY, Chen L, et al. Machine learning based early warning system enables accurate mortality risk prediction for COVID-19. Nat Commun 2020;11(1):5033.
  PubMed
  
  CrossRef
1. Zhou F, Yu T, Du R, Fan G, Liu Y, Liu Z, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 2020;395(10229):1054–1062.
  PubMed
  
  CrossRef
1. Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996;22(7):707–710.
  PubMed
  
  CrossRef
1. Yadaw AS, Li YC, Bose S, Iyengar R, Bunyavanich S, Pandey G. Clinical features of COVID-19 mortality: development and validation of a clinical prediction model. Lancet Digit Health 2020;2(10):e516–25.
  PubMed
  
  CrossRef
1. Kabarriti R, Brodin NP, Maron MI, Guha C, Kalnicki S, Garg MK, et al. Association of race and ethnicity with comorbidities and survival among patients with COVID-19 at an urban medical center in New York. JAMA Netw Open 2020;3(9):e2019795
  PubMed
  
  CrossRef
1. Wu C, Chen X, Cai Y, Xia J, Zhou X, Xu S, et al. Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Intern Med 2020;180(7):934–943.
  PubMed
  
  CrossRef