Development and validation of a prognostic nomogram for predicting in-hospital mortality of COVID-19: a multicenter retrospective cohort study of 4086 cases in China

To establish an effective nomogram for predicting in-hospital mortality of COVID-19, a retrospective cohort study was conducted in two hospitals in Wuhan, China, with a total of 4,086 hospitalized COVID-19 cases. All patients have reached therapeutic endpoint (death or discharge). First, a total of 3,022 COVID-19 cases in Wuhan Huoshenshan hospital were divided chronologically into two sets, one (1,780 cases, including 47 died) for nomogram modeling and the other (1,242 cases, including 22 died) for internal validation. We then enrolled 1,064 COVID-19 cases (29 died) in Wuhan Taikang-Tongji hospital for external validation. Independent factors included age (HR for per year increment: 1.05), severity at admission (HR for per rank increment: 2.91), dyspnea (HR: 2.18), cardiovascular disease (HR: 3.25), and levels of lactate dehydrogenase (HR: 4.53), total bilirubin (HR: 2.56), blood glucose (HR: 2.56), and urea (HR: 2.14), which were finally selected into the nomogram. The C-index for the internal resampling (0.97, 95% CI: 0.95-0.98), the internal validation (0.96, 95% CI: 0.94-0.98), and the external validation (0.92, 95% CI: 0.86-0.98) demonstrated the fair discrimination ability. The calibration plots showed optimal agreement between nomogram prediction and actual observation. We established and validated a novel prognostic nomogram that could predict in-hospital mortality of COVID-19 patients.


INTRODUCTION
Since being publicly characterized as a pandemic by the World Health Organization on March 11 th , 2020, the coronavirus disease 2019 (COVID- 19) has become an urgent threat to global public health [1]. The outbreak of COVID-19 led to a significant increase in demand for hospital beds and medical equipment, and several countries have been confronted with a critical care crisis [2]. Therefore, it is urgently needed to set up clinical prediction models for COVID-19 mortality to stratify the most vulnerable patients, to provide them with the best possible care while mitigating the burden on the whole healthcare system.
A number of studies have identified risk factors associated with poor outcomes in COVID-19 univariate/ multivariate analyses [3]. For example, older age, comorbidities, higher sequential organ failure assessment (SOFA) score, lower lymphocyte count and increased d-dimer have been reported to be associated with an increased risk of death for COVID-19 patients [4][5][6]. Besides, several models have been developed to assist in the prognosis of COVID-19 mortality, including nomogram [7], decision tree [8], score system [9], online tools [10], and computed tomography based scoring rule [11], most of which are still in preprint. However, as pointed out by a recent systematic review [12], despite 23 prognostic models to predict mortality risk in patients with COVID-19 having been reported, none was recommended for use in practice due to several limitations. First, some studies suffered from severe sampling bias which was caused by excluding participants who didn't reach an endpoint (recovered or died). Second, limited sample size, varied length of follow-up, highly subjective predictors, and lack of external validation. Third, the calibration of the models was rarely assessed. Fourth, the guidelines of transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) were not complied with, and prediction model risk of bias assessment tool (PROBAST) showed these studies were at high risk of bias.

Validation and calibration of the prognostic nomogram
An internal validation of the prognostic nomogram was conducted in 1,242 COVID-19 cases (22 deaths Figure 4) and external validation ( Figure 5) for the probability of survival at 5-,15-, and 30-days after admission also showed an optimal agreement between the prediction by nomogram and actual observation.

DISCUSSION
In the current study, we established a prognostic nomogram to predict in-hospital mortality of COVID-19 based on eight independent factors including age, severity at admission, cardiovascular disease, and levels of lactate dehydrogenase, total bilirubin, blood glucose, and urea in a large and well-described population of 4, 086 patients. The prognostic nomogram has been validated by internal 1,000 bootstrap resampling, an internal validation, as well as an external validation cohort, maintaining an adequate calibration and discrimination capacity. To our knowledge, this is the largest and most comprehensive study which aims to establish and validate an effective prognostic nomogram for predicting in-hospital mortality of COVID-19 patients to date.
Vague reporting of study population, as well as substantial sampling bias and limited sample size are main obstacles preventing clinical use of previous prognostic models for COVID-19 [12]. The wide variation is mainly caused by excluding participants who did not reach endpoints (neither recovered nor died) and difficulty of data collection under epidemic conditions, with death percentage varying between 1% and 59% in those studies that developed prognostic models to predict mortality [12]. This will inevitably yield a highly selected and biased sample and restrain application of those models. In the current study, we included all patients that had been treated from 2 designated hospitals in Wuhan, with a relatively large cohort of 4, 086 in-hospital patients with 100% ascertainment of endpoints (recovered or died). The clinical characteristics of these patients were welldescribed, and they serve as a good representation of general in-hospitalized COVID-19 patients.    AGING Demographic, clinical, and laboratory parameters were included in prognostic models for COVID-19. The current model included eight predictors (age, severity at admission, dyspnea, cardiovascular disease, and levels of lactate dehydrogenase, total bilirubin, blood glucose, and urea). Of them, age and comorbidity with cardiovascular disease have been reported in previous models to be risk factors for either mortality or disease progression [9,[13][14][15][16]. Severity at admission was mainly determined by SpO2 and CT imaging, as detailed before [17]. The prevalence of dyspnea is barely higher in patients who develop acute respiratory distress and have the poorest clinical outcomes, and was suggested to be a risk factor for predicting mortality in patients with COVID-19 [18,19].
As for laboratory markers in the current model, the involvement of total bilirubin, blood glucose, urea, and lactate dehydrogenase indicates that involvement of multi-organ dysfunction represents a major predictor of in-hospital mortality for COVID-19 patients. In another recent nomogram, high direct bilirubin level was found to be an independent predictor of 28-day mortality in adult hospitalized patients with confirmed COVID-19 [7]. However, another study with a larger sample size found that AST abnormality, rather than bilirubin, was strongly associated with COVID-19 mortality risk [20]. A recent meta-analysis concluded that comorbid diabetes was associated with an increased risk of disease severity or death in Chinese COVID-19 patients, while it is still not clear to what extent diabetes independently contributes to the increased risk [21]. Besides, acute kidney injury is associated with severe infection and fatality in patients with COVID-19 [22]. The combination of blood urea nitrogen and D-Dimer were predictors of in-hospital mortality in 305 COVID-19 patients, with 27.9% mortality [23].
The current study provides a practical quantitative prognosis judgement tool (nomogram) for clinicians. We have the following strengths: First, sampling bias was avoided as much as possible by inclusion of all COVID-19 patients treated in the 2 designated hospitals, with the largest sample size to date. Second, the model showed good performance in both internal and external validations. Third, C-index and the calibration plot showed adequate calibration and discrimination capacity. Finally, we conducted the current study in strict compliance with the TRIPOD guideline, and PROBAST categorized it as at low risk of bias. Meanwhile, the current study has also several limitations. First, the retrospective study design limited the hierarchy of research evidence, and a prospective study is warranted to confirm the reliability of the findings. Second, missing data of some variables existed due to the emergency situations. However, the missing rate was of less than 10.0%, and the missing values was imputed by EM method. Third, further validations from different hospitals or countries are warranted.

CONCLUSIONS
Conclusively, a novel prognostic nomogram for COVID-19 based on age, severity at admission, cardiovascular disease, and levels of lactate dehydrogenase, total bilirubin, blood glucose, and urea, was established and validated. It would be helpful for physicians to make optimal treatment decisions, conduct reasonable triage of patients, and avoid delays in treatment. Further studies are warranted to validate whether use of this prognostic nomogram will improve clinical care and patient outcomes of COVID-19.

Patients and study design
The retrospective cohort study for prognosis model of COVID-19 was conducted according to the TRIPOD reporting guideline and the risk of bias was accessed using the PROBAST scales [24][25][26][27]

Data collection and entry
Information of demographic characteristics and coexisting disorders was telephone-interviewed using a uniformed questionnaire by two trained physicians. The clinical symptoms, laboratory characteristics, and outcomes information were extracted from the electronic medical records. We double entered and validated the data using EpiData (version 3.1, EpiData Association, Odense, Denmark) software, and disputes were arbitrated by the expert committees composed of experts of respiratory and critical care medicine, and epidemiology.

Construction of the prognostic nomogram
All statistical analyses were conducted using the SAS statistical software (version 9.4; SAS Institute Inc., Cary, NC, USA) and the R software version 4.0.0 (Institute for Statistics and Mathematics, Vienna, Austria). P-value of < 0.05 was considered statistically significant. Categorical variables were described using frequency rates and percentages, while continuous variables were described using the median/interquartile range (IQR) values. The missing values of all potential predictors (missing rate of less than 10.0%) were imputed by expectation-maximization (EM) method. Univariate and multivariate Cox regression analysis was adopted for the estimation of hazard ratio (HR) and corresponding confidence interval (CI) of each variable. First, univariate Cox regression analysis was used to screen the potential prognostic factors which reached a P value of less than 0.05. Then, the independent prognostic factors were derived from a backward stepdown selection process in multivariate Cox regression model. Finally, a prognostic nomogram was formulated based on the results of multivariate analysis by using the rms package, according to the Akaike information criterion (AIC) [29].

Validation and calibration of the prognostic nomogram
The prognostic nomogram was subjected to 1,000 bootstrap resamples of the primary development cohort, an internal validation cohort, as well as an external validation cohort. The performance of the nomogram was measured by Harrel concordance index (C-index) and the calibration plot. The value of the C-index, which assesses the discrimination of the model, ranges from 0.5 to 1.0, and a larger C-index means a more accurate prognostic model (0.5 indicating a random chance and 1.0 indicating a perfect ability to correctly discriminate the outcome with the model). During the validation of the prognostic nomogram, the total points of each patient in the validation cohort were calculated according to the established nomogram, then Cox regression in this cohort was performed using the total points as a factor, and finally, the C-index and calibration plot were derived based on the regression analysis.

AUTHOR CONTRIBUTIONS
MX and CG conceived and designed the study. MX, LL, FX, and CL drafted the paper and did the statistical analysis. LL, CL, FX, WP, LS, YH, ZY, JN, ZT, HC, ZJ, LS, PY, LY, NL, LY, SQ, JH, LM collected the data. All authors approved the final draft of the manuscript for publication.

CONFLICTS OF INTEREST
All authors declare no conflicts of interest.

FUNDING
The present study was funded by Outstanding Youth Science Foundation of Chongqing (cstc2020jcyj-jqX0014), and the Science Foundation for Outstanding Young People of the Army Medical University (grant to Pro Xiangyu Ma and Li Li). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.