Prediction of intensive care admission and hospital mortality in COVID-19 patients using demographics and baseline laboratory data

Highlights • Prediction scores can be used to support clinical decisions and resource allocation.• The authors used data from 3,022 hospitalized patients with COVID-19, of whom 1054 died.• The final scores included age, comorbidities, and baseline laboratory data.• Accuracy was 75% for ICU admission and 77% for death in the validation sample.• Our scores were more accurate than the previous NEWS-2 and 4C Mortality Scores.


Introduction
In the three years since the first cases of COVID-19 were identified in China, more than 676 million people have been diagnosed with COVID-19 worldwide, and more than 6.8 million have died from its complications. Notably, some countries held higher burdens of cases and deaths, including the United States, India, and Brazil. 1 Data on the overall life expectancy and years of life lost show that most nations with reliable mortality data witnessed substantial reductions in life expectancy, with more than 28 million excess years of life lost in 2020 in 31 countries. 2 A severe shortage of medical resources was reported during the peak phase of COVID-19 infections and hospitalizations in several regions. The scarcity of crucial resources such as Intensive Care Units (ICU) beds, mechanical ventilation devices, and protective gear for healthcare workers, as well as limited supplies of sedative medications, frequently resulted in inadequate protection for healthcare staff, reduced patient admissions, and restricted access to medical care. Although these issues challenged even some of the world's most affluent countries, 3-6 the impact of COVID-19 and other highly transmissible diseases is undeniably more dramatic in middleand lower-income nations, where access to medical resources is limited even in usual conditions. 7,8 In Brazil, there was a significant disparity in healthcare access, resulting in a high mortality rate among ICU patients that ranged from 13% to 57%. [9][10][11] More critical examples were observed in Colombia, 12 India, 13 and Manaus, the capital of Amazonas state in Brazil, where hospitals ran out of oxygen supplies during a surge of SARS-CoV-2 infections in January 2021. 14 Strategies to identify patients for whom scarce treatment and support interventions should be prioritized are crucial in this context. 15 With the rising rates of COVID-19 vaccination in several countries 16 it is less likely that the number of severe COVID-19 cases will reach the levels seen in 2020 and 2021. However, the emergence of new virus variants with potentially higher transmissibility and virulence, 17 and the slow pace of vaccination coverage in several places 16 could still result in severe stress for healthcare services.
In this study, the authors used data from a large Brazilian tertiary university hospital to explore predictors of ICU admission and hospital mortality in patients admitted for COVID-19 and to develop and validate prediction models that might be used as clinical decision tools for resource allocation in day-to-day emergency care.

Methods
In this retrospective cohort study, the authors used Electronic Health Records (EHR) from COVID-19-related admissions to the largest referral hospital for the disease in Sao Paulo, Brazil. The authors developed a prediction score for intensive care admission and hospital mortality using demographics and baseline clinical variables.
Hospital das Clinicas, University of Sao Paulo Medical School (HCFMUSP), is a renowned 2,200-bed teaching hospital complex that specializes in providing high-level medical and surgical care. Between March 2020 and September 2020, its 900-bed central building was designated by the Sao Paulo State's Health Department to operate as a special COVID-19 treatment center, receiving SARS-CoV-2-infected patients from 278 secondary hospitals located in 85 cities, mainly in the Sao Paulo metropolitan area. Additionally, its intensive care capacity was increased four-fold with the conversion of regular wards to ICUs, totaling 300 ICU beds. Throughout the pandemic, COVID-19 care followed institutional protocols in our hospital.

Participants and data collection
The authors analyzed data from consecutive patients (> 14 years) diagnosed with COVID-19 who were admitted as inpatients for at least 24 hours between March and August 2020. The presence of SARS-CoV-2 infection was confirmed through either RT-PCR or serology testing. In instances where RT-PCR testing was not conducted within ten days of symptom onset, serology was utilized as a confirmatory test for probable COVID-19 cases . The authors excluded patients with nosocomial COVID-19 infection, defined as patients admitted to the hospital  for other causes who were infected with SARS-Cov-2 during their hospitalization.  The authors extracted data on the following variables: demographics;  comorbidities; COVID-19 symptoms on admission; baseline laboratory  tests; ICU admission; need for mechanical ventilation; severity of disease at  ICU admission measured with Simplified Acute Physiology Score 3 (SAPS-3); and clinical outcomes, including death, discharge, or referral to another healthcare facility. Data from each participant were collected from EHR and compiled by a trained research team using standardized web-based forms and Research Electronic Data Capture (REDCap) 18 resources.

Data analysis
Numeric variables were reported as means and Standard Deviations (SDs) or medians and Interquartile Ranges (IQR), according to their distribution. Occasionally, variables were also stratified into categories to simplify their clinical interpretation. Categorical variables were reported as counts and proportions. The authors then used demographic, clinical, and laboratory data to develop prediction scoring systems.
The authors randomly split our participants into derivation and validation samples using a 1:1 ratio and selected 25 variables to feed our models based on their clinical relevance and causal relations: (1) Demographics: age, sex, race/ethnicity; (2) Clinical history: hypertension, diabetes mellitus, heart disease, stroke history, chronic obstructive pulmonary disease, rheumatologic disease, cancer; (3) COVID-19 symptoms: fever, muscle pain, dyspnea, cough, dysgeusia or anosmia, headache, diarrhea; (4) Admission laboratory: hemoglobin, neutrophile-tolymphocyte ratio, creatinine, C-reactive protein. The model including the complete list of independent variables for each outcome was defined as Model 1. As sensitivity analyses, the authors also examined our models excluding the reported COVID-19 symptoms, as these variables were more likely to be affected by information bias, particularly among patients with a more severe clinical presentation on admission. The model excluding COVID-19 symptoms for each outcome was defined as Model 2.
Subsequently, the authors explored the association between each variable of interest and the primary outcomes in univariable logistic regressions and used stepwise logistic regression models to select the final predictors to build our scoring system (variables with p-values < 0.1 were retained). The authors used variation inflation factors to assess for collinearity.
In accordance with the resulting models, the authors attributed points to each predictor dividing their respective beta coefficients by the lowest available beta coefficient and rounding the results to the nearest integer (0 or 5). The authors then used the sum of these points to estimate risk scores for our sample and examine their accuracy to predict hospital death and ICU admission. The authors validated the performances of the risk scoring systems using Receiver Operating Characteristic (ROC) analyses and test characteristics, including the Youden index, sensitivities, specificities, positive predictive values, and negative predictive values. The authors used the Youden index to identify optimal cutoffs for each model according to the outcome of interest.
The authors also compared the predictive performances from our models and the National Early Warning Score-2 (NEWS-2) 19 and 4C Mortality Score. 20 The authors used reclassification tables and measures of net reclassification improvement (the net percentage events correctly classified upward) and integrated discrimination improvement (difference in discrimination slopes between two models).

Ethical aspects
The institutional ethics committee reviewed and approved our research protocol with an exemption of informed consent. The authors kept all identifiable patient information confidential throughout the study.

Results
During our recruitment period, 3,596 patients (> 14 years) were admitted to HCFMUSP with suspected COVID-19. Of those, 574 candidates were excluded due to a lack of laboratory confirmation of SARS-CoV-2 infection. The final study sample included 3,022 participants.
The overall demographics and clinical characteristics of the study participants are presented in Table 1, according to hospital mortality. Compared with non-survivors, a lower percentage of survivors were male (52% vs. 62%, p < 0.001). Survivors were also younger (p < 0.001) and less likely to have diagnosed comorbidities, except for liver disease, HIV, and hematological cancer. Table 2 presents the baseline reported symptoms of study participants, overall and according to hospital mortality. The percentage of patients reporting good general health conditions was higher among survivors. Interestingly, flu-like symptoms were more frequently reported by survivors. Median SAPS-3 values were higher among non-survivors. Baseline laboratory findings are described in Table 3. Measurements of complete blood count, kidney function, liver enzymes, C-reactive protein, lactic dehydrogenase, creatine kinase, albumin, prothrombin time, and D-dimer were all consistently and significantly abnormal in non-survivors.
From the complete cohort of 3,022 participants, 1,496 were randomly assigned to the derivation sample and 1,526 to the validation sample. In the derivation sample, 1,496 (68%) admissions required intensive care, and 527 (35%) participants died in the hospital. In the validation sample, there were 989 (65%, p = 0.077) ICU admissions and 527 (35%, p = 0.690) deaths.
After multivariable analyses, the following items were selected to predict ICU admission and hospital mortality ( Table 4): age; cancer; dementia; diabetes; rheumatic disease; anosmia or ageusia; dyspnea; fever; headache; sore throat; C-reactive protein; creatinine; hemoglobin; neutrophil-to-lymphocyte ratio; platelets. All variables had a variation inflation factor of less than 1.5, indicating a lack of multicollinearity between predictors. Variables retained in the final models and their respective scores are presented in Table 4. The maximum scores, indicating the highest risk of ICU admission, were 48.5 points for Model 1 and 56.5 points for Model 2. The maximum scores, indicating the highest risk of death, were 29.0 points for Model 1 and 30.0 points for Model 2. Fig. 1 presents ROC curves examining the accuracy of Models 1 and 2 in predicting ICU admission (Panels A and C) and hospital death (Panels B and D). The areas under the ROC curves were very similar for the derivation (grey lines) and validation samples (black lines). Both

Discussion
In this study, the authors used a detailed dataset of patients admitted to a large academic COVID-19 treatment center in Brazil to identify factors associated with ICU admission and death. The authors built predictive scores that can be used in hospitals and emergency healthcare units to support clinical decisions and resource allocation. The final scores included age, comorbidities, and baseline laboratory data and were more accurate than the previously published NEWS-2 and 4C Mortality Score. Furthermore, the authors found that including baseline flu-like symptoms in the scores did not add substantial value to their accuracy.
Several studies including both outpatient and hospitalized participants have explored prognostic scores in COVID-19. A recently published systematic review 21 examined articles published up to May 2021 and identified 79 studies investigating prediction models for severe COVID-19. Nevertheless, most had significant methodological caveats and were rated as having a high risk of bias or high concerns for applicability. Out of the nine studies rated with a low risk of bias and low concerns for applicability, one included patients with suspected COVID-19; 22 one addressed respiratory failure as an outcome; 23 one included variables collected one week after hospital admission; 24 and three included COVID-19 patients in outpatient settings. [25][26][27] The remaining three studies developed risk scores for mortality in hospitalized patients with laboratory-confirmed COVID-19. Some studies used chest roentgenogram and computed tomography findings as predictive variables either alone or with clinical data. In our study, patient radiology findings  * Excludes one participant with outlier leucocytes count who probably had hematological cancer.

Table 4
Scores assigned to predictors in the final multivariable models for ICU admission and mortality.   Beta coefficients are expressed in log-odds units. The beta coefficient for each retained variable was divided by the lowest beta coefficient in the model; the results were rounded to the nearest integer (0 or .5) to generate the respective score values in the new scoring systems. *Dashes indicate variable was not part of the respective model. were unavailable and could not be included in the models. Even so, the authors had a detailed database of more than 3,000 individuals, and we were able to explore prediction models using 25 demographic, clinical, and laboratory variables. Chen et al. developed the OURMAPCN-score using data from more than 6,000 patients admitted to seven hospitals in Wuhan as the derivation sample, with an external validation sample including more than 9,000 patients from China and Italy. The score included admission before the national maximum number of daily new cases was reached, age, oxygen saturation, blood urea nitrogen, respiratory rate, procalcitonin, C-reactive protein, and absolute neutrophil counts. Of note, this score included procalcitonin, a marker of systemic inflammation that is not readily available in most hospitals. Moreover, it included a calendar reference that is unlikely to apply to other settings. More recently, the same research group developed the PAWNN score, which used only age and complete blood count information (platelet counts; white blood cell counts; neutrophil counts; and neutrophil-to-lymphocyte ratio) as variables for a prediction tool built using a derivation sample of more than 9,000 patients and a validation sample of almost 3,000 patients in China; in this analysis, the model accuracy was 80% in an external validation sample including 227 patients from Italy. 28 Knight et al. used data from the International Severe Acute Respiratory and Emerging Infections Coronavirus Clinical Characterisation Consortium (ISARIC-4C) to build the 4C Mortality Score. The study included more than 35,000 patients in the derivation sample and more than 22,000 patients in the validation sample. The final score included age, sex, number of comorbidities, respiratory rate, peripheral oxygen saturation, level of consciousness, urea, and C-reactive protein levels. In addition to the high discrimination for mortality, the 4C Mortality Score had the advantage of including variables usually available at the initial hospital assessment. 20 In our prognostic score, we used variables readily available in most hospital settings that could be applied in different scenarios. We compared our scores with the 4C Mortality Score20] and the widely validated NEWS-2 19 and observed that our discriminatory performance was higher. In a previous study, Bradley et al. showed that already-established prognostic scores may underestimate mortality in COVID-19 patients. 29 Another study in our institution found a poor prediction performance of the original version of NEWS, qSOFA, and SIRS to predict mortality, early bacterial infection, and admission to ICU in COVID-19 patients admitted to the emergency department. 30 Furthermore, our hospital participated in a binational study including 1,361 patients from Brazil and Spain to evaluate the performance of 11 risk stratification scores in predicting hospital mortality and ICU admission. The results of the study indicated that the more recent scores created to predict COVID-19 outcomes had a similar ability to predict mortality compared to the conventional pneumonia scores. However, all the scores demonstrated inadequate performance in predicting ICU admission. 31 Together with our findings, these results highlight the need to recalibrate or develop specific prognostic scores in the context of different diseases and settings.
Despite the initial optimism brought on by the development of several effective vaccines for COVID-19, their generally slow rollout and the emergence of new SARS-CoV-2 variants have contributed to recurring waves of infected patients in several countries. 1 Despite being less severe compared to the situation prior to the availability of vaccines, the persistent strain on emergency departments and hospitals highlight the ongoing need for efficient resource allocation. Moreover, the epidemiological data underlines the importance of regularly reassessing the factors that contribute to adverse outcomes in hospitalized COVID-19 patients. This should be done while keeping in mind the continuously evolving variables such as vaccination status, previous exposure to the virus, and therapeutic interventions such as antiviral medications and monoclonal antibodies.
This study had limitations. The authors used data from a relatively small sample (3,022 individuals) admitted to a single tertiary university hospital in a resourceful city in Brazil. However, our hospital was the primary referral center for severe COVID-19 in Sao Paulo, receiving patients from all regions of the metropolitan area, which is the most populated in Brazil with 23.5 million inhabitants. All participants were enrolled in 2020, preceding any exposure to previous SARS-CoV-2 infection or vaccination and prior to the emergence of viral genetic variants, which will likely modify the disease prognosis. As such, it is unlikely that our scores could be directly applied to contemporary cohorts of hospitalized COVID-19 patients. The authors also limited our analyses to patients aged ≥14 years and cannot extrapolate our results to pediatric populations. Nevertheless, our results are valuable in showing that prognostic scores created using readily accessible, locally sourced data and easily managed through electronic health records can be more effective in predicting clinical outcomes and improving resource allocation compared to scores developed externally.

Conclusion
In conclusion, the SARS-CoV-2 pandemic has resulted in a massive public health crisis, putting significant strain on healthcare systems globally. This highlights the critical need to optimize resource utilization, particularly in the face of supply shortages. Prognostic scores, created using locally sourced and easily accessible information and validated on contemporary patient cohorts, are critical tools in supporting clinical decision-making and maximizing the impact of limited healthcare resources.

Consent for publication
Not applicable.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request. HCFMUSP will participate in the COVID Brazil Data-Sharing repository coordinated by The State of São Paulo Research Foundation (FAPESP), providing open access to hospital data related to COVID-19 hospitalizations (https://repositoriodatasharingfapesp.uspdigital.usp.br/).

Authors' contributions
VAS, TAS, and EGK conceived the study. VAS and TAS performed data analysis and drafted the first version of the manuscript. TAS prepared Figure 1. MJRA and JCF contributed to the manuscript writing. JFM, VCJ, MMF, KRS, JEP, NMS, LA, AJSD, MMM, TEPBF, CC, and HPS contributed to data acquisition and database organization. All authors revised and approved the final version of the manuscript.

Funding
The authors acknowledge the financial contribution to the study setup provided by donations from the general public under the HC-COMVIDA crowdfunding scheme (https://viralcure.org/c/hc) with funds managed by the Fundação Faculdade de Medicina. MJRA was supported by a scholarship from HCFMUSP with funds donated by NUBANK under the #HCCOMVIDA initiative.