Fully independent validation of eleven prognostic scores predicting progression to critically ill condition in hospitalized patients with COVID-19

Introduction COVID-19 remains an important threat to global health and maintains the challenge of COVID-19 hospital care. To assist decision making regarding COVID-19 hospital care many instruments to predict COVID-19 progression to critical condition were developed and validated. Objective To validate eleven COVID-19 progression prediction scores for critically ill hospitalized patients in a Brazilian population. Methodology Observational study with retrospective follow-up, including 301 adults confirmed for COVID-19 sequentially. Participants were admitted to non-critical units for treatment of the disease, between January and April 2021 and between September 2021 and February 2022. Eleven prognostic scores were applied using demographic, clinical, laboratory and imaging data collected in the first 48 of the hospital admission. The outcomes of greatest interest were as originally defined for each score. The analysis plan was to apply the instruments, estimate the outcome probability reproducing the original development/validation of each score, then to estimate performance measures (discrimination and calibration) and decision thresholds for risk classification. Results The overall outcome prevalence was 41.8 % on 301 participants. There was a greater risk of the occurrence of the outcomes in older and male patients, and a linear trend with increasing comorbidities. Most of the patients studied were not immunized against COVID-19. Presence of concomitant bacterial infection and consolidation on imaging increased the risk of outcomes. College of London COVID-19 severity score and the 4C Mortality Score were the only with reasonable discrimination (ROC AUC 0.647 and 0.798 respectively) and calibration. The risk groups (low, intermediate and high) for 4C score were updated with the following thresholds: 0.239 and 0.318 (https://pedrobrasil.shinyapps.io/INDWELL/). Conclusion The 4C score showed the best discrimination and calibration performance among the tested instruments. We suggest different limits for risk groups. 4C score use could improve decision making and early therapeutic management at hospital care.


Introduction
Coronavirus Disease 2019 (COVID-19) is a respiratory infection that may be from asymptomatic, to severe cases and death. 1 Since its emergence in December 2019 up to December 2022, COVID-19 has accounted for more than 601 million cases and 6.4 million deaths worldwide and overwhelmed health systems around the world. 2,3The incidence behavior in waves established substantial pressure on health services, especially when crowded hospitals have no room for additional critically ill cases. 4The COVID-19 progression rate to critically ill condition was estimated to be 22.9 %. 5 Additionally, the estimated risk of Intensive Care Unit (ICU) admission, mechanical ventilation and overall mortality were 10.96 %, 7.1 % and 5.6 % respectively. 5fter effective population vaccination, there was a decrease in the number of admissions in critical units and death. 6It is possible to see differences in time trends of hospitalizations, critical care admissions, and deaths from COVID-19 over the course of the pandemic.In adults over 50-years, there was a lower relative risk of Intensive Care Unit (ICU) admission of 23.3 % and 24.3 % when comparing the peaks of COVID-19 by Ômicron vs. Alpha and Ômicron vs Delta, respectively.When comparing the Ômicron to previous waves, deaths and ICU admissions were 4.5% vs. 21.3 % and 1% vs.4.3 % respectively, 7 changing to a profile of high dissemination and a decreasing number of hospitalizations and deaths. 6atients affected by moderate or extensive clinical presentation who seek hospital care make up the risk group for critical COVID-19.Early identification of patients at higher risk groups for disease progression at the emergency room or at hospital admission could aid decision making and improve individual and public health resources.[10][11][12][13][14][15][16] Studies have demonstrated promising applicability of some COVID-19 prognostic scores. 17A prognostic score for inhospital death with Brazilian participants estimated 20.3 % mortality, later validated at Barcelona, Spain. 18Nevertheless, validation studies of prognostic scores with the Brazilian population are scarce.The aim of this study was to validate different prognostic instruments to predict COVID-19 progression to a severe condition in a fully independent sample of Brazilian patients.

Source data and settings
This is a retrospective observational follow-up study carried out at Niter oi/Rio de Janeiro, Brazil.All patients hospitalized at Hospital Santa Martha from January 1, 2021 to April 30, 2021 and all patients hospitalized at Hospital Niter oi D'Or from September 1, 2021 to February 28, 2022 were sequentially included.In the 1st quarter of 2021, Alpha strain was the main circulating variant of the SARS-CoV-2 virus.There was little vaccine coverage at this time, high incidence rate of COVID-19 and high death rates from the disease. 6In July-Oct/2021 a first wave of new cases caused by the Delta variant was observed.Later, a predominantly Ômicron variant wave was observed, regardless of vaccine increased coverage.Special groups were already receiving booster doses, with a gradual reduction in hospitalization rates in critical units and of mortality at this time. 6,7,19

Study participants
The inclusion criteria were: adult patients (18-years old or more); a positive RT-PCR result for COVID-19, obtained from respiratory swab or viable biological material representing active disease, collected between 3 and 10 days onset of symptoms, at any time during hospitalization; patients with a history of exposure, clinical findings or radiological image compatible with COVID-19 according to Ministry of Health criteria's at that time; 20 patients with a completed hospitalization guide, allocated in non-critical sectors.The exclusion criteria were absence of clinical evaluation in the first 48 h; discharge or death before completing 24 h of hospitalization; critical conditions at admission or directly admitted to intensive support units.Critical conditions were considered as: 1-Glasgow coma scale <8; 2-Need to use vasoactive amines; 3-Need intubation and mechanical ventilation support; 4-Need for acute dialysis therapy.

Criteria and measurement data
The predictors' assessments were performed at hospital admission, eventually considered up to 48 h after admission.Patients were submitted to a protocol where: 1-Clinical evaluation, seeking to identify relevant clinical elements (e.g., fever, headache, coryza, sore throat, myalgia, dry cough, risk of exposure); 2-Laboratory tests; and 3-Chest image by computerized tomography.
The criteria used for hospital admission sectors at the time were in accordance with the parameters defined by Brazilian Ministry of Health, which were: 1-Moderate cases: patient with clinical or radiological evidence of respiratory disease and SatO2 ≥ 94 % in room air; 2-Severe cases: patient with respiratory rate > 30 ibm, SatO2 < 94 % on room air (or, in patients with chronic hypoxia, a > 3 % reduction from baseline), PaO2/FiO2 ratio h 300 mmHg, or opacities in i 50 % of the lung. 20Standard treatment was offered according to each hospital's protocol based on guidelines at the time.
On the other hand, there were less frequent and less available predictors in the context of hospital admission and, therefore, more difficult to apply in our analysis, such as hemoptise, 14 smoking and bacterial co-infection, 10 ethnicity, 4 interleukin-6, ferritin and fibrinogen. 9There were also instruments that used other existing scores for their application, such as the Charlson score 15 and the RALE score 4 adapted to COVID-19.Data were collected from medical records and consulting assistant health professionals.
Data was extracted from medical records to an electronic standard data collection instrument by one of the authors (VLCM) and an undergrad trainee supervised by the second author (PEAAB).There was some training in data extraction and research forms improvement at the beginning.The extractors were not blinded to the research hypothesis, and no extractors interrater agreement was estimated.

Data analysis
The outcome prevalence estimated from administrative data before the study ranged from 50 % to 72 % depending on period and health unit.Therefore, we assumed that 300 subjects would be enough to reach the 100 subjects with events and the 100 subjects without the events. 21For prediction purposes, missing data was imputed with multiple imputation procedures using the CART models with "mice" R package.
Data analysis was conducted in R software following the steps: description of possible predictors to be explored; exploration of missing data patterns and the need for data imputation; verification of the need for recoding of the predictors; and validation (discrimination and calibration) of the different scores.The different prediction instruments were tested with the same data, reproducing the original model or using the original recommended scores in the population of interest.The validity measures used to measure discrimination were area under the ROC curve and R squared.Additionally, for the calibration measures the calibration belt, model's intercept and slope and predictions errors were used (average, maximum and percentile 90). 22Additionally, decision limits were estimated with the "uncertain interval" method 23 in order to allow different courses of action, for example, (a) Low risk recommending discharge, (b) Moderate risk recommending monitoring, (c) High risk recommending early transfer to critical care.

Results
About half of 301 participants included and analyzed came from each health unit, Hospital Santa Martha and Hospital Niter oi D'Or.The composite outcome overall prevalence was 41.86 %, 56.96 % at Niteroi D'Or, 26.67 % at Santa Martha hospital (Table 1).The overall mortality was 16.61 %, 15.23 % at Niteroi D'Or and 18.00 % at Santa Martha hospital.Median age is higher, and males are more frequent in the outcome group.Most participants were not immunized or data regarding immunization was not available.Among those vaccinated, most hospitalized patients had been immunized with Coro-naVac.Most participants who had worse respiratory parameters profile progressed to the outcome more frequently, most evidently those with worse saturation (Table 1).
The most prevalent comorbidities were systemic arterial hypertension and diabetes mellitus.There is an apparent positive linear effect between the number of comorbidities and a higher risk of outcome, reaching 67 % in patients with 3 or more comorbidities.Likewise, the higher the Charlson score, the greater the risk of outcome (Table 2).
There was no relevant C-Reactive Protein (CRP) nor Urea relationship with the outcome.On the other hand, patients with D-dimer elevation had a slightly higher risk of a composite outcome than those without D-dimer elevation (Table 3).Likewise, relevant differences in the risk of occurrence of the outcome was observed among participants who had elevated procalcitonin or concomitant bacterial infection with COVID-19 (Table 3).
There is an apparent positive linear relationship of the composite outcome with the Radiological RALE Score index.There is also a higher risk of occurrence of the outcome in 65 % of the participants with pulmonary consolidation.The tomographic analysis of the pulmonary involvement less versus equal or greater than 50 % showed a difference in the risk of a composite outcome (Table 4).
Only two instruments showed acceptable calibration (Fig. 1) and discrimination: the College of London Score (Cols) and the 4C score.The latter is the one with the best discrimination and calibration performance (Table 5).Additionally, for the 4C score only, the upper and lower decision threshold of the uncertain interval was estimated as 0.239 and 0.318   respectively.Therefore, predictions below 0.230 should be considered as at lower risk of progression to critically ill condition, predictions between 0.318 are not able to discriminate between those at high and low risk of progression, and predictions above 0.318 should be considered as at higher risk of disease progression.As our latest result, we provide a web tool of the 4C Mortality score with the updated limits and risk classification for the Brazilian population (https://pedrobrasil.shi nyapps.io/INDWELL/).

Discussion
The main results to be discussed are: (a) Some biomarkers that usually are considered clinically relevant were tested as predictors in very few instruments (e.g., concomitant bacterial infection, D-dimer and procalcitonin); (b) There is a positive relationship with comorbidities or the number of comorbidities and the risk to progress to critical condition; (c) There is a positive relationship between the degree of pulmonary radiological involvement and the higher risk of occurrence of the outcome studied; (d) The 4C prognostic score was the one with the highest performance with reasonable applicability, good enough to be recommended for this population; (e) It was possible to estimate decision thresholds to recommend different courses of action for this population using the 4C score.
From the beginning of the pandemic, aspects regarding dissemination, lethality, mass prevention measures, among others, varied significantly.The advent of vaccines and the occurrence of viral variants are examples of these changes.Throughout the pandemic, a worldwide effort was made to develop vaccines against COVID-19. 6,19After vaccines became available, there was a reduction in the frequency of severe disease, ICU admission and mortality, 24 changing the course of the pandemic.
In our study, participants who received the Chinese Coro-naVac vaccine were the most frequent.Additionally, these participants were also the elders and the ones with the outcome more frequently observed.In Brazil, this was the first vaccine available for the public. 25Protection against severe disease presentation due to viral variants remained substantial, although age and multiple comorbidities contribute to worse outcomes. 26The considerable large time periods in the inclusion of participants probably allowed the inclusion of participants with different SARS-CoV-2 variants and participants from a population with different degrees of vaccination.One must consider that the vaccination and different strains may change the applicability of the prediction instruments only if the mortality and the critical illness incidence relationship with the presence or absence of the predictors also change. 6,7,24That is, once the patient is admitted to the emergency room and a prediction instrument would be applied, either the choice of the instrument is influenced by these information, or the instrument should adjust the prediction given these information.However, none of the instruments considered any of these limitations.Additionally, mortality was essentially the same in both vaccinated and non-vaccinated groups, and composite outcome was twice as frequent in the vaccinated group.When we estimate the 4C prediction instrument performance stratified by vaccination status, the discrimination performance is slightly better amongst the vaccinated, although slightly less calibrated (data not shown).
Unfortunately, there is no data regarding the variants to extend such discussion.Therefore, the evidence suits reasonably for a population of hospitalized COVID-19 patients regardless of vaccination status.We observed a substantial difference in the outcome prevalence among patients who had a concomitant bacterial infection with COVID-19.This infectious association would likely increase clinical compromise (systemic and cardiorespiratory) as well create doubts regarding correct antibiotics use.Only one score has included this variable to predict COVID-19 severity progression. 10-dimer and other hematological changes are important features and significantly increase with disease severity. 27In multivariable analyses, death was associated with increased D-dimer. 27Elevated D-dimer over time was observed in nonsurvivors compared with more stable levels in survivors. 28In our study, patients with D-dimer elevation also had a slightly higher risk of a composite outcome than those without D- dimer elevation.
Multiple comorbidities and underlying conditions have been associated with COVID-19 to severe illness, with a higher prevalence of hospitalization, ICU admission and mechanical ventilation, or death. 29We also observed a positive linear relationship between the number of comorbidities and the risk of the outcomes, either through the number of comorbidities or the Charlson Score.
There are several imaging patterns of pulmonary involvement. 30In this study, we observed that radiographic involvement in patients with COVID-19 is a fundamental element in predicting disease severity.However, there is a 48 % prevalence of outcome in the "0" RALE index group.This may be associated with the presence of other predictors that may not have respiratory findings that add risk to the individual.We also observed that the presence of pulmonary consolidation alone would already indicate a risk of outcome of 65 % and a pulmonary involvement of 50 % or more has twice the risk of outcome.ARDS was estimated to occur in 20 % of the COVID-19 patients, and mechanical ventilation was implemented in 12.3 % of them. 28In the United States, 12 % to 24 % of hospitalized patients with altered respiratory symptoms progressed to mechanical ventilation. 29n the 4C Mortality Score 13 four risk groups were defined with corresponding mortality rates determined: low risk (0-3 score, mortality rate 1.2 %), intermediate risk (4-8 score, mortality rate 9.9 %), high risk (9-14 score, mortality rate 31.4%), and very high risk (≥ 15 score, mortality rate 61.5 %).However, these groups seemed to be arbitrarily defined.The original recommended course of action for each risk group are: patients within the low risk groups could be suitable for management in the community; the intermediate risk group could be suitable for ward level monitoring; patients within high or very high could start promptly treatment and early escalation to critical care, if appropriate. 13However, in our population the risk distribution seems to be different from the 4C original population.Therefore, it is reasonable to adapt the risk classification to this population.As only three courses of action are recommended, we divided the estimated risk of outcome (not the intermediate score) into three categories (low risk, intermediate risk and high risk) allowing similar interpretation.
In this study, we had some limitations, such as the large number of missing data for some predictors, mainly those from the laboratory.Although the performance of the instruments can be estimated with imputed data, the lack of availability of certain predictors also opens an applicability and inference discussion.The different protocols for the management of COVID-19 cases in the two target hospitals of this study may influence the outcome incidence in different directions, as they had different clinical, structural or administrative criteria for directing patients to critical sectors.This could be in the way of patients with indication for admission to the critical unit, who remained in non-critical beds due to the smallest number of ICU beds.Finally, rapid changes in the epidemic with the emergence of new strains, the advent of vaccines and other preventive and treatment measures may have changed the populations characteristics in a way, that is the relationship of the clinical findings with the critical illness incidence, that score performance could require updating often.Even if the performance verified here is an "average" of the observed scenarios, attention must be paid to whether the future scenarios will continue to change, in such a way as to raise questions about the model's performance in the future.

Conclusions
The validation of the prognostic models included here had very heterogeneous performance to predict critical illness and death, in patients already admitted to the emergency room or to non-critical units.The College of London COVID-19 severity score and the 4C Mortality Score showed the best discrimination and calibration performance.These findings are in accordance with validation in other populations, and we suggest different limits for risk groups.

Table 1 -
Clinical, demographic characteristics, signs and symptoms by composite outcome.

Table 3 -
Laboratory results by composite outcome.

Table 4 -
Image findings by composite outcome.