Supervised Machine Learning Approach to Identify Early Predictors of Poor Outcome in Patients with COVID-19 Presenting to a Large Quaternary Care Hospital in New York City

Background: The progression of clinical manifestations in patients with coronavirus disease 2019 (COVID-19) highlights the need to account for symptom duration at the time of hospital presentation in decision-making algorithms. Methods: We performed a nested case–control analysis of 4103 adult patients with COVID-19 and at least 28 days of follow-up who presented to a New York City medical center. Multivariable logistic regression and classification and regression tree (CART) analysis were used to identify predictors of poor outcome. Results: Patients presenting to the hospital earlier in their disease course were older, had more comorbidities, and a greater proportion decompensated (<4 days, 41%; 4–8 days, 31%; >8 days, 26%). The first recorded oxygen delivery method was the most important predictor of decompensation overall in CART analysis. In patients with symptoms for <4, 4–8, and >8 days, requiring at least non-rebreather, age ≥ 63 years, and neutrophil/lymphocyte ratio ≥ 5.1; requiring at least non-rebreather, IL-6 ≥ 24.7 pg/mL, and D-dimer ≥ 2.4 µg/mL; and IL-6 ≥ 64.3 pg/mL, requiring non-rebreather, and CRP ≥ 152.5 mg/mL in predictive models were independently associated with poor outcome, respectively. Conclusion: Symptom duration in tandem with initial clinical and laboratory markers can be used to identify patients with COVID-19 at increased risk for poor outcomes.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused a global pandemic with over 30 million cases and 500,000 deaths in the United States alone [1]. In the spring of 2020, New York City was the first epicenter of the national outbreak as hospitals became overrun and hundreds of patients died from coronavirus disease 2019 (COVID-19) each day [2]. Since then, improved treatments have led to better outcomes in patients with COVID-19 [3]. However, while most cases are mild, mortality rates among hospitalized patients remain elevated [4,5].
Despite the rapid deployment of effective vaccines, rates of new cases of COVID-19, fueled by more transmissible variants, have plateaued [1,6,7]. Risk factors for severe disease include age, race/ethnicity, and underlying comorbidities, in addition to the degree of hypoxemia on presentation and a range of abnormal laboratory parameters [8][9][10]. For patients that develop severe or critical disease, complications include acute respiratory distress syndrome (ARDS), cytokine storm syndrome, and multiorgan dysfunction [11,12]. The pace of disease progression is variable but previous studies indicate that clinical deterioration typically occurs approximately five to seven days after symptom onset [12]. Predicting exactly which patients are most likely to require mechanical ventilation or die has been an area of active research [8][9][10][11][12][13]. To this end, multiple predictive scoring systems have been developed to assist clinicians in identifying such patients [14][15][16][17][18]. While most studies have relied on clinical and laboratory data upon hospital admission, to our knowledge none have analyzed these characteristics based on the duration of patients' COVID-19 symptoms.
Given the delay between patients' first symptoms and subsequent clinical decompensation, we hypothesized that the predictive value of clinical and laboratory parameters varies according to the duration of illness at the time of hospital presentation. We utilized a supervised machine learning approach to identify clinical comorbidities, initial vital signs, and laboratory markers associated with poor outcome in patients with COVID-19. Taking into account the duration of symptoms, we aimed to provide clinicians with practical algorithms for using clinical and laboratory markers to identify patients who may benefit from close monitoring and anticipatory management.

Materials and Methods
We included all adult patients (≥18 years old) testing positive for SARS-CoV-2 who presented to a quaternary care medical center in Northern Manhattan between 1 March 2020 and 15 April 2020 and were evaluated in the emergency department (ED) (n = 4103) ( Figure 1). Infection was confirmed by detection of SARS-CoV-2 by real-time reverse transcriptase polymerase chain reaction (RT-PCR) testing of nasopharyngeal and/or oropharyngeal swab specimens [19]. All patients had at least 28 days of follow-up after testing SARS-CoV-2-positive to allow adequate time to observe disease outcome. The institutional review board at Columbia University Irving Medical Center approved this under an expedited review (protocol number AAAS9622).
Data were extracted electronically from the electronic medical record (EMR) and were augmented with manually abstracted data. Electronically extracted data included demographics, admission and discharge dates, and diagnosis codes used to identify patients with pre-existing medical conditions. We obtained initial vital signs, first recorded oxygen delivery method defined as 0 = none; 1 = nasal cannula; 2 = non-rebreather, allowing for higher oxygen concentrations; and 3 = non-invasive ventilation, including high-flow nasal cannula, continuous, or bilevel positive airway pressure (CPAP and BIPAP, respectively). Patients who had mechanical ventilation as an initial recorded oxygen intervention were excluded, since mechanical ventilation was included as part of the outcome (N = 57).
Basic laboratory results (complete blood count, basic metabolic panel, hepatic panel, prothrombin time, partial thromboplastin time, international normalized ratio), inflammatory markers, and other laboratory parameters were included in our institutional guidance for management of COVID-19 (erythrocyte sedimentation rate [ESR], C-reactive protein [CRP], lactate dehydrogenase [LDH], ferritin, d-dimer, procalcitonin, high-sensitivity troponin, and interleukin-6 [IL-6]). A subset of consecutive charts was manually reviewed starting with the first patient admitted with SARS-CoV-2 to our institution. Manually abstracted data, including the date of symptom onset and presenting symptoms, were entered into a REDCap database [20]. All data were merged using RStudio [21]. Data were extracted electronically from the electronic medical record (EMR) and were augmented with manually abstracted data. Electronically extracted data included demographics, admission and discharge dates, and diagnosis codes used to identify patients with pre-existing medical conditions. We obtained initial vital signs, first recorded oxygen delivery method defined as 0 = none; 1 = nasal cannula; 2 = non-rebreather, allowing for higher oxygen concentrations; and 3 = non-invasive ventilation, including highflow nasal cannula, continuous, or bilevel positive airway pressure (CPAP and BIPAP, respectively). Patients who had mechanical ventilation as an initial recorded oxygen intervention were excluded, since mechanical ventilation was included as part of the outcome (N = 57). Basic laboratory results (complete blood count, basic metabolic panel, hepatic panel, prothrombin time, partial thromboplastin time, international normalized ratio), inflammatory markers, and other laboratory parameters were included in our institutional guidance for management of COVID-19 (erythrocyte sedimentation rate [ESR], C-reactive protein [CRP], lactate dehydrogenase [LDH], ferritin, d-dimer, procalcitonin, high-sensitivity troponin, and interleukin-6 [IL-6]). A subset of consecutive charts was manually reviewed starting with the first patient admitted with SARS-CoV-2 to our institution. Manually abstracted data, including the date of symptom onset and presenting symptoms, were entered into a REDCap database [20]. All data were merged using RStudio [21].
We conducted a nested case-control analysis to evaluate the association between initial vital signs and laboratory values, and the primary outcome. Cases were defined as We conducted a nested case-control analysis to evaluate the association between initial vital signs and laboratory values, and the primary outcome. Cases were defined as patients who met the composite primary outcome of mechanical ventilation, death, or discharge to hospice with at least 28 days of follow-up. The remaining patients who survived to discharge or remained hospitalized but did not require intubation comprised the controls. Predictors included demographic, clinical, and laboratory data collected at the time of initial ED evaluation, and were typically obtained within 24 h of presentation. Investigational antivirals or immunomodulators targeting COVID-19 were inconsistently used during this time period and/or were not subsequently shown to improve outcomes in clinical studies and thus were not included as predictors in this analysis. Patients who underwent additional manual chart review and had a recorded date of symptom onset (n = 1873) were divided into tertiles based on number of days from symptom onset to hospital presentation (tertile 1, ≤4 days; tertile 2, >4-8 days; tertile 3, >8 days) for further analysis based on the duration of symptoms ( Figure 1).
All potential clinical and laboratory markers were described for the whole sample and for stratified tertiles described above. All continuous variables were non-normally dis-tributed and tended to be outside of the normal range, and were described using medians and interquartile ranges (IQRs). Due to the data distribution and clinical context, we used classification and regression tree (CART) analysis, a non-parametric supervised machine learning approach, to determine the relative importance of clinical and laboratory predictors of poor outcome and to estimate clinically predictive threshold levels for continuous laboratory values [22]. For our full cohort, we used a training subset of 75% of patients to derive the predictive tree and a 25% partition to validate the prediction. Sample sizes in the analysis of symptom duration tertiles were too small to partition the data into training and validation subsets. Continuous and categorical variables were included. The full tree was gown using information gained measured by entropy. Regression tree analyses were conducted with complete set and missing values assigned using the "popular node" option to split nodes. In most instances, the "popular node" missing values reproduced the same tree as the complete data analysis, and thus were used. Pruning was determined as a function of cost-complexity and the estimated average misclassification rate in the leaves. In instances where the number of leaves were specified, these were revised to reduce overfitting (defined by having too few records in a leaf and only a single outcome). The confusion matrix and the area under the curve (AUC) were computed for all models. Additionally, unadjusted and multivariable logistic regression analyses were conducted for variables remaining in the pruned tree using the CART-informed threshold cut-off values to estimate the odds of poor COVID-19 outcome. Categorized clinical and laboratory markers that were significant in unadjusted logistic regressions were analyzed together in a multivariable logistic model. All statistical analyses and data visualization were performed in SAS ® software, version 9.4 (Cary, NC, USA), using the HPSPLIT procedure for the classification trees, and LOGISTIC procedure.

Tertile Analysis Based on Duration of Symptom from Illness Onset to Hospital Presentation
There were 1873 (46%) patients for whom additional manual chart review was performed ( Figure 1); compared to the overall cohort, a greater proportion of them were Hispanic and had underlying comorbidities, but presenting characteristics were otherwise similar. These patients were divided into tertiles defined by reported duration of symptoms at the time of presentation: ≤4 days (n = 599; tertile 1), 4-8 days (n = 685; tertile 2), or >8 days (n = 589; tertile 3).
Patients presenting earlier in their disease course were older with more comorbidities including hypertension, diabetes, and kidney disease than patients presenting in tertile 2 or tertile 3 (Table 1). Conversely, patients presenting later reported more symptoms including fever, cough, dyspnea, fatigue, myalgias, and diarrhea. The first reported method of oxygen delivery was similar across all three tertiles. Decompensations, however, were more common among patients in tertile 1: 244 (41%) decompensated compared to 219 (31%) and 152 (26%) of patients in tertiles 2 and 3, respectively (Figure 1).

Classification and Regression Tree Analysis
We used CART analysis to both rank predictors for the primary outcome and define clinically meaningful thresholds for determining risk (Figure 1). In the full prediction model including all patients regardless of symptom duration, the first recorded method of oxygen delivery was the most important predictor of poor outcome; 78% of patients requiring at least non-rebreather met the primary outcome, and 91.9% decompensated in the subset with a neutrophil percent ≥ 84%. In patients whose initial oxygen requirement did not exceed nasal cannula, 13% had a poor outcome, but this increased to 43.5% and 70% in patients with elevated CRP ≥ 152.5 µg/mL and kidney disease, respectively (Figure 2A). Findings were corroborated in the tree validation model using 25% partitioned data. among patients with lower supplemental oxygen requirements. All predictors identified in the pruned CART analysis were also independent predictors of decompensation in corresponding multivariable logistic regression models (Table S1). The dichotomized threshold cutoff values derived from the pruned CART analyses were strongly associated with poor outcomes in the multivariable logistic regression models (Table 2). For all models, the included predictors were found to be strongly and independently associated with poor outcomes.  The pruned CART analyses stratified by duration of symptoms show different predictors, tree structure, and threshold cutoffs ( Figure 2B-D). For patients in tertiles 1 and 2, the initial recorded method of oxygen delivery was the leading predictor of decompensation. The need for non-rebreather or greater oxygen requirement was significantly associated with decompensation. However, in tertile 1, age ≥ 63 years followed by NLR ≥ 5.1 were most useful for identifying high-risk patients, compared to IL-6 ≥ 24.7 pg/mL and D-dimer ≥ 2.4 µg/mL in tertile 2. In tertile 3, 81% of patients with IL-6 ≥ 64.3 pg/mL decompensated. Among those with IL-6 levels below the 64.3 pg/mL threshold, 68% of those who received non-rebreather decompensated (non-invasive ventilation was used by only one patient in this group). Elevated CRP (≥161.8 mg/L) predicted 28% decompensation among patients with lower supplemental oxygen requirements. All predictors identified in the pruned CART analysis were also independent predictors of decompensation in corresponding multivariable logistic regression models (Table S1). The dichotomized threshold cutoff values derived from the pruned CART analyses were strongly associated with poor outcomes in the multivariable logistic regression models (Table 2). For all models, the included predictors were found to be strongly and independently associated with poor outcomes. Abbreviations: OSR, oxygen severity rank; NLR, neutrophil to lymphocyte ratio; CRP, C-reactive protein; LDH, lactate dehydrogenase; IL-6, interleukin-6.

Discussion
Given the delay between patients' first symptoms of COVID-19 and subsequent clinical decompensation, a better understanding of the initial clinical and laboratory parameters predictive of poor outcome are critical to guide clinical decision-making and management strategies. Here we evaluated the discriminatory ability of readily available clinical and laboratory parameters obtained at the time of hospital presentation to predict poor outcomes. Similar to previous reports, we found a broad range of predictors to be significantly associated with poor outcomes; however, we detected key differences in predictors including clinically meaningful laboratory thresholds that varied based on duration of symptoms at the time of hospital presentation. This corresponds to the evolution of signs and symptoms seen in patients with COVID-19 and likely reflects virologic progression and underlying host factors.
Using CART analysis and subsequent multivariable logistic regression, we were able to incorporate interactions between variables to rank predictors in step-wise fashion and to estimate the likelihood for decompensation. Resulting models were simple and highlighted the predictive value of laboratory testing and other readily-available clinical information as part of the initial evaluation of patients with COVID-19. For example, in adjusted analysis, patients presenting within 4 days of symptom onset had a 7.4-, 4.8-, and 2.9-fold higher odds of decompensating if they required a non-rebreather or noninvasive ventilation, were ≥63 years, or had an NLR ≥5.1, respectively. Among patients presenting later in the disease course, 4-8 days, and >8 days after symptom onset, the adjusted odds of decompensating were 3.8 and 8.8 if they required a non-rebreather or non-invasive ventilation. While the initial type of supplemental oxygen was a highly important predictor of poor outcome overall and in most subgroup analyses, additional clinical and laboratory parameters varied in predicting decompensation but depended on patients' duration of symptoms. IL-6 (pg/mL) thresholds of ≥24.7 and ≥64.3 were associated with 3.3-and 11.9-fold increased odds of decompensation in tertiles 2 and 3, respectively. D-dimer ≥ 2.4 (ug/mL) was associated with a 10% increased odds in tertile 2, and CRP ≥ 161.8 (mg/L) was associated with a 2-fold increased odds of decompensation in tertile 3. Finally, this approach generated clinically meaningful cutoffs for interpreting clinical and laboratory parameters, often differing substantially from usual clinical values. For example, while the established upper limit of the normal range of the IL-6 assay used at our institution is 1.8 pg/mL, a higher threshold value of 64.3 pg/mL identified high-risk patients from among those presenting after 8 days of symptoms. Cut-off values were supported by multivariable models but should be further validated in prospective studies.
We suspect that differences in CART models reflect evolving viral and host inflammatory responses accompanying COVID-19 disease progression [23]. Unsurprisingly, the need for higher levels of oxygen supplementation was the most predictive factor of poor outcome in this patient population, as pulmonary involvement is seen in most patients with COVID-19 presenting to the hospital. Based on our results, respiratory failure is an early indicator of decompensation, and the most important predictor in patients presenting within 8 days of symptom onset. Subsequently, mounting inflammatory responses and hypercoagulability may become more prominent, which in a subset of patients can result in development of cytokine storm, progressive ARDS, and other end-organ failure [24]. Correspondingly, elevated inflammatory markers such as IL-6, CRP, and D-dimer were found to be leading predictors of poor outcome in patients presenting later in their disease course. In patients presenting after 8 days of symptoms, elevated IL-6 > 64.3 pg/mL was the most important predictor of decompensation and may reflect development of inflammatory complications. Interestingly, symptom duration also appeared to vary across different patient demographics, e.g., younger patients and those with more medical comorbidities presenting earlier, which may reflect different disease phenotypes [25]. Further studies are needed to better understand the pathophysiologic changes underlying the temporal dynamics of laboratory and other clinical markers for disease progression in patients with COVID-19.
Our results expand on several previous studies that found specific initial laboratory parameters to be significant predictors of poor outcome in patients with COVID-19 in multivariable analyses. These studies similarly found lymphopenia, elevated troponin, renal and hepatic indices, low albumin, and elevated inflammatory markers, such as CRP, D-dimer, procalcitonin, and IL-6, to be independently associated with development of ARDS, need for intensive care unit admission or mechanical ventilation, and mortality [11,[26][27][28][29][30][31][32][33]. Other studies identified parameters such as increased CD4/CD8 radio, which was not included among initial recommended laboratory tests at our medical center, to be significantly elevated in patients at increased risk for critical illness [34]. Wang et al. developed a laboratory-based model for predicting hospital mortality which included age, initial oxygen saturation, neutrophil and lymphocyte counts, CRP, D-dimer, AST, and glomerular filtration rate [35]. Two studies conducted in China and the UK used Lasso regression to derive predictive models for mortality and/or critical illness, which consisted of demographic and clinical variables as well as laboratory parameters such as NLR or elevated neutrophil count, CRP, LDH, creatinine, and albumin [14,15]. These variables were then used to build clinical risk scores for identifying high-risk patients. More recently, using multivariable logistic regression analysis, researchers developed risk scores for mechanical ventilation and in-hospital death in COVID-19 [16]. Oxygenation, CRP, and LDH levels, along with a history of diabetes mellitus were deemed significant risk factors for requiring mechanical ventilation; age, male sex, coronary artery disease, diabetes mellitus, chronic statin use, oxygenation, BMI, neutrophil-to-lymphocyte ratio, platelet count, and procalcitonin levels were associated with increased risk of in-hospital mortality. However, symptom duration was not directly accounted for in any these models.
Several limitations of our study need to be considered. This was a single-center study and thus our findings may not be applicable to hospitals in non-urban centers or with different patient populations or different protocols for testing and monitoring of immune markers. Because of the retrospective study design, we were limited to available clinical data that could be readily extracted from the EMR. The follow-up period was a minimum of 28 days, which may not have been sufficient for identifying all patients who would meet the primary outcome. However, the number of patients still hospitalized at the time the analysis was completed was relatively small. We assessed a large number of patients presenting to our hospital for evaluation, and while our complete data and imputed CART analysis were mostly consistent, there may be bias in the resulting trees based on either imputed or complete data. Our findings were corroborated with the validation sample for the larger sample set, but sample size limitations precluded validation of the tertile. In addition to reducing sample sizes, particularly in the tertile analyses, this may have introduced bias towards patients at increased risk for decompensation, as these patients may have been more likely to receive a complete evaluation. We also relied on self-reporting of symptom types and duration, which may have been subject to recall bias. However, substantial differences in outcomes between tertiles were detected, and we do not expect differential misclassification bias to have occurred. Finally, we included only clinical and laboratory data from the initial evaluation, and therefore were not able to include a broad range of dynamic factors and complications that may have contributed to outcomes. However, this was consistent with our goal of developing an early triage tool based on data collected at the time of initial hospital presentation.

Conclusions
Our findings support the continued use of simple, practical clinical decision-making tools for the initial management of patients with COVID-19, especially in hospital settings where rapid triage is needed, capacity is limited, and virtual assessments may be performed by consultants and other medical care providers. Our results indicate that initial laboratory markers play an important role in clinical algorithms for identifying patients at increased risk for having a poor outcome. Because predictive factors differ by duration of illness, the initial patient evaluation should attempt to determine the date of symptom onset. Additional studies are needed to validate our results in a prospective cohort and link them to more precise characterization of underlying pathophysiologic processes.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/jcm10163523/s1, Table S1: Multivariable logistic regression model for the association between baseline demographic, clinical, and laboratory markers and decompensation.

Institutional Review Board Statement:
The study was conducted in accordance with the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Columbia University Irving Medical Center approved this under an expedited review (protocol number AAAS9622).
Informed Consent Statement: Patient consent was waived as per the Columbia University Irving Medical Center IRB protocol.

Data Availability Statement:
The data presented in this study are available upon request to the corresponding author.