COVID-19 subphenotypes at hospital admission are associated with mortality: a cross-sectional study

Abstract Background We have an incomplete understanding of COVID-19 characteristics at hospital presentation and whether underlying subphenotypes are associated with clinical outcomes and therapeutic responses. Methods For this cross-sectional study, we extracted electronic health data from adults hospitalized between 1 March and 30 August 2020 with a PCR-confirmed diagnosis of COVID-19 at five New York City Hospitals. We obtained clinical and laboratory data from the first 24 h of the patient’s hospitalization. Treatment with tocilizumab and convalescent plasma was assessed over hospitalization. The primary outcome was mortality; secondary outcomes included intubation, intensive care unit (ICU) admission and length of stay (LOS). First, we employed latent class analysis (LCA) to identify COVID-19 subphenotypes on admission without consideration of outcomes and assigned each patient to a subphenotype. We then performed robust Poisson regression to examine associations between COVID-19 subphenotype assignment and outcome. We explored whether the COVID-19 subphenotypes had a differential response to tocilizumab and convalescent plasma therapies. Results A total of 4620 patients were included. LCA identified six subphenotypes, which were distinct by level of inflammation, clinical and laboratory derangements and ranged from a hypoinflammatory subphenotype with the fewest derangements to a hyperinflammatory with multiorgan dysfunction subphenotypes. Multivariable regression analyses found differences in risk for mortality, intubation, ICU admission and LOS, as compared to the hypoinflammatory subphenotype. For example, in multivariable analyses the moderate inflammation with fever subphenotype had 3.29 times the risk of mortality (95% CI 2.05, 5.28), while the hyperinflammatory with multiorgan failure subphenotype had 17.87 times the risk of mortality (95% CI 11.56, 27.63), as compared to the hypoinflammatory subphenotype. Exploratory analyses suggested that subphenotypes may differential respond to convalescent plasma or tocilizumab therapy. Conclusion COVID-19 subphenotype at hospital admission may predict risk for mortality, ICU admission and intubation and differential response to treatment. KEY MESSAGE This cross-sectional study of COVID patients admitted to the Mount Sinai Health System, identified six distinct COVID subphenotypes on admission. Subphenotypes correlated with ICU admission, intubation, mortality and differential response to treatment.


Introduction
The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS CoV2) pandemic has resulted in significant morbidity and mortality [1,2].In the United States, there patients in the US was estimated to be 21% [4].A number of studies have identified early clinical markers of severe coronavirus disease 2019 (COVID- 19), such as fever, hyperglycemia, and elevated inflammatory markers that may be associated with worse outcomes [5][6][7][8][9].Recent studies have shifted focus to COVID-19 heterogeneity [10,11].We posit that COVID-19 heterogeneity at the time of hospital admission represents underlying subphenotypes with different natural histories, clinical and biological characteristics, outcomes and, possibly, responses to treatment [12][13][14].Better characterizations of COVID-19 subphenotypes and their associations with outcomes could inform treatment options.Furthermore, secondary analyses of completed trials using COVID-19 subphenotypes may explain variable responses to therapeutics and improve targeted therapy.
It is now understood that syndromes of critical illness, such as acute respiratory disease syndrome (ARDS) and sepsis, both seen in severe COVID-19, are not singular in presentation but rather are composed of multiple underlying subphenotypes with differing associated morbidity and mortality risk.Secondary analyses of randomized controlled trials of ARDS employed latent class analysis and consistently identified hyperinflammatory and hypoinflammatory subphenotypes where the hyperinflammatory subphenotype was associated with higher risk of mortality [13,[15][16][17].These subphenotypes may differentially respond to therapy, although evidence is mixed [13,15,16].Within the COVID-19 framework, published studies have evaluated whether individual measures such as oxygen saturation, creatinine, D-dimer or Creactive protein (CRP) predict disease severity.Machine learning approaches have been applied to understand risk for COVID-19 mortality and critical illness however these approaches are limited by data missingness and sample size requirements [18][19][20][21][22][23][24].More recent studies have examined COVID-19 subphenotypes at the time of ICU admission when the disease is advanced and successful interventions may be limited [25,26].Moreover, randomized controlled trials of therapeutics recruit COVID-19 patients broadly and do not enrich for subphenotypes that may be more likely to respond to that therapeutic [27][28][29].More research is needed to identify subphenotypes of disease severity on hospital presentation and assist with clinical risk stratification and treatment algorithms [30,31].
To address this knowledge gap, we conducted a retrospective analysis to identify COVID-19 subphenotypes on admission, examine whether identified subphenotypes were associated with COVID-19 outcomes, and explore differential response to therapeutics.Specifically, we leveraged electronic health record data from inpatient COVID-19 encounters within five New York City (NYC) hospitals during the Spring through Summer 2020 surge.Our primary COVID-19 clinical outcome was mortality; secondary outcomes included intensive care unit (ICU) admission, intubation and length of stay (LOS).

Study participants
We extracted electronic health data from all persons hospitalized with a PCR-confirmed diagnosis of COVID-19 at five Mount Sinai Hospital System Hospitals including Mount Sinai Brooklyn, Mount Sinai Queens, the Mount Sinai Hospital, Mount Sinai Morningside and Mount Sinai West.Specifically, we included patients who were aged 18 years or older, admitted between 1 March 2020 and 30 August 2020, and had a positive SARS Cov2 PCR nasal swab within 7 days of admission.

Ethics
The study was approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai (20-00547).

Electronic health record data
Electronic health record data were collected from the first COVID- For each patient encounter, we extracted date of admission, hospital, sex, age, self-reported race/ethnicity, insurance provider and date of SARS-CoV-2 PCR test.For patients with multiple encounters, data collected at the first encounter that met inclusion criteria were used in analyses.Information on medical comorbidities were extracted from the electronic health record using international classification of disease 10 (ICD 10) codes.Comorbidities were grouped into organ-specific categories.Persons with history of asthma, chronic obstructive pulmonary disease (COPD) and/or obstructive sleep apnea (OSA) were categorized as having pulmonary disease.Persons with history of hypertension (HTN), coronary artery disease (CAD), congestive heart failure (CHF) and/or myocardial infarction (MI) were categorized as having cardiovascular diseases.Persons were also categorized as having a history of cancer or obesity (as measured by ICD 10 code, or a calculated BMI >30).The number of organ-specific comorbidities were then summed.
We obtained clinical and laboratory data from the first 24 h of the patient's first hospitalization that met inclusion criteria.Only variables with data available from at least 60% of participants or variables with a strong biological basis based on prior studies [erythrocyte sedimentation rate (ESR), interleukin 6 (IL-6), interleukin 1 beta (IL-1B)] were included.For variables with repeated observations, we identified the worst value recorded within 24 h of admission.Clinical variables included lowest oxygen saturation, lowest systolic and diastolic blood pressure, highest heart rate, and highest temperature within the first 24 h.Laboratory variables examined included inflammatory markers [C-reactive protein (CRP), ESR, IL-6, IL-1B, lactate dehydrogenase (LDH), procalcitonin, ferritin]; hematologic markers [white blood cell (WBC), hemoglobin, platelets, d-dimer, fibrinogen, prothrombin time (PT), partial thromboplastin time (PTT)); cardiac markers (troponin, brain natriuretic peptide (BNP)]; and renal and hepatic markers (alanine transaminase (ALT), aspartate aminotransferase (AST), albumin, total bilirubin, sodium, potassium, calcium, bicarbonate, blood urea nitrogen (BUN), creatinine, anion gap, glucose.Laboratory values above the laboratory-defined limit of detection were assigned the value at the limit of detection.
Patient outcomes were assessed across all hospital encounters.The primary outcome was mortality; secondary outcomes included intubation and admission to the intensive care unit (ICU).Specifically, patients listed as 'expired' or 'deceased' as per Epic discharge disposition were classified as deceased.Electronic health record mortality data also included healthrecord linked post-discharge deaths.Patients assigned to an ICU bed at any point of any hospitalization were classified as having an ICU admission.Patients recorded as having a surgical intubation, non-surgical airway intubation or endotracheal intubation during any encounter were classified as being intubated.Amongst survivors, we also determined length of stay (LOS) of the initial hospitalization.Data regarding COVID-19 therapeutics, including tocilizumab and convalescent plasma, administered while hospitalized were also collected across all hospital encounters.

Covariates
We extracted the hospital in which patients were hospitalized (categorical variable), self-identified race/ ethnicity (categorical variable: White, Black, Hispanic, Asian, Other), insurance provider (categorical variable: Private/Medicare, Medicaid/Emergency Medicaid, Other) and onset time, defined as the time in days from first SARS-Cov-2 PCR positive admission in the Mount Sinai Health System to the individual patient's admission.

Statistical analysis
A two-step approach was undertaken in this analysis.First, we employed latent class analysis (LCA) to identify COVID-19 subphenotypes on admission without consideration of outcomes.The distribution and completeness of clinical and laboratory data was examined.As LCA allows for missingness, no data imputations were performed.Clinical and laboratory variables were placed into quintiles for LCA.We fit LCA models ranging from 2 to 10 subphenotype classes and then determined the best fitting model (i.e. the number of subphenotypes).Criteria for number of subphenotypes included: (1) consistent Akakie information criteria (cAIC) and adjusted Bayesian information criteria (aBIC), where lower values suggest better fit; (2) entropy, where higher values suggest better class separation; (3) likelihood ratio; and (4) number of participants per subphenotype, where models with adequate sample size in each class are optimal.Once the best fitting model and number of subphenotypes was identified, participants were assigned to the subphenotype for which they had the highest probability of correct assignment.These subphenotype assignments were then used as the independent variable for subsequent regression analyses.
Given that our COVID-19 outcomes of interest were common (occurring in more than 10% of the cohort), we employed bivariate and multivariable Poisson regression models with robust error variance to examine associations between COVID-19 subphenotype assignment and risk of mortality, ICU admission, and intubation, considered separately [32,33] using the R package sandwich [34,35].Amongst survivors, we employed bivariate and multivariable generalized linear regression to examine associations between COVID-19 subphenotype and length of stay.Multivariable models adjusted for onset time, hospital, self-identified race/ethnicity and insurance provider.Finally, we explored whether the COVID-19 subphenotypes had a differential response to tocilizumaband convalescent plasma therapies through introduction of an interaction term and in treatment-stratified models.

Latent class analysis
Overall, clinical and laboratory data completeness was high (Supplemental Table S1).Supplemental Figure S1 displays the LCA model fits, specifically cAIC, aBIC, entropy and likelihood ratio for models with 2-10 subphenotypes.Using these variables, we determined that the optimal fit was six subphenotypes.

Associations between COVID-19 subphenotype and mortality
Bivariate models suggest that, as compared to the hypoinflammatory subphenotype, all subphenotypes had increased risk of mortality (Supplemental Table S2).In multivariable models adjusting for onset time, hospital, self-identified race/ethnicity and insurance provider, all subphenotypes had increased risk of mortality as compared to the hypoinflammatory subphenotype (moderate inflammation and febrile RR 3. 29 2(A), complete model output Supplemental Table S2).

Associations between COVID-19 subphenotype and intubation
In bivariate and multivariate models, all subphenotypes had increased risk of intubation as compared to the hypoinflammatory subphenotype (multivariable model: moderate inflammation and febrile RR 3. 44 S3).

Exploratory analyses of COVID-19 therapies
Exploratory analyses to examine effect modification of the association between convalescent plasma (CP) and mortality by COVID-19 subphenotype suggested a differential effect by subphenotype.Within the cohort, n ¼ 93 patients received CP.As compared to the hypoinflammatory subphenotype, the other subphenotypes demonstrated increased risk of dying in those who did not receive CP as compared to those who did [moderate inflammation with fever: received CP (n ¼ 10) RR 0.85, 95% CI 0.  4).

Discussion
Utilizing a cohort of 4620 COVID-19 patients during the Spring to Summer 2020 NYC surge, these data suggest that there are six distinct COVID-19 subphenotypes at the time of hospital admission.These subphenotypes have varying clinical courses, with differences in associated risk for mortality, intubation, ICU admission and LOS.Further, despite limited treatment options in the early pandemic, this work provides insight that COVID-19 subphenotypes may differentially respond to therapeutics, suggesting that further characterization of COVID-19 subphenotypes may be critical for future COVID-19 therapeutic trials and, ultimately, to guide therapy.This result expands on the currently published literature in two significant ways.It uses data at time of hospital admission, rather than trajectories while hospitalized or ICU admission, to provide the earliest possible timepoint to identify distinct subphenotypes [14,25,26].Additionally, much of current literature focuses on strictly identifying subphenotypes [10,24,38].This study goes beyond identification to begin to explore the role subphenotypes play in response to potential treatments.
Our admission dataset includes a number of serologic markers (IL-6, IL-1B, ferritin, LDH, ESR, CRP and procalcitonin) with variability across the cohort.The identified six subphenotypes differed predominantly in inflammatory profiles on admission, as defined by these serologic inflammatory markers.For example, median ferritin level on admission was 724 ng/ml (IQR 326, 1653) and varied amongst subphenotypes.The median ferritin in the hypoinflammatory subphenotype was 148.0 (IQR 56.8, 261.5) as compared to the hyperinflammatory subphenotypes (hyperinflammation with liver dysfunction median 1341.0 (IQR 770.2, 2465.2),hyperinflammation with renal dysfunction median 1004.0 (IQR 430.0, 2376.5) and hyperinflammation with multiorgan dysfunction subphenotype median 1441.5 (IQR 685.0, 2756.0)).We noted similarly separation of subphenotypes by IL-6, CRP and LDH in particular.While there is no single marker (serologic or otherwise) that has been identified to predict disease severity, associated organ involvement or outcome, multiple studies have demonstrated the role of IL-6, IL-8, CRP, LDH, procalcitonin and ferritin, in identifying patients at higher risk of poor outcomes [10,[39][40][41].Interestingly, the inflammatory markers are not uniformly elevated suggesting nuances in the inflammatory cascade and/or host response that need further investigation.
Organ dysfunction occurred predominantly in subphenotypes with the largest inflammatory derangements.For example, subphenotypes with hyperinflammation had multiorgan dysfunction or renal or liver failure.Notably, the liver failure group were predominantly younger (median age 61, IQR 52, 68) men which is not a group previously considered high risk [42].Given the cross-sectional view of clinical and laboratory measures at admission we cannot temporally identify whether the inflammation directly led to the organ dysfunction or whether the inflammatory markers were elevated due to reduced renal or hepatic clearance [43].It is interesting to note, however, that the two subphenotypes with moderately elevated inflammatory markers on average did not have renal or hepatic dysfunction.These data suggest that specific inflammatory markers in conjunction with markers of renal and hepatic injury may be used to identify patients at risk of clinical deterioration.
While the mortality risk observed for the most elderly group with multiorgan derangements is not surprising, it is important to note that this approach allowed us to identify subphenotypes that were at increased risk for mortality but may not previously have been considered to be at higher risk.For example, we identified a younger group (median age 59 years, IQR 49, 66) with moderate inflammation and febrile subphenotype that had 3.3 times the risk of death (RR 3.29, 95% CI 2.05 5.28), 3.4 times the risk of intubation (RR 3.44, 95% CI 2.07, 5.74) and 2 times the risk of ICU admission (RR 2.00, 95% CI 1.50, 2.67) in the multivariable model as compared to the hypoinflammatory subphenotype.On average, members of this group were previously healthy with no (N ¼ 279, 36%) or one (N ¼ 332, 43%) organ system comorbidity.It is critical that future studies focus on these populations to better understand the pathophysiologic mechanisms of increased risk, enable early identification and initiate appropriate treatments.
Further, these data suggest that identification of these subphenotypes at the time of hospital admission may be helpful in designing future COVID-19 therapeutic trials, guiding secondary analyses of existing COVID-19 randomized controlled treatment studies and, ultimately, in generating a patient-centered treatment algorithm.This work builds on prior analyses by Calfee et al. suggesting that ARDS subphenotypes differentially respond to therapeutic interventions in ARDS including fluid management strategies and use of statins [13,15,16].
Specifically, we find that the hyperinflammatory subphenotype with multiorgan dysfunction may differentially respond to tocilizumab and convalescent plasma therapies.Emerging evidence supports this concept and suggests that some treatments may be more efficacious in certain populations [39,44].Current literature on the efficacy of tocilizumab and convalescent plasma have demonstrated conflicting results.Five large randomized trials examining the efficacy of tocilizumab, which enrolled patients with varying degrees of respiratory failure, found a mortality benefit in only two studies [45][46][47][48][49].A recent meta-analysis of IL-6 antagonists, including tocilizumab, however did demonstrate a mortality benefit [50].Similarly, studies of convalescent plasma have not shown mortality benefit but do suggest that early administration to mildly ill patients or with high titer plasma, may provide some benefit [51][52][53][54][55].Our data support secondary analyses of completed randomized controlled trials to better understand if COVID-19 subphenotypes differentially respond to therapeutics.These analyses will be critical to inform patient-centered treatment algorithms.
Our study has several strengths.By leveraging the Mount Sinai electronic medical record data repository, we were able to evaluate a large, diverse sample of patients from multiple hospitals in New York City.We employed a data-driven method to analyze the heterogeneous population of COVID-19 patients and were able to identify underlying subphenotypes and demonstrate associations with mortality, ICU admission, intubation risk, and length of stay, key COVID-19 endpoints.Our exploratory analyses suggest that these subphenotypes may have a differential response to therapeutic treatments, which suggests that completed COVID-19 RCTs may benefit from secondary analyses of their datasets even if no effect in the overall cohort was found.Our employment of admission data to generate subphenotypes provides a platform for early identification of subphenotypes, which if replicated in other studies, suggests the potential for selecting therapeutic options based on identified differences early in the hospitalization.
We also acknowledge limitations.There is variability in duration of illness prior to presentation to the hospital that is not captured by this dataset.Patients may have presented at different phases of illness, which we know was true in New York City during the first wave of the pandemic as factors including patient volume and limited resources impacted decisions on when to present to the hospital or be admitted.Hospital capacity and available resources may have also impacted aggressiveness of treatment (e.g.palliative care) which we are unable to capture by electronic health records.We adjust for time since the onset of the pandemic to address this.An evaluation of the stability of the subphenotypes over time would lend additional evidence that the subphenotypes are distinct classes, regardless of duration of illness at presentation [56,57].Exploratory analyses of associations with treatments are limited by potential biases as these data are retrospective.Access to these treatments was limited in Spring 2020.Certain treatments were not available at every hospital site and were restricted to patients with more severe illness rather than distributed in a randomized fashion, introducing selection bias.Criteria for use changed over time as more studies became available about these treatments, limiting generalizability of these analyses.Additionally, these data reflect the original strain of COVID, and were collected prior to the development of vaccines, which has impacted the clinical presentations and potentially altered the clinical subphenotypes.
In conclusion, these data find six distinct, clinically relevant, COVID-19 subphenotypes present on admission, which are associated with risk for mortality, intubation, ICU admission and LOS and suggest differential responses to tocilizumab and convalescent plasma.Future studies should validate these subphenotypes in other populations and health systems.Post hoc analyses of randomized control trials of tocilizumab, convalescent plasma and other therapeutics are warranted.
a Time in days from first SARS-Cov-2 PCR positive admission in the Mount Sinai Health System to the individual patient's admission.b Comorbidities included history of cancer, obesity, pulmonary disease (asthma, chronic obstructive pulmonary disease (COPD) and/or obstructive sleep apnea (OSA)), and cardiovascular disease (hypertension (HTN), coronary artery disease (CAD), congestive heart failure (CHF) and/or myocardial infarction (MI)).