Cytokine ranking via mutual information algorithm correlates cytokine profiles with presenting disease severity in patients infected with SARS-CoV-2

Although the range of immune responses to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is variable, cytokine storm is observed in a subset of symptomatic individuals. To further understand the disease pathogenesis and, consequently, to develop an additional tool for clinicians to evaluate patients for presumptive intervention, we sought to compare plasma cytokine levels between a range of donor and patient samples grouped by a COVID-19 Severity Score (CSS) based on the need for hospitalization and oxygen requirement. Here we utilize a mutual information algorithm that classifies the information gain for CSS prediction provided by cytokine expression levels and clinical variables. Using this methodology, we found that a small number of clinical and cytokine expression variables are predictive of presenting COVID-19 disease severity, raising questions about the mechanism by which COVID-19 creates severe illness. The variables that were the most predictive of CSS included clinical variables such as age and abnormal chest x-ray as well as cytokines such as macrophage colony-stimulating factor, interferon-inducible protein 10, and interleukin-1 receptor antagonist. Our results suggest that SARS-CoV-2 infection causes a plethora of changes in cytokine profiles and that particularly in severely ill patients, these changes are consistent with the presence of macrophage activation syndrome and could furthermore be used as a biomarker to predict disease severity.


Introduction
In December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the origin of coronavirus disease 2019 , emerged in Wuhan, China (Zhu et al., 2020). Although many COVID-19 patients remain asymptomatic, there exists a subset of patients who present with severe illness. Early treatment with dexamethasone appears to improve outcomes in these patients. However, it is not always initially clear which patients would benefit from this therapy (The RECOVERY Collaborative Group, 2020). Moreover, COVID-19 infection can be accompanied by a severe inflammatory response characterized by the release of pro-inflammatory cytokines, an event known as cytokine storm (CS) (Tang et al., 2020;Ragab et al., 2020). Thus far, this COVID-19-associated CS has predominantly been characterized by the presence of IL-1b, IL-2, IL-17, IL-8, TNF, CCL2, and most notably IL-6 (Tang et al., 2020;Merad and Martin, 2020;McGonagle et al., 2020;Wan et al., 2020;Otsuka and Seino, 2020). Severe cases of CS can be life threatening, and early diagnosis as well as treatment of this condition can lead to improved outcome. We hypothesize that cytokine profiles combined with clinical information can predict disease severity, potentially giving clinicians an additional tool when evaluating patients for preemptive intervention.

Results
Analysis was performed for 36 PCR-confirmed COVID-19 (+) and 36 (À) human plasma samples (Figure 1-source data 1). The COVID-19 Severity Score (CSS) was developed to categorize patients based on their status upon presentation to the emergency department. CSS is graded as follows: 0 = COVID (À), no symptoms, healthy control (n = 24); 1 = COVID (À), symptoms (n = 12); 2 = COVID (+), discharged from emergency room (n = 15); 3 = COVID (+), admitted, but who did not require supplemental oxygen (n = 7); 4 = COVID (+), admitted and required any amount of supplemental oxygen or positive pressure ventilation (n = 8); and 5 = COVID (+), admitted to ICU/step-down (n = 6) ( Figure 1). CSS was used as the outcome variable for a mutual information minimum-redundancy maximum-relevance algorithm (Kratzer and Furrer, 2018; Figure 1), with the goal of selecting a subset of variables most predictive of CSS. The algorithm confirmed the predictive value of clinical variables such as age and chest x-ray abnormality and also ranked the information gain provided by each of 15 cytokines tested. Several cytokines were able to add unique predictive value to the mutual information model in addition to what was provided by clinical factors such as age or patient comorbidities. This algorithm also deprioritized factors when their predictive value was redundant with the most predictive variables. Macrophage colony-stimulating factor (M-CSF) was ranked second after age as it was the factor that added the most predictive power to the algorithm with minimal redundancy with age. It ranked ahead of abnormalities on chest x-ray because while both were relevant in predicting COVID severity, part of the predictiveness of chest x-ray abnormality was also explained by age differences (Figure 2). The top four cytokines combined with age were predictive of the most severe CSS (4-5) and had a receiver operating characteristic ( Figure 2), with an area under the curve of 0.86. Multiple cytokines, including M-CSF (p<0.01), interferon-inducible protein 10 (IP-10) (p<0.01), interleukin 18 (IL-18) (p<0.01), and interleukin-1 receptor antagonist (IL-1RA) (p<0.01), were more relevant in predicting CSS than more frequently characterized cytokines in the context of COVID-19 such as IL-6 (p<0.01). These cytokines showed a statistically significant difference in their profiles when segregated by CSS (Figure 3), yet the mutual information algorithm prioritized them differently than would be expected based on univariate analyses. This indicates that the mutual information algorithm is prioritizing cytokines whose predictive value for COVID-19 severity cannot be fully explained by other clinical variables such as age or medical comorbidities.

Discussion
We found that a small number of clinical variables when combined with cytokine expression are predictive of presenting COVID-19 disease severity. Cytokines singled out for relevance by the mutual information algorithm shared a connection to macrophage activation syndrome (MAS), raising questions about the mechanism by which SARS-CoV-2 creates severe illness in a subset of patients. First, we examined the significant contribution of IP-10 to CSS. IP-10 is secreted by monocytes, fibroblasts, and endothelial cells in response to interferon gamma (IFN-g), which is secreted by  Source data 1. Patient information. (mainly, Th1), macrophages, mucosal epithelial cells, and natural killer (NK) cells (Liu et al., 2011). This release of IFN-g induces several cell types to produce IP-10, which consequently recruits more Th1 cells, contributing to a positive feedback loop. IP-10 is also chemoattractant to CXCR3-postitive cells such as macrophages, dendritic cells, NK cells, and T cells. It has been proposed that macrophages recruited by IP-10, in the presence of persistent IFN-g production, can lead to MAS (Merad and Martin, 2020;McGonagle et al., 2020;Otsuka and Seino, 2020). MAS is characterized as a state of systemic hyperinflammation often accompanied by CS, which, without intervention, can lead to severe tissue damage and, in extreme cases, death (Otsuka and Seino, 2020). Moreover, the cytokine most relevant in predicting CSS was M-CSF, which is secreted by eukaryotic cells in response to viral infection and stimulates hematopoietic stem cells to differentiate into macrophages. Currently, there are three separate immune stages that describe the progression of COVID-19. The first stage is characterized by a potent induction of interferons that marks the early activation of the immune system that is important in the viral response, and the second stage is characterized by a delayed interferon response (Merad and Martin, 2020). These stages may prime the body for a third stage comprised of detrimental hyperinflammation characterized by CS and MAS (Merad and Martin, 2020). This excessive macrophage activation could explain the increase in IL1-RA that we observed, a cytokine abundantly produced by macrophages.
Steroids have shown a survival benefit for COVID-19, likely by suppressing such detrimental hyperinflammation (The RECOVERY Collaborative Group, 2020). Our analysis identified a pattern of cytokine alterations on presentation associated with COVID-19 severity. The ability to identify a cytokine pattern less redundant with known clinical factors such as age and chest x-ray could help better identify patients in need of immunomodulatory treatment without the confounders of current models where the measured cytokines correlate as much with age as with severity (Pierce et al., 2020). Further studies should be conducted to clarify the mechanistic role that these cytokines and macrophages play in the various stages of COVID-19 and correlate them with other hematologic parameters that were not collected in this database. The results of these future studies could identify more targeted immunomodulatory strategies beyond steroid administration such as treatment with MEK inhibitors (Zhou et al., 2020), as well as the ideal timing of these interventions to maximize therapeutic efficacy. Future studies could also address the size limitations of this study, which was not powered to explore race-or ethnicity-related differences in COVID-19 severity. Finally, we present the application of this mutual information algorithm as a way to evaluate the dataset as a whole and elucidate the most important cytokines in predicting the presenting severity of COVID-19. COVID-19 severity is influenced by many clinical factors, such as age, and this algorithm is able to identify cytokines that contribute information not present in the tested clinical variables. Identifying the most important variables for severe presentation of COVID-19 within a more complete cytokine profile may help determine global immune mechanisms of disease severity. Biobank samples  and (À) human plasma samples were received from the Lifespan Brown COVID-19 Biobank from Brown University at Rhode Island Hospital (Providence, RI). All biobank samples were collected on patients' arrival in the Emergency Department at Rhode Island Hospital. All patient samples were deidentified but included the available clinical information as described in Results. It is unknown if any patients were blood relatives. The IRB study protocol 'Pilot Study Evaluating Cytokine Profiles in COVID-19 Patient Samples' did not meet the definition of human subjects research by either the Brown University or the Rhode Island Hospital IRBs. All samples were thawed and centrifuged at 14,000 rpm for 10 min following the manufacturer protocol included with the Luminex kit to remove cellular debris immediately before the assay was run.

Clinical variables
Available deidentified clinical variables were collected from patients and from chart review during their time in the emergency department. Clinical variables were categorized to create combined variables such as the number of chronic conditions or the number of presenting symptoms. The full breakdown of clinical variable categorization can be found in Figure 2-source data 1.

Data analysis
Data analysis and visualization were generated using R (R Development Core Team, 2020). The varrank package (Kratzer and Furrer, 2020) was used to apply a minimum-redundancy maximum-relevance mutual information algorithm. The algorithm classifies the amount of information each cytokine and clinical variable can provide about the outcome variable, CSS. Each cytokine variable was discretized into two clusters -either high or low analyte concentration in pg/mL -using k-means clustering to minimize within-variable entropy and, thus, over-fitting. This algorithm partitions each data point into the cluster (high or low analyte concentration) with the nearest mean. Clinical variables and cytokine levels were used to predict CSS. The first variable was selected for local optimum relevance by a greedy algorithm. All subsequent variables were ordered to maximize relevancy and minimize redundancy. The ordering was robust to leave-one-out cross-validation. For each cytokine, one-way ANOVA with Tukey's honest significant difference test and Š idá k correction for multiple comparisons was used to compare plasma cytokine levels among CSS groups.