Classification of patients with COVID-19 by blood RNA endotype: a prospective cohort study

ABSTRACT Although the development of vaccines has considerably reduced the severity of COVID-19, its incidence is still high. Hence, a targeted approach based on RNA endotypes of a population should be developed to help design biomarker-based therapies for COVID-19. We evaluated the major RNAs transcribed in blood cells during COVID-19 using PCR to further elucidate its pathogenesis and determine predictive phenotypes in COVID-19 patients. In a discovery cohort of 40 patients with COVID-19, 26,354 RNAs were measured on day 1 and day 7. Five RNAs associated with disease severity and prognosis were derived. In a validation cohort of 153 patients with COVID-19 treated in the intensive care unit, we focused on prolactin (PRL) and toll-like receptor 3 (TLR3), among RNAs that have a strong association with prognosis, and evaluated the accuracy for predicting survival of PRL-to-TL3 ratios (PRL/TLR3) with the areas under the receiver operator characteristic (ROC) curves (AUC). The validation cohort was divided into two groups based on the cut-off value in the ROC curve with the maximum AUC. The two groups were defined by high PRL/TLR3 (n = 47) and low PRL/TLR3 (n = 106), and the clinical outcomes were compared. In the validation cohort, the AUC for PRL/TLR3 was 0.79, showing superior prognostic ability compared to severity scores such as Acute Physiology and Chronic Health Evaluation II and Sequential Organ Failure Assessment. The high PRL/TLR3 group had a significantly higher 28-day mortality than the low PRL/TLR3 group (17.0% vs 0.9%, P < 0.01). A new RNA endotype classified using high PRL/TLR3 was associated with mortality in COVID-19 patients. IMPORTANCE In this study, whole-blood RNAs (prolactin and toll-like receptor 3) involved in the prognosis of patients with COVID-19 were identified. The RNA endotypes classified by these important RNAs highlight the possibility of stratifying the COVID-19 patient population and the need for targeted therapy based on these phenotypes.

T he global COVID-19 pandemic that began at the end of 2019 has claimed millions of lives, and the toll continues to rise.Its unprecedented impact has caused extreme pressure on health systems and economies as well as a massive health crisis.The complexity of this crisis is compounded by "long COVID, " a condition in which symptoms persist after recovery, reducing the quality of life of survivors and placing an additional strain on healthcare resources (1).
Toll-like receptors (TLRs), a class of pattern recognition receptors, play an important role in determining the severity of COVID-19.SARS-CoV-2, classified as a pathogen-asso ciated molecular pattern, interacts with TLR.This interaction initiates an intracellular signaling cascade that leads to the release of inflammatory cytokines such as interleukin ventilation for >12 days or 28-day non-survivors as late recovery, as per our previous study (8).We divided the COVID-19 patients of the discovery cohort into two groups based on early and late recovery (Fig. 1).

RNA sequencing and RT-qPCR
Total RNA from leukocytes was isolated from patients with COVID-19 and healthy volunteers using a PAXgene Blood RNA System (BD Bioscience, San Jose, CA).Blood samples were collected in collection tubes and stored at −30°C until further analysis.Library preparation was performed using a TruSeq stranded mRNA sample prep kit (Illumina, San Diego, CA) according to the manufacturer's instructions.RNA-seq was performed as previously described (9).Details are presented in the Supplemental Methods.RT-qPCR was performed using a Biomark HD (Fluidigm) microfluidics system as previously described (10,11), with some modifications.Details are presented in the Supplemental Methods.

Measurement of plasma proteins
To validate the results of the RNA analysis, plasma proteins were evaluated using Olink proteomics (Olink Explore 1536) as previously described (12).Details are presented in the Supplemental Methods.

Statistical analysis
In the discovery cohort, the number of samples required to extract candidate RNAs of prognostic relevance was calculated to be at least 12 cases for 90% power with measured RNA of 27,000, |log fold change| of 1.2, and false discovery rate of 0.1 (Fig. S1).In the validation cohort, on the basis of the results of previous comprehen sive molecular analysis studies, we determined that 150 COVID-19 patients would be necessary to determine prognostically relevant factors from candidate factors extrac ted from comprehensive molecular analysis in the discovery cohort (12).A subset of different conditions was created to extract molecules expressed in clinically impor tant phenotypes.In the discovery cohort, we created three subcohorts that could be related to disease progression: healthy volunteers and COVID-19 patients, critical and non-critical, and early recovery phase 1 and phase 2. In each subcohort, the differences in the expression of 26,354 RNAs were evaluated.Statistical analysis of RNA-seq data was performed as previously described (9).Multidimensional scaling using the "cmdscale" command in R version 4.3.1 (13) was performed using log2-normalized RNA fragments per kilobase of transcript per million mapped read values to compare expression between healthy volunteers and COVID-19 patients.Volcano plot analyses were performed to identify significant changes in RNA expression between patients and healthy volunteers, critical and non-critical subcohorts, and phase 1 and phase 2 subcohorts of the early recovery patients.Significance was defined as |minimum log2 fold change| >1.2, and false discovery rate, as <0.1.The top 20 significant RNAs were used for further analysis.
To evaluate the upstream regulators of RNA expression, the data were sub jected to upstream regulator analysis, and z-scores and P values were calcula ted using Ingenuity Pathway Analysis (QIAGEN, Inc., https://www.qiagen.com/us/products/discovery-and-translational-research/next-generation-sequencing/informaticsand-data/interpretation-content-databases/ingenuity-pathway-analysis/) (14).The z-score predicts the activation state of the upstream regulator using RNA expression patterns of the downstream state of that regulator.The upstream regulator was considered activated at |z-score| >2 with P < 0.05.The top 20 significant upstream regulators were used for further analysis.
Based on the results of statistical analysis of RNA-seq results, RT-qPCR of 31 RNAs was performed.The Wilcoxon rank sum test was used for analysis, and P < 0.05 was considered to indicate statistical significance.
The five candidate RNAs from the discovery cohort were validated in the validation cohort.To identify molecules associated with the most important prognostic phenotype, the patients were divided into two groups, 28-day survivors and 28-day non-survivors.The five candidate RNAs of the discovery cohort were compared by the Wilcoxon rank sum test between the two groups.The RNAs that were significantly changed (P < 0.05) between the survivors and non-survivors were extracted as key mRNAs.Receiver operator characteristic (ROC) curves were created using key mRNAs and ratios of prolactin (PRL) to toll-like receptor 3 (TLR3) (PRL/TLR3).The areas under the ROC curves (AUCs) were calculated to evaluate the accuracy for predicting survival.
The validation cohort was divided into two groups based on the cut-off value in the ROC curve with the maximum AUC.The two groups were defined by their clinical phenotypes, namely, the high PRL/TLR3 (n = 47) group and low PRL/TLR3 group (n = 106), and the clinical data of the two phenotypes were compared.Fisher's exact test and the Wilcoxon rank sum test were used to compare phenotypes based on baseline characteristics and 28-day outcomes.Missing values were not imputed in any analyses.Cumulative mortality was determined using Kaplan-Meier curves, and the phenotypes were compared using the log rank test.A two-sided P value <0.05 was considered to indicate statistical significance.For all statistical analyses, a fully scripted data manage ment pathway was created within the R environment for statistical computing, version 4.0.2(R Foundation for Statistical Computing, Vienna, Austria).Categorical variables are reported as numbers and percentages, and significance was calculated using χ 2 or Fisher's exact test.Continuous variables are described using mean and standard error or compared using the Mann-Whitney U test or Wilcoxon rank sum test and described using median and interquartile range values.

Candidate RNAs as prognostic biomarkers for COVID-19 from the discovery cohort
In the present study, we enrolled 40 COVID-19 patients and 16 healthy volunteers in the discovery cohort.The cohort comprised 35 critical and 5 non-critical patients; 12 critical patients were classified as early recovery patients, and 23, as late recovery patients.Compared to the healthy volunteers, the COVID-19 patients were significantly older (P < 0.01) and had higher rates of hypertension and diabetes mellitus (P = 0.01 and P = 0.02, respectively).There were no significant differences in other patient characteristics between the groups.Detailed characteristics of the patients and healthy volunteers are shown in Table 1.We selected 120 candidate RNAs based on fold change and upstream analyses of the three subcohorts (COVID-19 patients vs healthy volunteers, critical patients vs non-critical patients, and early recovery phase 1 patients vs phase 2 patients) (Fig. S2a; Table S1).Because of the important role of TLR7 in the defense mechanism against single-stranded RNA viruses, we performed an exploratory analysis of TLR7 expression but found no significant changes in any of these three subcohorts (Fig. S2b).Thirty-one RNAs exhibited significant change in expression between critical and non-critical patients (Fig. S3).The expression of these 31 RNAs was re-evaluated using RT-qPCR, which revealed significant change in the expression of five RNAs, namely, interleukin (IL)-18R1, galectin-2 (LGALS2), mitogen-activated protein kinase 14, PRL, and TLR3, in the three subcohorts (Fig. S4).Thus, these five RNAs were investigated in the validation cohort (Fig. 2).

Validation of five candidate RNAs using RT-qPCR
To ascertain the robustness of the five RNAs as candidate biomarkers for COVID-19 prognosis, we assessed their expression in an independent validation cohort using RT-qPCR.We included 153 COVID-19 patients in the validation cohort, which comprised 145 critical and 8 non-critical patients; 9 patients (5.9%) died within 28 days after ICU admission.The characteristics of the patients are presented in Table 1.RT-qPCR analysis revealed that the RNA expression of PRL significantly increased and that of TLR3 significantly decreased in the 28-day non-survivors compared with that in the 28-day survivors (Fig. 2).Therefore, PRL and TLR3 were established as the two key RNAs related to the prognosis of COVID-19 patients.

RNA endotypes for discriminating COVID-19 patients with poor vs good prognosis
Considering the characteristics of the expression changes of the two RNAs in patients with poor prognosis, we evaluated the ratio of PRL mRNA expression to that of TLR3 (PRL/TL3) as a biomarker of RNA phenotype.The performance of several protein markers [C-reactive protein (CRP) and fibrin degradation product (FDP)] and severity scores [Acute Physiology and Chronic Health Evaluation II (APACHE II) and Sequential Organ Failure Assessment (SOFA)] were compared with that of PRL/TLR3 using ROC curve analysis.The AUC for PRL/TLR3 was 0.79 (95% confidence interval, 0.69-0.89),whereas those of CRP and FDP were 0.48 (0.29-0.66) and 0.66 (0.48-0.84), respectively.The AUCs for APACHE II and SOFA severity scores were 0.70 (0.58-0.81) and 0.49 (0.26-0.71), respectively.Therefore, PRL/TLR3 was more suitable for the discrimination of prognosis than the protein biomarkers and severity scores (Fig. 3).Hence, based on the ROC curve analysis, the threshold for PRL/TLR3 was set at 294.8.

Comparison of patient outcomes based on PRL/TLR3
We divided the 153 patients into two groups, namely, the high PRL/TLR3 (n = 47) group and low PRL/TLR3 group (n = 106), and compared patient outcomes by group.The RNA expression pattern of PRL and TLR3 in each group is shown in Fig. 4. The RNA endotypes did not exhibit an association with age, sex, comorbidities, PaO 2 /FIO 2 (ratio), or SOFA and APACHE II scores at the time of admission of the patients to the ICU (Table 2; Fig. 5a).The cumulative incidence of mortality at 28 days was 17.0% (n = 8) and 0.9% (n = 1) in the PRL/TLR3 high and low groups, respectively.Kaplan-Meier analysis revealed an association between RNA endotype and 28-day mortality (P < 0.01) (Fig. 5b).Hospital mortality was significantly higher in the PRL/TLR3 high group than in the PRL/TLR3 low group (23.4% vs 9.4%, P = 0.04) (Table 3).However, we did not observe a difference in 28-day ventilator-free days and the duration of ICU stay between the two groups (P = 0.54 and P = 0.99, respectively).

Protein profile cohort
For further analysis, we compared the PRL and TLR3 in plasma of 54 COVID-19 patients and 20 healthy subjects.Compared to the healthy subjects, the COVID-19 patients were significantly older (P < 0.01) and had higher rates of hypertension, diabetes mellitus, lung disease, and kidney disease (P = 0.04, P < 0.01, P = 0.01, and P = 0.02, respectively).There were no significant differences in other patient characteristics between the groups.Detailed characteristics of the patients and healthy volunteers are shown in Table 1.analysis revealed that PRL was significantly elevated in both phase 1 and phase 2 (both, P < 0.01), whereas TLR3 was significantly decreased only in phase 2 (P < 0.01) (Fig. S5).

Laboratory data at inclusion
White blood cells (10

DISCUSSION
In the present study, we identified high PRL/TLR3 and low PRL/TLR3 endotypes of COVID-19 patients in two heterogeneous cohorts using whole-blood samples of the patients at the time of admission to the ICU.We elucidated that the RNA endotypes of COVID-19 patients could accurately predict patient mortality; these endotypes were not easily discernible by clinical characteristics.
The present findings based on the RNA expression of PRL and TLR3 could have pathophysiological implications.PRL is a pituitary hormone involved in lactation, luteal functioning, and reproduction (15).Moreover, leukocytes produce PRL, which plays a major role in immune responses (16)(17)(18).PRL strongly affects the innate and adaptive immune responses, exerting an immunomodulatory effect at the early stages of T-cell activation and increasing the secretion of interferon-γ (IFN-γ) (17).It also promotes cytokine production by monocytes and activates the STAT1 and MAPK pathways in granulocytes (19).Some of these responses were also observed in the present study (Table S1; Fig. S3).Although various studies have reported elevated PRL expression and exacerbation of viral infections such as those caused by human immune deficiency virus and hepatitis C virus, the mechanisms underlying the high PRL expression in COVID-19 are poorly understood (20).The results of our study suggest the potential role of PRL as a pro-inflammatory agent in COVID-19.
TLR3 plays a crucial role in the antiviral response against most viruses based on its ability to sense double-stranded RNA, a common intermediate of replication among many viruses (21).A protective role of TLR3 has been reported in infections caused by organisms closely related with COVID-19 viruses, such as SARS-CoV-1 and Middle East respiratory syndrome coronavirus (22).Moreover, TLR3 mutants are predisposed to COVID-19 and influenza infection severity and associated mortality (23,24).Therefore, the association between reduced TLR3 expression and mortality revealed in the present study may reflect the consequences of impaired immune defense against COVID-19.
The identification of potential risk factors that predict the disease course will be useful for healthcare professionals to efficiently triage patients, personalize treatment, monitor clinical progress, and allocate proper resources at all levels of care to mitigate morbidity and mortality.Prognostic predictions of COVID-19 based on patient background and laboratory test results have been reported, but their accuracy is insufficient (25)(26)(27).In the present study, the prognostic value of the RNA endotypes was higher than that of the existing clinical parameters, suggesting that the investigation of RNA expression in whole-blood samples may accurately assess pathophysiological changes that cannot be determined using clinical parameters.We also analyzed plasma proteins in the study.Although PRL was found to increase early in the course of the disease, TLR3 did not decrease until 1 week after onset of the disease.Therefore, we believe that the RNA expression of PRL and TLR3 is a better predictor of patient prognosis at the start of treatment than PRL and TLR3 in plasma.Furthermore, the differences in prognosis and therapeutic efficacy depending on RNA endotypes have been reported even in acute diseases, such as sepsis (3,28).Therefore, the accumulation of whole-blood RNA data of COVID-19 patients may reveal differences in response of critically ill patients to various therapies (29).
The study has several limitations.First, SARS-CoV-2 clade 20B was predominantly prevalent during the study period, but clade 22B is now the predominant clade.The risk of severe disease has decreased with changes in the prevalent strains (30).Therefore, it cannot be ruled out that differences in endemic strains may influence the results.Second, in sampling the healthy subjects, we were not able to adjust for background factors such as diabetes and hypertension, which are common in COVID-19 patients, and we cannot rule out the possibility that these underlying diseases may have influenced the results.Third, because severe COVID-19 tends to be more common in males (31), this study was a male-dominated cohort.It is possible that PRL is a sex hormone, and thus, the pattern of expression may vary by sex.However, our study has several strengths over previous studies on RNA endotypes (3,28).We improved the accuracy of quantification by retesting the candidate genes obtained by RNA-seq using RT-qPCR.We also validated our results using multiple cohorts; therefore, the results should be considered robust.
In the present study, we identified two key RNAs (PRL and TLR3) associated with the prognosis of COVID-19.A new RNA endotype classified using high PRL/TLR3 was associated with mortality in COVID-19 patients.This can potentially lead to the development of therapeutic interventions based on RNA endotypes in the future.

FIG 2 FIG 3
FIG 2Identification of two RNAs related to mortality of COVID-19 patients from the validation cohort.Among the five candidate RNAs selected from the analysis of the discovery cohort, the expression of PRL was significantly increased, and that of TLR3 was significantly decreased in the 28-day non-survivor subcohort compared with that in the 28-day survivor subcohort.NS, not significant.*P < 0.05.

TABLE 1
Clinical and demographic characteristics of COVID-19 patients and healthy volunteers in the discovery cohort, validation cohort, and protein profile cohort a,b

patients Healthy volunteers COVID-19 patients COVID-19 patients Healthy volunteers
a BMI: body mass index; heart disease: coronary artery disease, congestive heart failure, valvular disease; lung disease: asthma, chronic obstructive pulmonary disease, requiring O 2 at home, and any chronic lung condition; kidney disease: chronic kidney disease, baseline creatinine level >1.5 mg/dL; immunocompromised condition: active cancer, chemotherapy, transplant and immunosuppressant agents, asplenic; P/F: PaO 2 /FIO 2 ; SOFA: Sequential Organ Failure Assessment; APACHE: Acute Physiology and Chronic Health Evaluation.b Data are reported as number (percentage) or median (IQR, interquartile range) as appropriate.P value: for the comparison between COVID-19 patients and healthy volunteers.

TABLE 2
Clinical and demographic characteristics of COVID-19 patients according to PRL/TLR3 ratio a,b obstructive pulmonary disease, requiring O 2 at home, and any chronic lung condition; kidney disease: chronic kidney disease, baseline creatinine >1.5 mg/dL; immunocompro mised condition: active cancer, chemotherapy, transplant and immunosuppressant agents, asplenic; LDH: lactate dehydrogenase; P/F: PaO 2 /FIO 2 ; SOFA: Sequential Organ Failure Assessment; APACHE: Acute Physiology and Chronic Health Evaluation.
a BMI: body mass index; CRP: C-reactive protein; heart disease: coronary artery disease, congestive heart failure, valvular disease; lung disease: asthma, chronic b Data are reported as number (percentage) or median (IQR, interquartile range) as appropriate.P value: comparison between high and low groups.

TABLE 3
Clinical outcomes of COVID-19 patients according to PRL/TLR3 ratio a,b,c Data are reported as number (percentage) or median (IQR, interquartile range) as appropriate.
a ICU, intensive care unit.b c P value: comparison between high and low groups.