Traces of SARS-CoV-2 RNA in Peripheral Blood Cells of Patients with COVID-19

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The outbreak was first reported in December 2019 in Wuhan, China, but rapidly progressed into a pandemic of an unprecedented scale since the 1918 flu pandemic. Besides respiratory complications in patients with COVID-19, clinical characterization of severe infection cases showed several other comorbidities, including multiple organ failure, and septic shock. To better understand the systemic pathogenesis of COVID-19, we interrogated the virus's presence in the peripheral blood cells, which might provide a form of trafficking or hiding to the virus. By analyzing >2 billion sequence reads of high-throughput transcriptome sequence data from 180 samples of patients with active SARS-CoV-2 infection or healthy controls collected from 6 studies, we found evidence of traces of SARS-CoV-2 RNA in peripheral blood mononuclear cells in two samples from two independent studies. In contrast, the viral RNA was abundant in bronchoalveolar lavage specimens from the same patients. We also devised a “viral spike-to-actin” RNA normalization as a metric to compare across various samples and minimize errors caused by intersample variability in total human RNA abundance. Our observation suggests immune presentation and discounts the possibility of extensive viral infection of lymphocytes or monocytes.


Introduction
T he severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the third virus that caused coronavirus-related outbreaks over the past 20 years. The first outbreak occurred in Asia in 2002 hence, the name SARS-CoV, which back then was not related to any of the known viruses (Marra et al., 2003;Rota et al., 2003). Between 2002 and2003, 8098 people became sick with SARS, and of those, 774 died (i.e., a mortality rate of 9.5%). Since 2004, there have been no more reports of SARS cases (via NHS, WHO, and CDC).
The second coronavirus-related outbreak started in the Arabian Peninsula in 2012 (Zaki et al., 2012), causing a fatal disease, Middle East respiratory syndrome (MERS), with a significantly higher mortality rate of 40% of the cases infected by MERS-CoV virus (Zumla et al., 2015).
More recently, in December 2019, the third coronavirusrelated outbreak was first reported in Wuhan, China, by the emergence of the SARS-CoV-2, initially dubbed ''the 2019 novel coronavirus'' (2019-nCoV). The spread of the virus led to a pandemic of an unprecedented scale since the 1918 flu pandemic.
As of mid-June 2021, >175 million confirmed COVID-19 cases globally, and >3.5 million deaths have been reported by the WHO (WHO Dashboard, continuously updated). Besides the respiratory complications in patients with COVID-19, clinical characterization of severe infection cases indicated further comorbidities, including multiple organ failure (liver, kidney, and heart) and septic shock (Cascella et al., 2020;Li et al., 2020;Poston et al., 2020).
Since the first genome sequence of SARS-COV-2 has been determined and made public in January 2020 (Lu et al., 2020), >2 million genomes have been sequenced worldwide and become available through the Global Initiative on Sharing All Influenza Data (Shu and McCauley, 2017). The availability of those genomic sequences allows rapid screening of viral RNA in human tissues and environmental samples [e.g., sewage (Bibby and Peccia, 2013)] using multiomic wet lab technologies and in silico screening tools for publicly available metatranscriptomic samples.
To better understand COVID-19 systemic pathogenesis, we conducted this study to interrogate the presence of the virus in the blood, or any of its components, as it might provide a form of trafficking or hiding to the virus, notably that some precarious studies reported the ability of the virus to infect lymphocytes (Wang et al., 2020b). In contrast, others have suggested that the virus exerts its pathogenesis through ''attacking hemoglobin,'' although this hypothesis has been heavily criticized (Read, 2020). The virus was sporadically reported to be found in the plasma or blood of patients with COVID-19 (Huang et al., 2020;Wang et al., 2020a). Finally, peripheral blood mononuclear cells (PBMCs) were shown to harbor other infectious viruses, such as HIV, HCV, and HBV (Li et al., 2015;Wang et al., 2002).
We computationally analyzed high-throughput sequence data from patients with active COVID-19 in several publicly available RNA-Seq datasets for the reasons mentioned previously. We found only evidence of traces of SARS-CoV-2 RNA in their PMBCs, whereas their bronchoalveolar lavage samples had large amounts of viral RNA.

Detection of viral RNA
Filtered FASTQ sequences were aligned to the SARS-CoV-2 reference genome (GenBank accession NC_045512) by the Burrows-Wheeler aligner (Li and Durbin, 2009). Sambamba (Tarasov et al., 2015) was used to filter generated binary alignment map files for mapped sequences with a quality score >40 and alignment score >90. For further verification, identified SARS-CoV-2 matching sequences were manually inspected, and blastn (Altschul et al., 1990) was used to check them against the NCBI nucleotide ''nt'' database. Blast matches and alignments were visually reviewed. In addition, the matching sequences were annotated by blastx searches against the NCBI ''RefSeq protein'' database.

Viral-to-human expression normalization
To estimate viral RNA abundance in a given sample and make a comparison between samples possible, we normalized the number of any positive SARS-CoV-2 matching hits to the total number of reads within that sample. In addition, we used the spike gene, which is highly specific to SARS-CoV-2, to estimate the extent of viral RNA load in a given sample. We normalized the abundance of spike genes to human actin RNA, being a transcript of a housekeeping gene. This normalization generated a ''spike-to-actin'' ratio that could be used as an accurate metric for viral RNA abundance relative to human RNA because the total number of reads may include nonhuman and nonviral samples.   The abundance of viral RNA is estimated as the number of detected viral RNA reads to the total number of RNA reads in the sample.

Dimension reduction and clustering
For dimension reduction analysis, we used the plotPCA function implemented in the DESeq2 package (Love et al., 2014), after applying the rlog function, which transforms the transcript count data to the log2 scale and normalizes the count data to the sizes of the libraries.

Differential gene expression analysis
Filtered sequences, in FASTQ format, were processed in Salmon (Patro et al., 2017) for the quantification of the expressed transcripts against the human reference GENCODE (Frankish et al., 2019) Release 35 (GRCh38.p13). The DE-Seq2 package (Love et al., 2014) was used for transcript normalization and differential gene expression analysis.
The comparisons performed for the differential expression are (1) healthy controls versus patients with COVID-19 without viral RNA, (2) healthy controls versus patients with COVID-19 with viral RNA, and (3) patients with COVID-19 without viral RNA versus patients with COVID-19 with viral RNA. The cutoff for statistical significance was adjusted p < 0.001. Thus, the complete set of the differentially expressed genes (DEGs) is the union of the identified DEGs from the three comparisons. Patterns of differential gene expression were visualized as a heatmap using the R package pheatmap https://cran.r-project.org/package=pheatmap. Further data processing and visualization were performed in R https://www .r-project.org/. Toppgene (Chen et al., 2009) was used for enrichment analysis for differentially expressed transcripts.
On the contrary, expectedly, we found viral sequences in all the bronchoalveolar lavage fluid (BALF) samples (Xiong et al., 2020) with a median abundance of 1.07% of the total sequence reads, which, despite being a relatively low percentage of the total reads, was sufficient to cover the entire genome of SARS-CoV-2 with coverage exceeding 4000 · for some areas in the viral genome ( Supplementary Fig. S2).
Based on the normalized gene expression values of the PBMC samples (Xiong et al., 2020), principal component analysis (PCA) demonstrated separation between the COVID-19 and healthy control samples. Moreover, the t-SNE analysis showed a further separation between the patient samples in which viral RNA was detected in PBMCs and those with no detected viral RNA (Fig. 1).
In agreement with the PCA analysis, hierarchical clustering delineated the patterns of DEGs and confirmed the separation of the three groups: controls, COVID-19 without SARS-CoV-2 viral RNA, and COVID-19 with SARS-CoV-2 viral RNA (Fig. 3).  Toppgene enrichment analysis for genes that were differentially expressed between healthy controls and COVID-19 samples with viral RNA and between COVID-19 without viral RNA and COVID-19 with viral RNA using provided enriched Gene Ontology (GO) terms under the Molecular Function, Biological Process, and Cellular Component categories (Table 3).

PBMC Condition
When the statistically enriched terms were filtered to adjusted p < 0.05, the following categories stood out. Under the Molecular Function category, there was only one statistically significant enriched term, GO:0003823 (antigen binding). Likewise, only one statistically significant enriched term under the Cellular Component category satisfied the statistical filter, that is GO:0019814 (immunoglobulin complex). Under the Biological Process category, 19 statistically significant enriched terms were shortlisted. Many of these terms were related to immune responses and viral life cycle, including GO:0051707 (response to other organisms), GO:0002250 (adaptive immune response), GO:0045087 (innate immune response), and GO:0002449 (lymphocyte-mediated immunity).
When we calculated a spike-to-actin RNA ratio for each sample, values for all four BALF samples, as well as the one PBMC sample with SARS-CoV-2 transcripts, followed the same pattern of viral RNA abundance ratios and were strongly correlated (Spearman correlation q = 0.96, p = 0.00047; Fig. 4C)

Discussion
Coronavirus-related infections are reported to be associated with hematological changes, including lymphopenia, thrombocytopenia, and leukopenia, by infecting blood cells, bone marrow stromal cells, or inducing autoantibodies (Yang et al., 2003). In a former study characterizing the clinical features of patients with COVID-19, Huang et al. (2020) showed that using reverse-transcription polymerase chain reaction (RT-PCR) allowed them to detect coronavirus in plasma-isolated samples from the patients. Their report preferred to use the term ''RNAaemia,'' rather than ''viraemia,'' which they defined as the presence of viral RNA in the blood because they did not perform tests to confirm the presence of infectious SARS-CoV-2 virions in the blood of the patients.
A handful of early reports detected amplifiable viral RNA in the blood (Peng et al., 2020;Wang et al., 2020a); however, to our knowledge, no studies used an unbiased systematic approach to report SARS-CoV2 RNA in PBMCs (Azghandi and Kerachian, 2020). In their preliminary analysis of RNA isolated from PBMCs, Corley et al. confirmed that they did not detect viral sequences (a preprint 10.1101/2020.04.13. 039263v1). High-throughput sequencing has been repeatedly demonstrated to be a practical approach for the identification and quantification of viruses in the blood (Moustafa et al., 2017) following similar methods as those used in viral metagenomics (Aziz et al., 2015;Breitbart et al., 2003) and uncultivated viral genomics (Roux et al., 2019).
Therefore, we planned to exhaustively mine publicly available RNA-Seq PBMC datasets for SARS-CoV-2 RNA sequences. As of the writing of this report, only one study reported viral RNA in an RNA-Seq PBMC dataset (Xiong et al., 2020). In that study, the authors profiled global gene expression in BALF and PBMC specimens of patients with COVID-19. Predictably, we detected viral RNA in all BALF samples (2 patients, 2 replicates each) with an average abundance of 1.07% of the total RNA reads in those samples, including human RNA (Table 1). We also confirmed the finding by Xiong et al. (2020) and detected the viral RNA sequences in one of their PBMC samples from patients with COVID-19 (Table 1)  Of note, to estimate the viral RNA load in a given sample and to compare between different samples, we normalized viral read counts to the total number of reads in each dataset (Fig. 4). Because the total number of human-related reads may be strongly affected by potential microbial RNA (from pulmonary microbiota, for example) or by extensive viral counts, we also sought to estimate the ratio of viral genomic RNA to human cellular RNA. For that purpose, we used actin RNA, being a transcript for a human housekeeping genenot expected to be seen in bacterial cells or viral genomes and expected to have constitutive expression levels. We estimated the ratio of hits to RNA encoding the spike protein (S) to actin transcripts in the sample. Quite interestingly, the spike-to-actin ratio strongly correlated with viral RNA abundance values (computed as SARS-CoV-2 RNA reads normalized to the total number of reads) (Fig. 4C).
These RNA traces are indeed quite rare; however, they confidently and unambiguously belong to SARS-COV-2 ( Supplementary Fig. S1). One viral RNA read translates into polyprotein (pp1ab, accession NP_828849), the largest protein of coronaviruses and involved in replicating and transcribing the viral genome. The other viral RNA from the same sample translates into surface (spike) glycoprotein (accession YP_009724390), which mediates the entry of the viral RNA into human cells expressing human angiotensinconverting enzyme 2(hACE2) (Ou et al., 2020;Walls et al., 2020). The third viral RNA was from a different sample, and it translated into ORF1a polyprotein (accession number YP_009725295), a leader protein, from which the nonstructural proteins are derived.
In our differential expression analysis of the human transcriptome, we opted to limit the analysis to the samples from the Xiong et al. (2020) study because (1) it includes PBMCs from healthy donors and patients with COVID-19 with and without RNA viral sequences and (2) to exclude possible batch effects across RNA-Seq datasets generated from different studies. Our analysis shows a clear separation between the PBMC transcriptomes from healthy donors and patients with COVID-19 ( Figs. 1 and 3). The analysis also shows a further separation within the COVID-19 PBMC samples between samples without detected viral RNA and the sample with viral RNA, suggesting that the PBMC transcriptome with the presence of the viral RNA differs from that with no detected viral RNA sequences.
Although we do not reject the possibility of crosscontamination or barcode bleeding (Kircher et al., 2012;Mitra et al., 2015) for detecting the viral RNA in two PBMC RNA-Seq samples, such possibility is unlikely, given that control samples had no hits to SARS-CoV-2 RNA. A more likely possibility is that SARS-CoV-2 is being sampled by antigen-presenting cells (most likely dendritic cells) or presented to T lymphocytes, which are in the PBMC population. Because such an event (antigen presentation) may be relatively difficult to detect, this explains why it was detected in only 2 of 118 datasets.
One more possibility, which needs many more samples to consider, is that SARS-CoV-2 may be transiently or coincidentally internalized by one of the mononuclear cell types, suggesting a mechanism for the chronicity of the SARS-CoV-2 infection. This hypothesis requires further testing. However, in light of our analysis, it is difficult to support the early reports that SARS-COV-2 targets T lymphocytes in vivo, as suggested earlier, in a correspondence, based on cell culture experiment with pseudotyped viruses (Wang et al., 2020b).
It is important to emphasize that, after 1.5 years of the start of the COVID-19 pandemic, a scientific consensus has been reached that this disease is not merely respiratory, but is a systemic disease involving multiple organs. Our results about the paucity of detectable SARS-CoV-2 RNA in peripheral blood cells are not contradictory to the systemic nature of the disease or the possibility of viral transport through the circulatory system. What the study reports, quite specifically, is that PBMCs are not specifically targeted by the virus. This finding is supported by reports on the lack of ACE2 expression by those immune cells (Hamming et al., 2004;Salamanna et al., 2020).
Although publicly available SARS-CoV-2 genomic data have accumulated to a historical record (>2 million genome sequences), blood transcriptome and RNA-seq data remain insufficient. With more data becoming publicly available, it will be possible to revisit this hypothesis and others to improve our understanding of the progression, replication, chronicity, and recurrence of SARS-CoV-2 in infected individuals.