Mucosal Gene Expression in Response to SARS-CoV-2 Is Associated with Viral Load

ABSTRACT Little is known about the relationships between symptomatic early severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral load and upper airway mucosal gene expression and immune response. To examine the association of symptomatic SARS-CoV-2 early viral load with upper airway mucosal gene expression, we profiled the host mucosal transcriptome from nasopharyngeal swab samples from 68 adults with symptomatic, mild-to-moderate coronavirus disease 19 (COVID-19). We measured SARS-CoV-2 viral load using reverse transcription-quantitative PCR (RT-qPCR). We then examined the association of SARS-CoV-2 viral load with upper airway mucosal immune response. We detected SARS-CoV-2 in all samples and recovered >80% of the genome from 95% of the samples from symptomatic COVID-19 adults. The respiratory virome was dominated by SARS-CoV-2, with limited codetection of other respiratory viruses, with the human Rhinovirus C being identified in 4 (6%) samples. This limited codetection of other respiratory viral pathogens may be due to the implementation of public health measures, like social distancing and masking practices. We observed a significant positive correlation between SARS-CoV-2 viral load and interferon signaling (OAS2, OAS3, IFIT1, UPS18, ISG15, ISG20, IFITM1, and OASL), chemokine signaling (CXCL10 and CXCL11), and adaptive immune system (IFITM1, CD300E, and SIGLEC1) genes in symptomatic, mild-to-moderate COVID-19 adults, when adjusting for age, sex, and race. Interestingly, the expression levels of most of these genes plateaued at a cycle threshold (CT) value of ~25. Overall, our data show that the early nasal mucosal immune response to SARS-CoV-2 infection is viral load dependent, potentially modifying COVID-19 outcomes. IMPORTANCE Several prior studies have shown that SARS-CoV-2 viral load can predict the likelihood of disease spread and severity. A higher detectable SARS-CoV-2 plasma viral load was associated with worse respiratory disease severity. However, the relationship between SARS-CoV-2 viral load, airway mucosal gene expression, and immune response remains elusive. We profiled the nasal mucosal transcriptome from nasal samples collected from adults infected with SARS-CoV-2 during spring 2020 with mild-to-moderate symptoms using a comprehensive metatranscriptomics method. We observed a positive correlation between SARS-CoV-2 viral load, interferon signaling, chemokine signaling, and adaptive immune system in adults with COVID-19. Our data suggest that early nasal mucosal immune response to SARS-CoV-2 infection was viral load dependent and may modify COVID-19 outcomes.

(1). While most SARS-CoV-2 infections are either mild or asymptomatic, in approximately 2% to 10% of cases, it can lead to life-threatening pneumonia and multiple-organ failure (2), resulting in more than 5.5 million official COVID-19 deaths worldwide (3). It has been hypothesized that a dysregulated innate immune activation response induces a cytokine storm that promotes respiratory failure and may lead to acute respiratory distress syndrome (ARDS) (4,5), which has been the main reason for hospital admission and mortality in COVID-19 patients (5). Several studies suggested that SARS-CoV-2 viral load can predict the likelihood of disease spread and severity (6)(7)(8). A higher detectable SARS-CoV-2 plasma viral load was associated with worse respiratory disease severity (8). Recently, we have identified associations between SARS-CoV-2 viral titer and the nasopharyngeal microbiome in adults (9). However, little is known about the relationship between SARS-CoV-2 viral load and airway mucosal immune response. To fill this knowledge gap, we profiled the host mucosal transcriptome from nasopharyngeal samples collected from outpatient adults infected with SARS-CoV-2 during spring 2020 with mild-to-moderate symptoms utilizing a comprehensive metatranscriptomics method we developed recently (10). We examined the association of SARS-CoV-2 viral load with the mucosal immune response.

RESULTS
Patient characteristics and SARS-CoV-2 viral load measurement. Sixty-eight adults with confirmed, symptomatic, mild-to-moderate COVID-19 (based on criteria from the World Health Organization [11]) were enrolled as part of a clinical trial (12). All patient enrollment and sample collection procedures were done in the early phase of the pandemic (spring of 2020). As described in the Materials and Methods, all research samples were collected early in the infection (within 24 h of clinical diagnosis). The patients' baseline characteristics are shown in Table 1. The median (interquartile range [IQR]) age was 36 (27 to 57) years. None of the participants had used antibiotics in the prior 2 weeks. Reverse transcription-quantitative PCR (RT-qPCR) was used to quantify the SARS-CoV-2 viral load. Based on the cycle threshold (C T ) value, we partitioned the subjects into the following tertiles: lower tertile (C T 14.5 to 21.5, n = 23, hereafter referred to as "high viral load group"), median tertile (C T 21.6 to 25.5, n = 22, hereafter referred to as "medium viral load group"), and upper tertile (C T 25.6 to 36, n = 23, hereafter referred to as "low viral load group"). The three groups had no significant differences in baseline characteristics and underlining comorbidities (Table 1). Metatranscriptome captured the nasal virome. Total RNA from the nasopharyngeal swab samples was extracted and further processed as described previously (10). After qualitybased trimming and removal of human and bacterial rRNA reads, we had an average of 98,120,791 (median, 740,58,180) human reads and an average of 2,263,290 (median, 848,478) microbiome reads. The microbiome read bin contains viral, bacterial, fungal, and unclassified reads (for details, see Materials and Methods). Following quality control and initial data processing steps, including a taxonomic classification of reads, the reads classified as viruses were used to profile the respiratory virome. The reads mapped to human transcripts were used to analyze the host response to SARS-CoV-2.
With a stringent cutoff of .80% genome coverage, we identified SARS-CoV-2 in 65 (95.5%) out of 68 samples (see Table S3 in the supplemental material). All SARS-CoV-2 genomes identified were placed in the B.1 lineage in the Nextclade tree (13) (see Table S2 in the supplemental material). In addition to SARS-CoV-2, we identified human Rhinovirus C in four samples, and bacteriophage/prophage RNA transcripts were also recovered from many of the samples, albeit with lower genome coverage (Fig. 1A). Among the four samples codetected with human rhinoviruses, three were mapped to human Rhinovirus C reference genome JN798569.1 and one was mapped to human Rhinovirus C reference genome LC004809.1. Human rhinovirus C was spread across three C T value quantile groups; two were in the C T value quantile 1 group (high viral load), one was in C T value quantile 2 (medium viral load), and one was in C T value quantile 3 (low viral load) groups (Table S3). As anticipated, common respiratory viruses were not codetected in these samples, except human Rhinovirus C was codetected in 6% of the samples. This finding may be due to the implementation of public health measures early in the COVID-19 pandemic, such as masking practices and encouragement of social distancing.
Differential host nasal mucosal gene expression based on SARS-CoV-2 viral load. Differential expression analysis between high and low viral load groups revealed that 48 genes were significantly upregulated and 4 genes were significantly downregulated (based on a threshold of log 2 fold change of .j1j and adjusted P , 0.05) in the high viral load subjects ( Fig. 1B and C; see Table S1 in the supplemental material). Most of the upregulated genes in the high viral load group are involved in the immune response during viral infection, specifically, genes involved in interferon alpha/beta signaling (IFITM3, IFITM1, RSAD2, MX2, IFI6, ISG15, IFI35, IFIT1, USP18, OASL, BST2, ISG20, OAS1, OAS2, OAS3, IRF7, and XAF1) ( Fig. 2A and B) and immunoregulatory interactions between a lymphoid and a nonlymphoid cell (IFITM1, CD300E, LILRB1, and SIGLEC1) (Fig. 2B). Only a few genes were downregulated in the high viral load group, including magnesium transporter 1 (MAGT1), mediator complex subunit 18 (MED18), RNA polymerase II subunit D (POLR2D), and centromere protein C (CENPC) (Fig. 1). There was no significant difference in the gene expression profiles between the high and medium viral load groups and medium and low viral load groups.
We examined the Spearman correlations between SARS-CoV-2 viral load and the mucosal gene expression levels. The genes involved in mucosal inflammation were inversely correlated with viral loads in terms of the C T values (Fig. 3); a lower C T value corresponds to a high viral load, indicating a higher level of infectiousness. Strikingly, there was a moderate negative correlation (r between 20.5 to 20.6; q value of ,0.05) between interferon signaling (OAS2, OAS3, IFIT1, UPS18, ISG20, IFITM1, ISG15, and OASL), chemokine receptor signaling (CXCL 10 and CXCL11), and adaptive immune system (IFITM1, CD300E, and SIGLEC1) genes with SARS-CoV-2 C T values in symptomatic, mild-to-moderate COVID-19 adults (Fig. 3). The expression levels of most of these genes were decreased with a lower viral load and seemed to reach the plateau phase at a C T value of ;25 (Fig. 4). The expression of the POLR2D gene, which encodes the fourth largest subunit of RNA polymerase II, responsible for synthesizing mRNA in eukaryotes, showed a weak positive correlation (r = 0.28, q value = 0.02) with SARS-CoV-2 viral load (Fig. 3).

DISCUSSION
High SARS-CoV-2 viral loads during early infection have been associated with severe disease outcomes (14). We did not find an association between disease severity and viral  (20), and blood (19,(21)(22)(23)(24). Studies using animal models and cell lines have shown that SARS-CoV-2 infection is associated with low type I and type III interferon and elevated interleukin-6 (IL-6) and chemokines that resulted in reduced innate antiviral defense and increased inflammation (25). A metatranscriptomic analysis of human BAL fluid from SARS-CoV-2-infected patients shows robust overexpression of interferon-stimulated genes (ISGs) and other inflammatory genes (20). However, to our knowledge, none of the studies focused on understanding the role of early viral load on the mucosal immune response. Similar to published data, we found the antiviral protein ISG15 was upregulated upon SARS-CoV-2 infection (26), as it can activate downstream pattern recognition receptor MDA5 to establish an antiviral state. However, this pathway is directly antagonized by SARS-CoV-2 to evade host innate immunity via the overexpression of PLpro, a gene with redundant functionality of ISG-deconjugating enzyme USP18 (a negative regulator of type I and III interferons)  analyses. Differential expression analysis was conducted using DESeq2 models, including age, sex, and race as covariates. The red squares indicate significantly upregulated genes, and the blue squares indicate significantly downregulated genes in symptomatic SARS-COV-2 adults with high viral load compared with the low viral load group. Only the top 50 most significantly different genes are labeled. (C) Principal-component analysis of the normalized gene-level read counts. Each dot represents a sample, and the samples are colored based on the C T value; we partitioned the subjects into the following tertiles: lower tertile (high viral load group) shown in red, median tertile (medium viral load group) shown in green, and upper tertile (low viral load group) shown in blue color. (27), as several studies (27,28) show upregulation of USP18 upon SARS-CoV-2 infection. Here, we show USP18 was upregulated in the nose in a viral-load-dependent manner, which for the first time confirms in vitro findings in clinical samples (Fig. 4). Our findings in the human clinical samples support the hypothesis from Vere et al. (29) that targeting USP18 has strong therapeutic potential for COVID-19.
Similar to our findings, several studies have shown that the ISGs, OAS1, OAS2, OAS3, IFITMs, and OASL are robustly activated upon SARS-CoV-2 infection (30,31). ISGs are critical in countering other interferon-responsive viruses like SARS-CoV and Middle East respiratory syndrome (MERS)-CoV (32). Supporting this hypothesis, multiple genome-wide association studies have shown that mutations in ISGs are associated with susceptibility to SARS-CoV-2 and the severity of COVID-19. Similarly, we also found interferon alpha and beta signaling genes BST2, IFIT1, OASL, MX2 (known antivirals), IFI35, IFI6, IFITM1, IFIM3, IRF7, ISG15, ISG20, RSAD2, USP18, and XAF1 (Fig. 4) were upregulated in subjects with a higher viral load. These genes have been linked with COVID-19 or other viral infection restrictions. IFIT genes, particularly IFIT1, work in concert with other ISGs to inhibit viral translation by binding viral mRNAs that mimic host mRNAs due to the presence of 59 cap-1 structures (33,34), which SARS-CoV-2 contains (35). The IFIT proteins also increase the expression of CXCL10 (36), the 2nd most upregulated gene in our analysis, which induces lymphocyte chemotaxis and may FIG 3 The Spearman correlation between C T value and mucosal gene expression. The Spearman correlation coefficient was calculated for normalized gene-level expression counts and C T value (viral load). The Benjamini-Hochberg method was used for multiple comparisons and P value adjustments. The gene expression is moderately correlated with viral load. As the adjusted P , 0.05, the correlation is statistically significant.
inhibit the replication of infected viruses. Only one study has identified IFIT1 in vitro during COVID-19 infection (37).
CXCL10 and CXCL11 are two T helper type 1 (Th1) chemokines that were upregulated in our analysis, and their gene expression was correlated with viral load. Both chemokines are highly produced in bronchial and alveolar epithelial cells. They have been implicated as predictors of COVID-19 severity in vitro, in vivo in an animal model, and in clinical data (38,39). Additionally, our findings of upregulation of RSAD2, OAS1-3, IFITM1, MX2, and CD300E (20,(40)(41)(42) agree with findings published previously for SARS-CoV-2 infection. We further show that the immune modulatory gene sialic acid-binding Ig-like lectin 1 (Siglec-1/CD169) was upregulated in subjects with high viral load. Lectins, especially Siglec-1/CD169, have been known to mediate the attachment of viruses to antigen-presenting cells (APCs) (43). Recent studies show that blockage of Siglec-1 on monocyte-derived dendritic cells (MDDCs) decreased SARS-CoV-2 viral transfer or trans-infection to bystander target cells (43), and Siglec1 was associated with disease severity (44) and enhanced SARS-CoV-2 infection and influenced neutralizing antibodies (45).
Our study has many strengths, such as the inclusion of an overall young population with no underlining comorbidities (Table 1), no recent use of antibiotics or current use of intranasal medications, and samples were collected early in the infection, i.e., within the first 24 h of SARS-CoV-2 diagnosis. Also, as the patient enrollment and sample collection were done in the early pandemic (spring 2020), none of our patients were vaccinated or potentially infected prior to this infection, which could have confounded immune response and gene expression (9,12). All the SARS-CoV-2 genotypes in this study were identified as B.1 lineage (Table S2). Our metatranscriptomics method also captured the virome and showed limited coinfection with other respiratory viruses.
We should also acknowledge several limitations. First, our study was cross-sectional; thus, there is a possibility of reverse causation. Second, residual confounding is possible, as we lacked data on the participants' atopic status or other unknown morbidities that could modify the immune response. Third, though all samples were collected within 24 h after the first diagnosis of COVID-19, it is still impossible to ascertain the days since infection, as the presymptomatic period can vary considerably (3 to 15 days) between patients. Fourth, nasal viral titers drop dramatically after 10 to 15 days of infection (46); thus, prior studies in hospitalized patients have shown blood viral titer to be a better predictor of the disease severity (8). Fifth, we did not have lower respiratory tract samples. However, the upper respiratory tract (URT) is the portal of entry and an active site of replication of SARS-CoV-2, as well as a common harboring site for potential pathogens, and is thus of critical importance in the pathogenesis of this respiratory virus (47). Last, our results cannot be extended to adults with asymptomatic, severe, or critical COVID-19, as only those with symptomatic, mild-to-moderate COVID-19 were included in our study (11). Despite limitations, our study highlights that URT mucosal innate immune response correlates with SARS-CoV-2 viral load.
In summary, we determined that early SARS-CoV-2 viral load was associated with URT gene expression during COVID-19 infection and potentially modified both the innate and adaptive immune response to SARS-CoV-2. Future studies with larger sample sizes, with serial sample collection, and with patients who develop severe disease outcomes will be needed to examine how SARS-CoV-2 interacts with the mucosal gene expression and how these viralhost interactions can impact the clinical progression, severity, and recovery of COVID-19.

MATERIALS AND METHODS
Overview of the study design. For the current study, we included nonhospitalized patients aged $18 years who were diagnosed with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection (confirmed by qualitative PCR) at Vanderbilt University Medical Center or one of its affiliated centers in Nashville, Tennessee. These patients were enrolled as part of a clinical trial examining the effect of several types of nasal irrigations on upper respiratory tract (URT) symptoms and viral load during coronavirus disease 2019 (COVID-19). The detailed methods for this clinical trial have been reported previously (12). Exclusion criteria for these patients included current use of nasal saline irrigations or other intranasal medications, inability to perform nasal irrigations or to collect URT samples in a separate house bathroom or away from household contacts, or need for hospitalization related to SARS-CoV-2 infection. Thus, only patients with mild or moderate COVID-19 (based on criteria from the World Health Organization [11]) were included in the clinical trial. Eligible patients were contacted and enrolled in the study within 24 h of initial diagnosis.
Following adequate training, all participants were asked to obtain a midturbinate swab on the day of enrollment (i.e., before any study intervention for those enrolled in the clinical trial) using a self-collection kit (FLOQSwabs; Copan Diagnostics Inc.). These enrollment samples were used for the current study. The sample collection in the present study occurred between April and June 2020. Each adult provided informed consent for their participation. The Institutional Review Board of Vanderbilt University approved this study.
Severe acute respiratory syndrome coronavirus 2 testing by quantitative reverse transcription-PCR. To measure viral load in SARS-CoV-2-infected patients, we performed quantitative reverse transcription-PCR (RT-qPCR) in the midturbinate swabs. Total RNA was extracted from the swabs using a phenol-chloroform-based method. The swabs were placed in red 1.5-mL RINO screw-cap tubes (NextAdvance) prefilled with RNase-free zirconium oxide beads, and QIAzol lysis reagent (Qiagen) was added. Samples were then homogenized in a Bullet Blender 24 Gold (NextAdvance) system for 3 min at maximum speed. Following homogenization, genomic DNA was eliminated with genomic DNA (gDNA) eliminator columns (Qiagen), and RNA was purified using the RNeasy mini plus kit (Qiagen) following the manufacturer's protocol. The RNA quality was measured using a 2100 bioanalyzer (Agilent Technologies). The United States Centers for Disease Control and Prevention primers and probes designed for the detection of SARS-CoV-2 (2019-nCoV) were purchased from Integrated DNA Technologies (IDT) (48). Both the SARS-CoV-2 nucleocapsid gene region 1 (N1) and nucleocapsid gene region 2 (N2) were targeted to detect SARS-CoV-2. RNase P was also examined as a measure of RNA quality and quantity. RT-qPCR was performed using the SuperScript III one-step RT-PCR system with Platinum Taq DNA polymerase (Invitrogen) as per manufacturer's instructions on a CFX96 touch real-time PCR detection system (Bio-Rad). Plasmid controls for 2 SARS-CoV-2 nucleocapsids and RNase P were also ordered from IDT at a concentration of 66,666 copies/reaction. No-template controls and an extraction negative were used as negative controls. Reactions were prepared using 12.5 mL of SuperScript III master mix (ThermoFisher), 1 mL each of 400-nm forward and reverse primer, 1 mL of 400 nM FAM-labeled probe, 1 mL of Platinum Taq polymerase, 3 mL of template RNA, and 7.25 mL of PCR-certified water (Teknova). RNA was reverse transcribed at 50°C for 15 min, and PCR conditions were run on a 95°C denaturation step for 2 min, followed by 40 cycles of 95°C for 15 s and 55°C for 30 s. The cycle threshold (C T ) values were captured and calculated by the CFX Maestro (Bio-Rad) software and used as a measure of viral load.
RNA extraction, metatranscriptomic library preparation, and sequencing. The nasal swab samples in the self-collection kit (FLOQSwabs; Copan Diagnostics Inc.) were vortexed for 2 min, and then an aliquot of the 250-mL nasal swap sample was used for RNA extraction. The total RNA from these samples was extracted as described previously (10). In brief, a 250-mL aliquot from the nasal swab sample was homogenized in 600 mL QIAzol (Qiagen) and 500 mL of 2.0 mm zirconium oxide beads (Next Advance, Inc.; catalog ZROB05) using a Bullet Blender homogenizer (BB24-AU; Next Advance, Inc.). While the samples were homogenizing, the temperature was maintained at or near 4°C by using the dry ice cooling system in the Bullet Blender system. The homogenate was treated with 100 mL of genomic DNA eliminator solution (Qiagen) to remove the genomic DNA. Next, 180 mL of chloroform was added to the samples for phase separation. The total RNA in the aqueous phase was then purified using RNeasy mini spin columns as the Qiagen RNeasy protocol recommended. RNA integrity and quantification were assessed using a bioanalyzer RNA 6000 Nano/Pico chip (Agilent Technologies, Palo Alto, CA). Eukaryotic rRNAs were depleted using the NEBNext rRNA depletion kit (human/mouse/rat; catalog number E6310X). After rRNA depletion, the samples were checked by using the Agilent Bioanalyzer RNA 6000 Nano/Pico chip to ensure depletion of the 18S and 28S ribosomal peaks. Next, Illumina sequencing libraries were made using the NEBNext Ultra II RNA library prep kit (New England BioLabs [NEB]; number E7775). The quality of the libraries was assessed using an Agilent bioanalyzer DNA high-sensitivity chip. The libraries were then sequenced on an Illumina NovaSeq6000 platform (S4 flow cells run) with 2 Â 150-base pair reads, with a sequencing depth of ;40 million paired-end reads per sample.
(ii) Read binning. Reads were mapped to human rRNA and the human mitochondrial genome using bbmap from bbtools; the mapped reads were discarded. The remaining reads were binned into the human genome, bacterial rRNA, and a bin that contains all microbiome reads, using the seal, from bbtools, with default parameters. The human genome (GRCh38) and SILVA bacterial rRNA database were used as references. Binning resulted in an average of 98,120,791 (median, 740,58,180) reads mapping to the human genome and an average of 2,263,290 (median, 848,478) microbiome reads. The microbiome read bin contains viral, bacterial, fungal, and unclassified reads.
(iii) Taxonomic classification of reads. Reads from the microbiome bin were subjected to taxonomic classification using KrakenUniq (51) with default parameters. The reference NCBI nucleotide database was installed via the kraken2-build script.
(iv) Virome profiling. To produce a high-confidence virome profile, we developed a method that first produces de novo transcriptome assemblies, followed by putative virome identification using BLAST searches, and finally high-confidence virome profiling based on read mapping to reference virus genomes. This workflow was implemented in a bash script. First, reads that were classified as viral by KrakenUniq were extracted using the script krakenuniq-extract-reads, with taxonID 10239 (superkingdom, viruses). If more than 100,000 reads were extracted, they were first normalized to a target depth of 100 using bbnorm from bbtools. Reads were assembled using the metaSPAdes assembler. The resulting contigs were filtered for length, using reformat from bbtools, and only contigs that were at least 300 bp were retained. Nucleotide BLAST (blastn) searches were performed on the resulting contigs against the NCBI nucleotide database with -max_target_seqs and -max_hsps set to 1. From the BLAST results, a list of subjects was compiled, and their genome sequences (fasta) were extracted from the nucleotide BLAST database using the blastdbcmd from BLAST. Each genome sequence was used as a reference, and all the virome reads were mapped using bowtie2. Genome coverage and average read depth statistics were extracted from this mapping using SAMtools (52). The high-confidence virome profile was constructed using these coverage statistics.
(v) SARS-CoV-2 genomes. Full sets of reads (before binning into human, bacterial rRNA, and microbiome) were utilized to produce the SARS-CoV-2 genome sequences. They were mapped to the SARS-CoV-2 reference genome NC_045512 using bowtie2 with default parameters. The consensus sequence was called using SAMtools, and lineages were determined using the Pangolin Web application (https://pangolin.cog-uk.io/). Consensus genome sequences were submitted to Global Initiative on Sharing Avian Influenza Data (GISAID).
(vi) Host response to SARS-CoV-2 infection. The reads identified as originating from human transcripts were mapped to the human genome (hg19) using HISAT2 (53). The read counts for genomic features were quantified using HTSeq (54). The feature counts of all the samples were combined into a single matrix using a custom R script. SARS-CoV-2-positive samples were partitioned into tertiles based on C T values. Differential expression analysis was performed by comparing tertile groups using the DESeq2 package (55). Genes with a significant log 2 fold change with an adjusted P value of ,0.05 were treated as differentially expressed. The lists of differentially expressed genes for each group were analyzed for enrichment of reactome human pathways using Enrichr (56) and were deemed significant when the false discovery rate (FDR) was ,0.05.
Spearman rank correlation. The Spearman rank correlation coefficient was calculated for gene expression counts and C T value (viral load) using the cor.test() function in R. The Benjamini-Hochberg method was used for multiple comparisons and P value adjustments.
Data availability. Raw sequence reads are available in the Sequence Read Archive (SRA) BioProject identifier (ID) PRJNA922078.