Identification of key differentially expressed genes in SARS-CoV-2 using RNA-seq analysis with a systems biology approach

COVID-19 is associated with dysregulation of several genes and signaling pathways. Based on the importance of expression profiling in identification of the pathogenesis of COVID-19 and proposing novel therapies for this disorder, we have employed an in silico approach to find differentially expressed genes between COVID-19 patients and healthy controls and their relevance with cellular functions and signaling pathways. We obtained 630 DEmRNAs, including 486 down-regulated DEGs (such as CCL3 and RSAD2) and 144 up-regulated DEGs (such as RHO and IQCA1L), and 15 DElncRNAs, including 9 down-regulated DElncRNAs (such as PELATON and LINC01506) and 6 up-regulated DElncRNAs (such as AJUBA-DT and FALEC). The PPI network of DEGs showed the presence of a number immune-related genes such as those coding for HLA molecules and interferon regulatory factors. Taken together, these results highlight the importance of immune-related genes and pathways in the pathogenesis of COVID-19 and suggest novel targets for treatment of this disorder.


Introduction
COVID-19 caused by SARS-CoV-2 has produced a pandemic with heavy burden on health system all over of the world. Several investigations have been conducted to unravel the signaling pathways and gene expression patterns altered by this virus [1,2]. These studies would help in identification of the molecular mechanisms responsible for high mortality of this disorder and finding suitable markers for prediction of course of COVID-19. For instance, whole transcriptome analyses have revealed up-regulation of neutrophil-related transcripts and downregulation of T cell related transcripts in SARS-CoV2-affected persons, particularly severely affected ones [3]. Moreover, single-cell and bulk RNA sequencing assays in these patients have revealed inflammatory mechanisms for COVID-19-associated organ damage [4]. In addition, a comprehensive assessment of transcriptome of organs affected by this virus has indicated the role of a group of myeloid-lineage cells with extremely inflammatory but distinctive expression profile in each part in the pathogenesis of COVID-19-related complications [5]. Based on this data, transcriptomic profiling has been suggested as a tool for identification of the pathogenesis of COVID-19 in each affected individual [5].
Based on the importance of expression profiling investigations in identification of the pathogenesis of COVID-19 and proposing novel therapies for this disorder, we have employed an in silico approach to discover differentially expressed genes between COVID-19 patients and healthy controls and their relevance with cellular functions and signaling pathways.

Collection of RNA-seq data and evaluation of dataset quality
In order to acquire the RNA-seq raw counts of GSE184610 (Illumina NovaSeq 6000 (Homo sapiens); GPL24676), we utilized the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). A total of 138 negative control samples and 204 SARS-CoV-2 positive patient mucosa samples were chosen for further analysis. In Rstudio software (version 4.0.4), we imported the dataset and assessed its quality. This step involves investigating raw counts and normalized counts boxplots and investigating dispersion estimate plot.

The dataset preprocessing and DEGs analysis
We analyzed raw counts data using DESeq2 package [6] and got DEGs using lfcShrink function via normal method. In addition, Bonferroni was used to adjust the P value into the FDR. We used the FDR <0.05 and |log2 FC| > 1 as the cutoff criteria for DEmRNAs and DElncRNAs.

Gene ontology (GO) enrichment analyses
We used the clusterProfiler R package [7] to accomplish Gene Ontology (GO) enrichment analyses to explore the function of the identified up-regulated and down-regulated DEmRNAs. The functional category criteria were established at an adjusted p-value of 0.05 or below.

Kyoto encyclopedia of genes and genomes (KEGG) pathway analysis
To determine the probable roles of these genes, KEGG pathway analysis [8] of significantly up-regulated and down-regulated DEGs was performed.

PPI network construction and hub genes identification
The PPI network for DEGs was built using the STRING database [9]. The interactions parameter was established using the highest level of confidence (confidence score >0.9). The interactions between the proteins were shown using Cytoscape software version 3.9 [10]. The Cytohubba plugin of Cytoscape [11] and the degree method were finally used to identify the top 20 DEGs associated with hub genes.

Receiver operating characteristic (ROC) curve analysis
The area under the curve (AUC) values derived from receiver operating characteristic (ROC) curve analysis were used to assess the diagnostic efficacy of hub genes.

Regulatory network of miRNA-hub genes and TF-hub genes
Finally, the Networkanalyst database [12] was used to create the relationships between the hub genes and the transcription factors (TFs) and microRNAs (miRNAs).

Dataset quality assessment
To normalize the raw counts, we utilized the DESeq2 package and the normalized counts transformation method. The boxplot of the normalized data for each sample is shown in Fig. 1. This boxplot shows that the data quality was acceptable. The dispersion estimate plot ( Fig. 2) displays the gene-wise estimates, the fitted values, and the final maximum a posteriori estimates used in testing. This graphic demonstrates that the DESeq2 model fits the data well. Heatmap of the count Table is shown in Fig. 3. In this figure, we used variance stabilizing transformation (VST) method to show the heatmap of top 30 genes.

DEGs analysis
Based on the RNA-seq data analysis between SARS-CoV-2 positive and control samples by DESeq2 package, we obtained 630 DEmRNAs, including 486 down-regulated DEGs (such as CCL3 and RSAD2) and 144 up-regulated DEGs (such as RHO and IQCA1L), and 15 DElncRNAs, including 9 down-regulated DElncRNAs (such as PELATON and LINC01506) and 6 up-regulated DElncRNAs (such as AJUBA-DT and FALEC). Table 1 lists the top 10 down-regulated and up-regulated DEmRNAs. Table 2 lists down-regulated and up-regulated DElncRNAs.

Table 2
The significantly up-and down-regulated DElncRNAs between SARS-CoV-2 positive patient mucosa and negative control samples.  The variations of lncRNA and mRNA expression between SARS-CoV-2 positive patient mucosa and negative control samples were visualized and assessed using volcano plot (Fig. 4).

GO enrichment analysis of DEGs
In this study, the considerably DEGs were enriched in 774 GO terms. We used Clusterprofiler package to perform analysis. In GO functional enrichment analysis, 774 GO entries satisfy Adjusted P value less than 0.05, most of which are biological processes (BP), followed by cellular component (CC) and molecular function (MF). The first 30 entries are response to virus (BP), cytokine-mediated signaling pathway (BP), positive regulation of response to external stimulus (BP), secretory granule membrane (CC), response to lipopolysaccharide (BP), response to molecule of bacterial origin (BP), defense response to virus (BP), defense response to symbiont (BP), cellular response to biotic stimulus (BP), regulation of response to biotic stimulus (BP), cellular response to lipopolysaccharide (BP), cellular response to molecule of bacterial origin (BP), positive regulation of cytokine production (BP), leukocyte chemotaxis (BP), leukocyte migration (BP), myeloid leukocyte migration (BP), negative regulation of viral genome replication (BP), regulation of innate immune response (BP), immune response-regulating signaling pathway (BP), granulocyte migration (BP), pattern recognition receptor signaling pathway (BP), cell chemotaxis (BP), positive regulation of defense response (BP), neutrophil migration (BP), leukocyte mediated immunity (BP), regulation of viral genome replication (BP), negative regulation of immune system process (BP), myeloid leukocyte activation (BP), viral genome replication (BP) and neutrophil chemotaxis (MF). As a result, the majority of the terms are related to the immune system and the body's viral defense system. In this way, the outcomes support the GO enrichment analysis. In Fig. 8, the UpSet plot visualizes the intersection between top 10 GO terms. It highlights the gene overlap between several gene sets. Fig. 9 indicates the gene-concept network of top 5 GO terms (cytokine-mediated signaling pathway, positive regulation of response to external stimuli, response to polysaccharides, response to molecules of bacterial origin and response to virus).

Pathway analysis
Using Pathview [13] and gage [14] packages in R, KEGG pathways analysis [15,16] of 486 down-regulated and 144 up-regulated DEmR-NAs were performed to identify the potential functional genes (Fig. 10). . X axis shows the count of geneset; Y axis shows the geneset function; Bar color represents the adjusted P.value, ranging from red (most significant) to blue (least significant). Table 3 and Fig. 10. Pathways were related to immune system functions in a manner similar to what we saw in the gene ontology terms.

Validation of hub genes
We used graphpad prism 9.0 to construct ROC curves. The accuracy of the hub genes predictions was assessed using the ROC curve. The diagnostic values of these hub genes were compared using AUC. ROC curves and AUC values of dataset are shown in Fig. 12. The computed AUC values in this study varied from 0.7 to 1, which is considered to have high discriminative power. According to the ROC analysis, the expression levels of 9 hub genes showed high diagnostic value.

Identification of the regulatory network of miRNA-hub genes
The Networkanalyst online database was used to collect miRNAs that target hub genes (Fig. 13). Hsa-miR-146a-5p and has-miR-29b-3p were regarded as significant miRNAs since they interacted with hub genes at the greatest level (degree 3) possible (Fig. 14).

Examination of the regulatory network of TF-hub genes
We obtained TFs targeting hub genes from the Networkanalyst database (Fig. 13). The TF-hub gene network showed that the SPI1 regulates six hub genes. Fig. 6. The dotplots of top 10 enriched functions. X axis shows the count of geneset; Y axis shows the geneset function; Dot color represents the adjusted P value ranging from dark blue (most significant) to red (least significant). Dot size represents the GeneRatio and the larger the size of the dot, the higher the value of the gene ratio.   Fig. 9. Top 5 GO terms as a network plot. These GO terms were connected to genes in this graph. The connection of genes to the corresponding GO is marked with a special color. There are more genes for a specific GO term if the dot relating to it is bigger.

Discussion
COVID-19 has been shown to trigger strong alterations in the blood transcriptome. Most notably, recent studies have demonstrated associations between the severity of COVID-19 and remarkable alterations in the neutrophil-and T cell-related transcripts [3]. Expression of a variety of other transcripts has also been found to be altered in COVID-19 patients [1,2]. In the current investigation, we developed an in silico approach to find the most important altered genes and signaling pathways in COVID-19. Our analysis included both mRNAs and lncRNAs. CCL3, RSAD2, PLAU, CCL2, CCL4, IFIT3, FCER1G, IFITM1, C3AR1 and TNFAIP6 have been among the most significantly down-regulated mRNAs in COVID-19 patients. CCL3, CCL2 and CCL4 are among chemokines whose expressions are induced by IL-1 and TNF-α and are associated with viral infections [17]. C3AR1 is a receptor for C3a, an     IFI35  IFI6  IFIT1  IFIT2  IFIT3  IFIT5  IFITM1  IFITM2  IFITM3  IRF7  ISG15  ISG20  MX1  MX2  RSAD2  STAT2  anaphylatoxin produced during induction of the complement system. Our analyses also revealed down-regulation of a number of lncRNAs such as NEAT1 and CYTOR and up-regulation of AJUBA-DT, FALEC, MZF1-AS1, RNF139-DT, LMO7DN and SNHG31 lncRNAs. NEAT1 has been found to modulate innate immune responses during viral infections [18]. CYTOR is a tumor-associated lncRNA that can modulate inflammatory responses via epigenetic modifications [19]. GOE analysis of differentially expressed genes showed response to virus, cytokine-mediated signaling pathway, positive regulation of response to external stimulus and secretory granule membrane as the most important biological pathways enriched by these genes. Moreover, gene-concept network showed the importance of cytokine-mediated signaling pathway, positive regulation of response to external stimuli, response to polysaccharides, response to molecules of bacterial origin and response to virus in this condition.
Finally, the PPI network of DEGs showed the presence of a number immune-related genes such as those coding for HLA molecules and interferon regulatory factors among hub genes with the highest degree of connectivity.
Taken together, the in silico approach used in the current study highlights the importance of immune-related genes and pathways in the pathogenesis of COVID-19 and suggests novel targets for treatment of this disorder.

Declarations
Ethics approval and consent to participant: Not applicable.

Consent of publication:
Not applicable.

Availability of data and materials:
The analyzed data sets generated during the study are available from the corresponding author on reasonable request.
Funding: Not applicable.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Data will be made available on request.