The pathogenetic influence of smoking on SARS-CoV-2 infection: Integrative transcriptome and regulomics analysis of lung epithelial cells

Corona virus disease (COVID-19) has been emerged as pandemic infectious disease. The recent epidemiological data suggest that the smokers are more vulnerable to infection with COVID-19; however, the influence of smoking (SMK) on the COVID-19 infected patients and the mortality is not known yet. In this study, we aimed to discern the influence of SMK on COVID-19 infected patients utilizing the transcriptomics data of COVID-19 infected lung epithelial cells and transcriptomics data smoking matched with controls from lung epithelial cells. The bioinformatics based analysis revealed the molecular insights into the level of transcriptional changes and pathways which are important to identify the impact of smoking on COVID-19 infection and prevalence. We compared differentially expressed genes (DEGs) between COVID-19 and SMK and 59 DEGs were identified as consistently dysregulated at transcriptomics levels. The correlation network analyses were constructed for these common genes using WGCNA R package to see the relationship among these genes. Integration of DEGs with network analysis (protein–protein interaction) showed the presence of 9 hub proteins as key so called ”candidate hub proteins” overlapped between COVID-19 patients and SMK. The Gene Ontology and pathways analysis demonstrated the enrichment of inflammatory pathway such as IL-17 signaling pathway, Interleukin-6 signaling, TNF signaling pathway and MAPK1/MAPK3 signaling pathways that might be the therapeutic targets in COVID-19 for smoking persons. The identified genes, pathways, hubs genes, and their regulators might be considered for establishment of key genes and drug targets for SMK and COVID-19.


Introduction
The latest human coronavirus is SARS-CoV-2 causes respiratory illness termed COVID-19. This COVID-19 was emerged in the Wuhan city of China in 2019. Research has shown that SARS-CoV-2 viruses currently referred to as COVID-19 may cause health issues such as fever, vomiting, and fatigue in patients infected with this virus. In severe cases, viral respiratory infections may cause the death of a patient due to serious acute respiratory syndrome (SARS) [1][2][3]. The pandemic has been steadily growing since its first emersion in December 2019. There are some studies reported in the literature on the association of COVID-19 and smoking [4][5][6][7][8][9][10]. Among them, Vardavas et al. [4] concluded with their research study that, smoking most certainly contributes to COVID-19's bad course and unfavorable results. In 2021, Umnuaypornlert et al. [5] conducted a meta-analysis among 1248 studies and concluded that smoking greatly raises the chance of COVID-19 severity and mortality. A recent work conducted by Ram Poudal et al. [9] reported that smoking increases the risk of severe COVID-19, including deaths. Findings have also been reported with some evidence of the correlation of variations in incidence and severity of COVID-19 diseases with sex and of smoking with increased ACE2 expression (the recipient for extreme SARS-CoV-2), which may also be a cause of this type [11].
It has been hypothesized from the above reports that smoking might make more vulnerable to COVID- 19 1. Flowchart used in this study. Using edgeR package on RNAseq data of COVID19 and SMK, DEGs of these diseases were identified, After that, common significant DEGs between diseases were identified. Correlation analysis using WGCNA for the common genes to see the characteristics of common genes. Then, identify common significant pathways and GO analysis through the pathway and go analysis on the common significant DEGs. Then, to identify the hub proteins, PPI network was constructed around the common genes, and to identify regulatory TFs and miRNA, DEGs-TFs and DEGs-miRNA networks were also constructed. Finally, the protein drug interaction network was constructed around the hub genes. the high expression of ACE2 expression making it more prone to COVID-19 infection. In addition, a study of meta-analysis observed the significant effect between smokers and no-smokers, where they suggest an increased risk for viral binding and entry of SARS-CoV-2 in lungs epithelial cells of smokers due to significantly and substantially increased pulmonary ACE2 expression [11]. Rao et al. [12] reported some significant genes associated with smoking, alcohol and COVID-19. But, the molecular agents (differentially expressed genes) and pathways related to smoking and COVID-19 have not been identified yet. Thus, identifying the molecular associations at the level of transcriptional changes and pathways is important to identify the impact of smoking on COVID-19 infection prevalence.
Lately, transcriptional signatures of lung epithelial cells infected with COVID-19 have been identified [13]. We have taken this data and analyzed it to identify differentially expressed genes (DEGs) in infected lung epithelial and non-infected cells. Then, we identified DEGs in smokers' epithelial lungs compared to non-smokers. The weighted gene coexpression network analysis (WGCNA) is used to detect the correlated genes in a cluster [14]. It has been widely used for checking the strengthened relationship among the genes [15]. In this research, the network biology approach for the identification of commonly deregulated pathways and molecular signatures in smokers and COVID-19 was adopted. We utilized gene expression data of COVID-19-infected lung epithelial and another dataset of smoking lung epithelial cells to identify common DEGs and pathways overlapped in COVID-19 and smokers. We also constructed the correlation network analysis for these common genes using WGCNA to see the relationship among these genes.

Materials and methods
In this research, we employed a series of analysis methods (see Fig. 1). At first, DEGs of these diseases were identified using edgeR package on RNAseq data of COVID19 and SMK. After that, we identified the common significant genes between COVID19 and SMK to find out the disease-gene associations among them. Next, we applied WGCNA correlation analysis on common genes of COVID-19 and SMK datasets separately to see the characteristics pattern of common genes. Considering these common significant DEGs, we performed signaling pathways and GO analysis and then we constructed PPI network and used the topological analyses to identify hub proteins. To identify regulatory TFs and miRNA, DEGs-TFs and DEGs-miRNA networks were also constructed. Finally, using the identified hub genes obtained from PPI network analysis, a protein-drugs interaction network was constructed.

Datasets employed and statistical methods
RNAseq datasets of COVID-19 and SMK with accession numbers GSE147507 (COVID-19) and GSE47718(SMK) were downloaded from NCBI Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/ geo/) database and were analyzed them to identify the shared significant genes between COVID-19 and SMK [16,17]. The GSE147507 (COVID-19) was RNA-seq transcriptomics data from lung cells infected with SARS-CoV-2 and controls (called mock). The gene expression data was profiled from A549 and NHBE cell within 24-h of infection. Then the samples infected for 24-h were considered and matched mock (controls). While the dataset GSE47718 is RNAseq data derived from the airway epithelium of healthy nonsmokers ( = 10) and smokers ( = 7) patients. We used edgeR package on RNAseq data of COVID19 and SMK to identify the DEGs of these diseases. We considered | | >= 1, and -value < 0.05 was considered statistically significant to identify DEGs.

Analysis methods
For comparing gene expression data of COVID-19 with the SMK dataset, global transcriptomic analysis was applied using RNAseq technologies. By comparing disease to normal, these datasets were produced to identify DEGs associated with their respective pathology. COVID19 and SMK datasets are RNAseq data. We applied edgeR [18]-R Bioconductor package-in raw RNAseq data to identify the DEGs of COVID19 and SMK. After that, by comparing two datasets of DEGs, we identified common DEGs between COVID-19 and SMK. For getting more shared information at the molecular level between COVID-19 and SMK, pathway and GO analyses were performed on shared genes between them by using online visual bioinformatics tools Enrichr (http: //amp.pharm.mssm.edu/Enrichr/). We used Reactome and KEGG pathway databases for pathway analysis. To identify the significant pathways, we considered -< 0.05 as statistically significant. Then, using these common genes, we reconstructed a PPI network utilizing the STRING database [19] via Network Analyst [20]. To identify hub genes from the PPI network topological analysis was applied where degree and betweenness centrality were used.
We know that the TFs and miRNAs affect the expression of transcript levels. So, to see these affect, TF-common significant genes network from the JASPAR database [21] and miRNA-common significant genes network from TarBase [22] and miRTarBase [23] were constructed using NetworkAnalyst tools [20]. For finding significant TFs and miR-NAs, the degree and betweenness centrality (BC) filters were used and identified top 10 TFs and miRNAs. Degree centrality can be defined as the following equation: Here, indicates the number of nodes in the network and represents that node v and j are directly connected. BC is also defined as follows: = total number of shortest paths from node to node , and = total number of paths through node .

Weighted gene co-expression networks construction
To see the clustering nature of the detected common gene between COVID-19 and Smoking datasets, we used weighted gene co-expression network analysis (WGCNA) package [14] of to find the weighted gene co-expression network among these genes. For this purpose, firstly we remove the outlier samples (if there exist) by constructing the sample cluster dendrogram by hclust function for both of the datasets. Then using this in this study, we used the pickSoftThreshold function for finding numerous soft thresholding powers over 2 and picking the value of for which the 2 value is higher. Then we construct the adjacency matrix and Topological Overlap Matrix (TOM) by using the transformed gene expression matrix. The dissimilarity of TOM (dissTOM) was also conducted to construct a network heatmap plot and for further analysis.

Identification of differentially expressed genes
To identify the influence of smoking and COVID-19, we analyzed RNA seq datasets of COVID-19 and SMK obtained from the NCBI-GEO database. We detected there were 739 DEGs (353 up-regulated and 386 down-regulated) in COVID-19 and 3866 DEGs (1916 up-regulated and 1950 down-regulated) in SMK. Then, we performed the crosscomparative analysis to identify common DEGs between COVID-19 and SMK, identified 33 upregulated DEGS common in COVID-19 and SMK, and 26 downregulated DEGs common in COVID-19 and SMK. These common DEGs indicated that they are comorbid.

Functional enrichment of differentially expressed genes common to COVID-19 and SMK
Because of the underlying molecular or biological mechanisms, different diseases are related to each other that can be understood by pathway-based analysis [24]. For these reasons, we find out the common pathways using the common significant genes between COVID-19 and SMK from Enrichr, where we used KEGG and Reactome data as preferred data annotation for enrichment analysis. Significant common pathways are summarized in Table 1. To get further insight into the biological significance of the DEGs, we performed gene ontology analysis (Biological process, Molecular Function and Cellular component) on commonly significant genes for getting the more biological significance of these genes. Identified gene ontologies are summarized in Table 2. Table 1 Molecular pathways enriched by the common differentially expressed genes shared by COVID-19 and SMK diseases. These include significant pathways common to COVID-19 and SMK.

Significant genes identification using WGCNA analysis
In Fig. 6, the gene co-expression network analysis has been executed with 59 DEGs for the COVID-19 dataset including 23 disease samples as well as the Smoking dataset including 7 disease samples. Fig. 6A to Fig. 6C for COVID-19 and Fig. 6D to Fig. 6F for SMK. Fig. 6A visualized the cluster dendrogram of the sample and no outlier samples were detected. To identify the modules through WGCNA, we found the optimized soft thresholding powers = 6 as the scale-free topology criteria (Fig. 6B). With this power value, we constructed the co-expression network and detect 2 modules through the Dynamic Tree Cut technique using deepSplit = 2 and minClusterSize = 20 parameters. We found 12 and 47 genes for gray and turquoise modules respectively for COVID-19. The network heatmap of all genes with these 2 modules has been shown in Fig. 6C. Fig. 6D visualized the cluster dendrogram of the sample and no outlier samples were detected. To identify the modules through WGCNA, we found the optimized soft thresholding powers = 16 as the scale-free topology criteria (Fig. 6E). With this power value, we constructed the co-expression network and detect 2 modules through the Dynamic Tree Cut technique using deepSplit = 2 and minClusterSize = 20 parameters. We found all 59 genes are in the gray module for SMK. The network heatmap of all genes with this module is shown in Fig. 6F.

Discussion
COVID-19 has been emerged as one of the worst pandemics and impacting health worldwide. SMK are more susceptible to COVID-19 infection and higher fatality was reported. In this research, we identified the molecular alterations at the level of transcriptome dynamics that occur in COVID-19-infected lung epithelial and shared transcriptional elements detected in the lung epithelial of SMK. When compared, discovered identified 33 upregulated DEGS common in COVID-19 and SMK and 26 downregulated DEGs common in COVID-19 and SMK. The WGCNA analysis identifies that the common genes showed the exact nature in both COVID-19 and Smoking datasets. The functional annotation of these common transcriptional signatures revealed several molecular pathways enriched by the DEGs. Some of the prominent pathways namely, the Hippo signaling pathway, Amyotrophic lateral sclerosis, Rap1 signaling pathway, Hypertrophic cardiomyopathy, IL-17 signaling pathway, Hematopoietic cell lineage, Salmonella infection, Amoebiasis, TNF signaling, MAPK1/MAPK3 signaling, and Interleukin-6 signaling pathways. It may be possible to consider the drugs that interfere in these pathways to repurpose drugs for COVID-19 infection. Rap1 signaling and MAPK signaling pathways have significant associations with COVID-19 [25]. Another Pathways Hippo signaling has a strong association with COVID-19 infection [26]. One significant pathway common between COVID-19 and SMK is the Interleukin-6 signaling pathway which has a significant impact on the COVID-19 patients as Table 3 Descriptions of drugs.
well as the patients of SMK [27,28]. From some studies, it was reported that the L-17 signaling pathway has significant rules with the patients of COVID-19 and SMK [29][30][31]. Exposing to tobacco smoke contributes to the inflammatory lung phase, increased activation of the mucosa, the release of inflammatory cytokines, tumour necrosis factor , epithelial permeability, excess mucous and compromised clearing [32]. However, a large number of research has to date shown that the most significant cause of COVID-19 patient death is a moderate or extreme cytokine (excessive immune response) floods. Interleukin-6 (IL-6) has a great role in cytokine excess release. As the signal transduction pathway of IL-6 can be disrupted, it will potentially evolve into a new method of treating serious COVID-19 patients [33,34]. Nasir et al. described the research results from cellular and experimental investigations defining the function of IL-6 as a therapeutic target in COVID-19 [35]. Protein-protein interaction network topological analysis provides critical proteins and signaling molecules, and drug targets. The PPI analysis was performed to determine the hub proteins -ITGB3, GLI2, GRIN1, DLG2, SH3GL3, SRC, AKT1, RAC1 and CBL. These hub proteins might be considered as drug targets. Tin Wang et al. [36] reported RAC1 protein as a therapeutic target in Acute Lung Injury induced by severe pneumonia of COVID-19. Another hub protein CBL was reported as a potential drug target for COVID-19 [37]. The hub gene AKT1 is associated with SMK and COVID-19 [36]. In the replication of viruses, particularly those connected to SARS-CoV-2, several SRC family kinases have been found to be active [38]. Another paper suggested that SRC protein is a potential targeted drug for COVID-19 [39]. The hub protein ITGB3 was declared as a potential therapeutic target for COVID-19related stroke [40]. GLI2 variants play a role in the pathogenesis of nonsyndromic cleft lip with or without cleft palate [41] and hazra et al. [42] found possible interaction of GLI2 with COVID-19. GRIN1 involved in neurodevelopmental disease [43]. DLG2 is involved in the developmental and intellectual disability [44].
Lack of lab facilities, the qRT-PCR analysis was not performed for the further validation of our identified significant hub proteins which is the limitation of our work. To provide insights into the transcriptional regulations at the level of transcriptional and post-transcriptional levels, we built a DEGs-TFs and DEGs-miRNAs interaction network. We need to test our findings in the laboratory. Otherwise, covid-19 research is ongoing and still, no research finding is concrete or absolute.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.