Bioinformatics analysis of potential therapeutic targets for COVID-19 infection in patients with carotid atherosclerosis

Background COVID-19 is a new coronavirus that constitutes a great challenge to human health. At this stage, there are still cases of COVID-19 infection in some countries and regions, in which ischemic stroke (IS) is a risk factor for new coronavirus pneumonia, and patients with COVID-19 infection have a dramatically elevated risk of stroke. At the same time, patients with long-term IS are vulnerable to COVID-19 infection and have more severe disease, and carotid atherosclerosis is an early lesion in IS. Methods This study used human induced pluripotent stem cell (hiPSC)-derived monolayer brain cell dataset and human carotid atherosclerosis genome-wide dataset to analyze COVID-19 infection and carotid atherosclerosis patients to determine the synergistic effect of new coronavirus infection on carotid atherosclerosis patients, to clarify the common genes of both, and to identify common pathways and potential drugs for carotid atherosclerosis in patients with COVID-19 infection Results Using several advanced bioinformatics tools, we present the causes of COVID-19 infection leading to increased mortality in carotid atherosclerosis patients and the susceptibility of carotid atherosclerosis patients to COVID-19. Potential therapeutic agents for COVID-19 -infected patients with carotid atherosclerosis are also proposed. Conclusions With COVID-19 being a relatively new disease, associations have been proposed for its connections with several ailments and conditions, including IS and carotid atherosclerosis. More patient-based data-sets and studies are needed to fully explore and understand the relationship.


Introduction
In late 2019, COVID-19 begins to spread widely around the world, and in late 2019, the World Health Organization declared COVID-19 a serious epidemic of the 21st century [1]. As of July 20, 2021 according to WHO (https://covid19.who.int/), a total of 190,671,330 positive cases of COVID-19 have been confirmed, including 4,098,758 deaths. Of these, the top five countries with the highest number of confirmed cases were the United States, India, Brazil, Russia, and France. At this stage, the overall novel coronavirus pneumonia epidemic worldwide is stabilizing, with almost all countries and regions worldwide experiencing difficult periods of outbreaks of coronavirus disease . At this phase, there are still regions with localized outbreaks of new severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2).
The virus which causes COVID-19. SARS-CoV-2 is a family of coronaviruses [2], Patients with SARS-CoV-2 infection have predominantly respiratory symptoms such as fever, dry cough and dyspnea [3], However, the manifestations of gastrointestinal, cardiovascular, and neurological involvement have been increasingly appreciated and widely reported. A growing body of evidence suggests that SARS-CoV-2 infection can cause neurological symptoms and complications. Several medical centers in various countries, including China, have identified a large proportion of patients with SARS-CoV-2 infection who have a combination of ischemic stroke. Also, patients with ischemic stroke have a higher susceptibility to SARS-CoV-2 and have a higher mortality rate [4][5][6]. Therefore, it is of great importance to understand the underlying mechanisms between COVID-19 and ischemic stroke.
Stroke has been the number one cause of death and disability in the 50-74 and 75 years and older age groups worldwide [7]. Epidemiological data on stroke show that ischemic stroke accounts for 84.4% of the total number of strokes [8] where patients with carotid atherosclerotic disease have an elevated risk of embolic stroke and other major adverse cardiovascular events [9,10]. The current study shows that SARS-CoV-2 is bound to the angiotensin-converting enzyme 2 (ACE2) receptor to infect the human nervous system [11]. Therefore, cells which expressed ACE2 may be susceptible targets for SARS-CoV-2 infection, and SARSCoV-2 contains S proteins with high interaction with angiotensin-converting enzyme 2 [12]. In parallel, glial cells and neurons also express ACE2 receptors, which made them potential targets for SARS-CoV-2 [13][14][15]，This feedback suggests that SARS-CoV-2 may be associated with neuroaffinity15-17. Meanwhile, ACE2 receptors located in vascular endothelial cells allow viral attachment and penetration into cells 16 ， Local pro-inflammatory mediators (cytokines and chemokines), coagulation cascade factors, growth factors and nitric oxide affect the reduction of the barrier integrity [18]. In addition, it was demonstrated that tumor necrosis factor(TNF-α) and interleukin 1β(IL-1β) released from endothelial cells during viral infection can trigger endothelial cells through the NF-κB pathway, ultimately inducing the expression of new adhesion molecules associated with the inflammatory response [19]. Consequently, viral infection triggers a series of cascade reactions leading to the complications of thromboembolism, and therefore, these investigations have sparked interest and research into the potential relationship between ischemic stroke and COVID- 19. In the field of biomedical research, high-throughput methods have assumed a pivotal role, and microarray data profiling is at the forefront of high-throughput methods for large-scale analysis of gene expression [20]. Microarray studies also assist the investigator in gene expression studies [21]. Relevant studies have demonstrated the outstanding performance of high-throughput sequencing analysis of SARS-CoV in assessing its data quality and gene representation [22]. There are no GeneChip data analysis for SARS-CoV-2 and carotid atherosclerosis.
This study attempted to identify the biological pathways of SARS-CoV-2 and carotid atherosclerosis and their interrelationships (Fig. 1). Two datasets were selected for analysis in this study; GSE157852 for human SARS-CoV-2 infection and GSE43292 for carotid atherosclerosis gene expression analysis. both datasets were collected from the Gene Expression Omnibus (GEO) database. First, we identified differentially expressed genes (DEGs) for GSE157852 and GSE43292, and the common differentially expressed genes were the basis of the whole study as well as the raw data. Further analysis based on common differential genes, including gene set enrichment analysis and pathway analysis, was performed to understand the biological process of genome-based expression studies. Subsequently, protein interaction networks are formed to identify HUB genes from DEGs and search for potential therapeutic agents.

Data collection
The dataset (GSE157852) investigates the susceptibility of human induced pluripotent stem cell (HiPSC)-derived monolayer brain cells Fig. 1. Methodological workflow of the present research. Two categories of samples (carotid atherosclerosis samples, SARS-CoV-2 infected cells and organoids) were collected from SARS-CoV-2 infected hiPSC-derived brain cells and organoids both are included in the GSE157852 dataset and GSE 43292. Identifying common DEGs from two datasets using the R programming language. From the common DEGs, GO identification, KEGG pathway, PPIs network, TF and miRNA analysis, hub gene identification and module analysis was designed and based on those analysis drug molecule identification was performed. and region-specific brain organs to SARS-CoV-2 infection, and the dataset GSE43292 investigates gene expression in carotid atherosclerosis, both from GEO database.The GSE157852 dataset has RNA sequence extraction was performed using the GPL21697 NextSeq 550 platform, and the GSE43292 dataset was performed using the GPL6244 (Affymetrix Human Gene 1.0 ST Array) platform. the GSE157852 dataset was provided by Jacob et al. [23]. Carotid atherosclerosis dataset (GSE43292) was proposed by Ayari et al. [24]. The COVID19 database (GSE157852) provides samples of human induced pluripotent stem cell (HiPSC)-derived SARS-CoV-2 infection. The carotid atherosclerosis dataset (GSE43292) contains 64 samples.

Characterization of differential genes (DEGs) and co-dominant genes in COVID-19 and carotid atherosclerosis
The DEG identification of GSE157852 and GSE43292 datasets was the principal task of this study. We analyzed the data using Linux system and Rstudio, library (DESeq2) to collate the GSE157852 dataset, and the DEG of GSE43292 dataset by dividing into two categories, Macroscopically intact tissue, Atheroma plaque. Rstudio, library(limma) was used to collate and analyze the data, and a web tool (http://bioinfogp.cnb.csic.es/tools/venny/index.html) was used to achieve universal gene identification between the DEGs of the GSE157852 and GSE43292 datasets.

Gene ontology and pathway identification in genome enrichment analysis
Gene set enrichment analysis is the functional analysis of gene sets with general biological functions and chromosomal locations [25]. The analysis results contain three main categories: biological processes, molecular functions and cellular components [26]. The main principle of gene enrichment analysis is to further understand the molecular activity, cellular role and location in the cell where the gene performs its function. kegg pathway is usually applied to understand metabolic pathways and contains more essential applications than gene annotation [27]. In the purpose of significant pathway analysis WikiPathways [28], Reactome [29] databases were also used alongside the KEGG pathway. GO terms and all the pathways were obtained through webbased platform Enrichr (https://maayanlab.cloud/Enrichr/)for the common genes that were identified in the previous step.

Analysis of PPI networks
PPI activity is considered a primary target for cell biology research and functions as a prerequisite for systems biology [30]. Proteins operate intracellularly through interactions with another protein, and information generated from the PPI network has boosted the insight into protein function [31]. The discrepant genes were imported into STRING11.0 for protein interaction network analysis, and the data were imported into Cytoscape software, and the top 10 Hub gene were identified by using the MCC algorithm in the cytohubba plugin in Cytoscape. The data were analyzed by MCODE plugin to obtain MCODE1 and MCODE2.

TF-gene interactions
Interaction of TF genes with identified common DEGs as a result of TF on functional pathways and gene expression levels was evaluated [32]. Identification of TF gene interactions with common genes via https://www.networkanalyst.ca/ platform, NetworkAnalyst is a web-based platform for performing gene comparisons, quantification, and differential gene expression analysis for numerous species [33]. The TF-gene interaction network was obtained from the EN-CODE (https://www.encodeproject.org/)) database in the Networ-kAnalyst platform.

TF-miRNA coregulatory network
TF-miRNA coregulatory network was obtained through the analysis of differential genes on https://www.networkanalyst.ca/ website. Identification of candidate drugs is a key component of ongoing research. The differential genes were screened and analyzed by https://maayanlab.cloud/Enrichr/ website.

Statistical analysis
The majority of statistical analyses were performed using the bioinformatics tools mentioned above, and the rest were performed using the default parameters of the R software (V4.11), such as DESeq2, clusterProfiler, ReactomePA, limma, biomaRt, etc., The identification of differentially expressed genes was evaluated by the DESeq2 package, applying the Benjamini-Hochberg FDR method to adjust the P-value.Results were considered to be statistically significant when P-value was < 0.05.

Identification of DEGs and common gene between COVID-19 and Carotid atherosclerosis
The GSE157852 dataset was applied for the identification of DEG in COVID-19, taking padj < 0.05, log2FoldChange > 1 as highly expressed genes and log2FoldChange < −1 as low expressed genes. A total of 983 differentially expressed genes were obtained, of which 586 genes were up-regulated and 397 genes were down-regulated in expression. Using the GSE 43292 carotid atherosclerosis dataset, a total of 92 differentially expressed genes were identified, of which 53 genes were upregulated and 49 genes were downregulated. A total of 983 collected COVID-19 genes and 92 carotid atherosclerosis genes were compared using R language, and a total of 9 common genes were identified for expression (TM4SF18, TDO2, MMP7, MYOM1, CNN1, TTLL7, PLD5, DPP4, NPR3). The common DEGSs between the two datasets were visually compared by the Venn diagram in Fig. 2. The results of the Venn diagram showed that the common DEGSs accounted for 0.8% of the total 1075 DEGSs.

GO and pathway findings in gene set enrichment analysis
Enrichr web tool was employed for gene set enrichment analysis. In this research, we analyzed the GOterm and KEGGpathway of 9 common DEGs (TM4SF18, TDO2, MMP7, MYOM1, CNN1, TTLL7, PLD5, DPP4, NPR3). go term includes biological process, molecular functions We list the top 10 GO terms for each component. As the data in Table1 shows, among the biological processes, positive regulation of nitric-oxide synthase activity is at the top of Biological Process, MYOM1 gene is involved in striated muscle MYOM1 gene is involved in striated muscle myosin thick filament assembly, myosin filament assembly, and protein kinase A signaling. Molecular functional data suggest that it is mainly involved in DPP4 and MYOM1 genes, including endocytic vesicle and focal adhesion. Cellular component studies have shown that peptidase activity, acting on L-amino acid peptides, aminopeptidase activity and alpha-actinin bindin have  important roles in the disease. KEGG pathways, WikiPathways, and Reactome pathways were analyzed as shown in Fig. 3. The information obtained from Fig. 3 shows that the protein digestion and absorption pathway tryptophan metabolism pathway interacts with the most number of genes according to the KEGG pathway database.

Identification and modular analysis of hub genes
The common DEGs were provided as an input in STRING and the file generated from the analysis is reintroduced into Cytoscape software for visual representation. The PPIs network is created for further analysis of this study including hub gene detection for identifying drug molecules for COVID-19 and Carotid atherosclerosis (Fig. 4). Ultimately, the results of the PPIs network connect for suggesting drug compounds that establish the PPIs analysis as centroid of this study. The high density module was designed from PPI network using MCODE, which is also a plug-in for Cytoscape software. Then MCODE plug-in was used for analysis to obtain MCODE1, characteristic: number of nodes 37, number of edges 147. MCODE2: characteristic number of nodes 17, number of edges 74.

Identification of hub genes and module analysis for proposing therapeutic solutions
To identify the hub genes in the PPI network, Cytoshubba, a plugin for Cytoscape software, was to be used. HUB genes were ranked according to a score, which indicates the number of gene interactions in the PPI network. Table 2 shows 10 HUB genes identified , and TIMP1 are the three genes highlighted in the modular network, as these three genes are also common between the two datasets.

TF gene interactions
TF-gene interactions were collected using NetworkAnalyst. Differential genes (TM4SF18, TDO2, MMP7, MYOM1, CNN1, TTLL7, PLD5, DPP4, NPR3) were screened for TF gene identification (Fig. 6). dPP4 is regulated by 13 TF genes, NPR3 by 8 TF genes, MYOM1 by 16 TF genes, TDO2 by 2 TF genes, and CNN1 is regulated by 15 TF genes, and MMP7 is regulated by 3 TF genes, and these TF genes regulate over one common differential gene in the network, indicating a highdegree of interaction between TF genes and differential genes.

TF-miRNA coregulatory network
TF-miRNA co-regulatory network was generated using NetworkAnalyst. The analysis of the TF-miRNA co-regulatory network provided the interaction of miRNAs and TF with the common DEG. This interaction may be responsible for the regulation of DEGS expression. the network created by TF-miRNA co-regulatory network consists of 170 Nodes,182 Edges, consisting of 47 TF genes and 115 miRNAs interacting with differential genes. Fig. 7 shows the TF-miRNA co-regulatory network.

Identification of candidate drugs
10 common drug molecules were identified using the Enrichr platform. The results of drug candidates were generated based on Pvalues ( Table 3). The analysis showed that ascorbic acid CTD 00005445, progesterone CTD 00006624 and cyclosporin A CTD 00007121 were the three drug molecules that interacted with most genes. Therefore these drugs represent the common drugs for neocoronary pneumonia and carotid atherosclerosis. Table 3 indicates the common drug candidates.

Discussion
Carotid atherosclerosis is recognized as a risk factor for COVID-19. When the Nervous system of a person gets disrupted and that is the time when the functionality of the system cannot adjust properly to its task. People who have carotid atherosclerosis are at a higher risk of developing COVID-19. This study assists in the narrative of the bioinformatics course used for the susceptibility of human induced pluripotent stem cell (HiPSC)-derived monolayer brain cells and region-specific brain organs to SARS-CoV-2 infection and the genome-wide dataset of human carotid atherosclerosis. Bioinformatics-related methods were used to investigate the identification of 983 and 92 DEGs from GSE 157852 and GSE 43292, respectively. To establish relationships and identify common DEGs between the GSE 157852 and GSE 43292 datasets based on COVID-19 and carotid atherosclerosis assay drug candidate. After identification, 9 common DEGs (TM4SF18, TDO2, MMP7, MYOM1, CNN1, TTLL7, PLD5, DPP4, NPR3) were identified. next studies continued with GO analysis, KEGG pathway analysis, PPI, TF-gene interactions, TF-miRNA coregulatory network, and drug candidate assays.
The 10 common digenomes identified were used to detect GO terms. GO terms were selected based on P values. Related studies confirmed that the ERK1/2-RSK-nNOS signaling pathway may play an important role in Ang II-mediated regulation of central blood pressure, particularly in terms of impaired vasoconstriction and dilation. [34,35]. Considering that SARS-CoV-2 infects the nervous system by binding to angiotensin-converting enzyme 2 (ACE2) receptors, and that most ischemic stroke patients have long-term combined hypertension and poor vasoconstriction and diastolic function, SARS-CoV-2 invades the nervous system through ACE2 by regulating nitric-oxide synthase activity and thus the adverse effects of vasoconstriction and diastolic function are exacerbated by SARS-CoV-2 invasion of the nervous system through ACE2. Whether this is a possible mechanism for the exacerbation of SARS-CoV-2 infection in patients with ischemic stroke needs to be further explored. Studies have shown that DPP4 is widely expressed in the vascular system, including endothelial cells, cardiomyocytes, smooth muscle cells, macrophages and many other types of cells, which means it may contribute to cardiovascular disease [36,37]. CoV virus entry into host epithelial cells is mediated by interactions between viral envelope S protein homologs and cell surface receptors. hydrolytic cleavage of the CoV S protein results in recognition of membrane receptors [angiotensin-converting enzyme 2 (ACE2) for SARS-Cov and SARS-Cov-2 and dipeptidyl peptidase 4 (DPP4) for MERS-Cov] by the S1 outer structural domain, while the S2 C -terminal structural domain is involved in cell fusion and viral entry [38][39][40].
The KEGG pathway was identified for 10 common DEGs. the KEGG pathway includes the protein digestion and uptake pathway, the tryptophan metabolism pathway.Increasingly, tryptophan and its metabolites are considered to be important aspects of the pathophysiology of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) that drove the COVID-19 pandemic [41], Targeting the various aspects of tryptophan metabolism may have major preventive and therapeutic implications for the administration of SARS-CoV-2 infection, especially since the emergence of new variants would suggest that this virus may affect for multiple years [42]. Systemic low-grade immune-mediated inflammation is associated with atherosclerosis, in which pro-inflammatory cytokines such as interferon-gamma (IFN-γ) play an important role. upregulation of indoleamine 2,3-dioxygenase (IDO) by IFN-γ decreases serum levels of tryptophan increasing levels of kynurenine metabolites. Increased IDO expression and activity can accelerate the atherosclerotic process [43]. Simultaneously, from the results of WikiPathways, the most interacted gene pathway are NAD Biosynthesis II(from tryptophan)WP2485, Tryptophan catabolism leading to NAD+ production WP4210,NAD+ biosynthetic pathways wp3645. Results from the Reactome pathway produce Tryptophan catabolism Homo sapiens R-HAS-71240 and Synthesis,secretion,and inactivation of Glucagonlike Peptide-1(GLP-1) Homo sapiens R-HAS-381771 pathway. PPI network analysis is the most prominent part of the investigation because the assay of pivotal genes, analysis of modules and identification of drugs were totally dependent on PPIs network. the analysis of PPIs also arose from TM4SF18, TDO2, MMP7, MYOM1, CNN1, TTLL7, PLD5, DPP4, NPR3 genes as these genes are common DEGs. According to the PPI network, ALB, FN1, TIMP1, PLG, VWF, SERPINE1, MMP9, INS, SPP1, MMP3 genes were declared as hub genes due to their high interaction rate or degree values. A retrospective analysis study found a significant decrease in ALB levels in the blood of patients with Covid-19 infection [44], The levels of hemoglobin (Hb) and albumin (Alb) predicts incremental risk for severe respiratory failure in Covid-19 patients with pneumonia [45]. To focus on important regions of the PPI network, a modular analysis of hub genes was implemented. The reason for concentrating on highly focused regions was more effective drug compound recommendations.
TF-gene interaction was obtained with the common DEGs. From the network, MYOM1 shows a high interaction rate with other TFgenes. Among the regulators, ATF3 and SUZ12 have significant interaction. The degree value of ATF3 and SUZ12 are 2, respectively in the TF-gene interactions network. ATF3 (activating transcription factor 3), a member of the CREB family of basic leucine zipper transcription factors (TFs), has been demonstrated to be involved in both inflammatory and metabolic pathways [46,47]. ATF3 is expressed at higher levels in human atherosclerotic vessels and ATF3 is strongly expressed in macrophages within atherosclerotic vessels [48,49]. SUZ12 (Suppressor of zeste 12 protein homolog) is a core component of the Polycomb PRC2-HMTase complex, which has been proved to be participating in stem cell maintenance and development [50][51][52]. SUZ12 is upregulated in many human cancers including gastric [53,54]、breast cancers [55] and bladder [51]. For illustration, the expression of SUZ12 was markedly higher in gastric tumor tissues as opposed to normal tissues [53]. SUZ12 expression was revealed to be correlated with pathological stage, metastatic distance and lower overall survival in GC patients.
Regulatory biomolecules serve as potential biomarkers in multiple complex illnesses. The activities of miRNAs and TF genes for the regulatory analysis of common DEGs were visualized in a TF-miRNA co-regulatory network. 115 miRNAs and 47 TF-genes are identified in the research. Among the most interacted TFs,CTCF has the higher degree value of 4. CTCF plays an important transcriptional regulatory role in lung cancer and atherosclerosis [56][57][58]. TF-genes are responders for the regulation, which is accomplished by binding to target genes and miRNAs, and are capable of regulating gene expression through mRNA degradation [59].
Pharmaceutical molecules were proposed by 10 common DEGs according to the DSigDB database. Among all drug candidates, the current research accentuates the top 10 significant drugs. ascorbic acid CTD 00005445 are the peak drug candidates for COVID-19 and Carotid atherosclerosis. There have been relevant randomized clinical trials using ascorbic acid in the treatment of ambulatory patients with SARS-CoV-2 infection [60]. At this stage, there are no studies that have used progesterone in the clinical treatment of SARS-CoV-2. In addition, the therapeutic effect of progesterone on atherosclerosis has been demonstrated for a long time [61,62]. These agents can be considered for further validation by chemical experiments. Since SARS-CoV-2 is a new virus, the studies done so far are relatively few. This is the reason why fewer samples were collected to analyze the results. In the future, with more samples, the current studies will be more effective in the case of a SARS-CoV-2 pandemic.

Conclusions
In terms of transcriptome analysis, there are no other studies on SARS-CoV-2 and carotid atherosclerosis to date. We have completed the analysis of DEGs between the two datasets and filtered the material by common gene identification in an attempt to find the infection response between SARS-CoV-2 and carotid atherosclerosis. The analysis on SARS-CoV-2 and carotid atherosclerosis predicts the way to detect infection in various diseases. The suggestions for drug targets are logical because they were derived by identifying the central gene, which may act as a positive precursor to already approved drugs. Because SARS-CoV-2 is a recent discovery, few studies have been conducted on its risk factors. Unique studies on SARS-CoV-2 will become ever more important with the availability of exceeding datasets.

Declaration of Competing Interest
The authors declare no potential conflicts of interest.