Identification of Key Candidate Genes and Pathways in Preterm Birth by Integrated Bioinformatical Analysis

Background: Preterm birth(PTB) is a primary cause of neonatal morbidity and mortality, the pathogenic mechanisms of PTB still remain largely unexplored. In the present study, we aimed to identify potential key genes and pathway associated with PTB by bioinformatics analysis. Methods: The GSE46510 dataset was obtained from GEO database. Differentially expressed genes (DEGs) were identified using the limma package in R software, the functional enrichment analysis was performed, and the proteinprotein interaction (PPI) network was constructed by Cytoscape software. The network topology was analyzed using MCODE. Results: A total of 335 DEGs were identified from the dataset. The majority of up-regulated DEGs were significantly enriched in inflammatory response, while down-regulated DEGs were mainly enriched in mitotic nuclear division. The top 5 hub up regulated genes were ITGAM, IL1B, ITGAX, NFKB1, and SOCS3. Pathway analysis indicated enrichment in Cytokine-cytokine receptor interaction, signaling by Interleukins. The top 5 hub down regulated genes were CXCR4, ANAPC10, ANAPC4, UBE2V2, UBA3, Pathway analysis indicated enrichment in Ubiquitin mediated proteolysis, Phosphorylation of the APC/C. Conclusion: Our study indicated genes and pathways in PTB by bioinformatics analysis, which may provide novel insights for unraveling pathogenesis of PTB.


Introduction
As one of the most common and serious complications of pregnancy, preterm birth (PTB) is defined as delivery before 37 weeks of gestation [1]. Every year, about 15 million babies are born before 37 weeks' gestation worldwide, and such number is still increasing, with rates varying from 5% to 18% [2]. PTB is a primary cause of neonatal morbidity and mortality, causing some serious complications. In addition, PTB may lead to increased risk of adult-onset chronic diseases, placing a heavy burden on families and society [2,3].
In the past few decades, important advances and efforts have been made in research on pregnancy and PTB. For example, the initiation of PTB is closely related to the change of inflammatory medium and its signaling pathway, such as IL-6, IL-8 and TNF-α [4,6]. Cell-free fetal DNA (cffDNA) can engage TLR-9 and induce an inflammatory response, and individuals with high concentrations of cffDNA are associated with increased risk for spontaneous PTB (sPTB) [7,8] The pathogenic mechanisms of PTB still remain largely unexplored. Therefore, it is urgently necessary to identify potential target genes associated with PTB in order to prevent and predict PTB.
In the present study, we aimed to identify potential genes and miRNAs associated with PTB, and explore the underlying mechanisms in the PTB development based on the GSE46510 dataset from the Gene Expression Omnibus (GEO) database [9]. Moreover, we assessed the gene expression profiles to identify differentially expressed genes (DEGs) of individuals with an sPTB within 48 h of admission. Furthermore, functional analysis was performed, and a protein-protein interaction (PPI) network was constructed. In addition, the target miRNAs for DEGs were identified accordingly.

Microarray Data
The gene expression dataset of GSE46510 was obtained from GEO (http://www.ncbi.nlm.nih.gov/geo/) database, which was analyzed using Affymetrix Human Genome U133 plus 2.0 Array. There were 154 samples, which were divided into two groups as follows: women who did (n = 48) and did not have a sPTB (n = 106) within 48 h of admission. Peripheral blood was collected at hospital admission from 154 women with threatened preterm labor (TPTL) before any medical treatment.

Data Processing and Screening of Degs
The original data in CEL format were processed into expression values by the robust multi-array average (RMA) method through the Affy [10] in R software(version 1.52.0; http://bioconductor.org/packages/release/bioc/html/affy.html) . Secondly, the probe level data were transformed by R/Bioconductor platform notes package.

Identification of Degs
DEGs were identified by Bayes methods using the limma package [11] version 3.30.3 (www.bioconductor.org/packages/release/bioc/html/limma.html) in R software. The cut-off criteria were adjusted as P<0.05 and |log2 fold change (FC)| >1

Functional and Pathway Enrichment Analyses
The Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8, https://david.ncifcrf.gov) provides a comprehensive set of functional annotation tools for investigators to understand biological meaning behind a large list of genes [12], including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses of up-regulated and down-regulated DEGs. GO enrichment analysis, consisting of biological processes (BP), cell component (CC) and molecular function (MF), was performed by DAVID. Pathway enrichment analysis for screening of DEGs with DAVID, Reactome(Available online: http://www.reactome.org) .The cutoff criteria of GO terms and KEGG pathways enriched with DEGs were P<0.05.

Ppi Network Construction
The PPI network was constructed with all DEGs using an online database: STRING (version 10.5, https://string-db.org) [13]. PPI links with a combined score >0.4 were identified for constructing the PPI network. PPI network was constructed with Cytoscape software (version 3.6.0, http://cytoscape.org/) [14], and hub genes were ranked by MCODE.

Ethics
All analyses were based on the data from public database, so ethics approval and patient consent were not required.

Identification of Degs
After integrated bioinformatical analysis, out of a collection of 19009 Genes from patients indicated dysregulation of 2% (157 up-regulated and 178 down-regulated transcripts), 37DEGs were excluded due to duplication and low expression level (Figure 1). Expression levels of 19009 genes were assessed in 154 samples. Compared with women not delivered within 48h of hospital admission, 372 genes (2%) had significant changes in expression levels (fold change>1, p<0.05).A total of 37 genes were precluded due to duplication and low expression level. A total of 335 genes were then identified from the screen, with 157 upregulated and 178 down regulated Hierarchical clustering revealed the DEGs expression in blood women delivered within 48h of hospital admission and not delivered samples (Figure 2).

Figure 2:
Heat Maps: Differentially expressed genes for women not delivered within 48h of hospital admission and delivered within 48h of hospital admission were hierarchically clustered. "Red" indicates high relative expression, and "green" indicates low relative expression.

Degs Gene Ontology and Signaling Pathway Enrichment Analysis in Preterm Birth
To investigate underlying biological associations, DEGs gene ontology analysis (GO) were performed with DAVID. As shown in Figure 3 and   In the cellular component group, up-regulated genes mainly enriched in extracellular exosome, membrane, endosome, early endosome, lipid particle; Down-regulated genes mainly enriched in nucleus, nucleoplasm, nucleolus, nuclear heterochromatin, NatA complex. In the molecular function group, up-regulated genes mainly enriched in protease binding, protein binding, ATP binding, transcription factor activity, RNA polymerase II core promoter proximal region sequence-specific binding, transferase activity, transferring glycosyl groups; Down-regulated genes mainly enriched in protein binding, RNA binding, voltage-gated cation channel activity ,Rab geranylgeranyl transferase activity, nucleic acid binding. These results showed that most of the DEGs were significantly enriched in cell cycle, nucleus, protein binding. Signaling Pathway Analysis showed both up-regulated and down-regulated DEGs were mainly enriched in Immune System, Interleukins signaling pathway and Chemokine signaling pathway (Figure 4 and table 2).

Figure 4:
Significantly enriched pathway terms of DEGs in preterm borth. DEGs functional And signaling pathway enrichment were conducted using online websites of KEGG PATHWAY，Reactomen("Red" indicates high relative expression, and "green" indicates low relative expression).

Ppi Network Analysis and Pathway of Hub Genes
Using String database and cytoscape software, a total of 251 genes were filtered into PPI network, contained 251 nodes and 510 protein pairs with a PPI score of >0.4 ，as shown in (Figure 5).
Auctores Publishing -Volume 4(2)-046 www.auctoresonline.org ISSN: 2642-9756 Page 7 of 12 In the PPI network, nodes stand for DEGs, while edges represent interactions between two proteins. Using MCODE plug, 36 central node genes were identified with the filtering of degree cutoff ≥2 (Figure 6).    Table 4: Signaling pathway enrichment analysis of hub expressed genes function in preterm birth (P<0.05)

Discussion
These several decades, a lots of work has been done about preterm birth and recent prematurity rates seem to be on the decline is considered [15]. The prevention of preterm birth is a public health priority because of the potential to reduce infant and childhood morbidity and mortality related to this condition [16]. We need to recognize that PTB is caused by multiple factors, such as microbial-induced inflammation, decidual hemorrhage and vascular disease, disruption of maternal-fetal tolerance, decline in progesterone action, cell-free fetal DNA and so on [17]. It is critically important to understand the molecular mechanism of these factors.
In the current study, the dataset (GSE46510) were downloaded from GEO database to identify DEGs between sPTB and not sPTB samples using bioinformatics analysis. A total of 335 DEGs, including 157 up-and 178 down-regulated DEGs, were identified. These differentially expressed genes were classified into three groups by GO terms using online website (DAVID). Functional and signaling pathway enrichment were conducted using DAVID and Reactomen, both of up and down regulated genes were mostly enriched in Immune System and Interleukin-signaling. After that, protein-protein interaction (PPI) network complex was developed using String and Cytoscape, 180 nodes/DEGs were identified with 518 edges, the most significant module was filtered using MCODE plug, 36 central node genes were identified and most of the corresponding genes were associated with Immune System, Signaling by Interleukins, Ubiquitin mediated proteolysis.
Through integrated bioinformatical analysis, we have identified 36 hub genes, ITGAM, IL1β， ITGAX, NFKB1, SOCS3 ,CXCR4, ANAPC10, ANAPC4, UBE2V2, UBA3, were listed at the top of the most changed genes, and their biological functions are involved in cell adhesion, inflammatory response and proteasome-mediated ubiquitin-dependent protein catabolic process. ITGAM and ITGAX encode the integrin alpha M and X chain, respectively. ITAGM and ITGAX also known as CD11B and CD11c, they play an important role in the adherence of neutrophils and monocytes to stimulated endothelium cells, and in the phagocytosis of complement coated particles. Gervasi et al [18] showed that preterm labour was associated with a significant increase in the expression of CD11b, CD15 and CD66b on neutrophils and CD11b and CD15 on monocytes, CD11a and b mediate binding to ICAM-1, which was upregulated in endothelium of human cervix and myometrium during labour [19]. Once leukocytes emigrate to the myometrium and cervix, chemotaxis of more neutrophils and monocytes is mediated by their own increased expression of IL-8 and MCP-1, respectively [20]. Proinflammatory cytokines (such as IL-1, IL-6, IL-8 and TNF-α) can directly trigger the transition from a uterine quiescent state to a subsequent unscheduled activation of the uterus [21,23].
During labor, the IL1β level is increased due to the influx of leukocytes into intrauterine tissues, which can enhance the contractile potential of myometrial smooth muscle [24 ,25].In addition, it has been demonstrated that IL-1βcan increase prostaglandin production and MMP9 expression via NF-kappa B signaling pathway [26 ,27], which are known to induce cervical ripening and myometrial contractions [28 29]. There is higher expression of the subunit of NF-κB in membranes, cervix and myometrium [30,31]. NF-κB can be activated by pro-inflammatory cytokines such as TNF and IL1β, and microbial or viral components that activate toll-like receptors (TLRs) [32].
Suppressor of cytokine signaling 3 (SOCS3) is a member of SOCS family, induced by various cytokines, including IL6, IL10, and interferon (IFN)gamma [33]. Inflammatory mediators might enter the circulation and activate placental macrophages, leading to IL-1βrelease and subsequent SOCS activation as a feedback/response mechanism , play a role in the interaction of endothelial cells of the villous placenta with neighboring cells, participate in Placental inflammation [34].
Activation of NF-κB involves the phosphorylation of the NFKBIA protein, NFKBIA will be ubiquitinated and subsequently degraded by proteasomes [32]. Ubiquitin like modifier activating enzyme 3(UBA3) encodes a member of the E1 ubiquitin-activating enzyme family, regulates cell division, signaling and embryogenesis. Ubiquitinconjugating enzyme E2 variant proteins (UBE2V2), is a distinct subfamily within the E2 protein family, the protein encoded by this gene shares homology with ubiquitin-conjugating enzyme E2 variant 1. Both genes are down-regulated in premature births, and they may play a role in influencing NFKB pathway by reducing ubiquitination.
The anaphase-promoting complex (APC/C) is a multimeric RING E3 ubiquitin ligase, which is composed with many different subunits (APC1-8, APC9-11, and CDC26) and plays a crucial role in coordinating mitosis progression through targeting numerous regulators for destruction by the 26S proteasome [35,36]. It helps during ubiquitination of free polyubiquitin chain that leads to MAP3K7 activation that in turn, leads to the activation of NFκB via its respective activation pathways [37]. Anaphase promoting complex subunit 4(ANAPC4/APC4) and 10(ANAPC10/APC10), are subunit of the anaphase-promoting complex (APC), APC10 is the core subunit, plays a critical role in facilitating the activity of the APC to function as an E3 protein ubiquitin ligase [38].In the present study, ANAPC10, ANAPC4, UBE2V2, UBA3 downregulated expression, play a role in preterm by influencing ubiquitination and proteasome degradation.
Cytokines are the major inducible products of immune system cells. CXCR4 encodes a CXC chemokine receptor specific for stromal cellderived factor-1. CXCR4 is activated by NFKB, regulates decidual leukocyte recruitment during labor [39 ,40]. Study confirmed that CXCR4 was up regulated in labor and is related to inflammatory response [41], it is contrary to the results of this study. In this study, CXCR4 was down regulated. This is an interesting thing, more research is needed in the future.
There are several limitations in our analysis. First, all the predicted results need to be confirmed by experimental data. Second, there are fewer samples of PTB. Third, we only chose genes, while transcription regulator (TF) and miRNA were not predicted. In future studies, large-scale samples are required to validate the expressions of above-mentioned DEGs. Moreover, future investigations should focus on the interactions of DEGs, regulatory associations between TFs, miRNAs and DEGs, and possible pathways underlying these gene alterations.

Conclusions
Using bioinformatical analysis, we have identified commonly changed 335 DEGs (including 157 up-and 178 down-regulated DEGs), and finally found 10 mostly changed hub genes, which significant enriched in several pathways, mainly associated with inflammatory response, ubiquitination and proteasome degradation. These findings significantly improve the understanding of the cause and underlying molecular events in preterm birth, and the candidate genes and pathways could be used as therapeutic targets.

Conflicts Of Interest
The author(s) declare(s) that they have no conflicts of interest related to the subject matter or materials discussed in this article