Functional Modules Analysis and Hub Gene Prognostic Values Evaluation Based on Co-Expression Network in Gastric Cancer

Background: Gastric cancer is one of the most common fatal disease worldwide, but its mechanism and therapeutic targets are still unclear. In this study, we have analyzed the differences in gene modules and key pathways in gastric cancer patients, then elaborated the mechanism and effective treatment of gastric cancer with microarray data from the gene expression omnibus(GEO) database. Methods: GEO2R tools were used to identify differential expression genes (DEGs), String database was employed to construct a protein-protein interaction (PPI) network. We imported the PPI network into the Cytoscape software to nd key nodes, and employed statistical approach of MCODE to cluster genes. After that the ClueGO was used to enrich and annotate the pathways of key modules. To investigate the relationship between the upstream regulator and hub genes, the transcriptional regulatory network was built based on TFCAT database. Results: 63 characteristic genes of gastric cancer are involved in regulation of ECM-receptor interaction, focal adhesion and protein digestion and absorption. SPARC, FN1, BGN and COL1A2 are four key nodes relating to tumor proliferation and metastasis, and their expression were strongly associated with poor survival (p<0.05). 13 transcription factors including PRRX1 have remarkable changes in gastric cancer, which may play a key role in hub gene regulation. Conclusions: The present study dened the gene expression characteristics and transcriptional regulatory network that promote our understanding of the molecular mechanisms underlying the development of gastric cancer, and might provide new insights into targeted therapy and prognostic markers for the personalized treatment of gastric cancer.


Background
Gastric cancer, which is the fth most common cancer worldwide and the second leading cause of cancer-related death in developed countries, affects about one million people per year (1) . Most cases, especially in developed countries, are diagnosed at a late stage, and curative treatment options are mostly restricted to surgery and often supplemented by neoadjuvant or adjuvant chemotherapy (2) .
However, despite signi cant advances in diagnosis and staging as well as the introduction of new treatment protocols, the rate of gastric cancer recurrences is still high. The long-term survival of patients affected by advanced gastric cancer following standard treatments is still far from satisfactory: 5-year survival rate ranges between 5 and 20% and median overall survival (OS) is less than 12 months (3) .
Thus, there is an urgent need to improve our understanding of the epidemiology, pathogenesis and molecular mechanism of gastric cancer and to identify more effective, less toxic therapeutic strategies.
Gene expression pro les combined with bioinformatics analysis have shown great application prospects in explore diagnosis and prognosis markers for complex disease. The high-throughput platforms for analysis of gene expression, such as microarrays, has been used to detect the expression level of genes in cells and tissues which can help to screen the crucial genes and pathways associated with disease.
Microarray technology combined with bioinformatics analysis is able to analyze the pathways, biological processes, or molecular functions which the DEGs were involved in (4) .
In the current study, using microarray data from GEO database, we rstly identi ed DEGs between gastric cancer and normal tissues, detected the crucial genes and pathways associated with gastric cancer using bioinformatics methods. Then we investigated the relationship between the crucial genes and gastric cancer patient survival to improve our understanding gastric cancer and to identify more effective, less toxic therapeutic strategies.

Results
As mentioned earlier, we concentrated on the gastric cancer sample with the clinical data available in four training datasets. The detailed information for the patients, which were obtained from GEO matrix les, included in this study is listed in Table 1. shows that the expression of DEGs were signi cantly different between gastric carcinomas and normal mucosa (Fig. 2).

PPI network of DEGs
The PPI network of DEGs were constructed using String database (https://string-db.org/) (Fig. 3). Each node is a differentially expressed gene and the node degree is re ected by the size: larger degree value often corresponds to large size. We used MCODE plugin to enrich the module of DEGs which was shown in Fig. 4.
ClueGO was used to identify the key biological role of module. Genes involved in Platelet Degranulation, collagen bril organization, endodermal cell differentiation, Nuclear Chromosome Segregation (Fig. 5).

The effect of the expression level of the crucial genes on patient survival
Survival analysis based on TCGA data showed that the high expression key nodes (SPARC, FN1, BGN, COL1A2) lead to a signi cantly decrease of the survival rate in gastric cancer patients (p < 0.05, Fig. 6).

The transcriptional regulatory network of DEGs
We built the transcriptional regulatory network of DEGs using data from TFCAT database, and detected

GO and KEGG pathway enrichment analysis of DEGs
GO enrichment analysis showed that DEGs were signi cantly enriched in biological processes (BPs). As for up-regulated DEGs, the top three BPs are 'Extracellular matrix organization', 'Extracellular Structure Organization', and 'Collagen catabolic process' (Fig. 8a), whereas the main BPs of down-regulated DEGs include 'digestion', 'detoxi cation of copper ion' and 'stress response to copper ion (Fig. 8b).
KEGG pathway enrichment analysis indicated that DEGs were signi cantly enriched in at least ten pathways. The top three pathways of up-regulated DEGs are 'ECM-receptor interaction', 'focal adhesion'and 'Protein digestion and absorption' (Fig. 9a), while the top three pathways of down-regulated DEGs included 'chemical carcinogenesis', 'metabolism of xerobiotics by cytochrome P450', and 'Drug metabolism-cytochrome P450' (Fig. 9b).

Discussion
In the present study, we explored critical genes in gastric cancer based on bioinformatics methods. We integrated four original microarray datasets associated with gastric cancer (three training datasets, one validation dataset) and identi ed 2636 DEGs between gastric cancer tissues and normal tissues. PPI network analysis and microarray results (p = 0.0095, 0.03, 0.031, 0.047) indicated that SPARC, FN1, BGN and COL1A2 were key nodes of gastric cancer and signi cantly up-regulated in patients with less survival rate. These results showed that SPARC, FN1, BGN and COL1A2 had promoting effects in the happening and development of gastric cancer. SPARC could promote development of some tumors with highly metastatic characteristics, such as breast cancer and melanoma (5) . FN1 is involved in cell adhesion and migration processes including embryogenesis, wound healing, blood coagulation, host defense, metastasis, as well as various biochemical processes (6) , which has been found to suppress apoptosis and promote viability, invasion, and migration in CRC by interacting with ITGA5. We also observed the potential role of COL1A2 in gastric cancer, collagens are important for cell adhesion and migration, angiogenesis, tissue morphogenesis, scaffolding and has been viewed as involving carcinogenesis (7,8) . However, little has been known about the role of collagens in gastric cancer.
Transcription factors (TF) control production rate of messenger RNA by binding to a transcription factor binding site on a DNA template to regulate the expression of genes in response to signals inside or outside the cell. Genes regulated by transcription factors are referred to as target genes (TGs). Transcription factors themselves are proteins encoded by genes, and thus are also regulated by other TFs. In this way, all the regulatory interactions which link transcription factors to their corresponding target genes are built into the transcriptional regulatory network (TRN) (9) . We used TFCAT database to conduct TFN in gastric cancer and identi ed that there were 13 TFs involved in Gastric cancer progression. Paired-related homeobox 1 (PRRX1) was down-regulated in gastric cancer which has been identi ed as a new EMT inducer (10) . Although the function of PRRX1 in gastric cancer was not well elucidated, PRRX1 expression levels were upregulated and positively correlated with metastasis in human gastric cancer. In addition, our GO and KEGG pathway analyses indicated that DEGs were involved in 'ECM-receptor interaction', 'focal adhesion', 'Protein digestion and absorption' (Fig. 9), and ECM-receptor interaction and focal adhesion pathway may be mediated by key nodes COL1A2 and key TF PRRX1.
With a microarray data set from the GEO database and multiple bioinformatics methods, we explored the crucial genes and pathways in gastric cancer, which not only contributes to elucidating the pathogenesis of gastric cancer, but also provides prognostic markers and potential therapeutic targets for gastric cancer.

Conclusions
In summary, we screened the DEGs of microarray data from GEO database and constructed the coexpression network to illustrate the underlying mechanism of gastric cancer progression. Four key nodes relating to tumor proliferation and metastasis and their expression were strongly associated with poor survival. The transcriptional regulatory network analysis found that 13 transcription factors including PRRX1 may play a key role in hub gene regulation. Our study not only contributes to elucidating the pathogenesis of gastric cancer, but also provides prognostic markers and potential therapeutic targets for gastric cancer.

identi cation of DEGs
GEO2R was used to identify DEGs between gastric cancer tissues and non-cancerous tissues (https://www.ncbi.nlm.nih.gov/geo/geo2r/; B-statistic of zero corresponds to a 50-50 chance that the gene is differentially expressed. The B-statistic is automatically adjusted for multiple testing by assuming that 1% of the genes, or some other percentage speci ed by the user in the call to eBayes; P-value was adjusted by the BH approach). The DEGs were screened out according to adjusted P-value < 0.05 and 2^|log2(Fold change) |>1.

analysis of the PPI network of DEGs
String database (www.string-db.org) was used to predict protein-protein association data and construct the PPI network of DEGs. Cytoscape software V3.6.0 (http://www.cytoscape.org/) was used to display the PPI network. Centiscape2.2 plugin was utilized to identify the degree centrality (DC) of gene nodes, and the DEGs were enriched in several modules by MCODE plugin (11,12) .

the effect of the expression level of the crucial genes on patient survival
We used the UALCAN database (http://ualcan.path.uab.edu/index.html) to analysis cancer transcriptome and clinical data from The Cancer Genome Atlas (TCGA), which includes the effect of Key nodes expression level and clinic pathologic features on patient survival. We then estimated the relationship of the crucial genes and gastric cancer patient survival by Kaplan-Meier curve and Log-rank test.

constructing the transcriptional regulatory network of DEGs
The transcriptional regulatory network of DEGs was derived from the curated catalog of mouse and human transcription factors database (13) . Each TF in the collection is assigned to one or more functional taxa with a con dence judgement by an evaluating scientist based on literature review. The con rmed TFs were used in a sequence-oriented homology analysis to predict additional TFs.

gene ontology (GO) and pathway enrichment analysis of DEGs
Visualization and Integrated Discovery (DAVID) v6.8 was used for GO analysis (https://david.ncifcrf.gov/) (14) . The Kyoto Encyclopedia of Genes and Genomes (KEGG) knowledge database wastilized for biochemistry pathway enrichment. We used R Pro ler V3.6.0 to annotate and visualize the KEGG pathways of DEGs. We selected GO terms and pathways that the DEGs mainly enriched in with a cut-off criteria of p < 0.05.

Statements
Some preliminary data of this manuscript rather than the whole has been submitted as poster presentation in ESMO (EUROPEAN SOCIETY FOR MEDICAL ONCOLOGY) meetings, and then the journal of "ANNALS OF ONCOLOGY" summarized the poster presented in the ESMO as a SUPPLEMENT issue.
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.