Hub Genes and Key Pathway Identification in Colorectal Cancer Based on Bioinformatic Analysis

Colorectal cancer (CRC) is one of the most common malignant tumors. The aim of the present study was to identify key genes and pathways to improve the understanding of the mechanism of CRC. GSE87211, including 203 CRC samples and 160 control samples, was screened to identify differentially expressed genes (DEGs). In total, 853 DEGs were obtained, including 363 upregulated genes and 490 downregulated genes. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis of DEGs were performed to obtain enrichment datasets. GO analysis showed that DEGs were significantly enriched in the extracellular region, cell-cell signaling, hormone activity, and cytokine activity. KEGG pathway analysis revealed that the DEGs were mainly enriched in the cytokine-cytokine receptor interaction, drug metabolism, androgen and estrogen metabolism, and neuroactive ligand-receptor interaction. The Protein-Protein Interaction (PPI) network of DEGs was constructed by using Search Tool for the Retrieval of Interacting Genes (STRING). The app MCODE plugged in Cytoscape was used to explore the key modules involved in disease development. 43 key genes involved in the top two modules were identified. Six hub genes (CXCL2, CXCL3, PTGDR2, GRP, CXCL11, and AGTR1) were statistically associated with patient overall survival or disease-free survival. The functions of six hub genes were mainly related to the hormone and chemokine activities. In conclusion, the present study may help understand the molecular mechanisms of CRC development.


Introduction
Colorectal cancer (CRC) is one of the leading causes of malignant tumors [1,2]. e incidence rate of CRC is high, which seriously affects the patient's health. Sigmoidoscopy has become an effective surgery for treating CRC; however, it is associated with the disadvantages of bleeding, perforation [3], and the low prognosis rate [4,5]. Our understanding of the occurrence and development mechanism of CRC has been greatly improved; however, the cause and potential molecular mechanism of CRC are still unclear [6,7]. erefore, it is necessary to identify molecular hub genes and key pathways for understanding the molecular mechanism and discovering potential biomarkers for CRC.
At present, microarray technology is widely used in molecular mechanism exploration and has a wide range of application in molecular biology. It offers an efficient method for systematically screening tumor-related genes and identifying their regulatory mechanisms with bioinformatics [8,9]. e hub genes are the highly connected nodes in the PPI network, which have high probabilities of engaging in essential biological regulation [10], and have been reported in various types of cancer.
e Protein-Protein Interaction (PPI) network and hub genes analysis are used for revealing crucial biological processes [11], which provide efficient approaches for discovering key molecular mechanisms in cancer biology.
In this study, 853 differentially expressed genes (DEGs) were obtained by screening gene expression microarray dataset GSE87211, which includes 203 CRC samples and 160 control samples. e biological functions, signal pathway enrichment, and PPI network were used to establish the characterization of the DEGs for understanding the molecular mechanism underlying CRC. It might also provide new insights for the study of potential biomarkers of CRC.

Gene Expression Microarray Data Acquisition.
e NCBI Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih. gov/geo) database is a public functional genomics database with high-throughput gene expression data, chips, and microarrays. GSE87211 [12] was downloaded from GEO. e datasets were based on the GPL13497 platform (Agilent-026652 Whole Human Genome Microarray 4 × 44 K v2). GSE87211 contains 203 CRC samples and 160 control samples.

Identification of DEGs.
e DEGs were screened using linear models for microarray data (limma) package in R. Probe sets without corresponding gene symbols were removed, and they were further converted into the corresponding gene symbol according to the annotation information. e mean value of multiple probes for the same gene was calculated. |logFC(fold change)| > 2 and Adj. P value < 0.01 were considered as the threshold to identify the DEGs.

GO and KEGG Enrichment Analyses of DEGs.
e Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david.ncifcrf.gov) (version 6.8) is an online functional annotation tool to provide a comprehensive understanding of biological information of genes and proteins, including biological process (BP), cellular components (CC), and molecular function (MF). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment was further annotated and viewed online by KEGG Orthology-Based Annotation System (KOBAS, http://www.kobas.cbi.pku.edu.cn) (version 3.0). P value < 0.05 was considered as the threshold.

PPI Network Construction and MCODE Analysis.
e PPI network of DEGs was predicted using the online database Search Tool for the Retrieval of Interacting Genes (STRING, http://string-db.org) (version 11.0). e minimum required interaction score was set to 0.09. e protein nodes that have no interaction with other proteins were removed. Analyzing the functional interactions between proteins may provide insights into the biological mechanisms of action. Key modules and hub genes were further analyzed and visualized with app MCODE plugged in Cytoscape. e top two modules were displayed to show the density of nodes by STRING. e criteria for selection of key genes were as follows: MCODE scores >5.

Survival Analysis of Hub Genes.
e correlation between the key genes and the survival time of CRC patients were analyzed through the survival function in R package by using the clinical information from GSE87211 and TCGA datasets. e overall survival and disease-free survival analyses of each hub gene were performed in Gene Expression Profiling Interactive Analysis (GEPIA, http://gepia.cancer-pku.cn/ index.html) [13]. e RNA expression level of hub genes between CRC samples and control samples was visualized by GEPIA based on the integration of GTEx and TCGA projects in a standard processing pipeline. e protein expression level of hub genes was analyzed using the Human Protein Atlas database.

Identification of DEGs in CRC.
e gene expression profile of GSE87211, including 203 CRC samples and 160 control samples, was downloaded from the GEO database. e median value of each sample was normalized (Supplementary Figure 1). 853 DEGs were identified including 363 upregulated genes and 490 downregulated genes, shown in the volcano plot ( Figure 1(a)). Randomly selected 10 CRC and 10 control samples were clustered together, respectively, according to the expression level of top 50 significant DEGs shown in the heatmap (Figure 1(b)).

GO and KEGG Enrichment Analyses of DEGs.
To figure out the functions of DEGs, GO analysis was performed with DAVID tool.
e DEGs were classified into three functional groups: biological processes (BP), cell component (CC), and molecular function (MF). GO analysis results showed that changes in BP were significantly enriched in digestion, behavior, locomotory behavior, ion transport, chemotaxis, taxis, synaptic transmission; changes in CC were mainly enriched in the extracellular region, extracellular region part, extracellular space, cellcell signaling, proteinaceous extracellular matrix, and extracellular matrix; and changes in MF were mainly enriched in hormone activity, cytokine activity, calcium ion binding, passive transmembrane transporter activity, substrate-specific channel activity, growth factor activity, and channel activity (Figure 2(a)).
KEGG pathway analysis by KOBAS revealed that the DEGs were mainly enriched in cytokine-cytokine receptor interaction, drug metabolism, androgen and estrogen metabolism, neuroactive ligand-receptor interaction, nitrogen metabolism, steroid hormone biosynthesis, and so on (Figure 2(b)). From GO and KEGG results, we could anticipate that functions related to hormone activity, cytokine activity, and cytokine-cytokine receptor interaction might play important role in the progress of CRC.

PPI Network Construction and Key Genes Selection.
To discover the key gene, PPI network with 279 nodes and 1201 edges was constructed with the highest stringent minimum required interaction score of 0.09 (Figure 3(a)). A total of 43 genes in the top two modules were identified as key genes with score ≥5. 32 key genes in Module 1 and 11 key genes in Module 2 are shown as the density of nodes in Figures 3(b) and 3(c) ( Table 1).

Hub Genes Selection and Validation.
To select the hub genes from 43 key genes, overall survival analysis was firstly performed using clinical data obtained from the microarray dataset GSE87211. It showed that none of the key genes had a statistically significant relation to overall survival (Table 2). en, the overall survival and diseasefree survival analyses of each hub gene were performed by GEPIA using the TCGA dataset (Table 2). e alteration of CXCL 2 , CXCL 3 , PTGDR2, and GRP genes was statistically associated with a worse overall survival rate, whereas genes CXCL 11 and AGTR1 were statistically associated with a worse disease-free survival rate (Table 2). Patients with a higher gene expression level of CXCL 2 , CXCL 3 , PTGDR2, and CXCL 11 have a significantly better prognosis compared to those with lower expression level, while patients with lower expression of GRP and AGTR1 were shown better survival rates ( Figure 4). e inconsistent overall survival results between GSE87211 and TCGA might be due to the limited size of the clinical samples.
RNA and protein expression levels of CXCL 2 , CXCL 3 , CXCL 11 , and AGTR1 genes were statistically different between the CRC and control samples, and these results were also consistent with the RNA expression level observed in microarray dataset GSE87211 (Figures 5 and 6). Furthermore, the RNA expression level of six hub genes was significantly different in various types of cancer and suggested

Discussion
CRC is a disease caused by high modality. Gene mutations and abnormal expression have been demonstrated in the progression of CRC. Understanding the molecular mechanism of CRC is critically important for diagnosis and treatment. With the development of microarray and highthroughput sequencing, the alternative expression levels of thousands of genes could be simultaneously screened. Integrating and reanalyzing microarray data provide valuable information including hub genes, biological functions, and signaling pathways, which indicate new clues for the diagnosis and treatment of CRC.
In this study, we extracted the expression data of GSE87211 from the GEO dataset with high-quality data and clinical characters. e previous study reported that the microarray dataset GSE87211 was used for examining the expression level of genes potentially linked with the SNPs and studying the CRC risk loci by SNP array analysis [12]. However, no deep microarray analysis was shown in the previous study. In this study, we reanalyze the microarray data to explore the key genes involved in the molecular mechanism of CRC.  853 DEGs were identified, including 363 upregulated genes and 490 downregulated genes. It showed that DEGs were significantly enriched in terms of extracellular region, cell-cell signaling, cytokine-cytokine receptor interaction, drug metabolism, androgen and estrogen metabolism, neuroactive ligand-receptor interaction, and nitrogen metabolism. 43 key genes were identified with potential clinical value. Six hub genes (CXCL 2 , CXCL 3 , PTGDR2, GRP, CXCL 11 , and AGTR1) showed statistically different expression between the CRC and control samples and statistical correlation with the prognosis of CRC patients. CXCL 11 and CXCL 2 are belonged to the chemokine's superfamily, which are group of small secreted proteins. Chemokines could bind to G protein-coupled transmembrane receptors on target cells and recruit cells of the immune system to a site of infection. CXCR3 is G proteincoupled transmembrane receptor. Together, they play fundamental roles in the development, homeostasis, and function of the immune system. CXCL 11 , induced by IFN-c and IFN-b, has a high affinity with CXCR3. e CXCL 11 / CXCR3 axis regulates the differentiation of naive T cells to T helper 1 cells and regulates immune cell migration, differentiation, and activation [14]. e previous study had  demonstrated that chemokines could exert antitumor effects via recruiting T cells, enhancing immune responses, and suppressing tumor-associated angiogenesis [15]. Chimeric molecules CXCL 10 and CXCL 11 had an impressive antitumor efficacy [16]. However, studies also indicated that CXCL 11 played important roles in promoting the chemotaxis activity of TAM, which was related to the poor prognosis of colorectal cancer. It had been reported that the calculated score based on the expression of CXCL 11 , CXCL 9 , and CXCL 10 could stratify nonmetastatic clear-cell renal cell carcinoma (ccRCC) patients into different risk subgroups [17]. CXCR3 was upregulated and played a predominant role in the tumorigenicity of prostate cancer. It showed that overexpression of CXCR3 stimulated the proliferation and migration of cancer cells in vitro and in vivo, which are related to the progression of malignancies [18]. Secretome study in breast cancer cell lines revealed that CXCR3 was highly upregulated in aggressive cancer cells and revealed a functional role of CXCR3 as a potential target for cancer therapy [19].
CXCL 2 (C-X-C motif chemokine ligand 2) significantly enhanced the migration and invasion ability of hepatocellular carcinoma cells (HCCs). CXCL 2 overexpression profoundly attenuated HCC proliferation and growth and induced apoptosis in vivo by negatively regulating the cell cycle via the ERK1/2 signaling pathway [20]. CXCL 2 is expressed at sites of inflammation and may suppress hematopoietic progenitor cell proliferation. However, CXCL 2 has also been reported to act as an oncogene. e methylation status of CXCL 2 was significantly different between normal and hepatocellular carcinoma tissues. Tissues with higher CXCL 2 expression showed significantly more numbers of tumors, indicating that the regulation mechanism may be controlled by CXCL 2 methylation [21]. CXCL 2 knockdown results showed reduced expression of cancer stem cell proteins, cyclins, and EMT markers, mediating through Gαi-2 and Gαq/11 to promote tumorigenesis [22].
PTGDR2, also named CRTH2, is a prostaglandin D2 receptor and preferentially expresses in CD4 + effector T helper 2 cells. PTGDR2 is the receptor of PGD2. Knockdown of PTGDR2 and PGD2 expression in cancer stem cells (CSCs) resulted in enhanced expression of CSC markers and self-renewal ability. PGD2 inhibited tumor growth, incidence rate, and mesenteric metastasis in vivo. Further study showed that the expression of PTGDR2 reversed these effects, indicating a novel function of PGD2/PTGDR2 signaling on CSC regulation and tumorigenesis in gastric cancer [23]. Numbers of research about PTGDR2 focused on studying colon inflammationrelated diseases. e high expression level of PTGDR2 was predominantly observed in the mild inflammation in ulcerative colitis patients [24]. PTGDR2 also played a proinflammatory role in the TNBS-induced colitis model. Antagonism of PTGDR2 had been shown to promote antiallergic and anti-inflammatory effects in Crohn's disease [25]. PTGDR2, a transmembrane protein, may have more potential as cancer targets.
GRP, gastrin-releasing peptide, mainly regulates numerous functions of the gastrointestinal and central nervous systems and plays an important role in human various cancers. GRP expression may be a predictive of aggressive tumor biomarkers for stratifying stages of colorectal cancer [26]. Downregulation of the GRP reduced the numbers of cancer stem cells in vitro and further abolished tumor development in SCID mice [27]. In small-cell lung cancer (SCLC), the high expression level of GRP was related to high disease burden and negative prognostic signature might be used as a potential diagnosis biomarker [28]. AGTR1, angiotensin II receptor type 1, is a potent vasopressor hormone and acts as an important effector controlling blood pressure and volume in the cardiovascular system. DNA methylation of AGTR1 might be a performing candidate biomarker, screened by paired normal and CRC stool samples [29]. AGTR1 mediated cell movement and promoted lymph node metastasis by activating the FAK/ RhoA pathway in early-stage breast cancer [30]. AGTR1 hypermethylation is a promising biomarker in lung squamous cell carcinoma detection and diagnosis [31].

Conclusions
In conclusion, our present study performed a bioinformatic analysis of DEGs between paired normal and CRC to obtain hub genes, providing certain key pathway in the occurrence and progress of CRC. e real function of these hub genes needs to be explored further to determine the clinical and biological mechanism of CRC.  Data Availability e data of GSE87211 were downloaded from the NCBI Gene Expression Omnibus database (GEO, http://www. ncbi.nlm.nih.gov/geo).