Uncovering potential genes in colorectal cancer based on integrated and DNA methylation analysis in the gene expression omnibus database

Colorectal cancer (CRC) is major cancer-related death. The aim of this study was to identify differentially expressed and differentially methylated genes, contributing to explore the molecular mechanism of CRC. Firstly, the data of gene transcriptome and genome-wide DNA methylation expression were downloaded from the Gene Expression Omnibus database. Secondly, functional analysis of differentially expressed and differentially methylated genes was performed, followed by protein-protein interaction (PPI) analysis. Thirdly, the Cancer Genome Atlas (TCGA) dataset and in vitro experiment was used to validate the expression of selected differentially expressed and differentially methylated genes. Finally, diagnosis and prognosis analysis of selected differentially expressed and differentially methylated genes was performed. Up to 1958 differentially expressed (1025 up-regulated and 993 down-regulated) genes and 858 differentially methylated (800 hypermethylated and 58 hypomethylated) genes were identified. Interestingly, some genes, such as GFRA2 and MDFI, were differentially expressed-methylated genes. Purine metabolism (involved IMPDH1), cell adhesion molecules and PI3K-Akt signaling pathway were significantly enriched signaling pathways. GFRA2, FOXQ1, CDH3, CLDN1, SCGN, BEST4, CXCL12, CA7, SHMT2, TRIP13, MDFI and IMPDH1 had a diagnostic value for CRC. In addition, BEST4, SHMT2 and TRIP13 were significantly associated with patients’ survival. The identified altered genes may be involved in tumorigenesis of CRC. In addition, BEST4, SHMT2 and TRIP13 may be considered as diagnosis and prognostic biomarkers for CRC patients.

Therefore, it is important to understand the pathological mechanism of CRC.
Simons CCJM et al. found that the CpG island methylated phenotype is a major factor contributing to CRC carcinogenesis [13]. Furthermore, gene expression regulation by aberrant DNA methylation is extensively described for CRC. For example, abnormal methylation of septin 9 (SEPT9) is frequently reported in CRC, and the SEPT9 methylation test has been used in early screening for CRC [14][15][16]. In order to further investigate the pathological mechanism of CRC, we performed both integrated analysis and DNA methylation analysis in the Gene Expression Omnibus database to find potential and valuable genes in CRC.

Datasets retrieval
We searched datasets from the GEO dataset with the keywords (Colorectal cancer) AND "Homo sapiens" [porgn:__txid9606]. All selected datasets were gene transcriptome and genome-wide DNA methylation expression data in the CRC tumor tissues and normal controls. Finally, a total of 3 datasets of gene transcriptome data (GSE113513, GSE87211 and GSE89076) and 2 datasets of genome-wide DNA methylation expression data (GSE101764 and GSE129364) were identified (Table 1). Clinical information of above datasets is shown in supplementary Table 1.

Identification of differentially expressed and differentially methylated genes
Firstly, scale standardization was carried out for the common genes in 3 datasets of gene transcriptome data. The metaMA and limma packages were used to identify differentially expressed genes [17]. P values and effect sizes from data were calculated either from classical or moderated t-tests. These p values were combined by the inverse normal method. Benjamini hochberg threshold was used to calculate the false discovery rate (FDR).
Finally, differentially expressed genes were obtained with the criterion of FDR and |Combined.effect size| ≥ 1.5. In addition, quantile standardization was performed for the common genes in 2 datasets of genome-wide DNA methylation expression data. Benjamini hochberg threshold was used to calculate the FDR. COHCAP package in R language was used to identify differentially methylated genes under the threshold of |Δβ| > 0.3 and FDR < 0.05.

Functional analysis of differentially expressed and differentially methylated genes
To understand the function of differentially expressed and differentially methylated genes, we conducted Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis through David 6.8 (https:// david. ncifc rf. gov/). FDR < 0.05 was considered as significant.

PPI network
The BioGRID database was used to retrieve the predicted interactions between top 50 proteins and other proteins. In the network, node and edge represents protein and the interactions, respectively.

Electronic and in vitro validation of differentially expressed and differentially methylated genes
The Cancer Genome Atlas (TCGA) dataset (involved 478 patients with CRC and 41 normal controls) was used to validate the expression of differentially expressed and differentially methylated genes. The expression result of these genes was shown by box plots.
In vitro validation QRT-PCR was also performed. The inclusion criteria of CRC patients was as follows: (1) Patients were diagnosed with CRC according to the pathological examination; (2) Patients underwent radical resection of CRC for the first time and received no chemoradiotherapy before; (3) patients had complete clinical data including medical history of present illness, personal history, family history, detailed physical examination data and  Table 2. The tumor tissue and para-carcinoma tissue of these patients was collected. All participating individuals provided informed consent with the approval of the ethics committee of the local hospital. All the experimental protocol for involving humans was in accordance to guidelines of national/international/institutional or Declaration of Helsinki. Total RNA of the tissue and para-carcinoma tissue was extracted and synthesized DNA by FastQuant cDNA first strand synthesis kit (TIANGEN). Then real-time PCR was performed in the SuperReal PreMix Plus (SYBR Green) (TIANGEN). ACTB and GAPDH were used for internal reference. Relative mRNAs expression was analyzed by log2 (fold change) method.

Diagnosis and prognosis analysis of differentially expressed and differentially methylated genes
We performed the ROC and survival analysis to assess the diagnostic and prognostic value of differentially expressed and differentially methylated genes in the TCGA dataset.

Differentially expressed and differentially methylated genes in the GEO dataset
There were 17,323 common genes in 3 datasets of gene transcriptome data. After scale standardization and differential expression analysis, a total of 1958 differentially expressed genes were identified in CRC. Top 20 differentially expressed genes were listed in Table 3. The heat map of top 100 differentially expressed genes was shown in Fig. 1. Additionally, there were 485,511 common methylation sites in 2 datasets of genome-wide DNA methylation expression data. After quantile standardization and differential methylation analysis, a total  of 2661 differentially methylated sites were screened out in CRC. Correspondingly, there were 858 differentially methylated genes (800 hypermethylated genes and 58 hypomethylated genes) in these differentially methylated sites. The Manhattan and heat map of all differential methylated sites was shown in Fig. 2 and Fig. 3, respectively. Some differentially expressed genes, such as down-regulated GFRA2 was hypermethylated gene. Upregulated MDFI was hypomethylated gene.

Biological function of differentially expressed and differentially methylated genes
All differentially expressed genes were the most significantly enriched in the biological process of DNA replication (Fig. 4A), cytological component of nucleoplasm ( Fig. 4B) and molecular function of protein binding (Fig. 4C). In addition, cell cycle, DNA replication and purine metabolism (involved IMPDH1) were the most remarkably enriched signaling pathways of differentially expressed genes (Table 4). Additionally, all differentially methylated genes were the most significantly enriched in the biological process of homophilic cell adhesion via plasma membrane adhesion molecules (Fig. 5A), cytological component of plasma membrane (Fig. 5B) and molecular function of sequence-specific DNA binding (Fig. 5C). Neuroactive ligand-receptor interaction, calcium signaling pathway, cAMP signaling pathway, cell adhesion molecules (CAMs), PI3K-Akt and Rap1 were the most remarkably enriched KEGG signaling pathways of all differentially methylated genes (Fig. 5D).

Diagnosis and survival prediction of key differentially expressed and differentially methylated genes
Firstly, we performed ROC curve analyses to assess the diagnosis ability of GFRA2, FOXQ1, CDH3, CLDN1, SCGN, BEST4, CXCL12, CA7, SHMT2, TRIP13, MDFI and IMPDH1 in the TCGA dataset (Fig. 9). The AUC of these genes was more than 0.7, which suggested that they had a diagnostic value for CRC. In addition, we further analyzed the potential prognostic value of these genes. The result showed that BEST4, SHMT2 and TRIP13 were considered to be remarkably negatively associated with survival (p < 0.05) time with CRC patients. The survival curves of GFRA2, FOXQ1, CDH3, CLDN1, SCGN, BEST4, CXCL12, CA7, SHMT2, TRIP13, MDFI and IMPDH1 were illustrated in Fig. 10. [18,19]. It is reported that ret. proto-oncogene (Ret) signaling through the combination of GFRA2 and neurturin (NRTN) is associated with the development of enteric nervous system [20]. Macartney-Coxson DP et al. found that GFRA2 was remarkably down-regulated in the process of CRC and possibly related to liver metastasis [21]. In mice, the function inhibition of MyoD family inhibitor (MDFI) promotes the regeneration of the gastrocnemius muscle after injury [22]. In addition, MDFI is over expressed in CRC tumors and high expression of MDFI is associated with tumor metastasis [22]. In this study, we found that down-regulated GFRA2 and upregulated MDFI were differentially expressed-methylated genes in CRC. This indicated that gene methylaton may be associated with gene expression changes. Moreover, GFRA2 and MDFI had a diagnostic value for CRC patients. Our study further demonstrated the key roles of GFRA2 and MDFI in the process of CRC.

GDNF family receptor alpha 2 (GFRA2) plays an important role in immune cells and intermediate monocytes in cancer
Forkhead box Q1 (FOXQ1), a transcription factor, activates target mRNA expression to regulate CRC cell migration, growth, epithelial-mesenchymal transition and chemoresistance [23,24]. It is found that FOXQ1 is over expressed in tumor tissues of CRC and its high expression is significantly related to the stage Fig. 2 The Manhattan of all differential methylation sites in CRC. The x-axis represents the chromosome, the y-axis represents the -log10 (FDR) of differential methylation sites and lymph node metastasis of CRC [25]. In addition, knock-down of FOXQ1 gene reduces the activity of Wnt signaling pathway [25]. These reports suggest that FOXQ1 can be considered as a potential therapeutic target for CRC. Cadherin 3 (CDH3), involved in cellcell adhesion, is used to detect lymph nodes metastatic in patients with CRC [26,27]. It has been demonstrated that hypomethylation is associated with CRC [28]. Furthermore, CDH3 is more frequently demethylated in advanced CRC [29]. In CRC, silencing the CDH3 genes lead to a remarkable decrease in tumor cell viability and proliferation [30]. Claudin 1 (CLDN1) is associated  with CRC tumor invasion, lymph node metastasis and tumor grade and stage [31]. High expression of CLDN1 has been found in primary and metastatic CRC, and CRC cell lines [32][33][34][35]. Additionally, CLDN1 is remarkably hypomethylated in tumor samples of CRC [31]. CLDN1 targeting with the anti-CLDN1 monoclonal antibody reduces growth and survival of CRC cells, which suggest that CLDN1 can be a potential new therapeutic target for CRC [36]. Herein, we found that expression FOXQ1, CDH3 and CLDN1 were top 10 upregulated genes in CRC. Furthermore, FOXQ1, CDH3 and CLDN1 had a diagnostic value for CRC patients. Our findings may provide new insight into the cancer biology of CRC.
Secretagogin, EF-hand calcium binding protein (SCGN) expresses in normal endocrine tissues, such as  hsa04110 Cell cycle  39  4.93E-09 E2F1, E2F3, CDC14A, TTK, PRKDC, PTTG2, CHEK1, CHEK2, CCNE1, CDC45, MCM7, TFDP2, BUB1,  ORC5, ORC6, CCNA2, MYC, TFDP1, ANAPC1, CDK1, RBL1, SKP2, ESPL1, CDC20, MCM2, CDK4,  CDC25C, MCM3, MCM4, CDK2, MCM6, CDC25B, CCNB1, CCND1, HDAC2, CCNB2,  neuroendocrine cells of gastrointestinal tract [37]. In mice, Scgn gene deficient leads to colitis, which highlights the role of Scgn in intestinal immune homeostasis [38]. The expression of bestrophin 4 (BEST4) is decreased in colon tumor, colon adenocarcinoma and rectal adenocarcinoma and CRC [39][40][41][42]. In addition, BEST4 expression is remarkably negatively related to the survival probability of patients with CRC after surgery [42]. C-X-C motif chemokine ligand 12 (CXCL12) plays important roles in the immune system. CXCL12 is associated with promotes CRC tumor cell growth, liver migration, survival rate and recurrence rate [43,44]. It is reported that the CXCL12 gene polymorphism could contribute to CRC by mediating tumor angiogenesis, progression, metastasis and leukocyte migration [45]. It is assumed that the CXCL12-G801A polymorphism can be used to indicate and detect stage T2 CRC [46]. In addition, activation of the CXCL12/C-X-C motif chemokine receptor 4 (CXCR4) axis renders CRC cell less sensitive to radiotherapy [47]. Carbonic anhydrase 7 (CA7) is expressed in  various normal tissues including colon [48]. Decreased expression of CA7 has been found in rectal cancer, rectal adenocarcinoma and CRC [49][50][51]. It is worth mentioning that CRC patients with lower CA7 expression had a remarkable shorter disease-specific survival in early stage tumors [51]. In the present study, we found that SCGN, BEST4, CXCL12 and CA7 were top 10 down-regulated genes in CRC. Both of them had a diagnostic value for patients with CRC. Interestingly, BEST4 was significantly related to survival time of CRC patients. Our result indicated that SCGN, BEST4, CXCL12 and CA7 could be involved in the development of CRC. According to the PPI analysis, we found several high degree proteins encoded by differentially expressed genes, such as serine hydroxymethyltransferase 2 (SHMT2) and thyroid hormone receptor interactor 13 (TRIP13). SHMT2, a key regulator in the serine/glycine metabolism pathway, is involved in cancer proliferation [52,53]. It is revealed that SHMT2 is up-regulated in colon cancer [54]. It is noted that SHMT2 is associated with the occurrence and development of CRC [55]. Moreover, SHMT2 regulation by acetylation plays a crucial role in colorectal carcinogenesis [56]. TRIP13 promotes CRC cell growth, proliferation, invasion, migration and subcutaneous tumor formation [57]. It is found that high expression of TRIP13 is related to poor prognosis in CRC [57]. Additionally, TRIP13 is involved in colorectal adenoma-to-carcinoma progression [58]. In our study, the expression of SHMT2 and TRIP13 was increased in CRC. Significantly, both SHMT2 and TRIP13 had a remarkable diagnostic and prognostic value for CRC.
In addition, we found some significantly enriched signaling pathways of identified genes, including purine metabolism (involved up-regulated inosine monophosphate dehydrogenase 1, IMPDH1), cell adhesion molecules and PI3K-Akt signaling pathway. Spurr IB et al. found that the targeting of de novo purine metabolism was a viable strategy to block Fig. 9 The ROC curves of GFRA2, FOXQ1, CDH3, CLDN1, SCGN, BEST4, CXCL12, CA7, SHMT2, TRIP13, MDFI and IMPDH1 between CRC and normal controls. The ROC curves were used to show the diagnostic ability of these genes with 1-specificity and sensitivity tumor growth in dividing cancer cells [59]. It has been demonstrated that purine metabolism is associated with the tumorigenesis of CRC [60]. The over expression of IMPDH1 has been found in CRC [61]. Some cell adhesion molecules such as selectins and immunoglobulin superfamily proteins play necessary roles in the CRC metastasis [62]. Ngan CY and Zlobec I et al. found that some cell adhesion molecules including E-cadherin and CD44v6 were lost at the invasive front of CRC [63,64]. The PI3K/Akt signaling pathway plays an important role in CRC and inhibition of the pathway is a potential therapeutic strategy of CRC [65,66].

Conclusions
In summary, we have obtained numerous differentially expressed and differentially methylated genes in CRC. Among which, GFRA2 and MDFI, were differentially expressed-methylated genes. It is suggested that DNA methylation may affect the expression changes of gene. Interestingly, GFRA2, FOXQ1, CDH3, CLDN1, SCGN, BEST4, CXCL12, CA7, SHMT2, TRIP13, MDFI and IMPDH1 were considered as the potential diagnostic biomarkers for CRC. In addition, BEST4, SHMT2 and TRIP13 could be used for prognostic detection molecule in CRC patients. However, there are limitations to our study. Firstly, the larger numbers of samples are further needed; Secondly, pyrosequencing and the QRT-PCR of gene methylation are further needed to respectively validate the methylation status and investigate the expression changes of methylated genes. Thirdly, the deeper mechanism study of the CRC is also explored.