Integrated Analysis of Oncogenic Networks in Colorectal Cancer Identifies GUCA2A as a Molecular Marker

Colorectal cancer (CRC) is one of the most common and deadly malignancies in the world. In China, the morbidity rate of CRC has increased during the period 2000 to 2011. Biomarker detection for early CRC diagnosis can effectively reduce the mortality of patients with CRC. To explore the underlying mechanisms of effective biomarkers and identify more of them, we performed weighted correlation network analysis (WGCNA) on a GSE68468 dataset generated from 378 CRC tissue samples. We screened the gene set (module), which was significantly associated with CRC histology, and analyzed the hub genes. The key genes were identified by obtaining six colorectal raw data (i.e., GSE25070, GSE44076, GSE44861, GSE21510, GSE9348, and GSE21815) from the GEO database (https://www.ncbi.nlm.nih.gov/geo). The robust differentially expressed genes (DEGs) in all six datasets were calculated and obtained using the library “RobustRankAggreg” package in R 3.5.1. An integrated analysis of CRC based on the top 50 downregulated DEGs and hub genes in the red module from WGCNA was conducted, and the intersecting genes were screened. The Kaplan–Meier plot was further analyzed, and the genes associated with CRC prognosis based on patients from the TCGA database were determined. Finally, we validated the candidate gene in our clinical CRC specimens. We postulated that the candidate genes screened from the database and verified by our clinical pathological data may contribute to understanding the molecular mechanisms of tumorigenesis and may serve as potential biomarkers for CRC diagnosis and treatment.


Introduction
Colorectal cancer (CRC) is a malignant tumor that ranks third in terms of incidence and second in terms of mortality worldwide [1]. Similarly, the incidence and mortality of CRC rank fifth in China [2]. Despite dramatic reduction in the overall CRC incidence and mortality [3,4], the morbidity rate in China is still rising from 2000 to 2011 [2]. Hence, further research is desperately needed to elucidate the causes for the increasing burden of CRC and to advance treatments for tumor subtypes with low response rates to current therapies. Treatment according to the distinctive tumor biology is an effective means to reduce the mortality of patients with CRC, as well as the detection of biomarkers that enable the stratification of patients with CRC into different prognostic subgroups and in relation therapeutic response [5]. Advances in RNA sequencing technologies and bioinformatics analysis provide novel potential biomarkers and drug targets for tumor treatment [6]. Weighted gene coexpression network analysis (WGCNA), as a systems biology algorithm, can enable the identification of highly coexpressed gene clusters (modules) [7]. en, such interest modules and hub genes related to clinical traits can be screened out and used to identify candidate biomarkers [8].
e robust rank aggregation (RRA) method can be used to integrate multiple sets of chip data gene lists and to perform comprehensive reordering to find the most significant difference genes [9]. RRA prevents cross-platform standardization processing, and the number of samples per chip has no strict limit, which is of great significance for the effective evaluation of the results of different gene expression profiles [10].
In our study, we performed WGCNA on GSE68468 and screened out the key gene modules and hub genes significantly associated with the CRC histology. Additionally, we conducted RRA on six raw data (i.e., GSE25070, GSE44076, GSE44861, GSE21510, GSE9348, and GSE21815) and calculated the top 50 robust DEGs in all the six data by using the library "RobustRankAggreg" package. We analyzed the integrated genes between DEGs and hub genes in the key module by using Kaplan-Meier analysis in the TCGA database and obtained the candidate genes associated with OS. Finally, we validated the candidate gene in our clinical CRC specimens. e candidate genes screened from the database and verified using our clinical pathological data might have significant clinical implications for CRC diagnosis, treatment, and prognosis prediction.

Weighted Gene Coexpression Network Construction and Module Detection.
We performed WGCNA to find the highly correlated genes. e sample dendrogram and trait heatmap are shown in Figure S1A. As shown in Figure S1B, power value 4 was set to guarantee the high-scale independence and low mean connectivity of 13515 genes. We set the dissimilarity as 0.25 to merge similar modules (Figure S1C), and 22 modules were generated (Figure 1(a)). Furthermore, the interaction relationship network of 22 modules was plotted (Figure 1(b)). From those obtained modules, the red module had the deepest association with tumor histology (cor � −0.76, P � 1E − 73), which was selected for further analysis (Figure 1(c)). Additionally, the module memberships in the red module and the gene significance had a high correlation (0.89) and a high P value (<1e − 200) (Figure 1(d)), suggesting its suitability for identifying the hub genes associated with CRC occurrence and metastasis.

Coexpression Network Construction and Hub Gene
Identification. In our study, we obtained 616 genes in the red modules. e hub genes in the red module were filtered by a condition of the weight value being greater than 0.15, and a total of 37 edges were obtained and are visualized in Figure 2. e top 10 hub genes were CA2, MS4A12, DHRS11, GUCA2A, ETHE1, CLCA4, TSPAN1, HSD11B2, AQP8, and CHP2.

RRA Analysis.
We calculated DEGs expressed between the cancerous and adjacent tissues in each dataset and displayed the results by using volcano maps ( Figures S2a-f ). en, DEGs in all six data were recalculated and reorganized using the library "RobustRankAggreg" package. A total of 464 robust DEGs were identified, including 176 upregulated genes and 288 downregulated genes (Table 1), by using adjusted P value < 0.01 and |logFC| ≥ 1 as the cutoff criteria. e 50 most prominent up-and downregulated genes were screened and are visualized in Figures 3(a) and 3(b).

Survival Analysis in the TCGA Database and Validation
in the GEO Database. By taking the intersection of the 50 downregulated DEGs and hub genes in the red module from WGCNA (Figure 4(a)), we obtained 19 interacting genes.
K-M analysis was conducted to evaluate the relationship between gene expression and the overall survival (OS) of CRC, and only GUCA2A was clearly related to the OS of CRC patients in the TCGA database. Patients with a lower GUCA2A expression had a significantly shorter OS than those with a higher expression (P < 0.05) (Figure 4(b)). Obviously, GUCA2A was abnormally expressed in tumor tissues and was significantly different in TCGA and GEO databases (Figures 4(c) and 4(d)). We further validated the aberrant expression of GUCA2A in GSE68468, which contains both CRC tissues and cellular RNA-Seq data. Compared with adjacent normal tissues, the expression of GUCA2A in tumor and metastatic tissues was significantly low, and the normal liver and lung tissues had the lowest expression value (P < 0.05) (Figure 4(e)). e colonic epithelial cell (NCM460) has the highest GUCA2A expression compared with other CRC cells (Figure 4(f )).

Human Tissue Samples and Quantitative Real-Time PCR.
We performed real-time PCR to examine the levels of GUCA2A in 31 paired CRC/adjacent tissues to further validate the dysregulated expression of GUCA2A (Figure 4(g)).
en, we further evaluated the diagnostic value of GUCA2A for CRC tissues and normal counterparts using ROC curve analyses. e results showed that the area under the ROC curve (AUC) was 0.835 (P < 0.001; sensitivity: 0.806; specificity: 0.903).
ese results suggest that GUCA2A downregulation may play an important role in colorectal tumorigenesis and has a potential diagnostic value for CRC patients.

Pathway Analysis.
e pathway enrichment analysis of GUCA2A coexpressed genes was conducted in the Con-sensusPathDB human database. From the 16 pathways shown in Figure 5, the transport of small molecules and metabolism are prominent.

Discussion
Early detection and complete resection before metastasis are considered the only curative therapy for CRC [11]. e fiveyear survival rate of CRC patients was obviously better when the localized disease was detected at an early stage than that of CRC patients with distant metastasis. Cancer is a molecularly heterogeneous disease, and none of the currently identified biomarkers are sensitive and specific enough for reliable CRC screening in the clinical setting. us, identifying novel molecular biomarkers has significant clinical benefits.
In our study, we first performed WGCNA for GSE68468 and screened the pathologically related hub genes. We conducted RRA analysis for six datasets and screened the top 100 robust DEGs, which had a high or low expression in all expression profiles. By taking the intersection, we obtained 19 candidate genes, and only the expression of GUCA2A was associated with the OS of CRC patients in the TCGA database. We found that GUCA2A was prominent and topranking in both the hub gene network ( Figure 2) and robust DEGs (Figure 3(b)), indicating its value in tumorigenesis.
Guanylin (GUCA2A) and uroguanylin (GUCA2B) are two peptide hormones that function as paracrine endogenous ligands for the guanylate cyclase-C (GUCY2C) receptor [12]. A previous study indicated the role of GUCY2C signaling in facilitating mucosal wounding and inflammation mediation, in part, through the control of resistin-like molecule β production [13]. GUCA2A, GUCA2B, and GUCY2C are downregulated in inflammatory bowel disease [14], which may have implications in inflammatory bowel disease pathogenesis. A recent study suggests that GUCY2C can suppress tumor progress [15] in the intestine, and the loss of the GUCY2C signaling cascade increases CRC susceptibility [16]. Intestinal homeostasis disruption and CRC tumorigenesis are associated with a fairly common loss of GUCA2A and GUCA2B [17][18][19]. Bashir et al. revealed the possibility of GUCA2A loss silencing GUCY2C, which leads to microsatellite instability tumor [20].
Consistent with the expression of GUCA2A in our study, the expression of GUCA2B was significantly downregulated in CRC tissues and had a relative high weight in the red module, which further indicates that GUCA2A and GUCA2B may play a consistent role in CRC neoplasia.
In conclusion, we revealed that GUCA2A was downregulated in CRC tissues. Aberrantly expressed GUCA2A can be a candidate marker of poor prognosis in patients with CRC, which may be a therapeutic target for precision medicine in cervical cancer. However, further studies are still needed to explore the underlying molecular mechanism through which GUCA2A plays a role in CRC. However, future in vivo and in vitro experiments are still required to explore the mechanisms underlying the roles of GUCA2A in CRC.           GSE9348  GSE21510  GSE21815  GSE44076  GSE25070  GSE44861   MMP7  THBS2  TGFBI  MMP3  DPEP1  CDH3  KRT23  NFE2L3  GZMB  COL11A1  CXCL1  CEMIP  FAP  SLCO1B3  CLDN1  MYC  IFITM1  TESC  CXCL11  EDNRA  MMP1  FOXQ1  TPX2  COL4A1  TCN1  EGFL6  TRIP13  PMAIP1  CXCL3  PSAT1  CTHRC1  PLS3  INHBA  IFITM2  PHLDA1  CSE1L  BMP4  CRNDE  S100A9  CDK4  XPOT  LY6E  CCL20  TSPAN5  CDC25B LGR5  GSE21510  GSE21815  GSE44076  GSE25070  GSE44861   GUCA2B  GUCA2A  CA4  MS4A12  CA2  AQP8  CLCA1  CLCA4  AKR1B10  CA1  HSD17B2  GCG  CWH43  MT1M  CHP2  FCGBP  ADH1C  CD177  CA12  CHGA  SLC26A3  CEACAM7  PLAC8  CLDN8  PCK1  BEST2  GBA3  MT1H  HMGCS2  SI  GCNT3  EDN3  SLC26A2  TSPAN1  UGT2B17  BCAS1  LRRC19  LGALS2  ANPEP  VSIG2  MT1E  MT1G  NXPE4  KRT20  CA7  MT1F  KLF4  AHCYL2  ITLN1 [7]. en, the modules whose eigengenes were highly correlated (correlation above 0.7) were merged. e gene network was visualized with randomly selected 400 genes. e resulting Pearson correlation matrix was transformed into a matrix of connection strengths (that is, an adjacency matrix) by using a power function (connection). All the modules were assigned to the corresponding color. e relationships of modules and clinical traits (i.e., disease status, histology, and organism part) were calculated. Among these clinical traits, pathology, including normal mucosa, polyps, CRC tissue, CRC with metastases, and normal lung/liver tissues, can reflect the occurrence and metastasis of CRC. e associations of individual genes with the clinical trait, namely, gene significance (GS), and the module eigengenes, namely, module membership (MM), were evaluated. en, the correlation between GS and MM was calculated, and the highly correlated interest module can be used to construct the coexpression network and identify the hub genes.

Coexpression Network Construction and Hub Gene
Identification.
e genes in the key modules screened from WGCNA were further analyzed.
We filtered the hub genes by a condition of the weight value (>0.15) and visualized them using Cytoscape 3.6.0 [21].

Survival Analysis in the TCGA Database.
In order to increase the reliability of the results, the intersection of hub genes in the red module from WGCNA and 50 downregulated robust DEGs was performed and analyzed. Kaplan-Meier (K-M) analysis was conducted to evaluate the relationship between gene expression and the overall survival (OS) of 617 CRC patients in e Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/). Patients were classified into high-or low-expression groups according to the median value. en, genes associated with CRC survival were screened.

Human Tissue Samples and Quantitative Real-Time PCR.
Overall, the 31 CRC and adjacent normal tissues obtained at Jiangsu Cancer Hospital (Nanjing, China) were frozen immediately after surgical resection and kept at 80°C until further analysis. Tumor histopathology was classified according to the World Health Organization Classification of Tumors system. e present study was done with the approval of the local ethics committee.

Node label color
The node label color denotes the type of the gene sets: Neighborhood  RNA isolation (Takara, Dalian, China) was performed according to the manufacturer's instructions. Reverse transcription was conducted with a PrimeScript RT reagent  [30,31] CRC GSE21815 GPL6480 9 132  4.6. Pathway Analysis. We performed pathway enrichment analysis for the coexpressed genes to explore the possible mechanism of the candidate gene in CRC. e coexpressed genes were obtained from the TCGA database, and the top 100 genes with the highest Pearson correlation coefficient were considered to be significantly coexpressed (Table 3). Pathway enrichment analysis was performed by using the ConsensusPathDB human database (http://cpdb.molgen.mpg. de/) [33]. e overrepresentation gene set analysis was utilized, and the following pathway databases were enrolled in our analysis: Manual upload, NetPath, SignaLink, PID, EHMN, HumanCyc, INOH, KEGG, BioCarta, WikiPathways, SMPDB, and PharmGKB. Minimum overlap input list >5 and P value cutoff <0.01 were considered significant enrichment.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.