A Five-microRNA Signature as Prognostic Biomarker in Colorectal Cancer by Bioinformatics Analysis

Mounting evidence has demonstrated that a lot of miRNAs are overexpressed or downregulated in colorectal cancer (CRC) tissues and play a crucial role in tumorigenesis, invasion, and migration. The aim of our study was to screen new biomarkers related to CRC prognosis by bioinformatics analysis. By using the R language edgeR package for the differential analysis and standardization of miRNA expression profiles from The Cancer Genome Atlas (TCGA), 502 differentially expressed miRNAs (343 up-regulated, 159 down-regulated) were screened based on the cut-off criteria of p < 0.05 and |log2FC|>1, then all the patients (421) with differentially expressed miRNAs and complete survival time, status were then randomly divided into train group (212) and the test group (209). Eight miRNAs with p < 0.005 were revealed in univariate cox regression analysis of train group, then stepwise multivariate cox regression was applied for constituting a five-miRNA (hsa-miR-5091, hsa-miR-10b-3p, hsa-miR-9-5p, hsa-miR-187-3p, hsa-miR-32-5p) signature prognostic biomarkers with obviously different overall survival. Test group and entire group shown the same results utilizing the same prescient miRNA signature. The area under curve (AUC) of receiver operating characteristic (ROC) curve for predicting 5 years survival in train group, test group, and whole cohort were 0.79, 0.679, and 0.744, respectively, which demonstrated better predictive power of prognostic model. Furthermore, Univariate cox regression and multivariate cox regression considering other clinical factors displayed that the five-miRNA signature could serve as an independent prognostic factor. In order to predict the potential biological functions of five-miRNA signature, target genes of these five miRNAs were analyzed by Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathway and Gene Ontology (GO) enrichment analysis. The top 10 hub genes (ESR1, ADCY9, MEF2C, NRXN1, ADCY5, FGF2, KITLG, GATA1, GRIA1, KAT2B) of target genes in protein protein interaction (PPI) network were screened by string database and Cytoscape 3.6.1 (plug-in cytoHubba). In addition, 19 of target genes were associated with survival prognosis. Taken together, the current study showed the model of five-miRNA signature could efficiently function as a novel and independent prognosis biomarker and therapeutic target for CRC patients.

Mounting evidence has demonstrated that a lot of miRNAs are overexpressed or downregulated in colorectal cancer (CRC) tissues and play a crucial role in tumorigenesis, invasion, and migration. The aim of our study was to screen new biomarkers related to CRC prognosis by bioinformatics analysis. By using the R language edgeR package for the differential analysis and standardization of miRNA expression profiles from The Cancer Genome Atlas (TCGA), 502 differentially expressed miRNAs (343 up-regulated, 159 down-regulated) were screened based on the cut-off criteria of p < 0.05 and |log2FC|>1, then all the patients (421) with differentially expressed miRNAs and complete survival time, status were then randomly divided into train group (212) and the test group (209). Eight miRNAs with p < 0.005 were revealed in univariate cox regression analysis of train group, then stepwise multivariate cox regression was applied for constituting a five-miRNA (hsa-miR-5091, hsa-miR-10b-3p, hsa-miR-9-5p, hsa-miR-187-3p, hsa-miR-32-5p) signature prognostic biomarkers with obviously different overall survival. Test group and entire group shown the same results utilizing the same prescient miRNA signature. The area under curve (AUC) of receiver operating characteristic (ROC) curve for predicting 5 years survival in train group, test group, and whole cohort were 0.79, 0.679, and 0.744, respectively, which demonstrated better predictive power of prognostic model. Furthermore, Univariate cox regression and multivariate cox regression considering other clinical factors displayed that the five-miRNA signature could serve as an independent prognostic factor. In order to predict the potential biological functions of five-miRNA signature, target genes of these five miRNAs were analyzed by Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathway and Gene Ontology (GO) enrichment analysis. The top 10 hub genes (ESR1, ADCY9, MEF2C, NRXN1, ADCY5, FGF2, KITLG, GATA1, GRIA1, KAT2B) of target genes in protein protein interaction (PPI) network were screened by string database and Cytoscape 3.6.1 (plug-in cytoHubba). In addition, 19 of target genes were associated with survival prognosis. Taken together, the current study showed the model of five-miRNA signature could efficiently function as a novel and independent prognosis biomarker and therapeutic target for CRC patients.

INTRODUCTION
CRC is a very common gastrointestinal tumor with high incidence and mortality. It was estimated that more than 1.8 million new colorectal cancer cases and 0.88 million deaths will occur in 2018, accounting for about 1 in 10 cancer about incidence and mortality (1). CRC patients usually show a survival rate of <5 years due to early metastasis. Although treatments (such as surgery, radiotherapy, chemotherapy, and targeted therapy) have been developed fleetly, high recurrence, and poor prognosis remain troubling issues (2). Although various biomarkers have been discovered and were associated with the occurrence, progression and prognosis of colorectal cancer to date (3), their reliability remains controversial. Consequently, it is urgent to screen new potential diagnostic and prognostic biomarkers or therapeutic targets for CRC.
MicroRNAs (miRNAs), a vital component of the noncoding RNA family, are approximately made up of 18-25 nucleotides, which almost function via binding 3 ′ untranslated regions(UTR)or 5 ′ UTR of mRNA to suppress translation and promote mRNA cleavage (4). Along with the advances of human genome-sequencing technology, a great number of miRNAs have been abundantly discovered. Increasing evidence demonstrated that miRNAs regulated various oncogenesis processes including cellular proliferation, angiogenesis, differentiation, and apoptosis by binding oncogenes or tumor suppresser genes (5). Zhang et al. displayed miRNA-519b-3p functioned as a tumor suppressor miRNA to suppress colorectal cancer cell proliferation and invasion by regulating the umtck/wnt signaling pathway (6). Wang et al. exhibited that miRNA-496 accelerated epithelialmesenchymal transition and migration of CRC via targeting RASSF6, which was involved in Wnt-pathway (7). Huang et al. demonstrated miR-506 inhibited cell proliferation, invasion, and migration of CRC via reducing NR4A1 expression (8). Studies on miRNA in colorectal cancer are far more than that, there are also some studies on miRNA as prognostic factors, including single, and multiple combinations. Although TCGA database has been used to construct the miRNA signature prognostic models for colon cancer (9,10), there are still some shortcomings with no miRNAs matures, model validation, and risk assessment.
In the present study, we constructed, verified and assessed a novel five-miRNA signature that predicted effectively over survival of CRC patients derived from TCGA database. Functional enrichment analysis revealed potential biological functions and signal pathways of five-miRNA signature associated with cancer, which enhances our understanding to molecular mechanisms of model in CRC.

Identification of Differentially Expressed miRNAs, mRNA, and Their Combination With Patient Survival Data
We used R language 3.6.1 version edgeR package to compare the miRNA and mRNA expression of tumor group with normal group and normalize the expression profile of miRNAs and mRNA, whose mean value was >1, the screening criteria were corrected p value (FDR) <0.05 and |log2FC|>1 (11). We selected the clinical information of patients with survival time ≥30 days and combined it with differentially expressed and standardized miRNA and mRNA expression profiles.

Grouping of Samples and Construction, Validation, and Evaluation of Prognostic Models
We used the R language 3.6.1 version "caret" package to randomly divide the samples with complete survival information and differentially expressed miRNA expression profiles into two groups (train group and test group), and performed univariate Cox regression analysis of miRNAs for the train group.
In order to reduce the number of miRNAs with similar expression, miRNAs with p value< 0.005 were subjected to a stepwise multivariate Cox regression to construct the prognostic model. In the multivariate Cox regression analysis, we took advantage of the function of "Coxph" and "direction=both" in R language survival package (12). Then, the risk score of a prognostic miRNA signature comprising multiple miRNAs was established based on the summation of the product of each miRNA and its coefficient. Furthermore, we tested the Proportional Hazards Assumption in Cox model. This model was used to evaluate the survival prognosis of each patients in train group, test group, entire group using Kaplan-Meier curve, and log-rank test according to median value grouping of risk score, namely high risk group, and low risk group. The predictive power of the miRNA signature was assessed by calculating AUC of 3 years dependent ROC curve using "survivalROC" package (13).

Independent Prognostic Ability of the miRNA Signature Including Other Clinical Variables
The relationship between the prognostic miRNA signature and patients' overall survival was analyzed in the train Frontiers in Oncology | www.frontiersin.org group by univariate Cox regression, as well as clinical variables (including age, gender, and clinical stage, lymph nodes, distant metastasis). Variables with p value < 0.05 in univariate Cox regression were further used for multivariate Cox regression analysis to determine whether they could function as independent prognostic factors. In order to compare the predictive power of this risk model compared to other clinical characteristics, we have drawn ROC curves for this model risk score and clinical characteristics. In addition, we tested the correlation of each miRNA to clinical features by using the SPSS 21.0 chi-square test, with a p-value of < 0.05 being considered meaningfully.

Target Genes Prediction of miRNA Signature and Their Potential Functions
We downloaded the miRNA prediction database from three miRNA target gene prediction websites including miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/), targetScan (http://www. targetscan.org) and miRDB (http://www.mirdb.org/), and used the Perl language to find the target genes of miRNA signature which are covered in at least 2 databases, meanwhile, utilizing the Venn diagram, and Cytoscape 3.6.1 to map the relationship between the miRNA and these target genes. To clarify whether the target genes of these miRNAs are likely to participate in the progression of colorectal cancer, we taken the intersection of these target genes and differentially expressed genes in colorectal cancer. All of these intersection genes obtained were analyzed by Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathway and Gene Ontology (GO) enrichment analysis through the R language "clusterProfiler" package (14) and the "org.Hs.eg.db" package, The p adjust < 0.05 and q value < 0.05 was set as the cut-off criteria.

Screening of Hub Genes and Survival Related Gene
The PPI network of the STRING database (https://string-db. org/) (15) was applied to unearth the relationship between the target genes, the parameter of settings the medium confidence is 0.400. Then, the network relationship file was downloaded and the top 10 hub genes were identified in accordance with Cytoscape 3.6.1 and its plug-in (degrees ranking of cytoHubba). Meanwhile, The Kaplan-Meier method was used to check whether the intersection gene is related to over survival, log rank test < 0.05.  Frontiers in Oncology | www.frontiersin.org

Statistical Analysis
All statistical analyses are based on R language 3.6.1 version and attached packages.

Identification of Differentially Expressed miRNAs and mRNAs
Based on this screening criteria, miRNA mature expression profiles between 464 tumor samples and 9 normal samples showed 502 differentially expressed miRNAs (DEmiRNAs), of which 343 were up-regulated and 159 were downregulated (Figure 1). mRNA expression profiles between 488 tumor samples and 42 normal samples showed 5,540 differentially expressed mRNAs (DEmRNAs), of which 2992 were up-regulated and 2,548 were down-regulated, displayed in Supplemental Table 1.

Construction of the Predictive Five-miRNA Signature
The entire group (N = 421) with miRNA mature expression profiles was randomly divided into train group (N = 212) (Supplemental Table 2) and test group (N = 209) (Supplemental Table 3). The univariate Cox regression analysis displayed that a total of thirty-two miRNAs were found to be associated with patients' overall survival (p value < 0.05) in the train group. For the reliability of the model, eight miRNAs (p value < 0.005) were selected for further analysis ( Table 2).
Prediction of the Five-miRNA Signature for Over Survival in the Train Group, Test Group, and Entire Group Based on median value grouping of risk score. Kaplan-Meier curves shown high risk group had an obviously poorer overall survival compared to low risk group in the train group (p = 1.001E-02), test group (p = 4.164E-04) and entire group (p = 2.12E-05; Figures 3A-C). The train group shown overall survival of 5 years for patients with high and low risk group were 60.0 and 72.8%, respectively. The test group demonstrated that overall survival of 5 years for patients with high and low risk group were 39.9 and 62.7%, respectively. The entire group displayed that overall survival of 5 years for patients with high and low risk group were 53.0 and 62.8%, respectively.
Evaluation of the Five-miRNA Signature for Over Survival in the Train Group, Test Group, and Entire Group The AUC of 3 years dependent ROC for the five-miRNA signature achieved 0.790, 0.679, 0.744, respectively, in the train group, test group and entire group (Figures 3D-F), which demonstrated the better performance of model in predicting CRC patient survival risk. In addition, in the three groups, the patients with high risk score had higher mortality rates than low (Figures 3G-I).

Independence of the Five-miRNA Signature Considering Other Clinical Factors
Univariate Cox regression analysis exhibited that the five-miRNA signature was evidently associated with patients' overall survival (hazard ratio HR = 1.286, confidence interval 95% CI = 1.164-1.420, p = 6.719E-07; Table 3). Multivariate Cox regression analysis pointed out that the five-miRNA signature remained independent with overall survival considering other conventional clinical factors (HR = 1.326, 95% CI = 1.168-1.505, p = 1.23E-05), such as clinical stage, T stage, Lymph-node status, distant metastasis, which makes it possible to be a prognostic marker for CRC in the future. Meanwhile, distant metastasis was also found to be an independent prognostic factor (HR = 2.976, 95% CI = 1.285-6.891, p = 0.01). The ROC curves for this model risk score and clinical characteristics demonstrated that risk score (0.777), clinical stage (0.810), T stage (0.707), Lymphnode status (0.725), and distant metastasis (0.744) had a high predictive ability (Figure 4). In addition, the results about the correlation of each miRNA to clinical features demonstrated hsa-miR-10b-3p was associated with T stage (p = 0.011), hsa-miR-9-5p was associated with age (p = 0.032), and clinical stage (p = 0.049), hsa-mir-3189 was associated with Metastasis (p = 0.002) and clinical stage (p = 0.042; Table 4), which further  suggested that these miRNAs do have a close relationship with some clinical features.

Prediction of Target Genes for the Five miRNAs
The target genes regulated by the five miRNAs, were predicted in at least 2 databases. To further enhance the reliability of the bioinformatic analysis, the overlapping target genes were identified. The results indicated that 41, 272, 701, 31, and 752 overlapping genes were identified for hsa-miR-5091, hsa-miR-10b-3p, hsa-miR-9-5p, hsa-miR-187-3p, hsa-miR-32-5p, respectively, by the three databases above, which were shown using Venn diagram ( Figure 5) and network map of miRNAtarget genes (Supplemental Figure 1). A total of 1,672 target genes was predicted for the five miRNAs. To clarify whether the target genes of these miRNAs are likely to participate in the progression of CRC, the above obtained 5540 DEmRNAs (upregulated 2992, down-regulated 2548) was used for analysis. The intersection of target mRNAs for down-regulated miRNAs (hsa-miR-5091, hsa-miR-187-3p) and upregulated mRNAs, and target mRNAs for upregulated miRNAs (hsa-miR-32-5p, hsa-miR-10b-3p, hsa-miR-9-5p) and downregulated mRNAs were taken. The results were performed on a total of 246 genes including 12 up-regulated genes, 234 down-regulated genes, respectively (Supplemental Figure 2). The sub network between the five miRNAs and their 246 target genes was shown in Figure 6.

Functional Enrichment Analysis of Target Genes Associated CRC
The results of GO annotation about the target genes associated CRC are 234 (Supplemental Table 5). The top fifteen terms from the GO results: biological process (BP), cellular component (CC), and molecular function (MF) were demonstrated in dotplot (Figures 7A-C). In the three categories, BP analysis mostly include axon development, axonogenesis, and stem cell differentiation, CC analysis was mainly contained synaptic membrane, postsynaptic membrane and neuronal cell body, MF analysis mainly contained metal ion transmembrane transporter activity, transcriptional activator activity and DNA binding, ion channel binding. The results of KEGG pathways about the target genes associated CRC are 18 (Table 5), of which counts > 10 were mainly enriched in the cGMP-PKG signaling pathway, cAMP signaling pathway, Calcium signaling pathway, Neuroactive ligandreceptor interaction In addition, to provide a readable graphic representation of the complex relationship between target genes and relative KEGG pathway, the "pathway-gene network" and "pathway-pathway network" was also shown in Figures 7D-F.

DISCUSSION
Colorectal cancer is a highly malignant tumor, which is particularly prone to liver and lung metastasis, seriously affecting the survival prognosis of patients (16). Therefore, finding a prognostic marker with high specificity and sensitivity is becoming more and more urgent for patients. Extensive evidence displayed miRNAs can regulate the expression of abundant genes, playing critical roles in many biological processes of human malignant tumor (17). Especially, recent studies have revealed that distinct miRNA-expression profiles seriously affected the development and progression of CRC (18,19). At present, several miRNAs are known to be used as potential prognostic    (22), and miR-217 (23). However, overwhelming studies manifested that multiple miRNA signature have bigger advantages than single miRNA on the hand of statistically robust analysis. Thence before our study, there have been a lot of prognostic markers based on multiple miRNA signature in tumors (24)(25)(26), especially colorectal cancer (9,10,27). There are many differences between our research and previous studies yet, such as research methods, sample size, and most importantly, we use miRNA matures and sample groupings to validate the model. In the current study, we download mature miRNA expression profiles and corresponding patients' clinical information of CRC   (31). However, the current research mechanism of hsa-miR-5091 in tumors has not been reported yet, so more experiments in the future need to be carried out to hsa-miR-5091, especially in CRC.
To further understand the regulatory mechanism of the five-miRNA signature in colorectal cancer, the target genes of five miRNAs in the model were predicted by three target gene prediction databases. At the same time, based on the study of colorectal cancer, we obtained the intersection of the target genes of these miRNAs and the differentially expressed genes from the TCGA database, and performed functional enrichment analysis on these intersection genes. The GO annotation of the target genes was mainly associated with axon development, axonogenesis and stem cell differentiation, synaptic membrane, postsynaptic membrane, and neuronal cell body, metal ion transmembrane transporter activity, transcriptional activator activity and DNA binding, ion channel binding. The signal pathways of the target genes mainly enriched in the cGMP-PKG signaling pathway, cAMP signaling pathway, Calcium signaling pathway, Neuroactive ligand-receptor interaction. Ren et al. illuminated that the cGMP/PKG signaling pathway played an essential role on proliferation and survival of human renal carcinoma cells (32). Park et al. displayed that the cAMP signaling pathway regulated by the Epac-Rap1-Akt pathway caused suppression of JNK-dependent HDAC8 degradation, which augments cisplatin-induced apoptosis by inhabiting TIPRL expression in lung cancer cells (33). Monteith GR reviewed that calcium signaling pathway not only played key role on proliferation, invasion and sensitivity to cell death, but also in the establishment and maintenance of multidrug resistance and the tumor microenvironment (34). These signaling pathways show their effects on tumors to varying degrees, and these three signaling pathways are only the tip of the iceberg of the target gene involved in signaling pathway, which prompts that our constructed miRNA prognosis model may be involved in the regulation of tumor signaling pathways.
In order to find key nodes of the miRNA signature model regulating colorectal cancer 10 hub genes (ESR1, ADCY9, MEF2C, NRXN1, ADCY5, FGF2, KITLG, GATA1, GRIA1, KAT2B) were screened according to Cytoscape 3.6.1 and its plug-in (degree ranking of cytoHubba). In addition, the Kaplan-Meier method showed that the expression of 18 genes (AHCYL2, AKR1B10, CBFA2T3, CCNJL, CCR9, CLIC5, DPP10, FAM46C, GATA1, IQGAP2, MAN1A1, MIER1, NR5A2, PHLPP2, PTGER4, RBM47, RPS6KA5, TSPAN11) were positively associated with survival prognosis, however the high expression of SRCIN1 shown a poorer over survival. Surprisingly, GATA1 (GATA binding protein 1) is not only a key gene in the PPI network, but also related to over survival of patients, which encodes s a protein which belongs to the GATA family of transcription factors and promoted erythroid development via adjusting the switch of fetal hemoglobin to adult hemoglobin. Wang et al. pointed out that decreased of GATA-1 was to the benefit of high expression of IRF-3 in lung adenocarcinoma cells by binding with a specific domain of IRF-3 promoter, consequently, alternating the immunomodulatory function in tumorigenesis (35). Thus, the miRNA signature may affect the survival prognosis of colorectal cancer patients and the colorectal cancer progression through regulating GATA1.

CONCLUSION
In summary, our study not only constructed a new predictive model of miRNA signature prognosis through miRNA mature expression profiling, but also by grouping to verify and evaluating the predictive ability of the model, the most important thing is that it can be used as an independent prognostic factors in CRC. In addition, the potential function is inferred by predicting the target genes of the model, which enhance our comprehension to tumorigenesis and progression of CRC. However, this is just a study based on the TCGA database using bioinformatics. We hope that there will be other databases and a large number of experiments to verify the feasibility of this prognostic model in the future and provide a reliable predictor and therapeutic target for CRC patients.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

AUTHOR CONTRIBUTIONS
YZ downloaded the miRNA and mRNA expression information. GY constructed miRNA signature model and performed the statistical analysis using R language software, and wrote the first draft of the manuscript. JY contributed conception and design of the study and checked the manuscript.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fonc. 2019.01207/full#supplementary-material Supplemental Figure 1 | The network map between miRNAs and target genes. The hexagon represents miRNA, the circle stands for mRNA. Red means upregulated, blue means downregulated.
Supplemental Figure 2 | The intersection of target mRNAs for miRNA and differentially expressed mRNAs.
Supplemental Table 1 | Differentially expressed mRNAs between colorectal cancer samples and normal samples.
Supplemental Table 2 | GO annotation of the target genes.
Supplemental Table 3 | Survival information of differentially expressed miRNA train group.
Supplemental Table 4 | Proportional Hazards Assumption in Cox model. Table 5 | GO annotation of the target genes.