Bioinformatic screening for candidate biomarkers and their prognostic values in endometrial cancer

Endometrial cancer is a common gynecological cancer with annually increasing incidence worldwide. However, the biomarkers that provide prognosis and progression for this disease remain elusive. Two eligible human endometrial cancer datasets (GSE17025 and GSE25405) were selected for the study. A total of 520 differentially expressed mRNAs and 30 differentially expressed miRNAs were identified. These mRNAs were mainly enriched in cell cycle, skeletal system development, vasculature development, oocyte maturation, and oocyte meiosis signalling pathways. A total of 160 pairs of differentially expressed miRNAs and mRNAs, including 22 differentially expressed miRNAs and 71 overlapping differentially expressed mRNAs, were validated in endometrial cancer samples using starBase v2.0 project. The prognosis analysis revealed that Cyclin E1 (CCNE1, one of the 82 hub genes, which correlated with hsa-miR-195 and hsa-miR-424) was significantly linked to a worse overall survival in endometrial cancer patients. The hub genes and differentially expressed miRNAs identified in this study might be used as prognostic biomarkers for endometrial cancer and molecular targets for its treatment.


Background
Endometrial cancer (EC), that is, uterine corpus endometrial carcinoma (UCEC), originates from the epithelial malignant tumours in endometrium. With an increase in obesity and an aging population, the incidence and mortality rates of EC are increasing in developed countries [1]. According to the latest statistics of the American Cancer Society [2], over 61, 000 cases were estimated to be diagnosed with EC in 2017. At present, advanced stage EC still accounts for 20-30% of incidents, and the disease relapse is associated with a poor prognosis.
Due to these factors reduce the clinical value of the existing biomarkers in the progress and prognosis of EC, it is crucial to discover new reliable biomarkers as well as to unravel the underlying molecular mechanisms of the EC progression.

Functional and pathway enrichment analysis
The functional and pathway enrichment analyses of DEGs were conducted with DAVID. The upregulated genes were mainly enriched in these biological processes, which were cell cycle, cell division, and DNA replication signalling pathways, while downregulated genes were mainly enriched in skeletal system development, vasculature development, and cell adhesion signalling pathways (Table 3). Moreover, three KEGG pathways were enriched in upregulated genes, including cell cycle, oocyte maturation, and oocyte meiosis signalling pathways (Table 3). There were no KEGG pathways enriched in downregulated genes.

Construction of PPI network and module analysis
A PPI network consisting of 287 nodes and 1840 edges was constructed, which included 212 upregulated and 308 downregulated genes (Fig. 2). Next, 82 genes were screened out as hub genes (Degree of interaction ≥10 were selected as the threshold) [13], there were close correlations among hub genes (Fig. 3, Additional file 1). After analysing the network with the MCODE tool in Cytoscape software, an important module was obtained, including 50 nodes and 1082 edges (Fig. 4). Functional enrichment analyses of biological processes with regard to this module showed that these genes were enriched in cell cycle, cell division, and DNA replication signalling  (Table 4). Three KEGG analysis showed an enrichment in cell cycle, oocyte meiosis, and oocyte maturation signalling pathways ( Table 4).

Discussion
In recent years, although clinical medical scientists have made significant progress in the treatment of EC with surgery and chemotherapy, the incidence and mortality rate of EC is still rising [14]. It is necessary to further understand the etiology and underlying mechanism of the EC progression to improve the prognosis of EC. DEGs differentially expressed genes, EC endometrial cancer, NE normal endometrium, FC fold-change; adj. P-value, adjusted P-value; adjusted P-value was obtained by correcting P-value using the 'Benjamini-Horchberg' method DEMs, differentially expressed miRNAs, EC endometrial cancer, NE normal endometrium, miRNA or miR, microRNA, FC, fold-change; adj. P-value, adjusted P-value; adjusted P-value was obtained by correcting P-value using the 'Benjamini-Horchberg' method In this study, by integrating GSE17025 and TCGA-UCEC datasets, 520 common DEGs were screened out in EC tissues compared with NE tissues. These 520 common DEGs were composed of 212 upregulated genes and 308 downregulated genes. The upregulated DEGs, such signalling pathways, were mainly enriched as cell cycle, cell division, and DNA replication. Skeletal system development, vasculature development, and cell adhesion signalling pathways were enriched among downregulated DEGs. Furthermore, PPI network was built for 82 hub genes. Survival association analysis of these 82 hub genes showed poor prognosis associated with 26 upregulated genes and one downregulated gene for patients with EC. Similarly, 30 common DEMs were analysed from GSE25405 and TCGA-UCEC_miRNA datasets. After integrating 6865 TG-miRNAs with these 520 common DEGs, 71 overlapping DEGs were screened that showed close correlations with 22 common DEMs in EC (Fig. 5, Additional file 2). Moreover, high mRNA expression of CCNE1 (one of the 82 hub genes, which was correlated with hsa-miR-195 and hsa-miR-424) was significantly correlated with worse overall survival in EC patients.
miRNAs are endogenous small non-coding RNAs, which can inhibit gene expression by mRNA degradation/destabilization or through impairing translation [15,16]. The abnormal expression of miRNAs occurs in a variety of tumours and is often associated with altered tumour characteristics, such as changes in tumour cell survival, proliferation, and invasion [17].
Our study outcomes suggested that hsa-miR-200b was also upregulated in EC, and the observation was consistent with the previous study [24]. Hsa-miR-200c has been widely investigated during the last few years. There have been numerous studies demonstrating the association between an aberrant expression level of miR-200c and the prognosis of various human malignancies, such as  [18,25,26], prostate cancer [27], ovarian cancer [28], and endometrial cancer [29]. Some of these studies verified the anti-oncogenic role of miR-200c in certain cancer types, indicating the potential correlation of elevated expression levels of miR-200c and superior prognosis [26,28,29]. In contrast, other studies have suggested that miR-200c serves as an oncogene [18,25,27]. Nevertheless, these findings suggest that miR-200c is a potential biomarker for cancer prognosis. Our results also suggested that hsa-miR-200c was upregulated, and the observation was consistent with the previous study [29]. Recent reports have shown that hsa-miR-429 expression is frequently upregulated in several cancers and may function as an oncogene [30,31] in cancers, such as endometrial carcinoma [30], as observed in this study. One study showed that upregulation of hsa-miR-429 is associated with a decrease in overall survival of serous ovarian cancer [32]; in contrast, other studies have shown that hsa-miR-429 was downregulated in some malignant tumours and had tumour-suppressor function [33,34]. These results indicate that hsa-miR-429 plays different (even opposite) roles in tumorigenesis and cancer progression in different tumours. Hsa-miR-141 is also an important member of the miR-200 family, several previous studies have shown that has-miR-141 was involved in prognosis of cancer [35][36][37]. Some previous studies reported that hsa-miR-424 was downregulated and could have a tumour suppressor role in some cancers [38][39][40]. In line with these observations, our present study also showed that hsa-miR-424 was downregulated [40]. Hsa-miR-195 is a member of the miR-15a, −15b, − 16, − 195, − 424, and − 497 families, which is involved in the occurrence and developmental progress of many malignant tumours and regulation of malignant biological behaviours [40][41][42][43]. In our study, hsa-miR-195 in EC tissues showed lower expression levels compared with NE tissues, which was consistent with the previous study [42]. So far, there are only few reports on the role of hsa-miR-653 in the malignant biological behaviour of tumours.
CCNE1, that is Cyclin E1, belongs to the cyclin family which, through association with cyclin-dependent kinase 2, controls cell cycle progression from G1 to S phase [44]. Previous studies have shown that the upregulation of CCNE1 could contribute to cancer development or tumorigenesis in many cancers [45][46][47][48][49][50], and CCNE1 could serve as a reliable independent prognostic marker [49,50]. miRNAs from multiple families have been identified to target CCNE1 in a number of malignant tumours, such as hepatocellular carcinoma [51], osteosarcoma [52], cervical cancer [53], and bladder cancer [54]. In the present study, survival analysis of the hub genes related to DEMs showed that high expression of CCNE1 could indicate poor prognosis in EC patients.
There are some defects in this article. Such as, the overlapped miRNAs were about only 1/4 to 1/5 between Fig. 3 Protein-protein interaction network of hub genes of the differentially expressed genes in endometrial cancer tissues compared with normal endometrium tissues. Green and red nodes represent upregulated and downregulated genes, respectively. The edges/lines stand for the regulatory association between nodes GSE25405 and TCGA-miRNA, and some of the findings need further experimental validation in future studies.
With regard to the ratio of the overlapped miRNAs is low, the following observations may explain the possible reasons. Firstly, the ethnic origins of the chip and RNA-seq samples were different. The GSE25405 data was composed of Asians, while the TCGA-miRNA data was mainly composed of European Americans and African Americans. Secondly, the sample sizes were also different; while GSE25405 included 48 samples (41  endometrial cancer tissue samples, 7 normal endometrial tissue samples), the TCGA data sample size was larger and (after the author has screened and processed the relevant data) a total of 572 samples were included (539 tissue samples from endometrial cancer patients and 33 normal controls). Last but not least, the efficacy of RNA-seq detection and chip detection were different. It is well known that when detecting genes with higher abundance, the results of RNA-seq and chip may be similar, however, when detecting genes with lower abundance, RNA-seq can more effectively capture relevant information. As for the latter topic, we believe that the outcomes of the present study provide credible base for future research. For example, verifying the expression of   selected miRNA (such as, miR-195 and miR-424.) in endometrial cancer cell lines and endometrial cancer tissue samples through PCR experiments, and in animal models may shed light on the role of these miRNAs in affecting the malignant biological process of endometrial cancer. Further, verifying the differential expression of miRNA in a large number of clinical samples and to analyse its correlation with clinical parameters (such as tumour clinical stage, pathological stage classification, recurrence, metastasis, and prognosis.) will help to determine the diagnosis and prognostic value of these miRNA in endometrial cancer patients. For another example, one or more hub genes can be selected to verify their mRNA and protein expression in endometrial cancer cell lines and endometrial cancer tissue samples. And then study the effect of genes, which were knocked out or overexpressed or mutated, on the biological process of endometrial cancer cell lines (such as, tumour cell proliferation, transformation, migration and invasion, blood vessel formation, and energy metabolism.) and its participation in molecular mechanism of action / signal transmission research. What's more, to establish a subcutaneous transplanted tumour model, to introduce the target gene into the animal body, to observe the effect on tumour growth in the body, and further analyse the molecular mechanism or signal transmission of the target gene to provide potential targets for tumour gene therapy. Lastly, to verify the different expression of genes in a large number of clinical samples and analyse its correlation with clinical parameters to determine the diagnosis and prognostic value of genes in endometrial cancer patients. Next, our clinical research team will select some miR-NAs to verify the relationship between miRNAs and target genes through clinical experiments and their value in the diagnosis and prognosis of endometrial cancer patients.

Microarray expression data
The mRNA and miRNA expression data of the GSE17025 and GSE25405 datasets were respectively downloaded from the GEO database (https://www.ncbi. nlm.nih.gov/geo/). The mRNA dataset GSE17025 contained the data from 103 samples, including 91 EC tissue samples and 12 normal endometrium (NE) samples. mRNA expression profiles in this dataset were measured using the GPL570 [HG.U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform [55]. The miRNA dataset GSE25405 contained 41 EC tissue samples and 7 NE tissue samples. In this dataset, the miRNA expression profile was detected using the GPL7731 Agilent-019118 Human miRNA Microarray 2.0 G4470B platform.

Identification of DEGs and DEMs
The Limma package (version 3.36.5) in R/Bioconductor was used to identify differentially expressed genes (DEGs) and differentially expressed miRNAs (DEMs) between EC and NE tissue samples [56]. The adjusted Pvalue (adj.P-value) was obtained by correcting P-value using the 'Benjamini-Hochberg' method, adj.P-value < 0.05 and |log 2 fold change (FC)| > 1 were set as the threshold value [57]. The original probe-level data in Series Matrix Files were converted into gene symbol based on platform annotation files. The expression values of multiple probes corresponding to the same gene were selected by the minimum adj.P-value.

Functional and pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.ncifcrf.gov) facilitates users to perform biological analysis from data collection [58]. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted with DAVID. FDR < 0.05 was set as statistically significant.

Construction of PPI network and module analysis
PPI network of DEGs was constructed using STRING database (version 11.0, https://string-db.org/) and visualized using Cytoscape (version 3.7.1) [59,60]. The parameter was set as medium confidence score ≥ 0.7, module analyses were conducted using Cytoscape software MCODE package with degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2 [61]. The functional enrichment analyses for these DEGs in the modules were conducted with DAVID.

Prediction of the target gene of miRNA
The target genes for miRNAs (TG-miRNAs) were predicted by employing miRecords (http://c1.accurascience. com/miRecords/), which includes 11 different miRNA target genes predicted databases [62]. A TG-miRNA can only be identified when at least four different prediction databases predict that the gene is a target gene.

Construction of the miRNA-mRNA regulatory network
The intersection of TG-miRNAs and DEGs were considered to be potentially valuable differentially expressed target genes. Pearson correlation analysis was then used in starBase (http://starbase.sysu.edu.cn/) to verify the association between these potentially valuable differentially expressed target genes and DEMs in patients with EC [63]. These significant differentially expression target genes and corresponding miRNAs were used to construct a miRNA-mRNA regulatory network using the Cytoscape software. The Degree of interaction of the node ≥5, which was defined as a hub miRNA.

Survival analysis of hub genes
The overall survival of patients with EC with regard to hub genes was calculated using Kaplan-Meier analysis in OncoLnc (http://www.oncolnc.org/) [64]. The patients were divided into two groups (high vs. low) according to the median values of mRNA expression of the hub gene. The log-rank test was used to examine the significance of the difference between two groups.
Additional file 1. Node-degree of interaction analysis of the 82 hub genes (Degree of interaction ≥10).
Additional file 2. Correlation between differentially expressed miRNAs and target genes in patients with endometrial cancer (Data source: starBase v2.0 project).