A novel mRNA-miRNA-lncRNA competing endogenous RNA triple sub-network associated with prognosis of pancreatic cancer

Background: Recently, increasing evidence has uncovered the roles of mRNA-miRNA-lncRNA network in multiple human cancers. However, a systematic mRNA-miRNA-lncRNA network linked to pancreatic cancer prognosis is still absent. Methods: Differentially expressed genes (DEGs) were first identified by mining GSE16515 and GSE15471 datasets. DAVID database was utilized to conduct functional enrichment analysis. Protein-protein interaction (PPI) network was built using STRING database, and hub genes were identified by Cytoscape plug-in CytoHubba. Upstream miRNAs and lncRNAs of mRNAs were predicted by miRTarBase and miRNet, respectively. Expression, survival and correlation analysis for genes, miRNAs and lncRNAs were performed via GEPIA, Kaplan-Meier plotter and starBase. Results: 734 and 180 upregulated and downregulated significant DEGs were identified, respectively. Functional enrichment analysis revealed that they were significantly enriched in focal adhesion, pathways in cancer and metabolic pathways. According to node degree, hub genes in the PPI networks were screened, such as TGFB1 and ALB. Among the top 20 hub genes, 7 upregulated genes and 2 downregulated hub genes had significant prognostic values in pancreatic cancer. 33 miRNAs were predicted to target the 9 key genes. But only high expression of 8 miRNAs indicated favorable prognosis in pancreatic cancer. Then, 90 lncRNAs were predicted to potentially bind to the 8 miRNAs. SCAMP1, HCP5, MAL2 and LINC00511 were finally identified as key lncRNAs. By combination of results from expression, survival and correlation analysis demonstrated that MMP9/ITGB1-miR-29b-3p-HCP5 competing endogenous RNA (ceRNA) sub-network was linked to prognosis of pancreatic cancer. Conclusions: In a word, we established a novel mRNA-miRNA-lncRNA sub-network, among which each RNA may be utilized as a prognostic biomarker of pancreatic cancer.


INTRODUCTION
Pancreatic cancer ranks as the third leading cause of cancer-related deaths in the United States [1]. Pancreatic cancer is one of the most quickly fatal cancers all over the world, with mortality almost equal to its incidence [2]. Besides, pancreatic cancer lacks of obvious early symptoms, thereby leading to most patients with pancreatic cancer being diagnosed at advanced stages. Over the past years, in spite of huge improvements in surgery, chemotherapy and radiotherapy for pancreatic cancer have been achieved, prognosis is still extremely dismal with nearly 100% of 5-year mortality rate [2,3]. Furthermore, to date, the precise mechanisms how pancreatic cancer occurs and progresses are still not clearly elucidated. It is essential to explore underlying molecular mechanisms and develop effective therapeutic targets and novel prognostic biomarkers for pancreatic cancer.
In 2011, Salmena et al. proposed a novel regulatory mechanism between noncoding RNA (ncRNA) and messenger RNA (mRNA), namely competing endogenous RNA (ceRNA) hypothesis [4]. In this theory, cross-talk between ceRNAs achieves by competitively binding to shared miRNAs [5]. The discovery of ceRNA mechanism has attracted the attention of many researchers and scholars. They conducted a variety of related investigations in respective study field, including cancer. Long noncoding RNAs (lncRNAs) are a class of ncRNA with length more than 200 nucleotides, which have been reported to act as miRNA sponges to decrease miRNA abundance, thus relieving inhibitory effect of miRNA on downstream target genes [6][7][8][9]. Increasing evidence has well documented that lncRNA-miRNA-mRNA ceRNA network plays key roles in multiple human cancers, such as breast cancer [10], gastric cancer [11], liver cancer [12] as well as pancreatic cancer [13]. identified from GSE15471. X axis represents log transformed P value, and Y axis indicates the mean expression differences of genes between pancreatic cancer samples and normal samples. Note: The two volcano plots showed all of the DEGs; the black dots represent genes that are not differentially expressed between pancreatic cancer samples and normal samples, and the green dots and red dots represent the downregulated and upregulated genes in pancreatic cancer samples, respectively. |log2FC| >1 and adj. p-value < 0.05 were set as the cut-off criteria. (C) The intersection of upregulated DEGs of GSE16515 and GSE15471 datasets. (D) The intersection of downregulated DEGs of GSE16515 and GSE15471 datasets. The intersected DEGs were defined as the significant DEGs. AGING However, current knowledge for lncRNA-miRNA-mRNA in human cancers is not enough, including pancreatic cancer.
In this study, we first acquired aberrantly expressed mRNAs by mining two GEO datasets. Subsequently, we conducted functional enrichment analysis for these aberrantly expressed mRNAs. Then, protein-protein interaction analysis was also employed, and hub genes were identified. By combining expression and prognostic roles of hub genes in pancreatic cancer, 7 upregulated genes and 2 downregulated genes were selected for subsequent analysis. Next, upstream miRNAs and lncRNAs were predicted. Besides, we also further evaluated the prognostic roles of these miRNAs and lncRNAs in pancreatic cancer. The correlations of mRNAs, miRNAs and lncRNAs were also determined. Finally, a novel ceRNA regulatory sub-network associated with pancreatic cancer patients' prognosis was successfully established. Intriguingly, each RNA in this ceRNA network may be utilized to indicate prognosis of pancreatic cancer based on our current analytic results. They may also serve as promising diagnostic biomarkers or therapeutic targets for pancreatic cancer in the future.

Screening of significant DEGs in pancreatic cancer
In search of gene expression microarrays regarding pancreatic cancer in the GEO database, two datasets (GSE16515 and GSE15471) were finally included. Subsequently, differential expression analysis was conducted by using GEO2R (|log2FC| > 1 and adj. pvalue < 0.05), and some DEGs in each dataset were discovered. These DEGs from GSE16515 and GSE15471 datasets were shown in Figure 1A and Figure 1B, respectively. Next, we further identified some significant DEGs which were commonly appeared in the two datasets. As shown in Figure 1C and Figure  1D, a total of 734 and 180 upregulated and downregulated significant DEGs in pancreatic cancer were identified. These upregulated and downregulated significant DEGs were concretely listed in Table S1 and  Table S2, respectively. Also, these significant DEGs were selected for subsequent analyses.

Functional enrichment analysis for the significant DEGs
To predict the underlying biological function and corresponding pathways of these significant DEGs, DAVID database was introduced to perform functional enrichment analysis, including three GO terms (BP: biological process; CC: cellular component; MF: molecular function) and KEGG pathway.
For upregulated significant DEGs, as presented in Figure 2A-C, the enriched GO functions included cell adhesion, extracellular matrix organization and wound healing in the BP category; protein binding, calcium ion binding and cadherin binding involved in cell-cell adhesion in the MF category; and extracellular exosome, plasma membrane and membrane in the CC category. Besides, Figure 3A revealed that these upregulated significant DEGs were significantly enriched in some cancer-associated pathways, such as pathways in cancer, focal adhesion, proteoglycans in cancer and small cell lung cancer.
As shown in Figure 2D-E, the enriched GO functions for downregulated significant DEGs included proteolysis, transport and metabolic process in the BP category; protein homodimerization activity, enzyme binding and pyridoxal phosphate binding in the MF category; and integral component of membrane, extracellular exosome and integral component of plasma membrane in the CC category. Similarly, some enriched KEGG pathways were also observed, among www.aging-us.com 2614 AGING which metabolic pathways, pancreatic secretion and glycine, serine and threonine metabolism were the most highly enriched pathways ( Figure 3B).

Establishment and analysis of PPI network
On the basis of the data from STRING database analysis, PPI networks of the upregulated significant DEGs and downregulated significant DEGs were constructed as shown in Figure 4A and Figure 4C, respectively. According to node degree, we identified some hub genes among these significant DEGs. For better visualization, the interactors of top 30 upregulated ( Figure 4B) and downregulated ( Figure 4D) hub genes were re-built using Cytoscape software. Additionally, the top 30 hub genes and their AGING corresponding node degrees were listed in Table 1 and top 10 upregulated hub genes were TGFB1, MMP9, CXCL8 (IL8), ACTB, ITGB1, STAT1, TOP2A, ACTA2, ICAM1 and CDK1, and top 10 downregulated hub genes were ALB, EGF, P4HB, MAT1A, GNMT, ABAT, CBS, CTH, PRSS3 and ECI2. The 20 hub genes were chosen for following analyses.

Identification of key genes in pancreatic cancer
In order to further identify key genes in pancreatic cancer, we determined the expression and prognostic values of the top 10 upregulated and downregulated hub genes using GEPIA and Kaplan-Meier plotter databases, respectively. Combined the results of expression analysis and survival analysis, we found that 7 upregulated hub genes (MMP9, CXCL8, ACTB, ITGB1, STAT1, TOP2A and CDK1) were not only significantly upregulated in pancreatic cancer samples but also the increased expression of the 7 genes indicated poor prognosis in patients with pancreatic cancer ( Figure 5A and Figure 5C-I) , and 2 downregulated hub genes (GNMT and ABAT) were commonly appeared in "low expression" gene set and "good prognosis" gene set ( Figure 5B and Figure 5J-K). In next analyses, we are interested to investigate the 9 key genes, including 7 upregulated hub genes and 2 downregulated hub genes.

Prediction and validation of upstream key miRNAs of key genes
Subsequently, we predicted upstream miRNAs of the 9 key genes by using an experimentally validated microRNA-target gene interactions database, miRTarBase. As mentioned above, in this study, we only included microRNA-target gene interactions that were validated by reporter assay. Finally, we identified a total of 33 miRNAs that could potentially regulate six key genes (ITGB1, MMP9, STAT1, CXCL8, CDK1 and ACTB) expression as presented in Figure 6 and Table S3. Upstream potential miRNAs of three other AGING key genes (TOP2A, GNMT and ABAT) were not observed. Besides, we noticed that all the six key genes were upregulated hub genes, with unfavorable prognostic values in pancreatic cancer. Based on the classical inverse relationship between miRNA and target gene, we hypothesized that the upstream miRNAs   AGING of 6 upregulated key genes should theoretically display favorable prognostic roles. Therefore, we further assessed prognostic values of the 33 predicted miRNAs using Kaplan-Meier plotter database. The results of survival analysis showed that 8 (miR-132-3p, miR-133a-5p, miR-29b-3p, miR-491-5p, miR-192-5p, miR-29c-3p, miR-9-3p and miR-140-5p) out of 33 miRNAs functioned as positive prognostic biomarkers for patients with pancreatic cancer as presented in Figure 7. The 8 miRNAs were defined as the key miRNAs.

DISCUSSION
Pancreatic cancer is notorious for its highly lethal nature and poor prognosis. Extremely poor prognosis of patients with pancreatic cancer greatly promotes us to develop effective treatment measures and excavate novel prognostic indicators. Only in these ways can the outcome of pancreatic cancer patients be improved rapidly. Recent studies have suggested that ncRNAs, including miRNAs and lncRNAs, play important roles in cancer initiation and progression [16][17][18][19][20]  AGING H19 promoted non-small-cell lung cancer development by STAT3 signaling via sponging miR-17 [23]. In pancreatic cancer, many positive results have been also reported. Gao et al. demonstrated that lncRNA ZEB2-AS1 promoted pancreatic cancer cell growth and invasion by regulation of the miR-204/HMGB1 axis [24]; tumor-derived exosomal lncRNA SOX2OT was also observed to enhance EMT and stemness by acting as a ceRNA in pancreatic cancer [25]; Chen et al. indicated that lncRNA AFAP1-AS1 facilitated pancreatic cancer growth and invasion by upregulating the IGF1R oncogene through sequestration of miR-133a [26]; Gao et al. suggested that lncRNA ROR acted as a ceRNA to modulate Nanog expression by sponging miR-145 and predicted poor prognosis in pancreatic cancer [27]. However, integrated and comprehensive analysis of ceRNAs and mRNAs in pancreatic cancer is still not enough. To the best of our knowledge, this is the first study to investigate the specific ceRNA network in pancreatic cancer by way of "mRNA-miRNA-lncRNA" order pattern, instead of lncRNA-miRNA-mRNA order pattern. Inspiringly, a novel mRNA-miRNA-lncRNA triple regulatory network was constructed and each RNA in this network possessed a significant prognostic value in pancreatic cancer.
In this present study, we identified a total of 914 significant DEGs, consisting of 734 upregulated and 180 downregulated DEGs by intersection of DEGs from two GEO datasets, GSE16515 and GSE15471. GO is widely used as functional enrichment analysis for a large number of genes [28]. The results of these significant DEGs related GO analysis demonstrated that they were significantly enriched in some GO terms that were associated with cancer biological behaviors, including cell adhesion [29], wound healing [30] and activation of MAPK activity [31]. KEGG pathway enrichment analysis revealed that multiple enriched pathways were obtained, primarily involving pathways in cancer and metabolic pathways. Besides, GO analysis and pathway analysis also indicated that these significant DEGs were significantly enriched in focal adhesion. It has been well documented that focal adhesion and cell adhesion play key roles in cancer invasion and metastasis, thereby causing cancer progression [32,33]. Thus, these significant DEGs may be involved in modulation of invasion and metastasis of pancreatic cancer.
To systemically analyze the relationships and functions of significant DEGs in pancreatic cancer, we mapped the DEGs into STRING database and obtained PPI networks. A variety of interactions among these significant DEGs were obtained, especially for the upregulated significant DEGs. It has been widely acknowledged that genes with more node degree in the PPI network usually play more roles. Therefore, we screened the hub genes in the two PPI networks according to node degree. For further identifying key genes in pancreatic cancer, the top ten upregulated and downregulated hub genes were selected for further expression and survival analyses. The analytic results demonstrated that 7 upregulated (MMP9, CXCL8, ACTB, ITGB1, STAT1, TOP2A and CDK1) and 2 downregulated (GNMT and ABAT) hub genes may act as the key genes in pancreatic cancer. Intriguingly, most of these key genes have been well investigated in pancreatic cancer. For example, MMP9 participated in pancreatic cancer angiogenesis and invasion [34,35]; CXCL8 promoted invasiveness and angiogenesis in pancreatic cancer [36]; ITGB1 was upregulated in pancreatic cancer and increased ITGB1 indicated a poor outcome [37]; and STAT1 enhanced pancreatic cancer growth and metastasis [38]. These publications partially support the accuracy of our bioinformatic analyses.
MiRNAs and lncRNAs, are involved in regulation of gene expression and function by ceRNA mechanism as previously described. Some upstream miRNAs of the key genes were first predicted. Survival analysis revealed that patients with higher expression of 8 miRNAs (miR-132-3p, miR-133a-5p, miR-29b-3p, miR-491-5p, miR-192-5p, miR-29c-3p, miR-9-3p and miR-140-5p) have better prognosis in pancreatic cancer. The tumor suppressive roles of the 8 miRNAs in pancreatic cancer have been reported. For example, Abukiwan et al. suggested that inhibition of miR-132-3p drove progression of pancreatic cancer [39]; miR-133a-5p was also found to function as a tumor suppressor in pancreatic cancer [40,41]; the group of Wang Lihua showed that miR-29b-3p decreased proliferation and mobility of pancreatic cancer by targeting SOX12 and DNMT3b [42]. Then, we further predicted 90 upstream lncRNAs of these key miRNAs. By combining expression analysis and survival analysis for these lncRNAs in pancreatic cancer using TCGA data, only 4 lncRNAs (SCAMP1, HCP5, MAL2, LINC00511) were defined as the key lncRNAs. SCAPM1 suppressed migration and invasion of pancreatic cancer [43]; MAL2 expression predicted distant metastasis in pancreatic cancer [44]. Regarding to HCP5 and LINC00511, a variety of studies have also suggested that they act as two crucial oncogenes in human cancers [45,46]. Thus, a prognosis-associated mRNA-miRNA-lncRNA network in pancreatic cancer was successfully established. In this network, some pairs have been identified. For example, Lu et al. demonstrated that miR-29c inhibited pancreatic cancer cell growth, invasion and migration by targeting ITGB1 [47]. These reports further imply the accuracy of our current analytic results. Finally, correlation analysis for the RNA pairs in the constructed mRNA-miRNA-AGING lncRNA network revealed that only MMP9/ITGB1-miR-29b-3p-HCP5 sub-network absolutely conformed to ceRNA hypothesis. Certainly, although attractive findings have been obtained by a series of bioinformatic analyses in our current study, more lab experiments and large-scale clinical trials need to be performed in the future.

CONCLUSIONS
In summary, by integrated bioinformatics analysis, we constructed a novel mRNA-miRNA-lncRNA ceRNA triple regulatory network, in which all RNAs possessed significant predictive values for pancreatic cancer prognosis. In addition to the prognostic value of this mRNA-miRNA-lncRNA network in pancreatic cancer, it also provides some key clues for molecular mechanistic investigations of pancreatic cancer in the future. However, our team and other labs should conduct more studies to further validate these findings.

MicroRNA microarray
At the first step, we searched for the datasets that compared gene expression between pancreatic cancer tissues and normal tissues in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/). Only datasets containing more than 15 cancer samples and 15 normal samples were included. Then, the titles and abstracts of these datasets were screened and full information of the datasets of interest were further evaluated. Finally, only two datasets (GSE16515 and GSE15471), based on the platform of Affymetrix Human Genome U133 Plus 2.0 Array (GPL570), were selected for subsequent analyses. GSE16515 dataset contained 36 pancreatic tumor samples and 16 normal samples, and GSE15471 dataset contained 36 pairs of pancreatic cancer tissues and adjacent normal tissues.

Differential expression analysis
The online analytic tool GEO2R (https://www.ncbi. nlm.nih.gov/geo/geo2r), provided by the GEO database, was utilized to obtain DEGs from the two datasets. |log 2FC| > 1 and adjusted p-value (adj. p-value) < 0.05 were set as the cut-off criteria when we performed the differential expression analysis. Besides, we employed an online tool, VENNY 2.1.0 (http://bioinfogp.cnb. csic.es/tools/venny/index.html), to draw the Venn diagrams. The DEGs that were commonly appeared in both GSE16515 and GSE15471 datasets were redefined as the significant DEGs, including upregulated significant DEGs and downregulated significant DEGs.

Gene ontology and KEGG pathway enrichment analysis
Database for Annotation, Visualization, and Integrated Discovery (https://david.ncifcrf.gov/) was introduced to conduct Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. The enriched GO terms and KEGG pathways were downloaded from the webpage. p-value < 0.05 was considered as statistically significant. Then, the top 10 enriched GO terms and KEGG pathways were displayed using ggplot2 package of R software [48].

Protein-protein interaction (PPI) network
The PPI interaction networks between the DEGs were constructed by Search Tool for the Retrieval of Interacting Genes (STRING) database (http://stringdb.org/) [49]. Firstly, the DEGs were typed into the database. Then, high-resolution bitmaps were displayed and downloaded from the webpage. Only these interactors with combined confidence score >= 0.4 were shown in the bitmap.

Identification of hub genes
By calculating the degree of connectivity as we previously reported [50][51][52], the hub genes in the PPI networks were identified using CytoHubba, a plugin in Cytoscape software (Version 3.6.1). According to node degree, the top 30 hub genes were displayed in the Cytoscape software (Version 3.6.1).

Gene expression analysis
In the TCGA project, there are only 4 pancreatic normal samples, which are too small sample size for performing the comparison between pancreatic cancer and normal controls. Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancerpku.cn/detail.php) is a newly developed interactive web server for analyzing the RNA sequencing expression data of 9736 tumors and 8587 normal samples from the TCGA and the GTEx projects [53]. In this study, GEPIA database, containing 179 pancreatic cancer samples and 171 normal samples, was used to analyze expression levels of key genes and lncRNAs in pancreatic cancer. Genes with |log2FC| > 1 and p-value < 0.05 were considered as statistically significant.

Survival analysis
Prognostic values of genes, miRNAs and lncRNAs in pancreatic cancer were analyzed using Kaplan-Meier plotter database, which is capable to assess the effect of AGING 54675 genes on survival using 10461 cancer samples [54]. Pancreatic cancer mRNA RNA-seq and miRNA data from "Pan-cancer" item in Kaplan-Meier plotter database were selected. These genes, miRNAs and lncRNAs were first entered into the database. Then, the hazard ratio (HR) with 95% confidence interval and logrank p-value were automatically calculated and directly displayed on the webpage. Logrank p-value < 0.05 was regarded as statistically significant.

Prediction of miRNA
Upstream miRNAs of key genes were predicted using miRTarbase database [55]. In miRTarbase database, the collected microRNA-target interactions are experimentally validated by reporter assay, western blot, qPCR, microarray and next-generation sequencing experiments. To obtain more accurate prediction results, in this study, we only included microRNA-target interactions that were validated by reporter assay. Prognostic values of these predicted miRNAs were further assessed using Kaplan-Meier plotter database as mentioned above.

Prediction of lncRNA
In this study, miRNet database was used to predict the upstream lncRNAs of miRNAs, which is a an easy-touse tool for miRNA-associated studies [56,57]. "Organism-H.sapies", "Tissue-Pancreas" and "target type-lncRNAs" were set as selection criteria.

Correlation analysis
The correlations of mRNA-miRNA, miRNA-lncRNA and mRNA-lncRNA pairs in pancreatic cancer were evaluated using starBase database, which is an opensource platform for studying the ncRNA interactions from CLIP-seq, degradome-seq and RNA-RNA interactome data [58,59]. p-value < 0.05 was considered as statistically significant.

Availability of data and materials
Please contact author for data and materials requests.

AUTHOR CONTRIBUTIONS
WYL, QZK and WMF designed the study. WYL and WLW wrote the manuscript and performed in silico analysis of the data. BSD performed some in silico analysis. WYL, WLW and QZK revised the manuscript. WMF, BSD, BY and HDL modified language. All authors have read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.

FUNDING
This work was supported by the National Natural Science Foundation of China (81372462, 81572987).

SUPPLEMENTARY MATERIAL
Please browse the links in Full Text version of this manuscript to see Supplementary Tables .   Table S1. The commonly upregulated genes in GSE16515 and GSE15471 datasets.