Identification of novel biomarkers and prognostic value of the location (head, body, or tail) of pancreatic cancer

Objective : This study was designed to identify the differentially expressed mRNA, microRNA (miRNA), and long non-coding RNA (lncRNA) and their functions in pancreatic cancer (PC). Methods: The expression data of PC and normal samples were downloaded from the GEO database. The expression data of pancreatic head (H), body (B), and tail (T) were downloaded from the TCGA database. After data preprocessing, the differential analyses between PC vs. Normal, H vs. B, H vs. T, and T vs. B were performed. Overlapping genes between PC vs. Normal and the different locations (the union of genes among T vs. B, T vs. H, and B vs. H) were selected. The competing endogenous RNAs (ceRNA) network was constructed based on co-expression analysis and prediction of targets, followed by functional enrichment analysis. Construction of an mRNA prognosis risk model and screening of prognostic factors were performed using Cox univariate/multivariate regression analysis, followed by Nomogram model construction. Finally, the gene-drug interactions were predicted for the DE-mRNA. Results: A five-mRNA prognostic model (GRHL2+CACNA1A+GRM1+UPK1B+PKHD1) was constructed, and the risk score was relatively increased with the increased expression of the GRHL2, PKHD1, and UPK1B, and the decreased expression of CACNA1A and GRM1. Compared with pancreatic body/tail cancer, the expression of GRHL2 was increased, while the expression of CACNA1A and GRM1 was decreased in pancreatic head cancer. LncRNA AC006369.2-miR-146a-5p-CACNA1A/GRM1 was a regulatory axis in the ceRNA network. Verapamil was predicted to be an antagonist of CACNA1A. Conclusion: Our results provide a new direction for the accurate diagnosis and treatment of PC and for investigating the mechanism of PC. head, pancreatic body, and pancreatic tail samples were selected based on sample clinical phenotype information. A total of 137 pancreatic head samples, 14 pancreatic body samples, 14 pancreatic tail samples as well as their corresponding gene expression RNAseq log2(count+1) and miRNA expression log2(RPM+1) data were obtained.

Surgical resection is the main treatment for PC patients. The 5-year overall survival (OS) rate is approximately 5% (range, 2% to 9%) [2]. Survival of PC patients is correlated with various factors, including tumor stage, therapy method, and tumor location [4]. PC originates from exocrine/endocrine pancreatic cells and approximately 95% of PC cases display the histologic characteristics of pancreatic ductal adenocarcinoma, which can be divided into pancreatic head and pancreatic body/tail cancer based on anatomy [5,6]. Accumulating evidence has highlighted the differences in pathological properties and course, incidence, therapy, as well as prognosis between pancreatic head cancer and pancreatic body/tail cancer [5,7]. Compared with pancreatic head cancer, pancreatic body/tail cancer displays a more aggressive tumor biology, lower 3-year survival rates, and is less resectable. In addition, it is usually more advanced at diagnosis because of the lack of early symptoms of biliary obstruction [4,8]. Considering that early diagnosis is less likely and the lack of effective therapies, prevention is considered a meaningfully strategy for PC [9]. Hampering this, the etiology of PC is not fully understood even though several risk factors have been identified, such as smoking, genetics, obesity, and others. It is necessary to better understand the etiology and definitively identify the risk factors of PC.
Animal research has shown that the development of PC can be caused by the targeted activation of the KRAS2 oncogene coupled with inactivation of tumor protein P53 or cyclin dependent kinase inhibitor 2A [10]. Sequencing was used to demonstrate that partner and localizer of BRCA2 (PALB2) is a susceptibility gene of PC; its truncating mutation is found in patients with familial PC [11]. In addition, non-coding RNAs including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs, which exceed 200 nucleotides in length), have been reported to have important roles in various biological processes in PC [12,13]. MiRNAs are a type of short ncRNAs that are crucial in PC progression. For example, miR-96 serves as a tumor suppressor by directly targeting KRAS, which can drive PC [14].
Plasma miR-744 is highly expressed in PC, and this expression is related to lymph node metastasis and recurrences, suggesting its potential as a biomarker [15]. LncRNAs have been implicated in various cytological processes, tumor metastasis, and tumor progression [16,17]. Moreover, they can serve as competing endogenous RNA (ceRNA) by competitively binding miRNAs to modulate the expression of miRNA targeted genes [17,18]. One study reported the high expression of lncRNA regulator of reprogramming in PC, and its role as a tumor promoter in PC as a ceRNA to modulate the expression of the Nanog transcription factor by competitively binding miR-145 [19].
In this study, the differentially expressed genes (including miRNA, lncRNA, and messenger RNA [mRNA]) associated with PC and cancer locations (pancreatic head, body/tail) were identified, followed by the construction of a ceRNA network and functional enrichment analysis. In addition, screening for prognostic factors and construction of an mRNA prognosis risk model and Nomogram model construction were performed to identify potential prognostic factors and crucial mRNAs, and to predict their corresponding targeted drugs. The results should provide a theoretical basis and novel biomarkers in the study and treatment of PC.

Data sources
The human PC related microarray datasets GSE86436 (expression data of mRNA and lncRNA) and GSE85589 (expression data of miRNA) were downloaded from the Gene Expression Omnibus (GEO

Data preprocessing and lncRNA re-annotation
For the GSE86436 dataset, the standardized expression profile of mRNA and lncRNA, as well as probe sequences, were downloaded. The sequences of the probes were mapped to the GRCh38 human reference genome and "unique map" probes were selected. Then, the mapped gene for each probe was obtained based on the corresponding position and positive/negative strand information on the chromosome as well as the "Release 25" annotation file. Probes with the annotation of "protein_coding" were the corresponding mRNA corresponding, while probes with the annotations of "antisense", "sense_intronic", "lincRNA", "sense_overlapping", and "processed_transcript" were the corresponding lncRNA probes.
For the GSE85589 dataset, the standardized miRNA expression profile and annotation files were downloaded, followed by annotation of the probes. The probes with no mapped miRNA were removed, and when multiple probes mapped to one miRNA, the mean expression value was considered as the expression value of this miRNA.
For the data downloaded from the TCGA database, the gene expression RNAseq log2(count+1) data were converted to count values. Genes with a count value of 0 in more than half of the samples were filtered, followed by gene annotation based on the "Release 25" annotation file. Similarly, for the miRNA expression log2(RPM+1) data, miRNAs with a log2(RPM+1) of 0 in more than half of the samples were filtered, followed by their annotation based on miRbase database.

Differential analysis
The expression profiles of mRNA, lncRNA, and miRNA downloaded from the GEO database were analyzed to determine differential expression between the PC and normal groups. The corresponding P-value and log fold change (FC) were obtained using the classical Bayes method in the Limma

Co-expression analysis
The DE-mRNAs and DE-lncRNAs were used to calculate the Pearson correlation coefficient (r) of each mRNA and each lncRNA by one-to-one correspondence of the samples. The lncRNA-mRNA interaction pairs were selected with cut-offs of r > 0 and P < 0.05. Similarly, the r-value of each mRNA and each miRNA was also calculated by one-to-one correspondence of the samples, and the miRNA-mRNA interaction pairs were selected with cut-offs of r < 0 and P < 0.05.

ceRNA network construction
The miRNA-lncRNA interaction binding sites were predicted using miRanda software (version 3.3a), and the miRNA-lncRNA pairs were selected with a threshold score > 140 and threshold Energy < -20.

Functional enrichment analysis
To explore the involved function of the DE-mRNAs, lncRNAs, and miRNAs, the clusterProfiler [23] (version 3.8.1, http://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) in the R package was used to enrich the biological processes in Gene Ontology (GO) annotation and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The significantly enriched function terms were selected with the threshold of P.adjust < 0.05 and enriched gene count > 2. Notably, the functions of lncRNAs and miRNA were obtained based on the functional enrichment of the mRNAs in co-expressed mRNA-lncRNA pairs and in the final obtained mRNA-miRNA pairs, respectively.

Construction of mRNA prognosis risk model
The mRNAs in the ceRNA network were considered as the candidate mRNAs. Cox univariate regression analysis was used to calculate the regression coefficient and P-value between each candidate mRNA and the survival time and state. The prognosis-related mRNAs were selected with the P-value threshold < 0.05, combined with hazard ratio (HR) risk (theoretically, up-regulated genes between PC vs. Normal should be risk factors, corresponding to an HR > 1; otherwise HR < 1). The Risk Score was calculated as βgene1*exprgene1 + βgene2*exprgene2 + ... +βgenen*exprgenen, in which β is the prognostic correlation coefficient and exprgene is the expression value of the corresponding gene. The mRNAs were added to the model according to the P-value from small to large one-by-one. The high and low risk samples classified by the average value of the model constructed after adding an mRNA had the greatest significant correlation with survival (log-rank test). The area under the curve (AUC) of the high and low risk samples was maximized according to the expression value of the selected mRNA. In this case, the mRNA model was considered to be associated with prognosis.

Screening of prognostic factors and construction of Nomogram model
The clinical information of PC samples in the surveillance, epidemiology, and end results (SEER) and TCGA databases were downloaded. Clinical factors, including age, gender, location of cancer, clinical stage, and tumor histological grade, and the prognostic risk model score calculated as described above were used as independent variables. OS was used as the dependent variable to perform Cox univariate regression analysis for clinical factors having P < 0.05 to screen for prognostic factors. The Nomogram model was constructed based on the results of multivariate regression analysis, including conversion and assignment of regression coefficients, Nomogram plotting, and calibration curve plotting.

Construction of drug-gene interaction network
The targeted drugs for DE-mRNA were predicted using the DGIdb online database (http://www.dgidb.org/search_interactions) using the parameters settings of FDA Approved and Antineoplastic. The drug-gene interaction network was constructed based on the obtained drug-gene interactions using Cytoscape. Table 1

Co-expression analysis and ceRNA network construction
Co-expression analysis identified 1097 and 1472 lncRNA-mRNA interaction pairs from the GEO and TCGA databases, and a total of 679 overlapped lncRNA-mRNA interaction pairs were screened as the final co-expressed lncRNA-mRNA pairs. Similarly, there were 1020 and 407 miRNA-mRNA negative correlation pairs identified from the GEO and TCGA databases, and a total of 218 overlapped miRNA-mRNA negative correlation pairs were screened. In addition, 131 predicted miRNA-lncRNA pairs and 55995 predicted miRNA-mRNA pairs were selected as described above. A total of 40 miRNA-mRNA pairs were screened among the 218 miRNA-mRNA negative correlation pairs and the 55995 predicted miRNA-mRNA pairs.

Functional enrichment analysis
The functional enrichment analysis for the 204 DE-mRNAs indicated that 44 GO_BP terms and eight KEGG pathways were significantly enriched. They included hsa04972~Pancreatic secretion, GO:0002526~acute inflammatory response, and GO:0055074~calcium ion homeostasis (e.g., GRM1).   Figure 3B.
MiR-146a-5p was associated with various neural signal transduction processes, including long-term depression, synaptic transmission, glutamatergic activity, taste transduction, and others. Those functions were obtained based on the functional enrichment analysis of GRM1 and CACNA1A.

Construction of mRNA prognosis risk model
Cox univariate regression analysis was performed for the 32 mRNAs in the ceRNA network, and a total of 10 mRNAs were selected. These 10 mRNAs were added to the prognosis risk model according to their P-values from small to large one-by-one. As shown in Table 3 Figure 4B). As expected, the Risk model was highly correlated with prognosis, with a higher Risk score indicating lower survival.

Screening of prognostic factors
The clinical information of PC samples in the SEER and TCGA databases were downloaded to perform Cox univariate regression analysis. Age, neoplasm_histologic_grade, and pathologic_N were significantly correlated with prognosis in the two databases (Table 4). Hence, the three clinical factors, together with the Risk score were included in the Cox multivariate regression analysis. Risk score (P = 0.016) and pathologic_N (P = 0.019) were significantly correlated with prognosis (Table 5).

Nomogram model construction
The nomogram assigned different factors to points, followed by addition to obtain the total points corresponding to the survival rate. This clarified the results of the Coxph regression. The Nomogram model was constructed for Risk score and pathologic_N ( Figure 5A). In addition, the consistency index (c-index) of each prognostic factor and composite factor (pathologic_N + Risk score) in the Nomogram model was calculated to fit the Coxph model. As shown in Table 6, composite factor (Nomogram_combined model) fit the Coxph model with a c-index of 0.641 and a maximum significance (P = 3.097E-05). Figure 5B displays the calibration curve of the Nomogram_combined model, which suggested a better prediction ability (close to 45°)

Construction of drug-gene interaction network
The drug-gene interaction network contained 44 drugs, 26 genes (13 up-regulated genes and 13 down-regulated genes), and 55 interactions ( Figure 6). In this network, Tubulin Beta 2B Class IIb (TUBB2B), Interleukin 2 Receptor Subunit Alpha (IL2RA), and Interleukin 6 (IL6) were predicted to be interact with more drugs. Verapamil was predicted to be an antagonist of CACNA1A, a gene in the mRNA prognosis risk model. . These findings were consistent with our analysis. In addition, GRHL2 expression was significantly increased in pancreatic head compared with pancreatic tail, while the mRNA prognosis risk model indicated that the high expression of GRHL2 corresponded to a high Risk score. These findings might contribute to the accurate diagnosis and treatment of PC patients.

Discussion
CACNA1A encodes the α1A subunit of voltage-dependent calcium channels, which regulate the transport of calcium ions and various calcium-related pathways [27]. CACNA1A expression in neuronal tissue is abundant to regulate the release of neurotransmitter [28]. GRM1 encodes one of the metabotropic glutamate receptors (mGluRs) that regulate the glutamatergic neurotransmission by Gprotein-coupled receptors [29]. L-glutamate serves as the main excitatory neurotransmitter in the central nervous system and can activate mGluRs [30]. Regulation of voltage-dependent Ca2 + channels by mGluRs is considered a crucial event in the release of neurotransmitter [31]. Nicotinic acetylcholine receptors are positioned in the cytoplasmic membrane and undergo a conformational change upon the binding of an agonist that leads to the opening of the ion channel followed by the entry of ions into cells [32]. This in turn causes an autocrine neurotransmitter loop and signaling cascades [33]. Nevertheless, PC is a heterogeneous disease involving individual differences in lifestyle, genetic, and environmental factors. For instance, smoking is a well-established causative factor for PC and is related with decreased survival [34,35]. Nicotine promotes PC cell proliferation and migration by stimulating the production and release of stress neurotransmitters followed by activation of downstream signal cascades [36]. Nicotine also triggers the self-renewal of PC stem cells by increasing stress neurotransmitters coupled with decreased γ-aminobutyric acid [37]. Psychological stress can also mediate the release of stress neurotransmitters and γ-aminobutyric acid, as well as their downstream effectors [38,39]. A close association of depression and PC has been reported [40].
Postsynaptic depolarization and calcium ion internal flow are reportedly essential for striatal longterm depression, and the reduction of neurotransmitter (glutamate) release from presynaptic terminals can promote the expression of long-term depression at striatal synapses [41]. In our study, CACNA1A and GRM1 were enriched in long-term depression, glutamatergic synaptic transmission, and calcium signaling pathway. In addition, comparison of pancreatic body and pancreatic tail cancer revealed that CACNA1A and GRM1 were expressed in low levels in pancreatic head cancer, suggesting that CACNA1A and GRM1 are specifically down-regulated genes in pancreatic head cancer. The mRNA prognosis risk model analysis indicated that the Risk score was relatively increased with decreased expression of CACNA1A and GRM1, suggesting a poor prognosis. The collective data support the conclusion that the decreased expression of CACNA1A and GRM1 might contribute to the progression of pancreatic head cancer by mediating the production and release of excitatory neurotransmitter as well as their downstream effectors.
Notably, CACNA1A and GRM1 were target genes of miR-146a-5p. No reports have focused on the associations between miR-146a-5p and these two genes, but the effects of miR-146a in PC have been reported. The expression of miR-146a was shown to be decreased in PC cells, while its overexpression inhibited tumor cell invasion and metastasis [42]. Moreover, animal experiments revealed that decreased expression of miR-146a can promote cell growth by increasing the expression of epidermal growth factor receptor in PC [43]. Presently, lncRNA AC006369.2 interacted with miR-146a-5p, and was co-expressed with CACNA1A and GRM1. Despite the lack of knowledge of lncRNA AC006369.2, we speculate that AC006369.2 might function as a ceRNA in PC to mediate the expression of CACNA1A and GRM1 by competitively binding miR-146a-5p. The AC006369.2-miR-146a-5p-CACNA1A / GRM1 regulatory axis might be a potentially important mechanism. This must be verified in future studies.
The gene-drug network analysis we conducted supports the role of verapamil as an antagonist of CACNA1A. Verapamil is a calcium ion (Ca 2+ ) channel blocker that inhibits the growth of PC cells by blocking Ca 2+ influx [44]. Similarly, Zhao et al. suggested that verapamil represses the proliferation and metastasis, and induces apoptosis of chemotherapy-resistant PC cells [45]. Our results, which indirectly verified the important effect of CACNA1A, are consistent with these reports. In both pancreatic body and pancreatic tail cancer, CACN1A was expressed at a significantly high level compared with pancreatic head cancer. As an antagonist of CACNA1A, we suggest that verapamil might be more useful in the treatment of pancreatic body/tail cancer.
Although several novel points were proposed in our study, there were some limitations. All the results were obtained by bioinformatics analysis, so further experimental verification is needed. Secondly, the predicted ceRNA mechanism and the corresponding functions need to be further explored.
Thirdly, the gene-drug interactions should be further analyzed.

Conclusions
The five-mRNA prognostic model (GRHL2+CACNA1A+GRM1+UPK1B+PKHD1) was dependable in the prediction of PC survival. The high expression of GRHL2 and low expression of CACNA1A and GRM1 might be indicators of poor prognosis for patients with PC, especially pancreatic head cancer.
Verapamil might be more useful in the treatment of pancreatic body/tail cancer. Finally, the AC006369.2-miR-146a-5p-CACNA1A/GRM1 regulatory axis might be a potentially important mechanism in PC progression.            Gene-drug interaction network Red circles represent up-regulated mRNAs, green octagons represent down-regulated mRNAs, green arrows represent agonist effect, yellow arrows represent inhibitor effect, light blue arrows represent ligand effect, and dark blue represents effect of adhesives.