Three hypomethylated genes were associated with poor overall survival in pancreatic cancer patients

Pancreatic cancer (PC) is a highly malignant cancer with poor prognosis and high mortality. Aberrant DNA methylation plays a critical role in the occurrence, progression and prognosis of malignant tumors. In this study, we employed multiple datasets from APGI, TCGA and GEO to perform Multi-Omics analysis, including DNA methylation and expression profiling analysis. Three differentially expressed genes (SULT1E1, IGF2BP3, MAP4K4) with altered status of DNA methylation were identified and then enrolled into prognostic risk score model using LASSO regression. Univariate cox regression analysis indicated that high risk score was significantly associated with poor prognosis. Multivariate cox regression analysis proved the risk score was an independent prognostic factor for PC. In addition, time-dependent ROC curves indicated good performance of our model in predicting the 1-, 3- and 5-year survival of PC patients. Besides, stratified survival analysis revealed that the risk score model had greater prognostic value for patients of late stage with T3/T4 and N+. Pathway enrichment analysis suggested that these three genes might promote tumor progression by affecting signaling by Rho GTPases and chromosome segregation. In summary, three hypomethylated gene signature were significantly associated with patients’ overall survival, which might serve as potential prognostic biomarkers for PC patients.


INTRODUCTION
Pancreatic cancer (PC) is a highly malignant cancer with poor prognosis and high mortality which almost parallels to its disease incidence. It is reported that PC has become the fourth leading cause of cancer-related death in the USA and are predicted to be the second leading cause of cancer death in the USA by 2030 [1]. What's worse, most patients with PC are asymptomatic at early stage. Therefore, it is quite often that PC patients have reached late stage when they are diagnosed, leading to the poor prognosis [2]. Aside from the progress in surgical resection and other adjuvant therapies, exploring efficient methods of early diagnosis is another important way to improve the clinical prognosis of PC patients. Compared to CT scanning and MRI, the more specific and sensitive biomarkers present greater value in early diagnosis, prognosis prediction and even therapeutic treatment.
AGING DNA methylation, as one of the major epigenetic modifications, plays an important role in the occurrence and progression of malignant tumors [3]. Aberrant DNA methylation of CpG islands located in promoter regions, as a critical molecular mechanism of tumor occurrence, often leads to transcriptional silence of tumor suppressor genes and over-expression of oncogenes through hypermethylation and hypomethylation respectively [4]. Notably, increasing studies have identified a series of gene mutations with deregulated DNA methylation from PC tissues [5]. For instance, KRAS (Kirsten rat sarcoma viral oncogene homolog) was found to be the most frequently mutated oncogene in PCs which involves in cellular proliferation, motility, and cytoskeletal remodeling in the earliest stage of pancreatic tumorigenesis [6]. And targeting oncogenic KRAS using exosomes was reported to significantly increase overall survival in PC mice models [7]. AGING Therefore, it is of promising value for diagnosis and treatment of PC patients to identify specific genes with aberrant DNA methylation in PC.
Although the association between DNA methylation and prognosis of PC has been extensively reported, specific prognostic model was rarely reported. In this study, by using the combination of methylation and expression profiling data, we aimed to identify the prognostic significance of differentially expressed genes (DEGs) with altered DNA methylation status in PC and to set up a reliable prognostic model for PC patients.

Identification of DEGs with altered DNA methylation status in PC
We firstly conducted LIMMA analysis to select DEGs using expression data from GSE62452 dataset. A total of 9227 DEGs, including 480 down-regulated genes and 8747 up-regulated genes were identified (Fold-change>1, q-value<0.01) ( Figure 1A). By comparing with the DNA methylation patterns of the ICGC dataset, we further identified 81 down-regulated genes which were hypermethylated (81/480, 16.9%) ( Figure 1B) and 1287 up-regulated genes which were hypomethylated (1287/8747, 14.7%) ( Figure 1C). Next, by performing univariate cox regression analysis between 1368 candidate genes above and survival data of discovery cohort, we obtained 3 prognosis-related genes SULT1E1, IGF2BP3 and MAP4K4, which all reached a statistical significance (P<0.05). The differential expression of above three genes between tumor and normal tissues was further validated in an independent cohort, GSE62452, which contained a total of 69 tumor and 61 normal tissues. Our results revealed that all three genes (SULT1E1, IGF2BP3 and MAP4K4) were all significantly overexpressed in PC tissues, suggesting that they might be potential biomarkers for PC patients ( Figure 2).

Construction and assessment of prognostic risk score model for PC
We selected these three genes to conduct risk score model for PC patients using LASSO regression method. Our risk score formula obtained from discovery cohort was that risk score=0.195 * expression of SULT1E1+ 0.129 * expression of IGF2BP3 + 0.65 * expression of MAP4K4. According to the formula, we calculated the risk score for each patient. Then patients of each cohort were divided into high-and low-risk subgroups by the median value in order to evaluate the prognostic value of risk score by performing univariate and multivariate OS analysis. The results were shown in Table 1. Remarkably, the consistent results of three cohorts (discovery and two validation cohorts) evidently prove that risk score was associated with OS. After adjusted by using AJCC stage and histological grade, our results also reached a statistically significant, suggesting that risk score was an independent prognostic factor for PC. The patients in high risk score group had a two-fold higher risk of death than those in low risk score group (Table 1 and Figure 3A, 3C and 3E). Moreover, we conducted Kaplan-Meier curves to evaluate the impact of risk score on patients' OS time. Our results showed that patients with high risk score had significantly poorer OS than patients with low risk score in three cohorts ( Figure 3B, 3D and 3F). The median survival time (MST) of patients with high risk score was less than 1.5 years (ranged from 13.0-17.0m) while the   AGING median MST of patients with low risk score was more than 2.0 years (ranged from 24.6-29.0m).

Stratified survival analysis
The univariate analysis showed that AJCC stage, T, N and histological grade had relatively significant impact on prognosis (Table 1). Therefore, we further performed stratified survival analysis to evaluate the prognostic values of our risk score model in different subgroups.
According to the results of  (Figure 4). Similarly, we also found the significant prognostic values of risk score in PC patients with N+. The results for N+ subgroup in three cohorts were showed in Figure 5. Especially, for validation-1 cohort, when stratified by T or N, the results of subgroups remained stable, which, to some extent, indicated the reliability and general applicability of our risk score model.

A comparison between our and other models
Recently, Liao et al reported a model containing 9 genes based on TCGA cohort [8]. To compare the prognostic values of our three-gene model and their model, we performed time-dependent ROC curve analysis. The results showed that the Liao's model exhibited a favorable predictive value in predicting 1-year OS, however, its predictive value obviously decreased in predicting 3-and 5-years OS. By contrast, the predictive  AGING values of AJCC stage, which was considered as recommended model, showed a better efficiency in predicting 3-and 5-years OS, but failed to predict 1year OS. The area under ROC curve (AUC) of our three-gene model for 1-, 3-and 5-year overall survival was 0.62, 0.69, and 0.69, respectively, suggesting our three-gene model had a favorable efficiency in predicting both short-and long-term prognosis ( Figure  6).

Biological function prediction
To explore the potential biological function of the three genes, we performed pathway enrichment analysis. Our data suggested that the top three signaling pathways that affected by SULT1E1, IGF2BP3, MAP4K4 and their co-expressed genes were Rho GTPases, chromosome segregation and focal adhesion pathways (Figure 7). Above three signaling pathways were all reported to be involved in tumor progression, providing evidence for further investigating the detailed molecular mechanisms of our three-gene models in PC.

DISCUSSION
Lack of effective and reliable prognostic biomarkers or models remains as a major problem for improving the clinical outcomes of PC patients. In this paper, we aim to explore and evaluate the prognostic values of methylated genes for PC. In brief, we firstly obtained 1368 DEGs with altered DNA methylation status. Three out of them (SULT1E1, IGF2BP3, MAP4K4) were identified as prognosis-related genes and were selected to generate a risk score model. Survival analysis proved our three-gene model was an independent prognostic factor for PC. Stratified analysis further revealed that the risk score model had a greater prognostic value for patients of advanced stage and metastatic lymph nodes.
In conclusion, our three-gene model might serve as a potential predictive tool for PC patients.  All the three genes (SULT1E1, IGF2BP3 and MAP4K4) in our model were confirmed to be upregulated in PC tissues. Our data also revealed that the three-gene model was associated with poor prognosis, especially in patients with advanced stage. Among them, MAP4K4, a serine/threonine kinase involved in activation of the JNK signaling pathway, was overexpressed in many types of human cancer and played an important role in proliferation, migration and invasiveness of cancer cells [9]. It was reported that the expression levels of MAP4K4 in CRC patients with www.aging-us.com 894 AGING lymph node metastasis were higher than that in patients without metastasis [10]. IGF2BP3 was initially identified as an oncofetal gene due to its high abundance in PC tissue and proved relevant with aggressive and invasive phenotype. IGF2BP3 was found to be upregulated in a variety of malignant tumors including lung, esophageal cancers and melanomas, and was capable of promoting tumor growth, drug-resistance and metastasis. Besides, expression of IGF2BP3 was reported to be associated with poor prognosis and metastasis [11]. SULT1E1 was a member of SULT1 family and is best known for inactivating estrogen in humans [12]. SULT1E1 was found to be correlated with estrogen-dependent breast and endometrial cancer while the expression level remains controversial [13]. High SULT1E1 levels were found in breast cancer tissues and associated with a poor prognosis for breast cancer in women [14]. Experimental data showed that SULT1E1 overexpression inhibited proliferation, migration, invasion of breast cancer cells by mediating the adaptive response to estrogen in tumor cells [15]. AGING Different from the above two genes, the role of SULT1E1 in PC was rarely reported and remains obscure. Recently, Seeliger et al reported that estrogen receptor expression was an independent predictor of shorter OS in resected PC patients [16]. Our findings therefore provide insight into the underlying mechanisms of estrogen-related signaling in PC progression.
To date, there was limited prognostic model for patients with metastatic lymph nodes. In this study, we also found that the prognostic efficiency of three-gene model was increased in patients with metastatic lymph nodes. Liang et al reported that MAP4K4 overexpression was associated with increased number of metastatic lymph nodes [17]. Yang et al also found that the expression of SULT1E1 was significantly higher in PC tissues with lymph nodes metastasis than in PC tissues without lymph nodes metastasis [18]. Yet, there was no direct evidence for the association between IGF2BP3 and lymph nodes metastasis in PC. Nevertheless, an experiment performed by Satoru et al showed that the H19-PEG10/IGF2BP3 axis promotes the progression of high lymph node ratio (the ratio of the number of metastatic lymph nodes to the number of dissected lymph nodes) in gastric cancer [19]. Above evidence were consistent with and confirmed our findings of the association between three-gene model and lymph node metastasis.
Several limitations in our study should be pointed out. First, because the clinical information of patients was limited, we could not perform subgroup analysis by stratifying more factors. Second, the censored rate of validation-1 cohort was high, which may comprise the reliability of the Kaplan-Meier estimates. Third, the construction and assessment of this prognostic model was based on public datasets. To further confirm or refuse this model, we warrant large-size, multicenter and prospective clinical cohorts in future.
In summary, we identified 1368 differentially expressed genes with altered DNA methylation and selected three of them (SULT1E1, IGF2BP3 and MAP4K4) to construct a prognostic model. Survival analysis showed that our risk score model exhibited significant prognostic value for PC patients, especially for patients with advanced stage and metastatic lymph node. By comparing with AJCC stage and other models from literature, our model presented greater advantages in stability and accuracy for prognosis prediction and was promising to be applied for clinical prognostic evaluation of PC patients.

Data and sources
The set of array-based DNA methylation data of PC were obtained from the Australian PC Genome Initiative (APGI; https://www.garvan.org.au/research/ cancer/pancreatic-cancer-research). The set of sequence-based mRNA expression data (RNA-seq data) of PC was downloaded from The Cancer Genome Atlas (TCGA) (https://cancergenome.nih.gov/). Another two gene expression arrays of human PC datasets (GSE21501 and GSE62452) were obtained from the Gene Expression Omnibus (GEO) (https://www.ncbi. nlm.nih.gov/gds/) and were served as the discovery cohort (n=102) and validation cohort (n=66), respectively. Moreover, clinicopathological information and survival data of total 345 PC patients from three cohorts (TCGA, GSE21501 and GSE62452) were also obtained for further prognostic analysis.

Identification of DEGs with altered methylation status in PC
The expression dataset and DNA methylation dataset were employed to identify the differentially expressed genes with altered methylation status. The flowchart of this study is shown in Figure 8. Firstly, LIMMA analysis was performed to identify the DEGs by comparing the normalized expression data between PC and adjacent normal tissues. We used GSE62452 to validate the differentially expressed genes because it consisted of 69 tumor tissues and 61 normal tissues. Next, we compared DNA methylation status of genes from the ICGC dataset and divided them into hypermethylated genes and hypo-methylated genes. Then we correlated the level of RNA expression with the degree of DNA methylation and in order to classify genes into two groups, hyper-methylated & down-regulated group and hypo-methylated & up-regulated group. Genes of both groups were selected as candidate genes.

Establishment of prognostic model and receiver operating characteristic (ROC) curve
A univariate Cox regression analysis was firstly performed on all of candidate genes in discovery group to calculate the association between the expression level of each gene and patient's overall survival (OS). Those genes with P-values less than 0.05 were identified as prognosis-related genes (key genes). Then, the selected key genes were further screened and confirmed by the Lasso regression. The prognosis risk score was established with the following formula: Risk score = expression of Gene 1 * β1 + expression of Gene 2 * β2 +…expression of Gene n * βn. In this case, we would be able to generate a risk score for each patient of 3 cohorts based on the normalized expression data. Patients were then divided into high-and low-risk subgroups according to the median cutoff of the prognosis risk score. The prognostic performance was evaluated by using time-dependent receiver operating characteristic (ROC) curve analysis within 1 year, 3 years and 5 years to evaluate the predictive accuracy and sensitivity of our prognostic model.

Overall and stratified survival analysis
Survival data were presented as frequency and percentage both for categorical variables and continuous variables (converted into categorical variables). Univariate cox regression survival analysis was firstly performed to evaluate the prognostic effect of risk score and various clinicopathological features including age, gender, tumor stage, grade, history of chronic pancreatitis, history of alcohol, history of diabetes, family history of cancer. Then the ones of prognostic significance would be put into a cox proportional hazards model for multivariate cox regression survival analysis to further validate if risk score was an independent factor for prognosis. In addition, to further explore the influence of other factors on the prognostic value of risk score, stratified analysis was performed according to prognosis-related clinical features including stage (I/IIA and IIB/III/IV), T stage (T1/T2 and T3/T4), N (N-and N+) and histological grade (G1/G2 and G3/G4). The log-rank test was chosen to determine significant differences of survival curves. A two-sided P value< 0.05 was considered statistically significant. Hazard radio (HR) and 95% confidence interval (CI) were reported if necessary. Statistical analysis was performed using IBM SPSS version 24.0 (IBM Corp, NY, USA). Kaplan-Meier survival was curved by Graphpad prism 7.

Pathway enrichment analysis
The enrichment analysis was conducted to predict the biological function of the three genes. We calculated the Pearson's correlation coefficients between each of three hypomethylated genes and genome and selected the most strongly co-expressed genes (top 200) for each hypomethylated gene. Then we performed pathway enrichment analysis using above genes. A web-based tool, Metascape (http://metascape.org/), was employed to gain insights into the biological functions of these coexpressed genes.