A novel epigenetic signature for predicting the prognosis of pancreatic ductal adenocarcinoma


 Background

Aberrant DNA methylation is often involved in carcinogenesis. This study is designed to establish an epigenetic signature to predict overall survival (OS) of pancreatic ductal adenocarcinoma (PDAC).
Methods

DNA methylation and RNA-seq data of PDAC patients were downloaded from the Cancer Genome Atlas database, Genotype-Tissue Expression (GTEx), and International Cancer Genome Consortium (ICGC) database. Methylation-related differentially expressed genes (DEGs) were identified using an R package MethylMix. Epigenetic signature and nomogram were established by the LASSO and multivariate Cox regression analysis, respectively. In addition, a joint survival analysis of the gene expression and methylation was performed to identify potential prognostic factors for patients with PDAC.
Results

There were a total of 56 methylation-related DEGs by MethylMix criteria. After LASSO Cox regression analysis, we developed an epigenetic signature composed of five genes according to their methylation level. The signature was able to divide patients into high-risk and low-risk groups, and the OS between the high-and low-risk groups was more significantly different in both training and validation cohort. The signature is independent of clinicopathological variables and indicated better predictive power. Moreover, we developed a novel prognostic nomogram that combines risk scores with three clinicopathological factors. The joint survival analysis of gene expression and methylation revealed that 24 genes could be independent prognostic factors for OS in PDAC.
Conclusions

The qualitative signature and nomogram that predict OS at the individualized level and guide therapy for patients with PDAC.

with PDAC. PDAC are heterogeneous and frequently are resistant to standard treatments [2]. At present, TNM staging system is widely used to predict the prognosis of these patients, but the prediction is not su ciently accurate. In addition, no method exists to predict the prognosis of patients with individualized pancreatic cancer. Therefore, establishing a reliable prognostic model for screening high risk patients with poor overall survival is of great clinical signi cance for optimizing individualized treatments and improving the managements of cancer patients.
Epigenetic alterations are considered to be important factors for the initiation and progression of PDAC and potential therapeutic targets [3,4]. Aberrant DNA methylation in oncogenes and tumor suppressor genes affect tumor progression and are related to the survival rate of PDAC patients [5,6]. Several schematics exist to indicated the interrelationship between PDAC genome, epigenome and signaling pathway changes [7][8][9].Tumor suppressor gene hypermethylation or oncogene hypomethylation is considered to be an important mechanism of tumorigenesis [10]. Therefore, the integrated analysis of the methylome and transcriptome is crucial to the understanding of the biological processes PDAC. The present study was to develop a novel epigenetic signature via the analysis integrating multi-omic data including DNA methylation, transcriptome, and clinical outcome of patients with PDAC.

Data and preprocessing
The DNA methylation (n = 189) and clinical data (n = 179) for PDAC used in this study was downloaded from the Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). To compare the mRNA expression levels using the data from the TCGA (n = 182) and Genotype-Tissue Expression (GTEx) (n = 167) databases, we used the University of California Santa Cruz (UCSC) Xena Browser as described previously [11]. In addition, we downloaded methylation data (n = 228) from International Cancer Genome Consortium (ICGC) database (https://icgc.org/). The TCGA cohort was used for training because it contains methylation and transcriptome data and relatively high-quality clinical records. ICGC data was used as validation cohort to verify the performance of the model. Patients with uncertain TNM stage (n = 7) and tumor grade (n = 4) were excluded, 168 PDAC patients were nally reserved for further study. The clinicopathologic characteristics of the study cohort were shown in Table 1. The current study did not require ethics approval as all study datasets were downloaded and analyzed in accordance with the corresponding data policies of previous databases. Identi cation of methylation-related differentially expressed genes The limma R package was used to identify differentially methylated genes (DMGs) and differentially expressed genes (DEGs). Based on cut-off value, adjusted P values < 0.05 and |fold change (FC)| > 1.5, and adjusted P values < 0.05 and |FC| > 2 were used for obtaining DMGs and DEGs, respectively. We explored the correlation between methylation data and DEG expression using Pearson's correlation analysis. Pearson coe cient between methylation level and gene expression level was calculated to nd signi cantly negatively related genes using the MethylMix package in R [12]. Methylation-regulated DEGs were identi ed using the criterion of Pearson coe cient < − 0.3 with P < 0.05.

Development and validation of signature for prognosis prediction
To identify potential OS-related genes in patients with PDAC, univariate Cox regression analysis was conducted based on these gene methylation level. Genes with P < 0.05 were de ned as candidate genes and were then subjected to LASSO-penalized Cox regression analysis for identifying optimal prognostic genes affecting the OS of PDAC patients. An epigenetic signature was established for predicting OS of PDAC patients by combining the prognostic coe cients of the DMGs and their methylation level.
The formula for calculating the risk scores was as follows: Coef and Methylation represented regression coe cient and methylation level of corresponding gene, respectively. Based on the median risk score value, all patients were classi ed into high risk score group and low risk score group. The Kaplan-Meier survival curves were generated to graphically compare the difference of OS between two groups. The predictive ability of the model was evaluated by timedependent receiver operating characteristic (ROC) curve analysis. In order to evaluate the feasibility of the prognostic model, we built a validation prognostic model in the ICGC cohort. The risk score for each patient was calculated with the same formula as TCGA cohort.

Construction of prognostic nomogram
To provide clinicians with a quantitative method to predict OS of PDAC patients, we established a nomogram integrating risk score and clinicopathological factors. Discrimination between the outcomes of observations and predictions using the area under the curve (AUC) of ROC curve. A calibration map was generated by comparing the nomogram prediction probability and the observation for the 1-year and 3-year OS rates.

Joint survival analysis of DMGs and DEGs
To investigate the prognostic value of methylation level and expression level of the same methylationregulated gene, we performed a joint survival analysis to further explore prognosis-related genes in patients with PDAC.

Statistical analysis
Continuous variables were presented as mean ± standard deviation (SD) or median as appropriate. The tdROC curves were used to assess the predictive performance of prognostic models [13]. Prognostic nomograms were constructed based on the factors of signi cant multivariate prognostic using "rms" package. All statistical analyses were carried out through R software (version 3.6.1, https://www.rproject.org/). P values lower than 0.05 were considered signi cant.

Identi cation of methylation related EDGs
The owchart of this study is demonstrated in Fig. 1. We identi ed 12,102 DEGs (7,447 up-regulated genes and 4,655 down-regulated genes) between 178 PDAC samples and 171 normal samples from the TCGA and GTEx databases. Then, methylation and DEGs data were integrated into the same sample by MethylMix algorithm for correlation analysis. Totally, 56 DEGs were identi ed as methylation-related DEGs (|FC| > 1.5, P < 0.05, Cor < -0.3, Fig. 2).

Construction and validation of epigenetic signature
We obtained 25 prognosis-associated genes by using univariate Cox regression analysis, which were signi cantly correlated to OS in PDAC. Then, we performed LASSO-penalized Cox regression, and the results demonstrated that ve genes were still signi cant prognostic factor for PDAC. An epigenetic signature was established based on the methylation level of the ve genes. The formula was as follows:  Fig. 3a, and the correlation between DNA methylation and gene expression of the ve genes is summarized in Fig. 3b. The distribution of risk score, survival status, and expression pattern of the 5-gene signature are shown in Fig. 4a, b, and c, respectively. We classi ed patients into two groups (high-risk and low-risk groups) according to the median risk score as the cut-off. Patients with high-risk group had worse OS compared to the low-risk group in PDAC (P < 0.05) (Fig. 4d). The AUC value of risk score was 0.776 which showed predictive ability of the signature in predicting survival risk of PDAC patients (Fig. 4e).

Validation of the signature in ICGC cohort
To validate the predictive power of the signature, risk scores were calculated with the same formula for patient in ICGC cohort. The distribution of risk score, survival status, and expression pattern of the 5-gene signature in ICGC cohort are shown in Fig. 5a, b, and c, respectively. Similarly, there were signi cant differences in OS among patients between low-and high-risk groups in PDAC (P < 0.05) (Fig. 5d). The AUC value of the signature was 0.705 which showed predictive ability of the signature in predicting PDAC patients' survival risk (Fig. 5e). Table 2 displays all risk factors between clinicopathological characteristics and prognosis. According to the results based on the multivariate analysis, age, N stage, surgery type, and risk score were con rmed as independent predictors for OS (P < 0.05). The four variables were used to create the nomograms for OS. The prognostic nomogram for 1-and 3-year OS was shown in Fig. 6a. We further constructed ROC curve to verify the accuracy of the nomogram. The AUCs of the nomogram predicting the 1-and 3-year OS were 0.748 and 0.771, respectively (Fig. 6b). The calibration curves, shown no obviously deviations from the reference line, which illustrated optimal agreement between model prediction and actual observations for 1-and 3-year OS (Fig. 6c).

Joint survival analysis
The joint survival analysis combining of methylation and expression of 24 genes was signi cantly associated with OS in patients with PDAC (P < 0.05; Fig. 6). The high-methylation and low-expression survival rate of 16 genes were poor, while the low-methylation and high-expression survival rate of the eight genes were also higher (Fig. 7).

Discussion
PDAC is a highly destructive malignant tumor of the digestive system. It is insidious, di cult to diagnose early, and prone to distant metastasis, which is an important cause of poor prognosis. Pancreatic cancer is associated with late diagnosis, early invasion, and chemoradiotherapy resistance. Therefore, a more accurate quantitative prediction method is urgently needed to assist clinical procedures. Here, we integrate DNA methylation and RNA-sequencing data from three public databases (TCGA, GTEx, and ICGC) to develop a robust qualitative signature for OS of patients with PDAC. This study elucidated ve methylated genes involved in the prognostic character of PDAC, which were used to develop an epigenetic signature. We validated its predictive power in ICGC cohort and proved that the signature is indeed signi cant irrespective of dataset in the independent validation cohort. The proposed risk score is independent of clinicopathological variables and showed a favorable prognostic ability. Moreover, a nomogram combining the epigenetic signature and clinicopathological factors was constructed to visually predict OS of PDAC patients. In addition, the methylation and gene expression combined survival analysis indicated that 24 independent genes can also be used as prognostic factors for stratifying OS in PDAC. Taken together, these results indicate that the signature not only serves as a biomarker of PDAC independent of clinicopathological features, but also to predicts clinical outcomes in PDAC patients.
Many genes show irregular expression due to abnormal CpG island methylation in the regulatory region of DNA, rather than sequence changes [14]. Due to the stable nature of DNA and its ampli cation, DNA methylation can be easily converted from laboratory settings to routine hospital operations. The methylation pro le of gene promoters is different for each type of cancer, suggesting that detection of abnormal methylation can be used as a potential molecular biomarker for cancers. Moreover, epigenetic changes expected to be therapeutic targets as epigenetic changes are reversible [15]. Therefore, detecting DNA methylation can provide new insights for further assessing cancer risk and treatment. Many reports suggested that aberrant DNA methylation is involved in the development and progression of pancreatic cancer. Tumor suppressor genes often show hypermethylation of the promoter, thereby suppressing its expression and contributing to the pathogenesis of PDAC [16,17]. Zhou et al. [17] found that methylation frequencies of CpG sites within promoter of LITAF were signi cantly higher in PDAC tissues, and LITAF promoter hypermethylation was associated with low LITAF expression. Moreover, promoter demethylation dose-dependently increased the LITAF transcription. LITAF demethylation inhibited proliferation and cell cycle progression, and promoted apoptosis of PDAC cells. Similarly, PCDH10 expression was signi cantly down-regulated in pancreatic cancer cells [18]. Methylation in PCDH10 promoter was observed in pancreatic cancer cells, and the expression of PCDH10 treated with Potent DNA methyltransferase 1 inhibitor was signi cantly up-regulated. Overexpression of PCDH10 inhibited the proliferation, migration, invasion ability of PDAC cells and induce apoptosis [18].
Several reports show that hypermethylation of the promoter can also serve as a diagnostic and prognostic biomarker in PDAC [19][20][21]. Nishizawa et al. [19] revealed that methylation level CDO1 promoter was signi cantly higher in PDAC tissues, and hypermethylation was signi cantly associated with worse disease-speci c survival in PDAC. Interestingly, it was signi cantly higher in prospectively collected PDAC cytology samples, including both pancreatic juice and needle aspiration cytology. Promoter methylation of ADAMTS1 and BNC1 in cell-free tumor DNA were detected in early stages of pancreatic cancer, so they may be used as indicators for early diagnosis of pancreatic cancer [21]. If a person is suspected of having pancreatic cancer, his plasma sample may be collected and analyzed for his methylation status. Promoter methylation of other genes in blood can also be used as a potential indicator for early detection of PDAC, such as MUC2 and SPARC [20,22]. In addition, Curia et al. [6] indicated that CpG methylation was observed in PDAC tissue, and hypermethylation level of PCDH10 was signi cantly associated with poor progression-free survival (PFS) rates.
The proposed signature included ve genes (AHNAK2, ARNTL2, OAS2, CLEC2B, and HIST1H2BH). Only AHNAK2 involved in this signature have been previously investigated in PDAC. Several studies in PDAC have con rmed AHNAK2 as an oncogene [23,24]. Lu et al. [24] indicated that AHNAK2 is upregulated in 79 PDAC tissues compared with adjacent normal tissue, and patients with high AHNAK2 expression had worse OS. AHNAK2 protein also be detected in PDAC samples by the immunohistochemical staining [25].
Other genes in the signature have not been reported to related to PDAC biology, the functions and mechanisms of these genes in PDAC need to be further investigated.
Though the epigenetic risk score model is promising, this study inevitably has some limitations that can be explored in the future. First, the prognostic signature or nomogram were established based on the limited number of patients. A larger sample validation cohort is needed to further validate the predictive accuracy of our model. Second, although the signature of the two genes showed favorable predictive ability in PDACC, the mechanism behind it was not yet clear and further researches were needed. Further functional experiments were needed to validate the functions of the ve gene signature.

Conclusion
we have identi ed ve prognostic genes and developed an independently validated epigenetic signature to predict the OS of patients with PDAC via the analysis integrating data of DNA methylation and gene expression. The qualitative signature could robustly predict OS of PDAC patients at the individualized levels.

Availability of data and materials
The datasets used in this study are available from the Cancer Genome Atlas (TCGA) database  Flowchart of construction and validation of the signature.

Figure 2
Heatmap of methylation related differentially expressed genes in pancreatic ductal adenocarcinoma).
The color change from blue to red in the heatmap illustrates the trend from low to high methylation. The epigenetic signature for overall survival prediction of pancreatic ductal adenocarcinoma patients in TCGA cohort. a The distribution of risk score derived from the signature. b The distribution of survival status of each patient. c The methylation level pro le of the signature in the high-risk and low-risk subgroups. d Kaplan-Meier estimate of the OS. Patients were divided into high-risk and low-risk subgroup based on the median of risk score. The difference between the two curves was determined by the twoside log-rank test. e Time-dependent ROC curve of the signature. The epigenetic signature for overall survival prediction of pancreatic ductal adenocarcinoma patients in ICGC cohort. a The distribution of risk score derived from the signature. b The distribution of survival status of each patient. c The methylation level pro le of the signature in the high-risk and low-risk subgroups. d Kaplan-Meier estimate of the OS. Patients were divided into high-risk and low-risk subgroup based on the median of risk score. The difference between the two curves was determined by the twoside log-rank test. e Time-dependent ROC curve of the signature. Kaplan-Meier survival curves of joint analysis of methylation and expression data in pancreatic ductal adenocarcinoma.