Development and clinical validation of A Novel 9-gene prognostic biomarkers based on multi-omics in pancreatic adenocarcinoma

Background: The prognosis of patients with pancreatic cancer remains poor due to the lack of biomarkers for early diagnosis and effective prognosis. Methods: RNA-Seq, SNP, CNV data were downloaded from The Cancer Genome Atlas(TCGA); Univariant cox regression was used to identify prognosis-related genes; GISTIC 2.0 was used to identify signicantly amplied or deleted genes; Lasso method was used to construct risk prognosis model, which was then validated in GEO and ICGC cohorts. rms package was used to evaluate the overall predictive performance of the model by calculating and comparing the C-index values with other models. Experiments of western blot were performed to evaluate the expression of genes. Results: 54 candidate genes were obtained after integrating the genomic mutated genes and prognosis-related genes. The Lasso method was conducted to nally ascertain nine characteristic genes, including UNC13B, TSPYL4, MICAL1, KLHDC7B, KLHL32, AIM1, ARHGAP18, DCBLD1, and CACNA2D4. The 9-gene signature model can help with stratifying samples at risk in the training cohort and external validation cohort. In addition, the overall predictive performance of our model is superior to the others. We found that AIM1 and DCBLD1 were highly expressed in tumor tissues, while ARHGAP18, CACNA2D4, and TSPYL4 were lowly expressed in tumor tissues at the protein and transcription levels. Conclusion: The nine-gene signatures constructed in this study can be used as the novel prognostic markers to predict the survival of PAAD patients.

TP53 and SMAD4 / DPC4 could be referred to predict the postoperative survival of PAAD patients [9]. In addition, the family of non-coding RNA has also shown huge potential for predicting prognosis of cancerous patients [10; 11]. Nonetheless, studies on the effects of these indicators on pancreatic cancer mostly focus on the function of a single or several genes, making it di cult to conduct in-depth and systematic research from a dynamic, holistic, and multidimensional perspective. With the continuous improvement of high-throughput sequencing technology, integrated multi-level approach of genome and transcriptome for systematic research [12; 13; 14] provide novel ideas to explore the prognostic markers of pancreatic cancer.
In the current study, we established a 9-gene signature model based on the integrating and screening of genomic and transcriptomic data, and subsequently validated its validity by using internal and external cohorts. In addition, we found that the 9-gene signature is involved in some important biological pathways of pancreatic cancer. In a nutshell, the 9-gene signature established in this study can effectively predict the prognosis of patients with pancreatic cancer and show promising application in clinical practice.

Methods
Data source and preprocessing TCGA cohort: Normalized RNA-sequencing data as transcripts per million (TPM) and the associated clinical information of the PAAD samples were downloaded from The Cancer Genome Atlas (TCGA) cohort (https://portal.gdc.cancer.gov/. 171 cases with corresponding tumor tissues and clinical information were included in the study. Normalized gene expression data for the TCGA PAAD cohort were log2-transformed for further analysis. GEO cohort Gene expression microarray data of 132 normal and primary pancreatic tumor samples from patients were obtained from the Gene Expression Omnibus (GEO) database (accession number GSE21501). Normalization, quality control, and imputation of array data were performed. Expression data from multiple probes were collapsed by the mean for each sample

Construction of Univariate Cox Proportional Hazards Model
Similarly, Guo J, et al [15], applied univariate cox proportional hazards regression analysis on each gene to screen out those that were signi cantly related to patient's OS in the training cohort, and p <0.01 was selected as the signi cance level.
Data analysis of copy number variation GISTIC is widely used and detects both broad and focal (potentially overlapping) recurring events. We used GISTIC 2.0 [16] software to identify genes exhibiting signi cant ampli cation or deletion. The parameter threshold was de ned as the length of ampli cation or deletion being greater than 0.1 and p <0.05 in the fragments.

Genetic Mutation analysis
To identify signi cantly mutated genes, we used Mutsig 2.0 software to analyze the maf les of TCGA mutation data, at the signi cance level of p <0.05.

Construction of prognostic signatures
We selected the genes that are signi cantly related to patient's OS and have undergone notable ampli cation, deletion, and mutation. The genes were ranked according to their importance using RandomSurvivalForest Package in R [17] following the methods described by Meng, J et al. [18]. The number of iterations for Monte Carlo simulation was set to 100, the number of advancing step was 5, and genes with a relative importance of greater than 0.4 were identi ed as characteristic genes. Furthermore, multivariate Cox regression analysis was performed to construct the following risk scoring model: Where N is the number of prognostic genes, is the expression value of prognostic genes, and is the estimated regression coe cient of genes in the multivariate Cox regression analysis.

Functional enrichment analyses
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using the clusterpro ler package in R [19] to identify over-represented GO terms in three categories (biological processes, molecular function and cellular component), and KEGG pathway. For this analyses, a FDR < 0.05 was considered to denote statistical signi cance.

Construction of risk model for the training cohort
Lasso method is a kind of compressed estimation used to obtain a re ned model by constructing a penalty function [20]. Meanwhile, it can compress some coe cients and set some coe cients to zero. Therefore, it retains the advantage of subset selection with shrinkage, and is a kind of biased estimation suitable for the processing of complex collinear data. It can help with the selection of variables at the time of parameter estimation, so as to better solve the multicollinearity problem in regression analysis. We use the glmnet package in R for lasso cox regression analysis to analyze the trajectory of each independent variable. Gene Set Enrichment Analysis (GSEA) GSEA was performed by the JAVA program (http://software.broadinstitute.org/gsea/downloads.jsp) using the MSigDB14 C2 Canonical pathways gene set collection, which contains 1320 gene sets. Gene sets with a false discovery rate (FDR) value less than 0.05 after performing 1000 permutations were considered to be signi cantly enriched. Sample Collection PAAD and adjacent tissues were collected from 4 patients, immediately placed in liquid nitrogen, and preserved at -80°C. Patients and their families in this study have been fully informed and informed consent was obtained from the participants. This study was approved by the Ethics Committee of Hainan General Hospital.

Statistics
The median risk score in each cohort was used as cutoff to compare the survival risk between high-risk group and the low-risk group, and subsequently plot the Kaplan-Meier (KM) survival curve. A multivariate Cox regression analysis was performed to test whether the genetic markers were independent prognostic factors. Signi cance level was set at P <0.05. All these analyses were performed using R 3.4.3.

Analysis of copy number variation
For the data of copy number variation in TCGA, we used GISTIC 2.0 to identify genes with signi cant ampli cation or deletion, and the parameter threshold was de ned as the fragment with an ampli cation or deletion length greater than 0.1 at the level of p <0.05. The signi cantly ampli ed fragments in the pancreatic cancer genome were listed in Supplement Table 1. For example, signi cantly ampli ed CCAT1 at 8q24.21 (q value = 7.69E-14), GATA6 at 18q11.2 (q value = 7.69E- 14), and RPS16 at 19p13.2 (q value = 3.83E-07) were determined. Gene ampli cation was found for 82 genes in total.
The signi cantly deleted fragments in the pancreatic cancer genome were shown in Supplement Table 2, such as CDKN2A at 9p21.3 (q value = 1.83E-74), SMAD4 at 18q21.2 (q value = 1.58E-41), and SFN at 1p36.11 (q value = 0.00011). After removing redundancy, a total of 476 deleted genes were deleted. The distribution of copy number variation was shown in Figure 1A.
Furthermore, heatmap was plotted to illustrate the top 15 copy number variations ( Figure 1B). Different copy number variation between different stages and grade samples was determined, yet none reached the signi cant level.

Analysis of mutation data
We used Mutsig2 to identify the genes with signi cant mutations. The threshold signi cance level was set at p <0.05, and a total number of 135 signi cantly mutated genes were selected. The pronounced mutated 49 genes (p <0.02) in PAAD patients and their distribution were presented in Figure 2. The left image shows the total number of synonymous and non-synonymous mutations in each patient, and the right shows the number of samples with mutations of genes. Of the identi ed signi cant genes, some have been reported in previous reach that are closely related to the carcinogenesis and progression, such as KRAS, TP53, CDKN2A, and RNF43.
Pathways and biological processes that involve genes with copy number variation and mutation By analyzing the ampli ed and deleted genes which were identi ed by copy number variant in TCGA, as well as targeted integration of mutant genes, we identi ed 693 genes in total that are implicated in the corresponding biological processes and pathways. These genes were signi cantly enriched in 39 KEGG pathways, including MAPK signaling pathway, Hepatocellular carcinoma, TNF signaling pathway and other tumor-related pathways. The Top 20 closed associated KEGG pathways were shown in Figure 3A. Furthermore, the gene sets were integrated into a network diagram ( Figure 3B) based on enrichment condition and overlapped gene sets. The overlapping gene sets tend to cluster, thusly helping with the nding of clusters with similar biological functions.

Construction of risk model based on training cohort
Univariate cox analysis was applied to identify 2625 prognostic genes with p values < 0.01 in training cohort. Through the previous analysis, we have obtained 82 copy ampli cation, 476 copy deletions and 135 signi cant mutations genes. After intersection with 2625 prognostic related genes, 54 candidate genes based on multi-omics were obtained.
Subsequently, we need to narrow down the 54 genes range and build a prognostic model while maintaining high accuracy. We used glmnet package to perform lasso cox regression analysis. As seen in Figure 4A, the gradually decreasing lambda resulted in a higher number of independent variable with coe cients approaching 0. We used 10-fold cross-validation to build the model and to analyze the con dence interval under each lambda. As shown in Figure 4B, the model is optimal when lambda equals to 0.03874. Thusly, we selected 15 genes as the target genes under the condition of lambda equals to 0.03874. Furthermore, we conducted multi-variant cox survival analysis on the 15 genes obtained in the previous step, and retained 9 mRNAs with the lowest AIC value (AIC = 555.89) as the nal model. The detailed information of these 9 mRNAs was shown in Table 3. The prognostic KM curve of the 9 genes is shown in Fig S1. Obviously, 7 genes can signi cantly distinguish the samples of TCGA training cohort into high-risk and low-risk group (p <0.05). One shows marginal signi cance at the level of p = 0.062, whereas the other indicates no signi cance. Among them, the down-regulated expression of four genes, including UNC13B, TSPYL4, MICAL1 and KLHL32, along with the high-regulated expression of ve genes, including KLHDC7B, AIM1, ARHGAP18, DCBLD1 and CACNA2D4, are associated with poor prognosis. Although no signi cant correlation was detected in CACNA2D4, yet its aberrantly higher expression was determined in high-risk groups, which indicates dismal prognosis than low-risk group. The nal version of the 9-mRNA signature formula is as follows: package. We analyzed the e ciency of prognostic classi cation e ciency of the 1 st , 3 rd and 5 th year in both training cohort and testing cohort. As shown in Figure 5A-B, the AUC for the 5 th years is 0.93 in the training cohort and 0.9 in the testing cohort. We conducted z-score transformation of the Riskscore, dividing the samples with Riskscore > 0 into high-risk group, and those with Riskscore < 0 into lower-risk samples, plotted a KM survival curve. As shown in Figure 5C-D, signi cant differences were determined in the training cohort (log rank p <0.0001, HR = 4.534) and testing cohort (log rank p <0.0001, HR = 3). 86 samples were divided into high-risk group and 51 samples into low-risk group.
Robustness of 9-gene signature in the TCGA -PAAD cohort In order to ensure the robustness of the model, we used the same model in the TCGA training cohort and the same cutoff value, and veri ed it in the TCGA-PAAD cohort, and plotted the RiskScore distribution. As shown in Figure 6A, the OS of high-risk groups was signi cantly smaller than that of low-risk groups, which indicates that samples with high RiskScore have worse prognosis. The changed expression of 9 different signature genes with higher risk value was identi ed. We veri ed that the high expression of KLHDC7B, AIM1, ARHGAP18, DCBLD1 and CACNA2D4 indicate high risk, demonstrating these 5 genes as risk factors. Meanwhile, it was found that the high expression of UNC13B, TSPYL4, MICAL1 and KLHL32 indicates low risk, demonstrating these 4 genes as protective factors.
Furthermore, we used timeROC package in R to perform ROC analysis on the RiskScore to carry out prognostic classi cation. The e ciency of the prognostic classi cation in the 1 st , 3 rd and 5 th year was analyzed. As shown in Figure 6B, the AUC is high in the model, reaching above 0.90 in the 5 th year. Finally, we conducted z-score transformation of the Riskscore, dividing samples with Riskscore greater than zero into high-risk groups, and the other into low-risk group, and plotted a KM curve. As shown in Figure 6C, extremely signi cant differences could be found, with log rank p <0.0001 and HR = 3.01 (1.819-4.981). 107 samples were divided into high-risk groups and 64 samples were into low-risk groups.
External Veri cation of the robustness of 9-gene signature We applied the same model and coe cients of the training cohort in the two external validation cohorts.
We also calculated the risk score of each sample based on the expression level of the sample and plotted the RiskScore distribution of the sample.
The results of the GSE21501 cohort are shown in Figure 7. We used the timeROC package to perform ROC analysis on the RiskScore for prognostic classi cation. We analyzed the prognosis e ciency of prognostic classi cation of the 1 st and the 3 rd year, and found high value of AUC in the model. The AUC of the 1 st year is 0.73. We draw a survival curve based on the expression of RiskScore. As shown in Figure  7B, extremely signi cant differences was observed, with log rank p = 0.05, HR = 1.628 (0.997-2.657). Among them, 48 samples were divided into high-risk group and 49 samples were into low-risk group.
Similarly, for ICGC cohort. The prognostic classi cation e ciency of 1 st, and 3 rd year was investigated, respectively. As shown in Figure 7C, the model has a high AUC (the AUC for 1 st year is 0.75). As shown in Figure 7D, signi cant marginal difference was determined, with log rank p = 0.011 and HR = 1.526 (1.098-2.122). 136 samples were divided into high-risk group and 121 sample were low-risk group.

Risk Model and Prognostic Analysis of Clinical Features
Furthermore, the results of survival analysis determined that only Grade and N stage in the TCGA training cohort samples were signi cantly related to OS of pancreatic cancer (Figure 8), whereas signi cant difference were not observed in regard to age, gender, family history, T stage, M stage and smoking history.
After performing a RS analysis on the 9-gene markers, we found that the 9-mRNA signatures can effectively separate young, old, male, female, smoking, family history, Grade + , T3, Stage , N0 and N1 stage patients into high risk and low risk groups (p <0.01). It further demonstrated the good predictive ability of this model in regard to different clinical features (Figure 9).
Univariate and multivariate analysis of 9-gene signature To identify the independence of the 9-gene signature model in clinical application, we conducted Univariate and multivariate COX regression analysis into the relevant clinical information carried by the entire TCGA, GSE21501 and ICGC, and calculated 95% CI of HR and p value. We systematically analyzed the clinical information from TCGA, GSE21501, and ICGC (International Cancer Genome Consortium), including age, gender, and grouping information of 9-gene signature. Uni-variant COX regression analysis found that RS, Age, T, N, and Grade in the TCGA cohort were signi cantly related to survival, yet the corresponding multi-variant COX regression analysis found that RS (HR = 2.43 95% CI = 1.420-4.165, p = 0.001) and age are signi cantly associated with prognosis and exhibit clinical independence.
As for the external cohort of GSE21501, univariate COX regression analysis found that RS and N were signi cantly related to survival, yet the corresponding multi-variant COX regression analysis only found N (HR = 2.01, 95% CI = 1.066-3.778, p = 0.031) and Age are signi cantly associated with prognosis and exhibit clinically independence.
Lastly, as for ICGC, Univariate and multivariate COX regression analysis found that RS was signi cantly associated with survival. The above ndings indicated that our 9-gene signature is a prognostic model independent from other clinical factors and thereby presenting independent predictive capacity in clinical application.

Riskscore-related regulatory pathways
To observe the association between the riskscore and biological function in different samples, we selected the gene expression pro les corresponding to these samples and performed single-sample GSEA analysis using GSVA package in R. We calculated the scores of different functions in different samples to obtain the ssGSEA score for each sample. Furthermore, we calculated the correlation between these functions and Riskscore, and then selected the function with a correlation coe cient greater than 0.5. As shown in Figure 10A, a small part is negatively correlated with the sample's risk score, whereas most of the part shows a positive correlation with the Riskscore.
We selected the 25 KEGG Pathways that have coe cients greater than 0.5, and then performed cluster analysis based on the enrichment scores ( Figure 10B). It can be seen that among these 25 pathways, P53 SIGNALING PATHWAY, CELL CYCLE and other tumor progression-related pathways increase with higher RiskScore, whereas the TASTE TRANSDUCTION and CARDIAC MUSCLE CONTRACTION pathways decrease with higher risk score. This also suggests that the dysregulation of these pathways is closely related to tumor development.

Comparison of risk model with other models
After referring to previously published literature, we selected four prognostic risk models, including 15gene signature (Chen) [21], 7-gene signature (Cheng) [22], 5-gene signature (Raman) [23], and 9-gene signature (Wu) [24] for the comparison with our 9-genes model. To ensure comparability, we calculated the risk score of each PAAD sample in the TCGA using the same method based on the corresponding genes in the 4 models, and evaluated the ROC of each model, and divided the samples into Risk-H and Risk-L groups in accordance with the median risk score. The difference in prognosis between the two groups of samples was calculated.
The ROC and KM curves of the four models are shown in Fig. 10 A-H. Apparently, the AUC of the four gene models are all above 0.70, yet the predictive effects of the four models are inferior to our 9-gene model.
Signi cant difference in the prognosis was also determined between Risk-H and Risk-L groups of samples in the four models (log rank p <0.001).
Furthermore, to compare the prediction performance of these models on PAAD samples, we used "rms" package in R to calculate the concordance index (C-index) of different models. It can be seen that the Cindex of the 9-genes model is the highest (Fig.10I), indicating that the overall performance of the 9-gene model outweighs than the other four models.
External validation of proteins and mRNA of 9 gene signature The protein expressions of 9 genes were analyzed by HPA database. Among them, MICAL1 and UNC13B are negative in tumors and normal tissues, KLHDC7B, KLHL32 are not signi cantly different in tumors and normal controls. ARHGAP18, CACNA2D4, and TSPYL4 are relatively low expressed in tumors, and AIM1, DCBLD1 are relatively high expressed in tumor tissues ( Figure 11).
The differential expression of 9 genes in GSE62452, GSE107610 and TCGA-PAAD were analyzed. Among them, AIM1 and DCBLD1 were highly expressed in tumors, while KLHDC7B, ARHGAP18, CACNA2D4, TSPYL4 and KLHL32 were all low expressed in tumors, and the overall expression trends of 9 genes in the three sets of pancreatic cancer cohorts were roughly the same ( Figure 12).

Genetic alterations of the 9 predictive genes
The mutations of 9 genes was explored in cBioportal database. among which the gene with the highest mutation proportion was UNC13B, accounting for 3%, and the main type of mutation was ampli cation, followed by KLHDC7B (2.3%) and CACNA2D4 (2.5%). (Figure 13)

Clinical validation of 9 genes
Then, we measured the expression levels of UNC13B, TSPYL4, MICAL1, KLHDC7B, AIM1, ARHGAP18, DCBLD1, and CACNA2D4 in four pairs of PADD tumors and normal samples. We found that compared with normal tissues, M1CAL1, KLHDC7B, KLHL32, and UNC13B were not different in human PADD tissues and normal controls. ARHGAP18, CACNA2D4, and TSPYL4 were relatively low expressed in PADD tissues.
DCBLD1, A1M1 are relatively highly expressed in tumor tissues, and the experimental results are almost consistent with our data analysis ( Figure 14)

Discussion
Human malignant cancer indicates an intricate etiology that involves multiple factors and accumulated lesions after multiple-stage. During such process, genetic factors, epigenetic factors, as well as the numerous related genes under their regulation jointly constitute the complex heterogeneity of tumors, bringing major di culties to clinical diagnosis and personalized treatment [25; 26; 27]. The continuous development of high-throughput sequencing technology makes it possible to carry out comprehensive and multi-level researches on tumors at genome level and transcriptome level [26]. Meanwhile, integrating multiple omics data and conducting comprehensive research in combination with patients' clinical data prove superior in ascertaining effective therapeutic targets and prognostic markers [28; 29]. In the present study, we analyzed multi-omics data, including transcriptome, copy number variation, and mutation, to screen and construct a 9-gene signature risk prognosis model that is closely related to the prognosis of PAAD patients. The 9-gene signature has strong robustness, and shows stable and consistent prediction performance in both the TCGA internal validation set and the GEO external validation set.
More importantly, we systematically analyzed the clinical information in TCGA, GSE21501, and ICGC cohorts by performing COX regression analysis, and found that the constructed 9-gene signature model maintains consistent and stable clinical independence under the in uence of various clinical factors. All nine genes were demonstrated as independent prognostic factors, further evincing the high stability of the 9-Gene signature. In light of the above results, su ce it to way that the constructed 9-gene signature has huge potential for clinical application by virtue of its stable and consistent predictive power for prognosis on different platforms and in various cohorts.
A plethora of studies have con rmed that KLHDC7B is overexpressed in breast cancer and is involved in regulating the malignant progression of breast cancer [30; 31]. In addition, KLHDC7B has been shown to be closely related to the prognosis of laryngeal cancer [32]. As an actin-binding protein, AIM1is aberrantly expressed in melanoma [33; 34], prostate cancer [35; 36], and bladder cancer [37] and is closely related to prognosis. ARHGAP18, a RhoGTP activating protein, has been shown to correlate with the prognosis of patients with gastric cancer [38] and breast cancer [39], due to its participation in regulating the metastatic capacity of breast cancer cells [40; 41]. TSPYL4 has been con rmed to correlate with the prognosis of patients with head and neck squamous cell carcinoma [42]. KLHL32 is implicated in regulating the production of ROS and plays a vital role in the progression of breast cancer [43; 44] and melanoma [45]. In addition, DCBLD1, CACNA2D4, and UNC13B have not been reported to be associated with the prognosis of cancerous patients; they were determined, for the rst time, as prognostic markers of pancreatic cancer in this study.
Finally, the results of GSEA enrichment analysis showed that 9-gene signature is closely related to tumor pathway such as P53 SIGNALING PATHWAY CELL CYCLE in pancreatic cancer.
It has been corroborated in numerous studies that P53 pathway and aberrant cell cycle regulation exert considerable impact on the biological behavior of pancreatic cancer cells and the prognosis of PAAD patients [45; 46; 47]. These ndings are consistent with our results. Therefore, pathway enrichment analysis should further conducted to deepen our understanding about the intricate mechanism of the onset and progression of PAAD.
Researchers have attempted to screen and construct prognostic marker models for pancreatic cancer.
Chen et al. constructed a 15-gene signature to predict the prognosis of patients with early pancreatic ductal cancer [21]. Cheng et al. constructed a 7-gene signature for the prognosis of pancreatic cancer based on multiple GEO cohorts [22]. In addition, the 5-gene signature was constructed by by Raman et al.
on the basis of GEO and ICGC cohorts [23]. Wu et al., conducted a 9-gene signatures based on TCGA and GEO cohorts [24]. In order to further validate the advantages of our 9-gene signature, we analyzed the above four models simultaneously. The results show that the 9-gene signature constructed in this study is superior to the other 4 models in regard to predicting the prognosis of patients. Further C-index analysis showed that the overall performance of our model is improved than the other four. In view of the above results, our prediction model based on multi-omics data shows stronger advantages. It can help with the prediction of the individual risk of PAAD patients, and provide constructive guidance for patient evaluation and decisions of clinical intervention.
Then we integrated the HPA protein database, GSE62452, GSE107610 and TCGA-PAAD databases to validate the protein and mRNA levels of 9 genes. We found that AIM1 and DCBLD1 were highly expressed in tumor tissues, while ARHGAP18, CACNA2D4, and TSPYL4 were lowly expressed in tumor tissues. Then we used 4 pairs of PADD tissue and control samples for protein and transcription level analysis, the results were consistent with the analysis Although we conducted the present experiment to screen and verify the potential prognostic markers for PAAD on the basis of large sample multi-omics data, it is not without limitations. The further veri cation in vivo and in vitro of these genes is still needed. In addition, the samples used in the study are retrospectively analyzed, more thoroughly and comprehensively investigations are yet to be done prior to clinical applications.

Conclusion
In summary, we screened and validated a prognostic risk model consisting of 9 genes which exhibited excellent predictive power in both training and validation sets, and all 9 genes are independent prognostic factors. The construction of this multi-omics risk model can improve the ability to predict the prognostic risk of PAAD patients more accurately. Therefore, we recommend that this 9-gene multi-omics model used as a molecular marker to assess the prognostic risk of PAAD patients.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
The data used to support the ndings of this study are available from the corresponding author on reasonable request.  Figure 1 A. Signi cantly ampli ed gene fragments and signi cantly deleted gene fragments in the PAAD genome; The abscissa represents the chromosome number, and the ordinate expresses the score of copy number variation of the gene on each chromosome. A score greater than 0 indicates gene ampli cation, otherwise gene deletion. Red represents signi cantly increased copy number, blue represents signi cantly decreased copy number, and gray indicates insigni cant change of copy number. The mutation locations on the chromosome were annotated at the top and the bottom of the Figure. B. CNV heatmap of pancreatic cancer genome. The abscissa represents the sample, the left ordinate represents the percentage of copy number variation, the right ordinate depicts the location of copy number deletion or ampli cation on the chromosome; red represents copy number ampli cation, blue represents copy number deletion.    A. Clustering of KEGG pathways that are signi cantly correlated with riskscore (coe cient greater than 0.5); B: Change of relationship between KEGG pathways that are signi cantly correlated with riskscore (coe cient greater than 0.5) and ssGSEA score (the abscissa represents the sample, and the risk score increases from left to right)  Analysis of protein expression of 9 genes.

Figure 12
Analysis of mRNA expression of 9 genes. A: Differential expression analysis of 9 genes in GSE62452; B Differential expression analysis of 8 genes in GSE107610.C.Differential expression analysis of 9 genes in TCGA cohort.

Figure 13
Analysis of mutations of 9 genes in the cBioportal (A) OncoPrint display of gene mutation distribution; (B) Histogram display of gene mutation distribution