Bioinformatic Analysis of Prognostic and Immune-Related Genes in Pancreatic Cancer

Pancreatic cancer (PC) is a malignant tumor with poor prognosis. The poor effect of surgery and chemotherapy makes the research of immunotherapy target molecules significant. Therefore, identifying the new molecular targets of PC is important for patients. In our study, we systematically analyzed molecular correlates of pancreatic cancer by bioinformatic analysis. We characterized differentially expressed analysis based on the TCGA pancreatic cancer dataset. Then, univariate Cox regression was employed to screen out overall survival- (OS-) related DEGs. Based on these genes, we established a risk signature by the multivariate Cox regression model. The ICGC cohort and GSE62452 cohort were used to validate the reliability of the risk signature. The impact of T lymphocyte-related genes from risk signature was confirmed in PC. Here, we observed the correlation between the T lymphocyte-related genes and the expression level of targeted therapy. We established a five-mRNA (LY6D, ANLN, ZNF488, MYEOV, and SCN11A) prognostic risk signature. Next, we identified ANLN and MYEOV that were associated with T lymphocyte infiltrations (P < 0.05). High ANLN and MYEOV expression levels had a poorer prognosis in decreased T lymphocyte subgroup in PC. Correlation analysis between ANLN and MYEOV and immunomodulators showed that ANLN and MYEOV may have potential value in pancreatic cancer immunotherapy.


Introduction
Pancreatic cancer is a malignant tumor with fewer than 7% of patients surviving the past 5 years [1]. It has one of the worst outcomes among all cancers with a median survival of approximately 6 months [2]. Pancreatic cancer is forecast to be the second most common cancer in all malignant cancers by 2030 [3]. The high mortality is due to extensive metastasis in the early stages and resistance to therapy. Common chemotherapy options for pancreatic ductal adenocarci-noma (PDAC) such as 5-fluorouracil (5FU), nab-paclitaxel, oxaliplatin, or combination therapy like FOLFIRINOX may lead to tumor resistance [4]. With the prevalence of chemotherapy resistance, immunotherapy may be an emerging treatment of pancreatic cancer. An important component of immunotherapy is cytotoxic T lymphocytes, which can kill cancer cells through antigen-antibody binding, currently known as immunotherapy such as immunomodulators IL-2 or chimeric antigen receptor (CAR) T cell therapy [5,6]. However, the effect of recent immune therapy trials was not  3 Computational and Mathematical Methods in Medicine ideal such as checkpoint blockade or engineered T cells, because of the low degree of T cell infiltration in pancreatic cancer [7]. Therefore, our research aims to find molecules, related to clinical significance, T lymphocyte infiltrations, and immune cells of pancreatic cancer patients and explore mechanisms further to serve as potential immunotherapy targets.

Construction and Validation of Prognostic Risk Signature.
The "limma" package in R software (version 4.0.2) was used to read, normalize, and explore the datasets to identify differentially expressed genes (DEGs) [12]. Univariate Cox regression was performed to screen DEGs significantly associated with overall survival (OS) in the TCGA pancreatic cancer dataset. The Least Absolute Shrinkage and Selection Operator (LASSO) regression was used to reduce data size and select the optimal mRNAs [13]. Multivariable Cox regression was conducted to establish the prognostic risk signature based on the results of LASSO regression. The following formula was used to calculate risk score of each patient: risk score = ∑Ex × C, where C is the coefficient, and Ex is the relative expression level of each DEG. The median risk score was used as our cutoff value to divide the PC patients into high-risk and low-risk groups. In addition, Kaplan-Meier survival plot was utilized to analyze the overall survival (OS) difference between the two groups. Second, we evaluate the performance of our model based on ICGC cohort and GSE62452 cohort.
Univariate and multivariate Cox regression analyses were conducted to whether risk score is an independent risk factor of OS in PC patients. Covariates included age, gender, race, stage, T, N, grade, and risk score. The nomogram was formulated to provide visualized risk prediction. The calibration was generated to assess the consistency between actual and predicted survival.

Exploration of Gene-T Lymphocyte Infiltrations
Relationships. We calculated the correlation between the gene expression level and GZMB/CD8A ratio. Genes were considered to be related to the T lymphocyte infiltrations when P < 0:05 and further discussed relationships with immune cell lines. We obtained gene sets of immune cells from a previous study [14] (Table S1). We computed GSVA (gene set variation analysis) scores by R − package "Genome Set Variance Analysis (GSVA)." The top and bottom 25% of GSVA score patients were divided in high and low groups, respectively [15]. By comparing mRNA levels of selected genes in the two groups, we explored the gene-T lymphocyte infiltrations relationships.  Computational and Mathematical Methods in Medicine (http://kmplot.com/analysis/) can analyze the correlation between selected genes and specific tumor prognosis from multiple dimensions [16]. Sources for the databases include Gene Expression Omnibus (GEO), European Genomephenome Archive (EGA), and The Cancer Genome Atlas (TCGA). Therefore, we used this database to analyze the impact of T lymphocyte cells on survival rates in pancreatic cancer. The hazard ratios (HRs) with 95% confidence intervals (CI) and log-rank P values were also calculated.
2.5. Gene Set Enrichment Analysis (GSEA). In this study, we performed single-gene GSEA to explore the potential roles of selected genes included in our risk signature in PC. GSEA generated an initial list on the classification of the genes according to their correlation with each selected gene expres-sion by using the Pearson method. GSEA was performed using GSEA3.0 (http://www.broad.mit.edu/gsea/) [17]. The phenotype label that we put forth was the expression level of the selected gene. The nominal P < 0:05 and the FDR < 0:25 were considered statistically significant.

Exploration of the Association between Genes and
Immunomodulators. The TISIDB database (http://cis.hku.hk/ TISIDB) enables an investigation of the correlations for selected genes with immunomodulators and chemokines [18]. In this study, we used the TISIDB database to investigate correlations with selected genes and immunomodulators.   Computational and Mathematical Methods in Medicine was used to draw the volcano plot and heatmaps. The chisquare test was used to evaluate the differences of clinicopathological parameters between the high-risk and low-risk groups. Survival curves were generated by the Kaplan-Meier method. Univariate, LASSO, and multivariate regression analyses were performed to explore the prognostic risk model. Time-dependent ROC analysis was used to evaluate the accuracy of the models. GSVA was performed to compute the immune cell type scores. The results were considered to be statistically significant for P < 0:05.

Differentially Expressed Genes in Cancer Tissues and
Normal Tissues. The detailed workflow of our study is shown as a chart ( Figure S1). RNA-seq from 186 tumor tissue samples and 36 nontumor samples was downloaded from TCGA. Volcano plot was used to visualize the altered mRNA expression pattern of TCGA PC cohort. A total of 234 DEmRNAs were identified across all the datasets consisting of 168 upregulated and 66 downregulated DEmRNAs (Figure 1(a)). Among them, MUC2 and STAB2 were the most upregulated and downregulated mRNAs in Table S2. Heatmap was used to visualize the expression levels of the significantly differentially expressed mRNAs. The blue color was assigned to the relatively low expression, while the red color represented the relatively high expression (Figure 1(b)).

The Construction and Validation of Prognostic Signature
Based on the PC Cohorts. The mRNAs that were significantly associated with OS were identified by univariate analysis based on TCGA cohort (Table S3). The LASSO regression identified 8 optimal mRNAs including LY6D, FAM83A, ANLN, LAMA3, ZNF488, MYEOV, PLAAT2, and SCN11A ( Figure S2). Then, the above eight mRNAs were further subjected to multivariate Cox regression analysis. The multivariate analysis showed that LY6D, ANLN, ZNF488, MYEOV, and SCN11A were the independent prognostic mRNAs for PC. The risk score for each patient was calculated with the following formula: ð0:1041× LY6DÞ+ ð0:2339 × ANLNÞ+ ð0:1388 × ZNF488Þ + ð0:1260 × MYEOVÞ + ð−0:2197 × SCN11AÞ. Based on the median value of the risk scores, 89 and 88 patients were classified into the high-and low-risk groups, respectively. The OS was significantly shorter in the high-risk group than in the low-risk group (P = 1:013e − 06) (Figure 2 Figure 2(d). The expression pattern of these five prognostic mRNAs between high-risk and low-risk groups is shown in Figure 2(e).
To confirm that the prognostic signature that had similar predictive values in different populations, we then used it to predict OS in two independent external validation cohorts using the median risk score as the cutoff.
A total of 143 patients in the ICGC cohort (validation cohort -1) were classified into a low-risk group (n = 72) and a high-risk group (n = 71), and the OS of the PC patients in   Computational and Mathematical Methods in Medicine the high-risk group was significantly lower than that of the patients in the low-risk group (P = 3:043e − 02; Figure 3(a)). The prognostic signature constructed with the ICGC cohort also showed a favorable predictive ability for the 1-, 2-, and 3-year OS rates, with AUC values of 0.617, 0.641, and 0.659, respectively ( Figure 3(c)). In addition, as shown in Figure 3(b), a total of 66 patients in the GEO cohort GSE62452 (validation cohort-2) were classified into a lowrisk group (n = 33) and a high-risk group (n = 33), and the OS of the PC patients in the high-risk group was significantly lower than that of patients in the low-risk group (P = 7:629e − 03). The result generated by the GSE62452 cohort also showed a favorable predictive ability for the 1-, 2-, and 3-year OS rates, with AUC values of 0.593.0.722, and 0.833, respectively (Figure 3(d)).
Furthermore, we analyzed the expression levels of LY6D, ANLN, ZNF488, MYEOV, and SCN11A in PC tissues. We found that these five genes are highly expressed in tumor tissues ( Figure 4).

The Prognostic Signature Is an Independent Prognostic Factor for Pancreatic Patients by Cox Regression Analyses.
To determine whether the prognostic signature for OS is an independent prognostic factor for PC patients, we performed Cox regression analysis. Univariate Cox regression analysis showed that stage, grade, T stage, N stage, and risk score were significantly associated with OS in PC patients (Figure 7(a)). Multivariate Cox regression analysis showed that risk score was an independent factor influencing PC prognosis (Figure 7(b)).

Nomogram Model Construction and Prediction.
To establish a clinically applicable method for predicting the prognosis of PC patients, we established a prognostic nomogram to predict the survival probability at 1, 2, and 3 years based on the TCGA cohort.
As shown in Figure 7(a), the risk signature and other clinicopathological parameters such as age, gender, race, tumor grade, stage, T stage, and N stage were included in the nomogram model to predict the prognosis of PC. A nomogram-based score for each patient was obtained according to the risk score and clinical parameters on the point scale. The 1-year OS, 2-year OS, or 3-year OS of each PC patient could be predicted by calculating the total nomogram score. The calibration curves showed that the nomogram model we built up exhibited good performance for predicting the 1-year OS of PC (Figure 8(b)).

The Impact of T Lymphocyte-Related Genes in Pancreatic
Cancer. We have confirmed that the expressions of ANLN    Computational and Mathematical Methods in Medicine (P = 5:98e − 06, correlation score = 0:33108779) and MYEOV (P = 0:023, correlation score = 0:16896960) were associated with the GZMB/CD8A ratio ( Figure 9). ANLN was observed to be related to 8 immune cell lines, and MYEOV was related to 9 immune cell lines (Figures 10 and 11).
To elucidate the underlying mechanisms by which the ANLN and MYEOV were associated with different T cell immune infiltrations, we analyzed the effects of somatic cell copy number alternations (CNAs) of the ANLN and MYEOV on T immune cell infiltration. The CNAs of the identified ANLN and MYEOV, including arm-level deletion and arm-level gain, significantly affected the infiltration level of CD4+ T cells (Figure 12).
The results showed that the expression of ANLN of PC in decreased CD4+ memory T cells cohort had poorer OS and RFS, respectively (OS, log rank P = 3:4e − 05; RFS, log rank P = 1:5e − 05) (Figures 13(a) and 13(e)), But there was no significant correlation between the high ANLN and the prognosis of OS or RFS in the enriched CD4+ memory T cells (OS, log rank P = 0:32; RFS, log rank P = 0:077), respectively (Figures 13(b) and 13(f)). For the high MYEOV, there was only no significant correlation between the high MYEOV and the prognosis of RFS in the enriched CD4+ memory T cells (RFS, log rank P = 0:2) (Figure 13(h)).
3.6. Underlying Mechanisms of the ANLN and MYEOV in PC. Interestingly, we found that the expression levels of ANLN and MYEOV have a significant positive correlation ( Figure S4). Then, we analyzed the expression of the proteins encoded by the two genes using clinical specimens from the Human Protein Profiles. ANLN and MYEOV were moderately positive in the PC tissue relative to their expression levels in the normal tissue ( Figure S5). In addition, the single-gene GSEA results show that the two genes have many of the same significantly enriched KEGG pathways ( Figure 14).   (Figure 15(b)). Figure 15(c) shows correlations between ANLN expression and immunostimulators, and     (Figure 15(f)). The greatest correlations between immunomodulators and MYEOV were displayed in Figure S6. Therefore, ANLN and MYEOV may be involved regulating the above immune molecules.

Discussion
The high fatality rate of pancreatic cancer is inextricably related to its own immunosuppressive microenvironment and the obvious reduction of T cell infiltration rate in the tumor [7]. Researchers have discovered that antitumor immunotherapy may be a breakthrough in tumor therapy by targeting to enhance the host's own immunity to tumors.  C10orf54   ACC  BLCA  BRCA  CESC  CHOL  COAD  ESCA  GBM  HNSC  KICH  KIRC  KIRP  LGG  LIHC  LUAD  LUSC  MESO  OV  PAAD  PCPG  PRAD  READ  SARC  SKCM  STAD  TGCT  THCA  UCEC  UCS  UVM   CD27  CD276  CD28  CD40  CD40LG  CD48  CD70  CD80  CD86  CXCL12  CXCR4  ENTPD1  HHLA2  ICOS  ICOSLG  IL2RA  IL6  IL6R  KLRC1  KLRK1  LTA  MICB  NT5E  PVR  RAET1E  TMEM173  TMIGD2  TNFRSF13B  TNFRSF13C  TNFRSF14  TNFRSF17  TNFRSF18  TNFRSF25  TNFRSF4  TNFRSF8  TNFRSF9  TNFSF13  TNFSF13B  TNFSF14  TNFSF15  TNFSF18  TNFSF4  TNFSF9   Computational and Mathematical Methods in Medicine T cells are important components of immunotherapy against tumors. When T cells encounter specific tumor antigens, the single chain variable fragments (scFv) in the antigen recognition region will bind to tumor antigens and then directly activate T cells and stimulate the secretion of cytokines, which can attack and kill tumor cells. Chimeric antigen receptor-modified T cell (CAR-T) therapy is based on this mechanism and has been achieved good results [19]. T cells are the key effectors of the tumor immune response. Tumors that grow in an immunodeficient environment have strong immunogenicity, and solid tumors lacking T cell infiltration usually have poor prognostic significance [20]. Therefore, enhancing the recognition and killing of tumor cells by T cells is a breakthrough in immunotherapy. However, the antigens found in solid tumors so far are all tumor-associated antigens. Such antigens will also be expressed in other normal tissues, which will put a great risk in targeted therapy, called "off-target effect" [21]. Therefore, it is particularly important to choose suitable tumor-associated antigens while continuously exploring the specific antigen of pancreatic cancer. At present, the effective molecules confirmed by researchers are B7-H3, HER2, and so on [22,23]. In this paper, machine learning is used to screen specific molecules that are highly different from ordinary tissues and to explore new molecules that are significantly related to patient clinical characteristics, prognosis, and T lymphocyte infiltration to find ideal immunotherapy targets. Our results consistently demonstrate that the five-mRNA risk signature is very robust for predicting clinical outcome of PC. And our risk signature is validated with two patient cohorts from different sources, which strongly demonstrates its robustness for predicting prognosis of PC. In addition, we have constructed a nomogram model, which is built up based on the five-mRNA risk signature. The proprotein encoded by GZMB (granzyme B) is secreted by natural killer (NK) cells and cytotoxic T lymphocytes (CTL) [24]. CD8A encodes a glycoprotein on the surface of most CTL [25]. The GZMB/CD8A ratio can reflect the degree of immune cytotoxicity and cytotoxic T lymphocyte infiltration, which can be used to predict the response of tumor genes to immune cells [26]. Therefore, GZMB/CD8A ratio-related genes are promising new targets for immunotherapy. We have confirmed that the expressions of ANLN and MYEOV were positively correlated with GZMB/CD8A ratio, which revealed that high expressions of those genes with high T lymphocyte cell infiltration in PC. And the high expressions of ANLN and MYEOV were also related to the poorer prognoses of these patients; so, we speculated that immunotherapy may be more effective in such patients and can significantly improve the prognosis.
Our study also revealed that the CNAs of ANLN and MYEOV significantly affected the CD4+ T cell infiltration level in PC by deleting and gaining aim level, providing insight into the TIME. Therefore, we further analyzed the effects of ANLN and MYEOV level on the OS and RFS of the CD4 + T cell low infiltration group and high infiltration group, respectively. The degree of CD4 + T cell infiltration affected the RFS and OS outcomes with different ANLN expression levels and the RFS outcome with different MYEOV expression levels in PC patients, suggesting that enriched CD4+ T cell infiltration could improve patient prognosis (we have confirmed that the high expression of these two genes leading poor prognosis previously). In addition, we found that the expression levels of the two genes were significantly positively correlated. GSEA analysis revealed that the molecular pathways of ANLN and MYEOV were similar. Therefore, we speculated that these two genes may have parallel effect on the progression of pancreatic tumors.
It is necessary to point out some limitations of current research. First, the prognostic power of the five-mRNA signature was evaluated in only two external datasets. Large scale independent research is necessary to further verify the validity of this signature. Second, our own independent cohort does not provide information on other clinical features such as chemoresistance, radioresistance, and intratumoral heterogeneity. Therefore, we cannot analyze the correlation between this five-mRNA signature and the above clinical features. Further research and more detailed clinical data are needed to explore. Similarly, for ANLN and MYEOV, we need more external data and experiments to prove that they are related to the prognosis and immune microenvironment of pancreatic cancer.

Conclusions
We established a five-gene signature for the prognosis of PC by the public databases. Then, we screened ANLN and MYEOV related to prognosis and the immune microenvironment in pancreatic cancer. ANLN and MYEOV are involved in the progress of pancreatic cancer and are expected to become new markers and therapeutic targets in the future. 21 Computational and Mathematical Methods in Medicine