Development and validation of a novel epigenetic-related prognostic signature and candidate drugs for patients with lung adenocarcinoma

Background: Epigenetic dysregulation has been increasingly proposed as a hallmark of cancer. Here, the aim of this study is to establish an epigenetic-related signature for predicting the prognosis of lung adenocarcinoma (LUAD) patients. Results: Five epigenetic-related genes (ERGs) (ARRB1, PARP1, PKM, TFDP1, and YWHAZ) were identified as prognostic hub genes and used to establish a prognostic signature. According our risk score system, LUAD patients were stratified into high and low risk groups, and patients in the high risk group had a worse prognosis. ROC analysis indicated that the signature was precise in predicting the prognosis. A new nomogram was constructed based on the five hub genes, which can predict the OS of every LUAD patients. The calibration curves showed that the nomogram had better accuracy in prediction. Finally, candidate drugs that aimed at hub ERGs were identified, which included 47 compounds. Conclusions: Our epigenetic-related signature nomogram can effectively and reliably predict OS of LUAD patients, also we provide precise targeted chemotherapeutic drugs. Methods: The genomic data and clinical data of LUAD cohort were downloaded from the TCGA database and ERGs were obtained from the EpiFactors database. GSE31210 and GSE50081 microarray datasets were included as independent external datasets. Univariate Cox, LASSO regression, and multivariate Cox analyses were applied to construct the epigenetic-related signature.

AGING Epigenetic regulation is broadly defined as repression or activation of gene expression via DNA methylation and histone modification, without introducing changes in the DNA sequence per se [8,9]. Epigenetic dysregulation is considered as an essential hallmark in the initiation and progression of different types of cancers [10,11]. Recently, epigenetic dysregulation has been reported to promote carcinogenesis of pulmonary epithelial cells and the progression of LUAD [12]. Epigenetic genes, such as DNMT, HDAC, and PARP1 have emerged as attractive targets for the development of anticancer drugs for NSCLC [13][14][15]. Hence, epigenetic alterations are considered to be an important characteristic in NSCLC development and progression. Although, numerous studies have reported prognostic signature for LUAD [16][17][18][19], only a few researchers have explored the role of epigenetic prognostic markers in LUAD and found out potential drug candidates for targeted therapy.
In the present study, we firstly developed an epigeneticrelated prognostic signature based on TCGA dataset of 497 LUAD patients, and then validated it in both GSE31210 and GSE50081 datasets. More importantly, a new nomogram was created for predicting the overall survival (OS) of LUAD patients based on five epigeneticrelated genes (ERGs). The accurate prediction function of the nomogram was evaluated. Moreover, we further screened out candidate targeted chemotherapy drugs to the five ERGs. In conclusion, our study may provide new ideas for epigenetic-related prognostic biomarkers, thereby, highlighting the need to identify high risk patients and eventually delivering more effective targeted chemotherapeutic drugs for LUAD patients.

Identifying differentially expressed ERGs
The flow diagram of our study is illustrated in Figure  1A. Firstly, we downloaded mRNA data of 497 LUAD samples and 54 corresponding normal lung samples and corresponding clinical information from the TCGA database. Meanwhile, a total of 720 ERGs were obtained from the EpiFactors database. Then, we matched 720 ERGs with LUAD-related mRNA data, 91 differentially expressed ERGs were identified (|log FC| > 1.0, adjusted P value < 0.05), including 12 downregulated and 79 upregulated ERGs (Figure 1).

GO and KEGG enrichment analyses
GO analysis results indicated that 91 ERGs were mainly enriched in 422 Go terms, such as covalent chromatin modification, histone modification, chromatin remodeling, peptidyl-lysine modification, regulation of chromosome organization and so forth (Figure 2A, 2B).
KEGG analysis results demonstrated that 91 ERGs were mainly enriched in 4 pathways including homologous recombination, cell cycle, lysine degradation and fanconi anemia ( Figure 2C, 2D).

Identification of survival-related differentially expressed ERGs
By using univariate Cox regression analysis, we identified 16 ERGs that were significantly associated with prognosis in patients with LUAD (Table 1) (P<0.05; Figure 3A). Of the 16 genes, 14 were identified as risk factors and two were identified as protective factors. UHRF1, HJURP, BUB1, PBK, AURKA, ACTL6A, ASF1B, PRKDC, CDK1, PARP1, UBE2T, YWHAZ, PKM, and TFDP1 were identified as risk factors (P<0.05; HRs, 1.0012-1.0740). Whereas ARRB1 and CBX7 were considered as protective factors (P<0.05; HRs, 0.9512 and 0.8885, respectively). The expression of 16 ERGs in LUAD and normal lung samples were presented in Figure 3B, and the correlation between these ERGs could be seen in Figure  3C. Due to the important prognostic value of the candidate genes, their genetic alterations were analyzed. As shown in Figure 3D, missense mutation is the most common type of mutation, and there are 10 genes with mutation rate ≥3%, among which PRKDC mutation is the most frequent (12%).

Development of epigenetic-related signature
To eliminate highly correlated ERGs and prevent overfitting of the signature, the Lasso regression analysis was performed ( Figure 4A, 4B). Finally, five ERGs were confirmed by multivariate Cox regression analysis and applied to develop an epigenetic-related prognostic signature (Table 1). The signature was established to assess the prognosis of each LUAD patient as follows: Risk score = (-0.03706×expression level of ARRB1) + (0.015474×expression level of PARP1) + (0.004509 ×expression level of YWHAZ) + (0.001132×expression level of TFDP1) + (0.004303×expression level of PKM).
According to above formula, the risk scores of 426 LUAD patients were calculated, and patients were clustered into high and low risk groups by median risk score. The risk score ( Figure 5A) and survival status ( Figure 5B) of 426 LUAD patients are presented. As shown in Figure 5C, patients in the high risk group had significantly poorer OS compared to those in the low risk group. The prognostic signature exhibited good predictive performance and accuracy for predicting the 1-, 3-and 5-year OS ( Figure 5D).

Relationships between risk score and clinicopathological factors
Furthermore, the expression of the five crucial genes and clinicopathological factors in high and low risk groups are presented in Figure 6A. The results indicated that with reduction of the risk score, ARRB1 was gradually increased, while YWHAZ, PKM, PARP1, and TFDP1 were gradually decreased. Significant differences were found for the pathologic T stage between the high and low risk groups. Moreover, a higher risk score was associated with higher pathological stage, N-stage and some histological growth patterns, including adenocarcinoma, adenocarcinoma with mixed subtypes, bronchioloalveolar carcinoma, non-mucinous, and papillary adenocarcinoma ( Figure 6B).
Next, Univariate Cox analysis data demonstrated that pathologic stage, T stage, N stage and the risk score were all associated with OS ( Figure 6C). Multivariate analysis data demonstrated that stage, M and the risk score (P<0.001) could be used as an independent prognostic factors ( Figure 6D). Moreover, YWHAZ is significantly correlated with pathological stage, PKM is significantly correlated with pathological stage and N stage (P < 0.05, Table 2). Then the relationship between the risk score and clinicopathological factors was assessed. The results indicated that patients with high risk scores were    AGING positively correlated with advanced tumor stage, higher T stage, and higher N stage (P < 0.05, Table 2).

Validating the performance of the epigenetic-related signature
Both GSE31210 and GSE50081 datasets, including 174 and 127 LUAD samples, respectively, were used for validation. Consistent with our above results, the Kaplan-Meier curve showed that patients in the high risk group had a worse prognosis ( Figures 7D, 8D). The risk score  Figure 7E) and 0.711, 0.691, 0.735 (GSE50081, Figure 8E), respectively. A nomogram for predicting OS was established based on the five ERGs ( Figure 9A). Moreover, the 1, 3, and 5-year OS predicted by our nomogram were remarkable consistent with the actual observed survival ( Figure 9B-9D). AGING Table 2. Relationship between the expression of epigenetic-related genes and clinical characteristics.

Exploring the expression and prognostic of crucial ERGs
The mRNA expression of the five hub genes in LUAD were analyzed by using the TCGA database. We confirmed that the mRNA expression of PARP1, PKM, TFDP1, and YWHAZ in LUAD tissues were all increased, while ARRB1 was decreased ( Figure 10). Moreover, we explored the protein expression of the five hub genes. The results indicated that the protein of AGING PARP1, PKM, TFDP1, and YWHAZ were increased in LUAD tissues, which were in line with their mRNA expression levels ( Figure 11). Furthermore, we found that the expression of the three high risk genes (PKM, TFDP1, and YWHAZ) were negatively associated with the prognosis in LUAD, while low risk gene ARRB1 was positive correlation with the prognosis by using the GEPIA database ( Figure 12).

DISCUSSION
Lung adenocarcinoma, a molecularly complex and heterogeneous disease, remains the most common causes of cancer-associated deaths worldwide [1,20]. Recently, the development of drugs targeting on EGFR [5] and ALK [7] has promoted the treatment of some patients with LUAD, however, the highly gene mutation heterogeneous of LUAD has limited the benefits to a small number of patients. Notably, there is growing evidence that epigenetic modifications are frequently AGING reversible, and may serve as attractive targets for lung cancer therapy [21,22]. Therefore, it is particularly important to identify suitable epigenetic markers for LUAD diagnosis and prognosis, that can provide valuable support in decision making when considering treatment options.
In this study, we developed a novel and meaningful epigenetic-related prognostic signature (ARRB1, YWHAZ, PKM, PARP1, and TFDP1) for LUAD patients and validated them in two independent datasets from GEO. According our risk score system, LUAD patients were stratified into high and low risk groups, and patients in the high risk group had a worse prognosis. Then, we confirmed that each of the five signatures could be an independent prognostic factor. A new nomogram was constructed based on the five hub genes, which can predict the OS of every LUAD patients. Interestingly, the expression of three high risk genes (PKM, TFDP1, and YWHAZ) had an enormous implication in the prognosis of LUAD patients. Finally, we identified 47 compounds that could serve as candidate targeted chemotherapeutic drugs to the five ERGs for LUAD patients. These drugs included the topoisomerase inhibitor, CDK inhibitor, aurora kinase inhibitor, PARP inhibitor, and so on.
The epigenetic factors included in our signature have been previously demonstrated to be closely related to the development and progression of lung cancer. ARRB1 (also known as β-arrestin-1), a multifunctional adaptor, was initially discovered to promote internalization and desensitization of G protein-coupled receptors (GPCRs) [23,24]. A study by Pillai et al. revealed that ARRB1 AGING promoted the expression of mesenchymal genes through mediation of the E2F1 transcription factor in non-small cell lung carcinoma cell lines (NSCLC) [25]. Likewise, Shen et al. reported that ARRB1 could enhance the chemo-sensitivity of lung cancer through the mediation of DNA damage response [26]. Our results suggested that ARRB1 may be a putative tumor suppressor gene in LUAD. The M2 isoform of pyruvate kinase (PKM2) (also named PKM, PK3, THBP1), an essential enzyme involved in glycolysis, is known to mediate the conversion of glucose to lactate in cancer cells under normoxic conditions [27,28]. Jing Li reported that phosphorylation of PKM2 and inactivation of STAT3 inhibited lung cancer cell proliferation [29]. Inhibitors of   AGING PKM2 could moderately decelerate tumor cell proliferation [30,31]. This finding might be in agreement with our current results, that PKM2 is a high risk gene in the context of LUAD. PARP1 has been reported as an abundant chromatin-associated enzyme able to catalyze the transfer of ADP-ribose units from NAD to substrate proteins [32]. Alternatively, PARP1 might play a important role in the development of various cancers, including cell proliferation, apoptosis, DNA repair, and so forth [33]. Recently, the use of immune checkpoint inhibitors has shown dramatic effect in the prognosis of NSCLC. Sophie Postel-Vinay reported that PARP inhibition enhanced the intrinsic immunity of tumor cells in NSCLC with deficiency in excision repair cross-complementing group 1, a gene which has crucial role in the nucleotide excision repair [34]. They also suggested that PARP inhibition, which did not cause severe bone marrow toxicity, might represent an interesting alternative or complement to platinum-based chemotherapy in combination with anti-PD-(L)1 agents in NSCLC. Transcription factor Dp-1 (TFDP1) is a key player of cell cycle regulation, it is a predominant protein that binds to E2F, another vital transcription protein that participates in the control of cell cycle and action of tumor suppressor proteins. TFDP1 can be candidate master regulators contributing to follicular lymphoma progression [35]. Knockdown of TFDP1 reduced both PITX1 promoter activity and mRNA transcription which caused patients suffering of knee/hip osteoarthritis [36]. Wang et al. revealed that TFDP1/E2F1 transcriptional activity played an important role in NSCLC [37]. YWHAZ (also named 14-3-3ζ), acts as a major hub protein for many signal transduction pathways [38,39]. YWHAZ was frequently shown to be upregulated in several types of cancers, and its overexpression was often correlated with unfavorable prognosis of cancer patients [40][41][42][43]. Ma et al. found that YWHAZ was a credible prognostic biomarker, and might be a therapeutic target in NSCLC [44].
Overall, in concordance with our findings, there is mounting evidence in the literature that tells about the important role of the 5 hub genes (described above) in relation to NSCLC, which further supports the use of these hub genes as prognostic genes for LUAD patients. In recent years, a great deal of knowledge has been accumulated to help identify prognostic signature of different types of cancers, one of which is NSCLC. In our study, we not only identified the five powerful ERGs that are useful as prognostic signature, also provided potential drugs for targeted therapy. Interestingly, PARP inhibitors are currently regarded as a novel class of small molecule therapeutics for lung cancer. Henning Willers and colleagues found that PARP inhibitor by controlling ROS levels upon EGFR TKI treatment, promoted the sensitivity of EGFR-mutated lung cancer to tyrosine kinase inhibitor treatment [45]. We hope our study will provide more choice for those patients not having EGFR mutation.
In the present study, there are some limitations that require mentioning. First, the three datasets are all retrospective and, hence, extrapolation based on these results may be difficult. To that end, the findings of this work should be validated, and further exploration using a larger multicenter prospective observational trial is desirable. Second, our findings have to be enriched by conducting long term in vivo experiments and additional in vitro experiments to investigate the functional role of the epigenetic factors associated with LUAD.
In summary, five ERGs (ARRB1, PARP1, PKM, TFDP1, and YWHAZ) are promising biomarkers for the diagnosis and prognosis of LUAD, which could provide valuable references to identify whether LUAD patients are at high risk. In addition, our findings may provide precise targeted chemotherapeutic drugs.

Study subjects
The RNA-seq data of 497 LUAD and 54 normal lung samples were downloaded from The Cancer Genome Atlas (TCGA) database, and the RNA-seq data of 174 LUAD and 127 LUAD samples were downloaded from GSE31210 and GSE50081 dataset, respectively. A total of 720 epigenetic-related genes (ERGs) were retrieved from the EpiFactors database (http://epifactors.autosome.ru/).

Identification and enrichment analysis of differentially expressed ERGs
Differentially expressed ERGs were obtained by using the Limma package in R software [46]. To further investigate the biological relevance of these genes, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted utilizing R "GOplot" package.

Development of the epigenetic-related signature for patients with LUAD
We collected clinical information of LUAD patients who were followed for less than 2000 days in the TCGA database. Univariate Cox regression analysis was performed by R "survival" package to identify ERGs markedly related to OS. Genetic alterations in the candidate genes were obtained from cBioPortal (https://www.cbioportal.org/). Then, a novel epigeneticrelated prognostic signature was developed by Lasso and multivariate Cox regression analyses [47,48]. The AGING risk scores of LUAD patients were calculated according to the formula: the signature risk score = Ʃ (βi × Expi), where βi, the coefficients, represented the weight of the respective signature and Expi represented the prognostic factors expression value as previously described [49]. According to the signature with identified prognostic factors, a nomogram for predicting the probability of OS was established.

Validating the performance of the epigenetic-related signature
To validate the performance of the signature, the GSE31210 and GSE50081 datasets were considered as the validation cohort. The risk scores for LUAD patients were calculated by using the formula. Survival and ROC curve analyses were performed as described above.

Exploring the expression and prognostic of crucial ERGs
To explore the prognostic value and the expression of these ERGs in LUAD, survival analysis was conducted on GEPIA database (http://gepia.cancer-pku.cn), the expression of these ERGs were confirmed on Human Protein Atlas (HPA) online database (http://www.proteinatlas.org/) [50].

Predicting candidate small molecules for LUAD patients
To predict potential drugs for LUAD patients, we utilized the Broad Institute's Connectivity Map (CMap) to screen candidate molecule drugs as previously described [51,49].

Statistical analysis
In this study, all statistical analyses were conducted using Perl software (version 5.28.1) and R platform (version 3.6.1). The comparison of two paired groups was performed using the Wilcoxon rank-sum test. P value <0.05 was considered as statistically significant.

AUTHOR CONTRIBUTIONS
Shenghui Qin conceived and directed the project, and wrote the manuscript. Zhihao Wang performed data bioinformatics analyses. Kidane Siele Embaye et al. helped with part of English writing and checking. All authors read and approved the manuscript.