14-CpG-Based Signature Improves the Prognosis Prediction of Hepatocellular Carcinoma Patients

Background Epigenetic dysregulation via alteration of DNA methylation often occurs during the development and progression of cancer, including hepatocellular carcinoma (HCC). In the past, many patterns of single-gene DNA methylation have been extensively explored in the context of HCC prognosis prediction. However, the combined model of a mixture of CpGs has rarely been evaluated. In the present study, we aimed to develop and validate a CpG-based signature model for HCC patient prognosis. Methods Data from methylation profiling of GSE73003, GSE37988, and GSE57958 from the Gene Expression Omnibus (GEO) database and 371 HCC patients from the Cancer Genome Atlas (TCGA) were downloaded. The 371 HCC patients were randomly divided into a development cohort (N = 263) and a validation cohort (N = 108). Two algorithms, least absolute shrinkage and selection operator (LASSO) and robust likelihood-based survival analysis, were used to select the most significant CpGs associated with overall survival (OS) time and were used to develop and validate a methylation-based signature (MSH) for HCC patient prognosis. In addition, the prognostic efficacy of the MSH was compared with that of AJCC TNM classification and other CpG-based MSHs from TCGA. Finally, a nomogram incorporating the MSH and clinicopathologic factors was also developed. Results Fourteen differential CpGs associated with OS were identified in HCC patients. The MSH, based on these 14 differential CpGs, could effectively divide HCC patients into two distinct subgroups with high risk or low risk of death (P < 0.0001) in the development cohort (26.35 vs 83.18 months, HR = 3.83, 95% CI: 2.56–5.90, P < 0.0001) and in the validation cohort (40.37 vs 107.03 months, HR = 2.23, 95% CI: 1.22–4.17, P=0.01). Univariate analysis showed that the MSH was significantly associated with OS, and the multivariate analysis also showed that the MSH was an independent prognostic factor for the OS of HCC patients in the two cohorts. In addition, stratified survival analysis indicated that the MSH still exhibited good prognostic value in different subgroups classified by AFP, cirrhosis, Child-Pugh A, tumor histologic grade, and AJCC stage. Moreover, time-dependent ROC analysis showed better performance of the MSH in predicting 3-year and 5-year survival of HCC patients than of AJCC stage and other CpG-based signatures from TCGA. The MSH-based nomogram also performed well in predicting 1-year, 3-year, and 5-year OS (C-index: 0.709). Conclusion The 14-CpG-based signature is significantly associated with OS and may be used as a novel prognostic biomarker for HCC patients.


Introduction
Hepatocellular carcinoma (HCC) is predicted to have become the sixth most common cancer and the fourth leading cause of cancer-related death worldwide in 2018. Each year, an estimated 841,000 patients develop HCC, and 782,000 patients die from this disease [1]. Nevertheless, the threat of HCC has not been mitigated, as evidenced by the rapidly increasing incidence of HCC and the high recurrence rate of 50% among early-stage HCC patients after surgery [2,3]. Late diagnosis and limited treatment options were suggested to account for the high mortality rate in advanced HCC patients [4]. Apart from working to find new treatment methods for this deadly disease, scientists are exploring new models for the early diagnosis and prevention of HCC to improve the prognosis of HCC.
It is well known that cancer genetics, including mutations and single-nucleotide polymorphisms (SNPs), and aberration of epigenetic regulation play important roles in the development and progression of HCC [5][6][7]. As one of the major epigenetic regulations, DNA methylation is reported to take part in the formation of many malignant tumors, including HCC [8,9]. Mechanistically, aberrant DNA hypermethylation on the promoter region of CpG islands would result in the silencing of tumor suppressor genes, thus leading to the overexpression of oncogenes [10]. DNA hypermethylation on promoter CpG islands has been observed to be associated with the clinicopathological characteristics and prognosis of HCC patients in previous studies [11][12][13]. Identifying specific abnormal methylated CpGs may be of promising value for the diagnosis, prognosis, and even treatment of HCC. e prognostic value of many single-gene DNA methylation patterns for HCC has been extensively explored. However, a combined model that includes assorted CpGs has rarely been evaluated. In the present study, we identified 14 differentially methylated CpGs related to HCC prognosis. We utilized the methylation profiling data of HCC from the Cancer Genome Atlas (TCGA) and developed a methylation-based signature for HCC (MSH) in the development cohort. Next, we validated the model in the validation cohort. Last, we compared the prognostic efficacy of MSH with that of the AJCC classification and other CpG-based MSHs from TCGA [14].

Ethics Statement.
All data in the study were obtained from online databases, including Gene Expression Omnibus (GEO) and TCGA. Informed consent was obtained from the patients before the study. e study was also approved by the Ethics Committee of the Shunde Hospital of Southern Medical University.

Methylation Data Collection and Processing from GEO
and TCGA. In our present study, DNA methylation profiles between primary HCC tumors and their nontumor counterparts from GSE73003 (including 20 paired tumor and nontumor tissues from Japan), GSE37988 (62 paired tumor and nontumor tissues from Taiwan), and GSE57958 (99 paired tumor and nontumor tissues from Singapore) were first obtained from GEO (https://www.ncbi.nlm.nih. gov/geo/). All three of these datasets were assessed on GPL8490 (Illumina Human Methylation 27 BeadChip). Next, GEO2R, an online software package, was used to identify differential CpGs. e cut-off criterion of differential CpGs was P < 0.05. To find the most significant differentially methylated markers, the top 1,000 CpGs with the lowest P values of each dataset were selected (supplementary materials 1, 2, and 3). Finally, an online tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to identify the overlapping CpGs among each of the 1,000 top CpGs of GSE73003, GSE37988, and GSE57958 (Supplementary material 4).
After identifying the most significant differential CpGs between HCC tumors and nontumor tissues, we next verified these CpGs among HCC patients from TCGA (https://cancergenome.nih.gov/). DNA methylation profiling data of 377 HCC patients were downloaded from TCGA. e methylation profiling data were assessed on the GPL13534 platform (Illumina Human Methylation 450 BeadChip), and the methylation level was presented as a β value, which was calculated as the ratio of the intensity of the methylated bead type to the combined locus intensity and ranged from 0 to 1. Subsequently, clinical characteristics including sex, age, BMI, APF, cirrhosis, Child-Pugh stage, adjacent hepatic tissue inflammation, tumor histological grade, surgical margin resection status, AJCC TNM stage, and overall survival (OS) time were also downloaded. Six of 377 HCC patients were excluded because of the absence of OS data. In total, 371 HCC patients with available methylation data and clinical parameters were included in the present study.
e clinical parameters of the HCC patients are summarized in Table 1.

Identification and Selection of HCC Prognosis-Related CpGs.
ree hundred seventy-one HCC patients were randomly divided into a development cohort (N � 263) and a training cohort (N � 108) with an allocation of 7 : 3 performed by R software. e development cohort was used to identify key HCC prognosis-related CpGs and develop the MSH. Two different algorithms, least absolute shrinkage and selection operator (LASSO) analysis [15] and robust likelihood-based survival analysis [16,17], were used to select the most significant methylation markers. Overlapping CpGs between the two selection methods were finally identified as the HCC prognosis-related CpGs.

Development and Validation of the MSH.
After the key prognosis-related CpGs were selected, we next used them to develop the MSH by multivariable Cox regression analysis. With this model, a risk score for each HCC patient was calculated. HCC patients were further classified into a high-risk group and a low-risk group based on the cut-off value of the median risk score. e OS difference between the high-risk patients and low-risk patients was analyzed by Kaplan-Meier analysis. en, the MSH was validated in the validation cohort. Univariate and multivariate Cox regression analyses were used to further assess the association of MSH with OS in the development and validation cohorts. Furthermore, stratified analysis was also performed to explore the influence of other major clinicopathologic factors (including AFP, cirrhosis, Child-Pugh stage, tumor histologic grade, and AJCC TNM stage) on the prognostic value of MSH in the total cohort by Kaplan-Meier analysis.

Establishment of a Time-Dependent Receiver Operating Characteristic (ROC) Curve and an MSH-Based Nomogram.
To further assess the predictive accuracy and sensitivity of the MSH, time-dependent ROC analysis was performed with HCC patients in the total cohort. e areas under the ROC curve (AUCs) of the MSH for predicting 1-year, 3-year, and 5-year OS were calculated and used for comparisons with other models. Moreover, to make MSH more clinically applicable, an MSH-based nomogram was also developed.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Analysis of the MSH.
To explore the biological function and pathways of the MSH, GO and KEGG analyses were conducted. First, the 50 most frequently altered genes related to these 14 genes were downloaded from cBioPortal (http://www.cbioportal.org). e biological function of these 50 genes and the 14 genes were then analyzed by GO and KEGG in the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (https:// david.ncifcrf.gov/summary.jsp). e detailed method was described in our previous study [18].

Statistical Analysis.
Statistical analysis was performed with R software (R version 3.5.1) and GraphPad Prism software (version 6). Univariate and multivariate Cox regression analyses were performed with the survival and survminer packages. e robust likelihood-based survival analysis was performed with the survivalROC and rbsurv packages, and the LASSO analysis was conducted with the glmnet and survival packages. Time-dependent ROC analysis was performed with the ROCR and rms packages. e nomogram was constructed with the rms and survival packages and was evaluated by the concordance index and calibration plots. Kaplan-Meier analysis was performed with GraphPad Prism software and was compared with the logrank test. P < 0.05 was considered statistically significant.

Basic Characteristics of the 371 HCC Patients.
e flowchart of the present study is shown in Figure 1, and the basic characteristics of the 371 HCC patients are summarized in Table 1

Construction and Validation of the MSH.
To comprehensively explore the association of these 14 selected CpGs with the prognosis of HCC patients, a MSH was built based on the coefficients weighted by multivariable Cox regression analysis in the development cohort (Table 3). e risk score was calculated as follows: risk score � (4.10 * cg00504595) . After the risk score for each patient in the development cohort was calculated, patients with a risk score >1.07 (median score) were assigned to the high-risk group (N � 132), and the other patients were assigned to the low-risk group (N � 131). e methylation levels of cg00504595, cg04711324, cg06226384, cg07014174, cg08668790, cg15747595, cg18536148, and cg24432073 in patients of the high-risk group tended to be higher than those in patients of the low-risk group, while the methylation levels of cg16673198, cg18343292, cg21578906, cg23163573, cg24898863, and cg26059632 tended to be lower in patients of the high-risk group (Figure 3(a)). Moreover, patients in the high-risk group had shorter OS time than those in the low-risk group (median survival time 26.35 vs 83.18 months,    Figure 4).

Prognostic Value of the MSH in HCC Patients.
After indicating that the MSH could be used to categorize HCC patients into high-risk (poor OS) and low-risk groups (better OS), we further evaluated the prognostic value of the MSH among HCC patients. Univariate analysis showed that the MSH was significantly associated with OS in the development cohort (HR � 4.3, 95% CI: 2.691-6.871, P < 0.0001, Table 4) and the validation cohort (HR � 1.979, 95% CI: 1.019-3.864, P � 0.044, Table 5). Moreover, multivariate analysis also showed that the MSH was an independent prognostic factor for OS in the two cohorts (development cohort: HR � 6.355, 95% CI: 2.524-16, P < 0.0001, Table 4; validation cohort, HR � 3.379, 95% CI: 1.054-10.834, P � 0.041, Table 5).

Stratified Survival Analysis Based on Major Clinicopathological Factors in the Total Cohort.
After the MSH was found to be an independent prognostic factor for the OS of HCC patients, we next performed stratified analysis to further explore the prognostic value of the MSH for patients classified by major clinicopathological factors in the total cohort. e number of patients divided into high-risk and low-risk groups and the log-rank tests are shown in Table 6.

BioMed Research International
Our results indicated that the MSH still exhibited good prognostic value in different subgroups classified by AFP, cirrhosis, Child-Pugh A, tumor histologic grade, and AJCC stage ( Figure 5), which, to some extent, suggested the greater reliability and general utility of the MSH.

Predictive Value of the MSH for the OS of HCC Patients and Comparison with Other CpG-Based Models Based on
TCGA. Time-dependent ROC cure analysis was used to assess the predictive value of the MSH in HCC patients in the total cohort, and this analysis was used to compare the MSH to other CpG-based models based on TCGA. As shown in Figure 6, the AUCs of the MSH for predicting 1-, 3-, and 5year OS were 0.643, 0.712, and 0.757, respectively, while the AUCs of AJCC stage, which is often used as prognostic model for HCC patients, were 0.657, 0.668, and 0.636, respectively, suggesting that the MSH exhibited a better efficiency in predicting 3-and 5-year OS.
Recently, a five-CpG-based prognostic signature was constructed by Fang et al. on the basis of HCC patients from TCGA [14], and the AUCs of the MSH developed by Fang et al. for predicting 1-, 3-, and 5-year OS were 0.577, 0.587, and 0.603, respectively. Compared to the MSH developed by Fang et al., our 14-CpG-based prognostic signature showed a favorable predictive value in predicting 1-, 3-, and 5-year OS. However, further investigation into external HCC cohorts is needed.

Development of an MSH-Based Nomogram for OS Prediction in HCC Patients in the Total Cohort.
To make MSH more clinically applicable, we developed an MSH-based nomogram to predict the 1-year, 3-year, and 5-year OS of HCC patients in the total cohort (Figure 7(a)). Clinicopathological factors, such as sex, age, AFP, cirrhosis, Child-Pugh stage, tumor histologic grade, AJCC stage, and surgical margin status, were included in the nomogram. e C-index for 1-year, 3-year, and 5-year OS prediction was 0.709, and the calibration plots also exhibited good consistency between the predicted OS and the actual OS (Figures 7(b)-7(d)), suggesting the good predictive value of our MSHbased nomogram.

Biological Function and Pathways of the MSH.
To explore the biological function and pathways of the MSH, GO and KEGG analyses were performed. Our results showed that the MAPK signaling pathway and neurotrophin signaling pathway were affected by these 14 genes (Figure 8(a)), and as expected, these two common pathways were all reported to play important roles in the development and progression of HCC [19][20][21], which provided evidence for the rationality and molecular thesis of the MSH. In addition, biological processes such as GO:0006468 (protein phosphorylation), GO:0018105 (peptidyl-serine phosphorylation), and GO: 0006351 (transcription, DNA-templated); molecular functions, such as GO:0004674 (protein serine/threonine kinase activity), GO:0004672 (protein kinase activity), and GO: 0005524 (ATP binding); and cellular components such as GO:0005634 (nucleus), GO:0005622 (intracellular), and GO: 0005856 (cytoskeleton) were also affected by these 14 genes (Figures 8(b)-8(d)).

Discussion
HCC is a highly malignant cancer with poor prognosis. It is still a great challenge to improve the clinical outcome of HCC patients because of the absence of effective prognostic biomarkers or models. In our present study, we aimed to develop and evaluate the prognostic value of methylationbased signatures for HCC patients. Fourteen candidate CpGs related to OS were identified in the development cohort by two distinct algorithms, including LASSO analysis and robust likelihood-based survival analysis. Unlike previous studies that used only one algorithm, we used two algorithms to help minimize the possibility of losing or missing important markers [22]. Subsequently, these 14 CpGs were used to develop an MSH in the development cohort and were validated in an internal validation cohort.
Our results showed that the MSH could effectively divide HCC patients into two distinct subgroups with high risk or low risk of death, suggesting the underlying clinical implications for the management of HCC patients. In addition, MSH was associated with OS and was also an independent prognostic factor for HCC patients. Moreover, stratified analysis also indicated good prognostic value in different subgroups classified by AFP, cirrhosis, Child-Pugh A, tumor histologic grade, and AJCC stage, which, to some extent, suggested the greater reliability and general utility of MSH. With the help of the MSH, high-risk HCC patients can be identified and can receive more intensive surveillance and even active adjuvant treatment to reduce recurrence and improve prognosis. Conversely, HCC patients with low risk may receive less active follow-up and even avoid adverse effects of adjuvant therapies. erefore, MSH may be a useful method for establishing more individualized followup interval schedules and selecting therapeutic strategies for HCC patients after surgery. e AJCC TNM stage is a well-known useful and common marker for predicting the prognosis of HCC. To further evaluate the predictive value of the MSH, we used time-dependent ROC analysis to compare the prediction efficacy between the MSH and AJCC stage. e prognostic predictive ability of the MSH was stable and good. e AUC for predicting 1-year, 3-year, and 5-year OS increased with increased prediction time (0.643, 0.712, and 0.757, respectively), suggesting the better accuracy of the MSH for longtime survival prediction, which is relatively important for patients at advanced stages. Compared to the AJCC stage (AUC � 0.657), the efficacy of the MSH in predicting 1-year OS was not inferior, but the efficacy of the MSH in predicting 3-year and 5-year OS was superior to that of the AJCC stage         Undoubtedly, our 14-CpG-based MSH was better than the five-CpG-based MSH in predicting the OS of HCC patients. Furthermore, we built an MSH-based nomogram to make the MSH more clinically applicable. e C-index and calibration plots exhibited good consistency between the predicted OS and the actual OS, which suggested the accurate prognosis prediction of the MSH-based nomogram.
Fourteen prognosis-related CpGs correspond to TNFRSF19, RIT2, CACNG5, KRTAP11-1, ZNF154, TSPYL5, CPNE4, MS4A7, TBX4, SLC5A4, SULT1C2 CDKL2, S100A8, and SPRR2A. Among all of the CpGs, ZNF154, TSPYL5, CDKL2, and S100A8 have been reported to be associated with HCC. ZNF154, TSPYL5, and CDKL2 were found to be significantly hypermethylated and downregulated in HCC tissues compared to their methylation status in nontumor liver tissues. e methylation of TSPYL5 and CDKL2 could also be used to distinguish HCC tissues from adjacent nontumor tissues [23][24][25][26][27]. Consistent with the above findings, hypermethylation of ZNF154, TSPYL5, and CDKL2 was also found in HCC patients in TCGA. We also found that the methylation levels of these genes in the high-risk patients were higher than those in the low-risk patients, suggesting that ZNF154, TSPYL5, and CDKL2 may play an antitumor role in the development and progression of HCC. In contrast to ZNF154, TSPYL5, and CDKL2, significant hypomethylation of S100A8 was found in HCC tissues when compared to the methylation status of adjacent normal tissues, suggesting shorter OS and progression-free survival (PFS). In addition, overexpression of S100A8 in Huh 7 and MHCC-97H hepatoma cells resulted in increased cell proliferation, migration, and invasion [28]. Furthermore, increased expression of S100A8 and S100A9 promotes the malignant progression of HCC by activating reactive oxygen species (ROS) dependent signaling pathways and inhibiting cell death [29]. In our study, we also found hypomethylated S100A8 in patients with HCC. e methylation level of S100A8 in patients in the high-risk group was lower than that in patients in the low-risk group, indicating the important role of S100A8 in the progression of HCC, which also partly explains why HCC patients with hypomethylation of S100A8 had a shorter OS. Despite the lack of reports about the role of the other 10 genes in HCC, future characterization of these genes may provide new insights into the development and progression of HCC and the discovery of potential novel therapeutic targets for HCC.
Despite our 14-CpG-based signatures showing good performance for the prediction of the prognosis of HCC patients, several limitations of this study should be noted. First, the prognostic value of the MSH was only validated in the internal cohort from TCGA. Other external cohorts with larger sample sizes are still needed to validate our model. Second, although we explored the potential biological functions and pathways of the MSH, more experiments should be conducted to justify the related mechanisms. Finally, as noninvasive "liquid biopsy" has received increasing attention with the potential to revolutionize the diagnosis and treatment of cancer [30], whether these 14 CpGs could be detected in the blood of HCC patients and whether the signature based on these CpGs would still have good prognostic value will need further validation.
In conclusion, we identified 14 differential CpGs that were significantly associated with the OS of HCC patients. e MSH developed by these 14 CpGs showed greater advantage in terms of stability and accuracy in prognosis prediction compared to the predictive ability of AJCC stage and other CpG-based signatures from TCGA. e MSHbased nomogram may help clinicians establish more individualized therapeutic strategies for HCC patients after surgery.

Ethical Approval
e study was approved by the Ethics Committee of the Shunde Hospital of Southern Medical University.

Consent
Informed consent was obtained from the patients before the study.