A novel prognostic signature based on four glycolysis‐related genes predicts survival and clinical risk of hepatocellular carcinoma

Abstract Background Hepatocellular carcinoma (HCC) is the most common cancer with limited cure and poor survival. In our study, a bioinformatic analysis was conducted to investigate the role of glycolysis in the pathogenesis and progression of HCC. Methods Single‐sample gene set enrichment analysis (ssGESA) was used to calculate enrichment scores for each sample in TCGA‐LIHC and GEO14520 according to the glycolysis gene set. Weighted gene co‐expression network analysis identified a gene module closely related to glycolysis, and their function was investigated. Prognostic biomarkers were screened from these genes. Cox proportional hazard model and least absolute shrinkage and selection operator regression were used to construct the prognostic signature. Kaplan–Meier (KM) and receiver operating characteristic (ROC) curve analyses evaluated the prediction performance of the prognostic signature in TCGA‐LIHC and ICGC‐LIRI‐JP. Combination analysis data of clinical features and prognostic signature constructed a nomogram. Area under ROC curves and decision curve analysis were used to compare the nomogram and its components. Results The glycolysis pathway was upregulated in HCC and was unfavorable for survival. The determined gene module was mainly enriched in cell proliferation. A prognostic signature (CDCA8, RAB5IF, SAP30, and UCK2) was developed and validated. KM and ROC curves showed a considerable predictive effect. The risk score derived from the signature was an independent prognostic factor. The nomogram increased prediction efficiency by combining risk signature and TNM stage and performed better than component factors in net benefit. Conclusion The gene signature may contribute to individual risk estimation, survival prognosis, and clinical management.


| INTRODUC TI ON
Liver cancer is the sixth most common cancer and the fourth leading cause of cancer-related deaths worldwide. Hepatocellular carcinoma (HCC) accounts for 75%-85% of these cases. 1 Cancer prevention and treatment advances have reduced the overall incidence and mortality rates in recent years. However, liver cancer is gradually increasing and has aggravated the global disease burden. 2 The main risk factors for HCC include chronic hepatitis B or hepatitis C virus infection, aflatoxin exposure, alcohol abuse, smoking, obesity, and others. 3 Patients with HCC are usually diagnosed at intermediate or advanced stages due to occult onset and atypical symptoms.
Curative treatment methods, including surgical resection, liver transplantation, and locoregional ablation, are unable to achieve the desired effect. 4,5 Radiotherapy, chemotherapy, targeted therapy, and immunotherapy are integrated into the comprehensive treatment of HCC, which improves the survival prognosis of patients with intermediate and advanced HCC. 6,7 However, tumor heterogeneity may result in various therapeutic responses and survival outcomes. Even though the histological grade and tumor stage of HCC are the same, biological behavior differences caused by molecular and genetic diversity affect the therapeutic effect. 8,9 Increasing evidence indicates that energy metabolism reprogramming is important in the initiation and progression of HCC; especially, the shift of aerobic glycolysis from oxidative phosphorylation, which is first found in HCC, is a hallmark of liver cancer and is responsible for the regulation of proliferation, immune evasion, invasion metastasis, angiogenesis, and drug resistance in HCC. [10][11][12] Aerobic glycolysis can rapidly produce sufficient adenosine triphosphate to meet the energy requirements of tumor growth and proliferation under hypoxic conditions. The metabolic intermediates of glycolysis can serve as raw materials for the synthesis of proteins, lipids, and nucleic acids. The production and accumulation of lactate and hydrogen ions during glycolysis lead to the acidification of the extracellular microenvironment, which can induce apoptosis of peripheral normal cells and produce an immunosuppressive effect on effector immune cells, prompting the invasion and metastasis of tumor cells. 13 Glycolysis is regulated by glycolytic enzymes and by transcription factors that include c-MYC, hypoxia inducible factor-1 (HIF-1), and p53, which are related to the expression and alteration of multiple oncogenes. 14 Consequently, antiglycolytic therapy might inhibit tumor proliferation and kill tumor cells. Glycolytic rate-limiting enzymes and HIFs are ideal targets for HCC. 15 Key glycolytic enzymes, such as hexokinase 2 (HK2), phosphofructokinase (PFK), and pyruvate kinase 2 (PKM2), are reported tumor markers. 16,17 Their expression and activity can affect the glycolysis of tumors, in turn affecting the proliferation of tumors.
Moreover, inhibition of glycolysis can increase the sensitivity of advanced HCC to sorafenib in cell and animal models. 18 In addition, 18F-fluorodeoxyglucose positron emission tomography/computed tomography (18FDG-PET/CT) enables the location, diagnosis, and follow-up of HCC based on glycolysis. 19 The risk signature for the prognosis of HCC involving glycolysisrelated genes has been investigated previously. [20][21][22] Nevertheless, the function of genes that might interact with glycolysis pathway in HCC has not been explored. Here, we analyzed the role of the glycolysis pathway in HCC tumorigenesis, screened the genes that might interact with glycolysis pathway, and assessed their important prognostic value. The identified genes enabled the construction of a highly efficient prognostic model to predict the survival and clinical risk of HCC. In addition, a nomogram that combined gene signatures and clinical characteristics displayed a higher prediction performance. The results reveal some novel promising targets for HCC that appear to be very important for prognosis evaluation, risk stratification, and reasonable treatment options.

| Data collection and processing
The transcriptome data of TCGA-LIHC and the corresponding clinical information were downloaded from The Cancer Genome Atlas (TCGA (http://cance rgeno me.nih.gov). The transcriptomic gene expression profiles of ICGC-LIRI-JP and GSE14520, and corresponding clinical data were obtained from International Cancer Genome Consortium (ICGC, https://dcc.icgc.org/) and Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/). Due to the unknown identity of patients from the above cohorts, informed consent was waived. Patients with incomplete survival information and a survival period of less than 30 days were excluded. Their detailed information is provided in Table 1. Two hundred genes encoding proteins involved in glycolysis and gluconeogenesis were extracted from the gene set (HALLMARK_GLYCOLYSIS) retrieved from the Molecular Signatures Database of GSEA (https://www.gsea-msigdb. org/gsea/index.jsp).

| Single-sample gene set enrichment analysis (ssGSEA) and weighted gene co-expression network analysis (WGCNA)
ssGSEA was applied to calculate the enrichment score that repre-

| Statistical analysis
R version 4.0.3 software (https://cran.r-proje ct.org/) was the main tool used to conduct the statistical analysis. A p-value of <0.05 was considered statistically significant.

| Role of the glycolysis pathway in HCC
Glycolysis ssGSEA scores were calculated for each sample in GSE14520 and TCGA-LIHC to represent the glycolytic activity using the ssGSEA method from the GSVA package, according to the gene set HALLMARK_GLYCOLYSIS. In GSE14520, it was observed that the glycolysis ssGSEA score was significantly higher in tumor tissues than in matched normal tissues (p < 0.05), and the glycolysis ssGSEA score of the dead patients was significantly higher than that of the surviving patients (p < 0.05) ( Figure 1A). Compared to the patients with lower glycolysis ssGSEA scores, the patients with higher glycolysis ssGSEA scores had worse outcomes (p < 0.05) ( Figure 1B).
In TCGA-LIHC, although no significant difference was found in the comparison of glycolysis ssGSEA scores between tumor tissues and adjacent normal tissues (p = 0.83), glycolysis ssGSEA scores of the patients with different survival outcomes were distinctly different, similar to the results of GSE14520 (p < 0.05) ( Figure 1C). The results of KM survival analysis indicated that the patients with higher glycolysis ssGSEA scores had a shorter survival period than those with lower glycolysis ssGSEA scores (p < 0.05) ( Figure 1D). The results of GSEA analysis in the TCGA-LIHC cohort showed that the normalized enrichment score of gene set HALLMARK_GLYCOLYSIS was more than 1 and both the NOM p-value and FDR q-value were <0.05, indicating that the biological behaviors of glycolysis and gluconeogenesis were more active in tumor tissues than in normal tissues in HCC, and it might promote the tumorigenesis and development ( Figure 1E).

| Novel prognostic signature based on four genes
WGCNA was implemented to construct a gene co-expression network and to identify a highly synergistic gene set based on weighted gene expression correlation. After excluding one discrete sample (GSM363054) and setting the optimal soft threshold to 4, hierar- of UCK2 * 0.334) ( Table 2). All four DEGs were upregulated in HCC and high expression levels were unfavorable indicators for OS of the patients.
Taking the median value of risk scores as the cutoff value, the cohort of TCGA-LIHC was divided into the high-risk and lowrisk groups. The distribution of risk scores, survival status, and expression of the four genes are displayed in Figure 4A. With the increase in risk score, more patients tended to be in the state of death.
The heatmap indicated that four genes were overexpressed in the high-risk group compared to the low-risk group. KM curves demonstrated that the prognosis of the high-risk group was significantly poorer than that of the low-risk group in OS (p < 0.05) ( Figure 4C).
F I G U R E 2 WGCNA analysis of gene expression profiles in HCC. One outlier was detected by sample clustering, sample dendrogram, and trait heatmap (A). Soft threshold equaled 4 when scale independence tended to the maximum and mean connectivity tended to the minimum (B). Hierarchical clustering tree revealing 26 gene modules (C). Correlation between the 26 gene modules and glycolysis ssGSEA scores (D). Gene enrichment analysis of the blue gene module (E)  were more deaths among those at high risk. The heatmap revealed that the expression comparison results of four genes between highand low-risk groups were consistent with those in TCGA-LIHC cohort. KM curves suggested that the high-risk group had a markedly lower OS rate than the low-risk group (p < 0.05) ( Figure 4D). Timedependent ROC curves of the ICGC-LIRI-JP cohort were plotted.

TA B L E 2 Stepwise multivariate Cox regression analysis of survival prognostic genes
The AUC values at 1-, 2-, and 3-years were 0.771, 0.756, and 0.78, respectively ( Figure 4F). PCA and t-SNE analyses illustrated that the prognostic signature could clearly distinguish the high-risk and lowrisk groups according to the sample distribution of the two groups ( Figure 4G,H).

F I G U R E 6
Correlation between the risk scores and HCC patients' histological grades, AJCC TNM stages, and T stages in TCGA-LIHC (A-C) that risk scores could act as independent prognostic factors for OS in patients with HCC (p < 0.05) ( Table 4).
GEPIA results revealed that the expression of all four genes (CDCA8, RAB5IF, SAP30, and UCK2) was related to the OS prognosis of patients with HCC, and patients with higher gene expression levels were more likely to have poorer outcomes (all p < 0.05) ( Figure 7A-D). The mutations of four genes in HCC are shown in Figure 6C. UCK2 had the highest mutation rate (11%), especially in amplification. The other three genes had a mutation rate of <2%, with a rate of 0.5% for CDCA8 ( Figure 7E).

| Nomogram integrating risk signature and clinicopathological features
The candidate factors with prognostic value, including AJCC TNM stage and risk scores calculated by the prognostic signature, were screened out by stepwise multivariate Cox regression analysis and were used to construct a highly accurate predictive nomogram The calibration curves showed considerable agreement between the nomogram-predicted OS and actual OS ( Figure 8B). According to the median value of risk scores calculated by the nomogram, the cohort was divided into the high-risk and low-risk groups. The results of the KM analysis demonstrated that the prognosis of the low-risk group was notably better than that of the high-risk group (p < 0.05) ( Figure 8C). The nomogram could provide a more accurate survival prediction. ROC analysis suggested that the AUC values at 1, 3, and 5 years for the training cohort were 0.825, 0.781, and 0.759, respectively ( Figure 8D). Time-dependent AUC curves suggested that the predictive power of the nomogram was superior to other predictive models, such as the AJCC TNM stage and four-gene based prognostic signature at different time points over the follow-up period.
DCA analysis showed that the nomogram could yield superior net benefits compared with an all-or-none treatment strategy and other predictive models ( Figure 8E).

| DISCUSS IONS
HCC is a common malignant disease with a high incidence and mortality. Most patients with HCC are diagnosed at the terminal stage and have lost the chance to benefit from radical surgery. 31 Although an increasing number of alternative therapy strategies have been developed and can prolong survival time for patients with unresectable HCC, they are not absolutely effective and some patients still have poor responses. 32 The reasons why patients respond differently to the same treatment may reflect the heterogeneity of tumors caused by molecular dysregulation and genetic variation. 33 Therefore, biomarkers related to tumor development and curative effect are worth exploring and could be conducive to early detection, targeted therapy, and personalized management of HCC. [34][35][36] Glucose metabolism reprogramming, especially aerobic glycolysis, is important in the initiation and progression of HCC in a hypoxic environment, such as growth and proliferation, invasion and metastasis, immunosuppression, and drug resistance. 37 The results of ssGSEA comparison indicated that glycolysis was more active in HCC and could promote tumor malignant behavior. The regulation of glycolysis in HCC is complex and may be related to genetic mutation and epigenetic modulation of oncogenes and suppressor genes, noncoding RNAs, signaling pathways, glycolytic enzymes, and other factors. 38 Understanding the mechanism of glycolysis in TA B L E 4 Independent prognostic factor analysis for the overall survival in TCGA-LIHC cohort and ICGC-LIRI-JP cohort  tumor development is essential to find novel biomarkers and promising therapeutic targets, and to improve diagnosis, treatment, and surveillance of HCC.
In the present study, we constructed co-expression gene modules related to glycolysis by WGCNA and identified four potential prognostic biomarkers to develop a novel risk signature for HCC.
Cell division cycle-associated 8 (CDCA8), a regulator of mitosis, is overexpressed in bladder cancer, breast cancer, liver cancer, and other tumors and is involved in their growth and development. High CDCA8 expression may be a predictor of poor prognosis in HCC. 39 Long noncoding RNA RAB5 interacting factor (RAB5IF) is highly ex-

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest regarding the publication of this article.

AUTH O R CO NTR I B UTI O N S
Haosheng Jin and Ning Shi contributed to the conception and de-