A Novel Glycolysis-Related Gene Signature that can Predict the Prognosis of Glioblastoma Patients

Background: Glioblastoma (GBM) is one of the most common primary intracranial malignancies, with limited treatment options and poor overall survival (OS). Metabolic changes in GBM have attracted wide attention in recent years, and one of the main metabolic features of cancer cells is the high level of glycolysis. Therefore, it is necessary to identify novel biomarkers associated with glycolysis in GBM. Methods: In this study, we performed gene set enrichment analysis and proled four glycolysis-related gene sets, which revealed 327 genes associated with biological processes. Univariate and multivariate Cox regression analyses were performed to identify genes for constructing a risk signature, and we identied ten mRNAs (B4GALT7, CHST12, G6PC2, GALE, IL13RA1, LDHB, SPAG4, STC1, TGFBI and TPBG) in the Cox proportional hazards regression model for GBM. Results: Based on this gene signature, we divided patients into high-risk (with poor outcomes) and low-risk (with better outcomes) subgroups. Multivariate Cox regression analysis showed that the prognostic power of this ten-gene signature is independent of clinical variables. Furthermore, we validated this model in two other GBM databases (Chinese Glioma Genome Atlas (CGGA) and REMBRANDT). In the functional analysis, the risk signature was associated with almost every step of cancer progression, such as adhesion, proliferation, angiogenesis, drug resistance and even an immune-suppressed microenvironment. Conclusion: The 10 glycolysis-related gene risk signature could serve as an independent prognostic factor for GBM and might be valuable for the targets in GBM therapies. Our ndings may provide novel insights for GBM research and guidance for individual therapy.

In this study, we aimed to discover novel glycolysis-related prognostic markers in GBM patients using gene set enrichment analysis (GSEA) and Cox multivariate regression models to analyse whole genome expression pro le data sets. We pro led the hallmark gene sets in 167 GBM patients with whole mRNA expression data from The Cancer Genome Atlas (TCGA) database. We identi ed 327 mRNAs signi cantly associated with glycolysis and established a ten-gene risk signature that can effectively predict patient outcomes. Notably, the glycolysis-related risk signature could independently identify patients in the highrisk group with poor prognosis. In addition, functional analysis demonstrated that the risk signature was associated with almost every step of cancer progression, such as adhesion, proliferation, angiogenesis, drug resistance and even an immune-suppressed microenvironment.

Clinical information of the patients and genome expression data
The transcriptional pro le and clinical data of patients with low-grade glioma (LGG) and GBM were extracted from the TCGA database (https://cancergenome.nih.gov/). Clinical information, including the total number of patients (n = 696, including 529 LGG patients and 167 GBM patients), gender, age, Karnofsky Performance Status (KPS) score, radiotherapy, chemotherapy, IDH status and MGMT promoter methylation status, was collected for the study. Validation information from the Repository for Molecular Brain Neoplasia Data (REMBRANDT, microarray) and Chinese Glioma Genome Atlas (CGGA, microarray) data sets were downloaded from GlioVis (http://gliovis.bioinfo.cnio.es/). GSEA GSEA (http://www.broadinstitute.org/gsea/index.jsp) was used to explore whether the identi ed sets of genes demonstrated signi cant differences between the two groups [16]. The expression levels of all mRNAs in LGG and GBM were analysed using GSEA_4.0.3. Normalized p-values (p < 0.05) and normalized enrichment scores (NESs) were used to determine functions to be investigated in further analysis. Gene set variation analysis (GSVA) was used to explore biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the risk signature [17]. Ten associated gene sets with differences between the high-risk group and the low-risk group in the TCGA data set (GBM HGU133A) were selected using the R package "limma", and an adjusted p-value < 0.05 was considered statistically signi cant.

Prognostic analysis
The associations between the expression level of each mRNA and patient overall survival (OS) were calculated using a univariate Cox model. In the univariate Cox analysis, mRNAs with p-values less than 0.05 were considered statistically signi cant. Afterward, multivariable Cox analysis was used to evaluate the weight of mRNAs as independent predictors of survival. These analyses were conducted using the R package "survival".

Statistical analysis
The candidate genes were classi ed into risk (hazard ratio (HR) > 1) and protective (0 < HR < 1) types. Based on the multivariate Cox regression analysis results, a prognostic risk score formula was established using a linear combination of the expression levels weighted with the regression coe cients. The risk score formula is described as follows: Risk score = expression of gene 1 × β1 + expression of gene 2 × β2 +......+ expression of gene n × βn. We classi ed 167 patients with GBM into high-risk and low-risk subgroups using the median risk score as the cutoff. Kaplan-Meier (KM) curves and the log-rank test were used to validate the prognostic signi cance of the risk score in strati ed analysis. Student's t test was used to examine the differential expression of the optimal genes in LGG and GBM tissues. All of the statistical analyses were performed using SPSS 19.0 and R 3.6.3 software. We also researched the genetic alterations of the prognosis-related genes in GBM through cBioPortal (http://www.cbioportal.org/). The chi-square test was used to assess the relationships between the risk score and clinical parameters, and the Bonferroni method was used to perform pairwise comparisons within multi-group comparisons [18].

Glycolysis-related gene sets differ signi cantly between
LGG and GBM samples The mRNA expression and clinical data of all patients were obtained from TCGA. We found all glycolysisrelated gene sets in the Molecular Signatures Database (MSigDB) version 7.1 to represent well-de ned glycolysis states or processes. GSEA was performed to explore whether the identi ed sets of genes showed signi cant differences between the LGG and GBM samples. Ultimately, we found that four gene sets, including GO_GLYCOLYTIC_PROCESS, HALLMARK_GLYCOLYSIS, KEGG_GLYCOLYSIS_GLUCONEOGENESIS, and REACTOME_GLYCOLYSIS, were signi cantly enriched with normalized p-values < 0.05 (Table 1, Fig. 1). We then selected the four gene sets, which contained 327 speci c genes, for further analysis. Then, the differential expression of the ten genes in LGG and GBM samples was also investigated. Eight genes (B4GALT7, CHST12, GALE, IL13RA1, SPAG4, STC1, TGFBI and TPBG) were signi cantly upregulated in GBM samples, and two genes (G6PC2 and LDHB) were signi cantly upregulated in LGG samples (p < 0.0001, Fig. 2). Association between the risk score and patient outcomes Using the ten-mRNA signature, we calculated the risk scores for each patient with GBM and ranked them in order of increasing risk scores (Fig. 3a). Figure 3b shows the risk score, OS (in years) and life status of 167 patients in the GBM data set, ranked in order of increasing risk scores. Patients with high risk scores had higher mortality rates than patients with low risk scores. Then, the 167 patients in the entire GBM data set were classi ed into the high-risk group (n = 83) and the low-risk group (n = 84) using the median risk score as the threshold. The KM analysis showed a signi cant difference in the outcomes of patients in the high-risk group and the low-risk group (log-rank test p < 0.001; Fig. 3c). Patients in the high-risk group had signi cantly worse survival than those in the low-risk group. To evaluate how well the ten-mRNA signature can predict prognosis, receiver operating characteristic (ROC) curve analysis was carried out. The area under the curve (AUC) for the ten-mRNA signature was 0.771, 0.847 and 0.713 for one-year, three-year and ve-year survival, respectively (Fig. 3d), demonstrating the reliable prognostic performance of the ten-mRNA signature for predicting survival in the entire dataset. To con rm that the gene signature performs better than the single gene biomarkers, we performed KM and ROC curve analyses, and the results supported our hypothesis. When the ten genes were each taken as a single biomarker, their prognostic performance was not better than that of the ten-mRNA signature (Fig. 3e).
The risk score generated from the ten-mRNA signature as an independent prognostic indicator In addition, risk score, age and radiotherapy had remarkable independent prognostic value not only in the univariate analysis but also in the multivariate analysis (p < 0.05), which indicates that the prognostic value of the ten-gene signature is signi cant for survival prediction. These results indicated that the risk score was robust in predicting the prognosis of patients with GBM (Table 3). Validation of the risk signature A total of 237 GBM samples in the CGGA data set and 181 GBM samples in the REMBRANDT data set were collected and used as two validation data sets to assess the performance of the risk signature. The K-M survival curves showed that patients with higher risk scores had poorer prognosis than those with lower risk scores (Fig. 4a, CGGA, p < 0.05; and 4b, REMBRANDT, p < 0.05). The AUCs of the ROC curves for predicting the 1-, 3-and 5-year survival of GBM patients in the CGGA data set were 0.589, 0.603 and 0.618, respectively (Fig. 4c), and those in the REMBRANDT data set were 0.561, 0.614 and 0.593 (Fig. 4d). These results indicated that the risk signature performed well for predicting the survival of GBM patients.

Associations between the risk signature and clinical characteristics
To explore the associations between the risk signature and clinical characteristics, we rst constructed a heatmap to present the distribution trends of gender, age, KPS score, transcriptome subtype, IDH1 status and MGMT promoter methylation status between the low-risk and high-risk groups in the TCGA database.
As shown in Fig. 5, the high-risk group tended to contain more patients older than 65 years, whereas samples with IDH1 mutations were all included in the low-risk group, and samples with different transcriptome subtypes seemed to have distinct distributions in the two risk groups. Meanwhile, there were no signi cant differences between the low-risk and high-risk groups in gender, KPS score, and MGMT promoter status. To be more intuitive, the chi-square test was used to verify the proportion differences of each factor (age, gender, molecular subtypes, MGMT promoter methylation status and IDH1 status) between the low-risk and high-risk groups ( Table 4). The results demonstrated that patients older than 65 years had more high-risk proportions, patients with wild-type IDH1 GBM had more high-risk proportions and patients with GBM of the mesenchymal subtype had the highest high-risk proportions (Bonferroni method). Functional analysis of the risk signature GSVA was used to explore the biological processes and KEGG pathways associated with the risk signature. As shown in Fig. 6a, several biological processes relevant to necrosis, leukocyte migration involved in the in ammatory response, positive regulation of macrophage chemotaxis and regulatory T cell differentiation were enriched in the high-risk group. Regarding KEGG pathways, the high-risk group was positively correlated with apoptosis, focal adhesion, the MAPK and JAK-STAT signalling pathway, the VEGF signalling pathway, ABC transporters and so on (Fig. 6b). In brief, these results revealed that the risk signature was correlated with almost every step of cancer progression.

Discussion
In recent years, cancer research on energy metabolism has attracted attention. Moreover, a large number of researchers have supported tumour metabolism as a critical determinant of glioma progression. In addition to TME-related factors, oncogenic mutations also modulate glioma metabolism to promote tumour cell proliferation and evasion of drug therapy [20,21]. It has been demonstrated that the cancer genotype and the TME shape the metabolic reprogramming of GBM and are thus potential therapeutic targets. Moreover, regulators of GBM metabolism can be useful tools for prognostication, diagnosis and therapy [20]. The Warburg effect is a phenomenon in which metabolism shifts to aerobic glycolysis rather than mitochondrial oxidative phosphorylation, which is a typical biochemical adaptation in GBM. Targeting glycolysis-related regulatory genes is an ideal therapeutic strategy for GBM. Several glycolysis regulatory genes have been reported, including hexokinase 2 (HK2) and PTEN-induced kinase 1 (PINK1), where the inhibition of HK2 and activation of PINK1 in preclinical GBM models have shown therapeutic bene t [22,23].
However, due to the high heterogeneity of GBM, targeting a single gene is often unable to effectively control tumour progression; moreover, single gene expression is easily interfered with by many external factors, and it is di cult for these biomarkers to independently and accurately predict the survival rate of patients. Therefore, in the present study, we constructed a statistical model containing multiple glycolysis-related genes and combined the function of each gene to improve the prediction e ciency. This kind of model has been con rmed in many other solid tumours and is superior to a single biomarker in predicting tumour prognosis [24][25][26].
Ten glycolysis-related biomarker genes (B4GALT7, CHST12, G6PC2, GALE, IL13RA1, LDHB, SPAG4, STC1, TGFBI and TPBG) were found to be statistically and biologically signi cant in the discrimination of LGGs from GBM in the present study, and six genes (CHST12, G6PC2, IL13RA1, LDHB, TGFBI and TPBG) were demonstrated to be signi cantly correlated with the prognosis of GBM patients ( Table 2). Among these biomarker genes, LDHB is a dehydrogenase and a critical switch that regulates glycolysis and OXPHOS. It has been demonstrated that the expression of LDHB alone was not able to predict a difference in OS, but the concomitant expression of LDHB and CCNB1 was able to identify medulloblastoma patients with a signi cantly worse prognosis [27]. Transforming growth factor-beta-induced (TGFBI) is an exocrine protein that has been found to be able to promote the development of glioma, nasopharyngeal carcinoma, bladder cancer and other tumours [28,29]. In a recent study, Guo Sk and colleagues showed that TGFBI was upregulated in glioma cells and played a promoting role in the growth and motility of U87 and U251 cells. Their results suggested that TGFBI has the potential to be a diagnostic marker and to serve as a target for the treatment of gliomas [30]. IL-13 receptor subunits α1 and α2 of the IL-13R complex are overexpressed in GBM. Jing Han and his colleagues showed that high IL13Rα1 with or without IL13Rα2 expression was associated with poor prognosis in patients with high-grade gliomas, but there was no correlation between IL-13Rα1 mRNA and IL-13Rα2 mRNA expression. Their ndings have important implications in understanding the role of IL-13R in the pathogenesis of GBM and potentially other cancers [31]. Although these genes can independently predict tumour prognosis to some extent, our results demonstrated that the ten-mRNA signature has better prognostic signi cance than the corresponding single biomarkers. Moreover, by using KM and ROC curve analyses of GBM, we veri ed our statistical results in the CGGA and REMBRANDT data sets and con rmed that the risk signature performed well in predicting the survival of GBM patients (Fig. 4). Therefore, this glycolysis-related gene signature can predict tumour prognosis more accurately and guide treatment more comprehensively.
We also constructed a heatmap to present the associations between the risk signature and clinical characteristics in the TCGA database. Our results indicated that elderly age, the mesenchymal subtype and wild-type IDH1 were signi cantly correlated with higher risk scores (Table 4) (Fig. 5). Consistent with mainstream views, elderly patients, the mesenchymal subtype and wild-type IDH1 usually predict an unfavourable prognosis [32,33]. Moreover, by using GSVA to explore the biological processes and KEGG pathways associated with the risk signature, we noticed that the risk signature was correlated with almost every step of oncogenesis and tumour progression, including adverse biological processes and signal transduction pathways (Fig. 6). Currently, many studies have elucidated the aggressive behaviours associated with GBM glycolysis and attempted to nd ways to target GBM glycolysis, such as through Myc, PGK1, SIRT3 and HK1 [34][35][36][37]. Therefore, our results once again con rm the reliability of the risk score in predicting the prognosis of GBM and provide new potential targets for targeting glycolysis.

Conclusion
In conclusion, we identi ed and validated a risk signature with 10 glycolysis-related genes associated with the survival of patients with GBM, where higher risk scores indicate unfavourable outcomes.
Moreover, these glycolysis-related genes could be potential prognostic targets in GBM therapies. Our ndings may provide novel insights for GBM research and guidance for individual therapy. Availability of data and materials The datasets used and/or analyzed in the present study are available from the corresponding author on reasonable request.  Verifying that the prognostic value of the risk signature is better than that of each single biomarker with ROC curves.

Figure 4
Evaluating the e ciencies of the risk signature in the CGGA and REMBRANDT data sets. a, b, Kaplan-Meier survival curves showed the prognostic value of the risk signature in the CGGA data set (a. low-risk group, n = 118; high-risk group, n = 119; p<0.05) and REMBRANDT data set (b. low-risk group, n = 91; high-risk group, n = 90; p<0.01). c, d, ROC curves evaluated the e ciency of the risk signature for predicting 1-, 3-and 5-year survival in the CGGA data set (c) and REMBRANDT data set (d).

Figure 6
Functional roles of the risk signature. GSVA showed the biological processes (a) and KEGG pathways (b) associated with the risk signature.