Glycolysis gene expression profilings screen for prognostic risk signature of hepatocellular carcinoma

Metabolic changes are the markers of cancer and have attracted wide attention in recent years. One of the main metabolic features of tumor cells is the high level of glycolysis, even if there is oxygen. The transformation and preference of metabolic pathways is usually regulated by specific gene expression. The aim of this study is to develop a glycolysis-related risk signature as a biomarker via four common cancer types. Only hepatocellular carcinoma was shown the strong relationship with glycolysis. The mRNA sequencing and chip data of hepatocellular carcinoma, breast invasive carcinoma, renal clear cell carcinoma, colorectal adenocarcinoma were included in the study. Gene set enrichment analysis was performed, profiling three glycolysis-related gene sets, it revealed genes associated with the biological process. Univariate and multivariate Cox proportional regression models were used to screen out prognostic-related gene signature. We identified six mRNAs (DPYSL4, HOMER1, ABCB6, CENPA, CDK1, STMN1) significantly associated with overall survival in the Cox proportional regression model for hepatocellular carcinoma. Based on this gene signature, we were able to divide patients into high-risk and low-risk subgroups. Multivariate Cox regression analysis showed that prognostic power of this six gene signature is independent of clinical variables. Further, we validated this data in our own 55 paired hepatocellular carcinoma and adjacent tissues. The results showed that these proteins were highly expressed in hepatocellular carcinoma tissues compared with adjacent tissue. The survival time of high-risk group was significantly shorter than that of low-risk group, indicating that high-risk group had poor prognosis. We calculated the correlation coefficients between six proteins and found that these six proteins were independent of each other. In conclusions, we developed a glycolysis-related gene signature that could predict survival in hepatocellular carcinoma patients. Our findings provide novel insight to the mechanisms of glycolysis and it is useful for identifying patients with hepatocellular carcinoma with poor prognoses.

AGING cancer cell growth. It is well established that cancer cells shift to a pro-anabolic metabolism induced by oncogenes, such as c-Myc [2]. Most notable is the Warburg effect wherein cancer cells increase glycolysis and it is a metabolic hallmark of virtually all cancer cells, characterized by excessive conversion of glucose to lactate. It occurs in most organisms, even though in the presence of oxygen. An increase in glycolysis provides cancer cells with energy and heightened potential for biomass production from glycolytic intermediates [3]. Glycolysis was an early attractive target for cancer therapy given the clinical observation that many tumors exhibit a significant increase in glucose uptake compared with adjacent normal tissue [4]. LDH-A, a will pyruvate (end product of glycolysis) converted to lactic acid metabolic enzymes, is identified as the first metabolic target of the oncogene MYC [5]. Another potential therapeutic target is the glycolytic protein HK2. Many tumor cells overexpress HK2, and preclinical mouse models of genetically engineered NSCLC (non-small cell lung cancer) and breast cancer demonstrate that HK2 inhibition delays tumor progression [6]. Uncontrolled growth, reduced ability to undergo apoptosis and the ability to metastasize are some of the important features of malignancies, regardless of origins of tissues [7]. Therefore, early detection of cancer is always one of the most effective ways to improve the overall survival rate of cancer patients. Clinically, better or alternative methods to identify cancer risk groups are always needed [8]. Thousands of biomarkers that may be associated with survival and prognosis have been explored through various methods. However, we found that limited studies have systematically investigated the metabolic status and its prognostic value in patients with tumor. And most biomarkers have been developed in a wide range, not selected for specific features or single signaling pathways, which could be a new entry point for us to research. Therefore, clarifying the relationship between glycolysis and tumor is crucial for understanding the mechanism of tumorigenesis.
In the present study, we performed GSEA to identify glucose-related gene sets that could distinguish the clinical and molecular variables of several common cancers which were mentioned above, including colon adenocarcinoma (COAD), kidney renal clear cell carcinoma (KIRC), hepatocellular carcinoma (HCC) and breast invasive carcinoma (BRCA). And then, we found the significant mRNAs could be mined only in HCC. We developed a glucose-related prognostic signature with the whole genome expression data from the TCGA database for the patients with HCC in the TCGA dataset. Surprisingly, the local glycolysis-related risk signature could independently classify patients with HCC with a high risk of unfavorable outcome. We first identified a glycolysis-related risk signature for patients with HCC.
These results might provide a new view for the research of HCC and individual treatment.

Glycolysis-related gene sets differ significantly between adjacent cancer samples and tumor samples
The study included four solid tumors, including COAD, KIRC, HCC and BRCA. The mRNA expression and clinical data of all patients were obtained from the Cancer Genome Atlas Database (TCGA). We found all glycolysis-related gene sets on the Molecular Signatures Database v4.0 (http://www.broadinstitute.org/gsea/ msigdb/index.jsp), namely three different gene sets (HALLMARK_GLYCOLYSIS, KEGG_GLYCOLYSIS_ GLUCONEOGENESIS, REACTOME_GLYCOLYSIS). First, we used GSEA to explore whether these three glycolysis-related gene sets differ significantly between adjacent cancer samples and tumor samples ( Figure 1). We found that the HALLMARK_GLYCOLYSIS gene set in HCC was significantly different between paracancerous and tumor samples (FDR=0.0221); the REACTOME_GLYCOLYSIS gene set in COAD was significantly different between paracancerous and tumor samples (FDR=0.0266). The HALLMARK_ GLYCOLYSIS gene set in BRCA was significantly different between the paracancerous sample and the tumor sample (FDR = 0.0152). In the analysis of renal clear cell carcinoma, the FDR values of three gene sets were all found to be greater than 0.05 (Figure 2), indicating that there is no significant difference between the adjacent glycolysis-related gene sets in the paracancerous samples and the tumor samples. Next, the core genes (CORE ENRICHMENT: YES) under the above gene set is screened, that is, the gene whose expression is up-regulated in the tumor tissue. Among the four solid tumors, differentially expressed genes can be screened for COAD, HCC and BRCA. Compared with adjacent tissues, 75 up-regulated mRNAs were screened in COAD tissues; 109 up-regulated mRNAs were screened in HCC tissues compared with adjacent tissues; compared with adjacent tissues, a total of 101 up-regulated mRNAs were screened in BRCA tissues (Table 1).
To verify whether the core gene is involved in glycolysis, we conducted an in-depth study using GO analysis and KEGG pathway enrichment analysis. The results showed that the most enriched biological process term (BP) was related to glucose metabolism process, molecular function term (MF) was related to sugar binding and glucose binding, and KEGG pathway enrichment analysis involved glycolysis/gluconeogenesis, suggesting that the selected core genes are indeed related to glycolysis ( Figure 3). AGING

Identification of glycolysis-related genes associated with patients' survival
We selected the core genes from the GSEA results table and performed a univariate Cox regression analysis to further screen for the mRNAs associated with the patient's OS. It was found that 22 mRNAs were selected for HCC in patients with OS (P<0.001); in the analysis of COAD, no mRNA associated with OS was found; in the analysis of BRCA, four mRNAs were associated with OS (P < 0.1, Table 2), but the area under the curve (AUC = 0.637) of the receiver operating characteristic curve (ROC), indicating diagnostic performance was less than 0.70 ( Figure 4). The efficacy is not high, therefore, no prognostic genes related to glycolysis can be screened in breast invasive cancer. The univariate Cox proportional regression model found that the differentially expressed genes related to glycolysis which were associated with the patient's OS could only been selected in HCC of the remaining three tumors.
Next, we analyzed the mutations of the selected six mRNAs in HCC through clinical HCC samples in the cBioPortal online web database. The results showed that 19 of 309 patients (6%) had mutations. Among them, DPYSL4: one case of gene amplification, one case of missense mutation, one case of truncation mutation; 0.9% of HOMER1 gene mutation, two cases of amplification, one case of missense mutation, one case of truncation mutation; 0.7% of ABCB6 gene mutation; CENPA gene has 2% mutation; CDK1 gene mutation is 0.7%; STMN1 gene mutation accounts for 1.4% ( Figure 5A).
We further analyzed the differential expression of DPYSL4, HOMER1, ABCB6, CENPA, CDK1, STMN1 in cancer and adjacent tissues, and found that compared with adjacent tissues, the six genes in 309 HCC tissues were significantly up-regulated (P<0.01, Figure 5B).

Construction and validation of a six-mRNA signature for predicting patients' outcome
To construct a prognostic signature, these six mRNAs were analyzed using a multivariate Cox regression analysis in the entire dataset with survival. Next, by integrating the expression profiles of six mRNAs and the      Figure 6A). Thus, 309 patients in the entire data set were classified as high risk groups (n = 154) and low-risk groups (n = 155), using the median risk score as the threshold. The Kaplan-Meier (KM) analysis showed a significant difference in the outcome of the patients between the high-risk group and the low-risk group (log-rank test P <0.001; Figure 6C). The high-risk subgroup had significantly worse survival than those in the low risk subgroup. To evaluate how well the six-mRNA signature for diagnosis, the ROC curve analysis was carried out. The AUC for the six-mRNA signature was 0.765 ( Figure 6D), demonstrating the good diagnostic significance of six-mRNA signature for survival prediction in the entire dataset. Figure 6B showed the risk score, OS (in days) and life status of 309 patients in the entire data set, ranked in order of increased risk score, the patients with high-risk scores had higher mortality rates than did the patients with low-risk scores.
We applied chi-square test to reveal the relation between risk score and clinical features (Table 4). It revealed that T, N, M, stage, grade, relative family history concerned the risk score of HCC patients.
We next validated our six-mRNA signature in the training set and validation set to confirm our findings. 309 patients were randomly divided into a training set (n = 151) and a validation set (n = 158). The ID of the two groups please see the Supplementary Table 1. Cross validation showed that the CLF score was 0.82, indicating that the grouping was stable and reliable. Consistent with the results of the entire set, patients in the high-risk group had significantly shorter survival time than those in the low-risk group of the training set (log-rank test P =0.0263; Figure 7A). The validation data set, similar results were observed: the survival rate of patients in the high-risk group was significantly lower than that in the low-risk group (logrank test P =0.0169; Figure 7B).
In order to confirm that the gene signature is more significant than the single gene biomarker, we verified it with the KM analysis and the ROC curve and the results showed that our hypothesis is correct. When the six genes act as a single biomarker respectively, their diagnostic significance was not better than the six-mRNA signature ( Figure 8).

Independence of the risk score for the six-mRNA from other clinical variables
To assess whether the prognostic ability of the six mRNA markers is independent of other clinical parameters including family history, new tumor event after initial treatment, neoplasm cancer status and tumor stage, the univariate and multivariate Cox regression analyses were performed in the entire dataset. The results of the data showed that the risk score of six mRNA is significantly related to patients' survival. Surprisingly, the six-mRNA signature still retained an independent prognostic indicator after adjustment for other clinical parameters in the entire dataset (P-value<0.001, HR = 1.884, 95% CI = 1.287-2.757; Table 5 and Table 6). In addition, we found that T, neoplasm cancer status, new tumor event after initial treatment were also independent prognostic factors, because they had significant differences not only in the univariate analysis, but also in the multivariate analysis, with P values less than 0.05. AGING  Stratified analyses were then performed according to person neoplasm cancer status, new tumor event after initial treatment, grade, respectively. First, all 309 HCC patients were stratified by person neoplasm cancer status into tumor free dataset (n = 135) and with tumor (n = 100) dataset. The six-mRNA signature could classify the tumor free dataset into a high-risk group (n = 84) and a low-risk group (n = 51) with significantly different survival. Similarly, the six-mRNA signature was also able to classify the with tumor dataset into a high-risk group (n = 51) and a low-risk group (n = 49) with significantly different survival ( Figure 9A). Then all patients were further stratified by grade into an early dataset (grade I and grade II, n=182) and a late dataset (grade III and grade IV, n = 123). Similar prognostic power of the six-mRNA signature was significant in both the early dataset and late dataset. Patients in the early dataset were classified into a high-risk group (n = 91) with shorter survival and a low-risk group (n = 91) with longer survival. Similar results were observed in the late dataset ( Figure 9C). Finally, all patients were stratified by the new tumor event after initial treatment into no dataset (n = 123) and yes dataset (n = 122). There was a significant difference in survival rate between high-risk group and low-risk group ( Figure 9B). It is shown from that regression analysis of single variable and multivariable Cox and the result of stratified analysis that the prediction ability of six mRNA mark is independent from other clinical parameter and can predict the survival rate of HCC patients.

Exploration of the differentially expressed genes between high-risk and low-risk patients related to the gene signature
To investigate the differentially expressed genes between high-risk and low-risk patients related to the gene signature, KEGG analysis was carried out on the  Abbreviations: T: Tumor; N: Node (regional lymph node); M: Metastasis risk score associated genes. By differential expression analysis, we extracted the differential expression data of 183 low-risk patients and 184 high-risk patients, and ordered them by P value. We selected the most 200 significant differentially expressed genes for subsequent analysis.
We found that these genes were enriched in terms including "cell cycle", "protein digestion and absorption", and"central carbon metabolism in cancer" (Supplementary Figure 1A). Meanwhile, using these genes with mRNA-signature, we performed PPI (protein-protein interaction) to find the hub gene through String (https://string-db.org/cgi/input.pl?sessionId= FB2xhX96kIxn&input_page_show_search=on) and cytoscape software. The result indicated that several genes for example, "CCNB1, CDK1, PLK1" (Supplementary Figure 1B) were the key genes we hoped to select, and "CENPA, STMN1, KIF2C, PTTG1" also played a crucial role in above pathways. Surprisingly, three genes of them that we identified before. It illustrated that "CENPA, STMN1, CDK1" are not only hub genes which have strong relationship with differentially expressed genes between high-risk and low-risk patients, but also the factors that influence the outcome of patients with HCC. All these results suggest that the novel gene signature reflects the HCC cellular functional characteristics related to the poor prognosis, thus predicts the survival of HCC patients.

The protein of the six-genes were overexpressed in HCC tissues
To further verify the protein expression level of the sixgenes in HCC, immunohistochemistry was performed in TMA containing 55 paired HCC and adjacent tissue. Consistent with our previous findings, the six proteins were highly expressed in most HCC tissues compared with that in adjacent non-tumor liver tissues ( Figure 10). As shown in Figure 10, high expression of STMN1 (29/55), ABCB6 (42/55), HOMER1 (24/55), CENPA (30/55), CDK1 (40/55) and DPYSL4 (41/55) were detected in a majority of HCC tissues, while high expression of STMN1 (8/55), ABCB6 (18/55), HOMER1 (11/55), CENPA (13/55), CDK1 (19/55) and DPYSL4 (17/55) were observed in a minority of adjacent non-tumor liver tissues. The potential association between protein expression and clinical features of HCC was analyzed. We found that highexpressed (STMN1, HOMER1, CENPA) patients had significant low overall survival rate compared with lowexpressed group patients ( Figure 11). These data indicate the survival time of high-risk group was significantly shorter than that of low-risk group, indicating that highrisk group had poor prognosis. We analyzed the correlation of the six proteins in HCC. The results showed that the absolute value of R values was less than 0.3, suggesting that these six proteins are independent of each other ( Figure 12).

DISCUSSION
In recent years, research on energy metabolism has been aroused people's attention, furthermore, human malignancy and energy metabolism are inextricably linked. The metabolism of malignant tumors is characterized by the Warburg effect, in which a metabolic shift from oxidative phosphorylation in the mitochondria towards glycolysis occurs in tumor cells [9]. Although aerobic glycolysis is less efficient for ATP production than oxidative phosphorylation, aerobic glycolysis is essential for tumor growth and    Abbreviations: T, Tumor; N, Node(regional lymph node); M, Metastasis; 95% CI, 95%Confidence Interval survival [10], by stimulating carbon fluxes to biosynthetic pathways. Increase NADPH synthesis and oxidative defense to meet cell proliferation needs [11].
In the case of cancer and energy metabolism, we have thought of biomarker, although there are many studies on cancer and glycolysis, the research that involved biomarkers of cancer related to glycolysis is limited. In our research this time, we tried to identify a glycolysisrelated biomarkers for patients with HCC.
A biomarker is defined as any substance, structure or process that can be measured in the body and influence or predict the incidence of outcome or disease [12].
Recent studies demonstrated that clinicopathological features such as age, sex, tumor death, margin status, and metastatic diagnosis are insufficient to accurately predict patient prognosis [13]. The ideal biomarker is readily available, reliably measurable, cost-effective, minimally invasive and highly accurate [14]. Therefore, in the cancer, more and more mRNAs might be useful as molecular markers of the patients' prognosis, indicating their important clinical significance should be explored [15]. For example, elevated GalNAc-T mRNA expression in melanoma cells appears to be a biomarker for aggressive melanoma [16], Han et al. confirmed that the MAGE family member A9 had significantly higher expression in laryngeal squamous cellcarcinoma and could be used as an independent prognostic factor in patients with laryngeal squamous cell cancer [17]. The expression of HERV-K in the PBMC has to do with the diagnosis of prostate cancer, particularly in elder men and smokers [18]. However, these biomarkers still exist to be insufficient to predict the survival of patients independently. In particular, because the expression of a single gene can be interfered by many factors through various ways, it is difficult to provide powerful functional predictive effects for such information. For this reason, here we use a statistical model to construct a gene signature containing several related genes, combined with the prediction effect of each component gene to improve prediction efficiency. This model is widely used and is superior to single biomarkers in  predicting disease prognosis [19][20][21]. In our study, in order to confirm that the gene signature is more significant than the single gene biomarker, we verified it with the KM analysis and the ROC curve in HCC, and the results showed that our hypothesis is correct. When the six genes act as a single biomarker respectively, their diagnostic significance was not better than the six-mRNA signature ( Figure  7). And for more in-depth research, the six genes and the mechanisms of glycolysis and HCC need further study.
Excluding in HCC, the above-mentioned methods (KM analysis and the ROC curve) combined with univariate and multivariate analysis were also performed in some other common cancers including KIRC, COAD, BRAC, OV to develop a glycolysis-related risk signature as a prognostic biomarker. And only HCC showed the strong relationship with glycolysis. In this regard, we conducted an in-depth discussion. When the results of GO enrichment analysis for target genes related with glycolysis were shown in HCC, we found that the most relevant enriched CC (Cellular Component) terms were associated with the golgi apparatus ( Figure 3). The Golgi apparatus is a major glycosylation site of the cell and plays an essential role in the secretory pathway [22]. And it is the central organelle along the eukaryotic secretory and endocytic pathway [23]. Thus, the golgi apparatus is the most in cells that require a large amount of synthetic protein. In other words, cells with a secretory effect have more golgi bodies, for example, glandular cells (endocrine glands, digestive glands, etc.), as well as organs with high metabolic rates such as the liver. As the largest digestive gland in the human body, the liver can also secrete some immune globulins and some hormones. Considering that all of the above results are based on data analysis, we used immunohistochemical method to detect the protein level of these genes in HCC tissues. The results of immunohistochemistry were consistent with those of our data analysis, which confirmed that the expression level of these proteins in HCC tissues was significantly higher than that in adjacent tissues. In summary, when we develop a glycolysis-related risk signature as a prognostic biomarker, the result that only HCC showed the strong relationship with glycolysis is reasonable. Our findings provide novel insight to the relationship between glycolysis and HCC and we laid a solid foundation for future research.
In conclusions, we first identified and validated a sixgene risk signature related to glycolysis that can predict the outcome of patients with HCC, where higher risk scores indicate unfavorable survival. And we also provide novel insight into the relationship between glycolysis and HCC. This signature could be a promising prognostic targets in clinical practice, it is useful for verifying patients with HCC with poor prognoses. These results might offer a new view for the research of HCC and individual treatment.

Functional enrichment analyses
GSEA (http://www.broadinstitute.org/gsea/index.jsp) was performed to explore whether identified sets of genes showed significant differences between two groups [24]. GSEA does not require the specification of a clear differential gene threshold. The algorithm provides researchers with an overall trend of the actual data to enable the researchers to examine the overall expression of several genes even without prior experience; thus, this approach improves the connection between the mathematical statistics of the expression of the data and the biological meaning [25]. The expression levels of all mRNAs in adjacent noncancerous tissue and tumor samples were analyzed. Normalized p values (P<0.05) were used to determine which gene set to further investigate. Functional enrichment analyses for those filtered genes were performed using the DAVID (The Database for Annotation, Visualization and Integrated Discovery) Bioinformatics Tool (version 6.7) [26]. Significant enrichment results were visualized using EasyChart (http://www.ehbio.com/ImageGP/index.php/ Home/Index/index.html).

Prognostic analysis
The relationship between the expression level of each mRNAs and the patient OS was calculated using a univariate Cox model. In univariate variable Cox analysis, mRNAs with p values less than 0.05 were considered statistically significant. After that, the multivariable Cox analysis was used to evaluate the weight of mRNAs as the independent predictor of survival. These analyses were conducted using the R package of survival.

Statistical analysis
The filtered mRNAs were classified into risk (HR > 1) and protective (0 < HR < 1) types. Subsequently, a prognostic risk score formula was established based on a linear combination of the expression levels weighted with the regression coefficients derived from the multivariate Cox regression analysis.
Risk score = expression of gene 1×β1 + expression of gene 2×β2 +......+ expression of gene n×βn. We classified 309 patients into high-risk and low-risk subgroups using the median risk score as the cutoff. Kaplan-Meier curves and the log-rank method were used to validate the prognostic significance of the risk score in stratified analysis. The Student's t test was performed to examine the differential expression of optimal genes in adjacent normal tissue and GC tissues. All of the statistical analyses were performed using SPSS 16.0 and GraphPad Prism 7.0 software. We also researched the genetic alterations of prognosis-related genes in HCC by cBioPortal web software (http://www.cbioportal.org/). Chi-square test was used to demonstrate the relationship between risk score and clinical parameters. Scikit-learn package of Python was used to do the cross validation.

Tissue microarray construction and immunohistochemistry
After reviewing the hematoxylin and eosin-stained slides, 55 paired representative paraffin blocks (2007-2017) of HCC and adjacent tissue samples were selected. Tissue cores were extracted from each donor block using a 1.5 mm diameter puncture needle, and arrayed into a new paraffin recipient block with a maximum of 60 cores. Sections were obtained from re-prepared blocks, mounted on poly-L-lysine-coated glass slides, and used for immunohistochemical staining. Sections were deparaffinized and exposed to antigen retrieval at 121°C for 5 min using an autoclave (pH 7.8 Tris-EDTA-citrate buffer). Sections were then incubated with primary antibodies against DPYSL4, HOMER1, ABCB6, CENPA, CDK1 or STMN1 (Sangon Biotech, China) at 37°C for 60 min, followed by incubation with biotinylated secondary antibodies at 37°C for 30 min. The sections were then incubated with horseradish peroxidasecoupled streptavidin for additional 30 min (LSAB kit; Dako, Glostrup, Denmark), and stained with DAB. Sections were then dehydrated.

Evaluation of immunohistochemistry
The immunostaining was evaluated under the light microscope (magnification, ×200; select three fields/ view) by two pathologists blinded to the experimental conditions. The intensity of immunoreactivity was scored as follows: zero for no staining, one for weak staining, two for moderate staining, and three for strong staining. The proportion of positive tumor cells was as follows: zero (no positive cells), one (<25% positive cells), two (26-50% positive cells), three (51-75% positive cells), four (>75% positive cells). The score is obtained by calculating the product of intensity of immunoreactivity and proportion of positive tumor cells. Score >= six represents high expression, otherwise it is low expression.