A glycolysis-related gene signature predicts prognosis of patients with esophageal adenocarcinoma

Background: Esophageal adenocarcinoma (EAC) is a growing problem with a rapidly rising incidence and carries a poor prognosis. We aimed to develop a glycolysis-related gene signature to predict the prognostic outcome of patients with EAC. Results: Five genes (CLDN9, GFPT1, HMMR, RARS and STMN1) were correlated with prognosis of EAC patients. Patients were classified into high-risk and low-risk groups calculated by Cox regression analysis, based on the five gene signature risk score. The five-gene signature was an independent biomarker for prognosis and patients with low risk scores showed better prognosis. Nomogram incorporating the gene signature and clinical prognostic factors was effective in predicting the overall survival. Conclusion: An innovative identified glycolysis-related gene signature and an effective nomogram reliably predicted the prognosis of EAC patients. Methods: The Cancer Genome Atlas database was investigated for the gene expression profile of EAC patients. Glycolytic gene sets difference between EAC and normal tissues were identified via Gene set enrichment analysis (GSEA). Univariate and multivariate Cox analysis were utilized to construct a prognostic gene signature. The signature was evaluated by receiver operating characteristic curves and Kaplan–Meier curves. A prognosis model integrating clinical parameters with the gene signature was established with nomogram.


INTRODUCTION
The incidence of esophageal adenocarcinoma (EAC) has increased in recent decades, especially in the Western world. It is the solid tumor with the fastest increase in cases in the United States in the last 30 years [1]. The prognosis of EAC remains poor because it is usually diagnosed late. Although many efforts have been made to improve prevention, early detection, and treatment, the 5year survival rate for patients with EAC is still less than 15% [2]. Therefore, promising biomarkers may have potential for identifying patients at a high risk of EAC and for evaluating their prognoses, since the progression status of EAC patients does not reflect prognoses and treatment responses accordingly.
Numerous biomarkers have been developed and used to predict the prognoses of cancer patients [3,4]. Some of them, such as glypican 3 (GPC3) and HER2/neu, were identified through global and targeted metabolite profiling [5,6]. However, the prognostic efficiency of single-gene biomarkers is limited. Cancer is partly characterized by reprogrammed energy metabolism [7]. EAC is not only a malignant disease but also an energy metabolism disease. It has been recognized that glycolysis pathways are significantly upregulated in the AGING precursor lesions of EAC, which is known as Barrett's esophagus [8]. Increased glycolysis is a Hallmark of cancer metabolism, yet little is known about this phenotype at malignant stages of progression. Gene set enrichment analysis (GSEA) can detect the overall expression of various genes without requiring extensive experience or a clear differential gene threshold. GSEA reveals general trends in the data. Therefore, this emerging computational technology could be applied to statistically analyze gene expression and biological behavior [9]. When applied to studying the glycolytic process, GSEA could provide better understanding of the underlying mechanism of tumorigenesis and the progression of EAC.
In this study, a new gene signature that effectively predicted the outcome of EAC patients was explored by analyzing data from The Cancer Genome Atlas (TCGA) database. We identified three glycolysis-related gene sets (GO glycolytic process, hallmark glycolysis and reactome glycolysis gene sets) and a five-gene risk model (CLDN9, GFPT1, HMMR, RARS and STMN1) to predict the prognostic outcome of EAC patients. CLDN9 has been found to be related to many human malignancies, such as non-Hodgkin's lymphoma, breast cancer, pituitary oncocytomas, laryngeal carcinoma and endometrial cancer, contributing to disease progression and poor prognosis in patients. Wang et al. revealed that CLDN9 was significantly correlated with overall survival and predicted a poorer prognosis in endometrial cancer patients [10]. GFPT1 is the first enzyme of the hexosamine biosynthetic pathway. It transfers an amino group from glutamine to fructose-6phosphate to yield glucosamine-6-phosphate, thus providing the precursor for uridine diphosphate Nacetylglucosamine (UDP-GlcNAc) synthesis, which is essential for all mammalian glycosylation biosynthetic pathways. Zhang et al. recently suggested that the lncRNA ELFN1-AS1 facilitates the proliferation, migration and invasion of esophageal cancer in vitro by promoting GFPT1 expression [11]. It was also recognized that HMMR was elevated in some malignancies, such as non-small-cell lung, breast, bladder, prostate, colorectal and ovarian cancers, resulting in aggressive phenotypes, poor prognosis and disease progression. Recently, Zhang et al. confirmed the relationship of HMMR involved in a glycolysis related nine-gene risk signature and lung adenocarcinoma in development [12]. Arginyl-tRNA synthetase (RARS) is one of the nine synthetase components of a multienzyme complex, and it belongs to a family of cytoplasmic aminoacyl-tRNA synthetases [13,14]. Its fusion with MAD1L1 might contribute to tumorigenesis, cancer stem cell like properties and therapeutic resistance [15]. STMN1 functions as a critical element of regulating microtubule dynamics, which is necessary in the final stage of cell division, and its mutation may lead to uncontrolled cell proliferation [16][17][18]. STMN1 has been reported to be upregulated in several types of cancer tissues and correlated with tumo r aggressiveness [19,20]. Reports have suggested that higher expression of STMN1 predicts worse survival in patients with several types of solid tumors, such as head and neck squamous cell carcinoma [21], gallbladder carcinoma [22], esophageal squamous cell cancer [23], lung carcinoma [24,25], breast cancer [26], and endometrial cancer [27]. Javed Akhtar et al also reported that STMN1 may be a suitable target for future therapeutic strategies in distal esophageal adenocarcinoma. Its overexpression was found to be associated with lymph node metastasis and increased malignancy in distal esophageal adenocarcinoma both in vivo and in vitro [28]. Multiple single genes correlated with glycolysis have been reported as a predictor of EAC prognosis; however, no glycolysisrelated gene signature has been constructed. In our study, we initially identified a gene signature (CLDN9, GFPT1, HMMR, RARS and STMN1) involved in glycolysis and then proved the predictive ability of this gene signature for EAC. Remarkably, a glycolysisrelated gene signature was identified and could be used to evaluate EAC patients' prognosis independently. Furthermore, a comprehensive nomogram based on clinical factors and gene signatures was established to predict the prognosis of EAC patients.

Development of glycolysis-related genes with GSEA
TCGA data analysis procedure of this study is shown in Supplementary Figure 1. As shown in the flow chart, GSEA was conducted to identify whether the five glycolysis-related gene sets were significantly different between EAC and normal specimens. The results showed that GO glycolytic process (NES=2.00, normalized P<0.001, FDR < 0.001, Figure 1A), hallmark glycolysis (NES=1.79, normalized P=0.007, FDR=0.007, Figure 1B), and reactome glycolysis (NES = 2.01, normalized P<0.001, FDR<0.001, Figure 1C) gene sets were significantly enriched in cancer samples. However, gene sets from the Biocarta glycolysis pathway (NES=0.93, normalized P=0.615, FDR=0.615) and KEGG glycolysis gluconeogenesis (NES=1.00, normalized P=0.456, FDR=0.456) did not manifest many meaningful results. After screening upregulated gene expression in cancer samples, 106 core genes from the GO glycolytic process gene set (Figure 2A), 200 core genes from the hallmark glycolysis gene set ( Figure 2B) and 72 core genes from the reactome glycolysis gene set ( Figure 2C) were used in further analysis.

Expression level of the five-gene signature in EAC and normal tissues
An unpaired t test was used to assess the differential expression of 5 genes in 78 EAC tissues and 9 normal tissues. In comparison with normal tissues, results showed that CLDN9, GFPT1, HMMR, RARS and STMN1were upregulated in EAC tissues ( Figure 3).

Mutation status of the five glycolysis-related genes in EAC
We then evaluated alterations in the five selected genes by testing 182 EAC samples in the cBioPortal database (https://www.cbioportal.org/). The Results demonstrated that genes on inquiry were altered in 13 (7.14%) of the sequenced cases. The CLDN9 gene was changed in www.aging-us.com 25832 AGING 2.75% of cases, displaying diverse alterations; 1.65% had amplification and 0.55% had deep mutations in GFPT1; The HMMR gene contained one deep deletion sample and one truncating mutation sample; RARS and STMN1 had 0.55% amplification and 0.55% missense mutations, respectively ( Figure 4).

Determination of glycolysis-related genes related to survival of EAC patients
Univariate Cox hazard analysis was used to identify individual single genes from the three glycolysis-related gene sets that affect the survival of EAC patients, in which we obtained five statistically significant genes: CLDN9, GFPT1, HMMR, RARS and STMN1. The five genes were associated with OS of EAC patients (P < 0.05). Thus, these screened genes were entered into the multivariate regression analysis and were split into a protective role (HMMR) with hazard ratio (HR) < 1 and a risk role (CLDN9, GFPT1, RARS and STMN1) with HR > 1 ( Table 1). The joint risk score of the five genes was calculated by substituting the coefficient into the formula to evaluate the prognosis as follows: risk score = (0.1340 × expression of CLDN9) + (0.0347 ×  expression of GFPT1) + (-0.1031 × expression of HMMR) + (0.0969 ×expression of RARS) + (0.0590 × expression of STMN1). We split patients with EAC in the TCGA cohort into low-and high-risk groups according to the median risk score, based on the fivegene signature. The distribution of survival status and risk score for each patient are displayed in Figure 5A, indicating that patients in the low-risk group had a better survival rate than those in the high-risk group. Additionally, the expression profiles of the five genes are exhibited with a heatmap ( Figure 5C). Similarly, ROC curves manifested that the areas under the curve (AUC) at 5-year was 0.922 ( Figure 5B), indicating good specificity and sensitivity of the fivegene signature in assessing prognostic outcome for patients with EAC.
The five-gene signature-based risk score acts as an independent prognostic factor Univariate and multivariate analyses were utilized to analyze the effect of each clinicopathological feature in comparison with the five-gene signature on survival. The prognostic value of the glycolysis-related risk score for OS in EAC patients was tested in combination with clinical features including age, gender, grade and stage.  Figure 6B).

Validation of the survival predictive ability of the five-gene signature by Kaplan-Meier curve analysis
A better prognosis in the low-risk score group was revealed by the log-rank method and Kaplan-Meier survival curves (P < 0.001) ( Figure 7H). The UICC (Union for International Cancer Control) stage and fivegene signature were significant in predicting the survival rate of EAC patients in the univariate Cox regression analysis, and the above results were confirmed by the K-M method. Patients at grade 3, III-IV stages (UICC stage) and with lymph node and distant metastasis demonstrated a poor prognosis ( Figure 7A-7G), in accordance with the survival curves.
Stratified analysis was carried out as the above results further confirmed the accuracy of our analysis. As shown in the K-M curves, the five-gene signature was a dependable prognostic indicator for EAC patients who were in the low-risk group and had a better prognosis ( Figure 8A-8C). However, in the N0 ( Figure 8D), M1 ( Figure 8E), G1-2 ( Figure 8F) and female ( Figure 8G) subgroups, the risk parameter could no longer independently act as a prognostic marker.

Construction of a nomogram model integrating the glycolysis-related gene signature
As a tool for use in clinical practice, a nomogram model incorporating the gene signature-based risk score with clinicopathological characteristics (age, gender, grade stage) was constructed to evaluate the survival probability of EAC patients for clinicians ( Figure 9A). Survival probability between the two risk groups demonstrated a significant result with P = 0.0001 ( Figure 9B). The nomogram demonstrated a worth noting value of the five-gene signature for individualized survival estimation ( Figure 9C). The performance of the nomogram was assessed with respect to calibration, discrimination, and clinical usefulness with a C-index of 0.8557.

DISCUSSION
Increasing studies have verified the significant roles of gender, age, smoking history, pathological stage, tumor size, and lymph node and distant organ metastasis in predicting patient prognosis. Accordingly, more associated mRNAs are molecularly noted to evaluate and predict the prognosis of EAC, suggesting their evaluable clinical significance in studies [29]. For instance, expression of the glycolytic enzyme PKM2 is positively associated with obesity in EAC patients [30], AGING overexpression of uncoupling protein-2 (UCP2) abrogated cigarette smoking condensate and deoxycholic acid mediated increases in lactate and ATP production in EAC; these links may provide novel strategies for EAC therapy [31]. The promising prognostic survival and treatment response based on tumor metabolism and targeting alterations in cellular energetics of EAC patients is in its full swing [32][33][34]. Studies have examined the prognostic outcomes of EAC patients with cellular glycolysis-related genes. For example, elevated expression of IGF2 mRNA binding protein 2 (IGF2BP2/IMP2) is linked to short survival and metastasis in EAC [35]. Mucin glycoprotein 1 (MUC1) expression increased during progression to EAC and followed tumor invasion [36]. However, these biomarkers were insufficient to independently predict patient prognoses. In particular, multiple factors can affect single gene expression levels. Thus, these biomarkers may be insufficient to be used as prognostic indicators independently and reliably. Therefore, a statistical model made up of genetic markers was used to improve prediction, of which various genes were combined to predict the effect of a single individual gene. A more accurate ability to evaluate the survival outcome of patients with malignancies leads to a widespread application of the model, compared with single biomarkers [37,38].
Tumors are characterized by unrestricted cell proliferation, which not only eliminates cell cycle control but also gives rise to excessive energy metabolism and eventually results in tumor cell replication and differentiation. Warburg's landmark observation that cancer cells predominantly convert large amounts of glucose to lactate even under conditions of adequate O 2 supply is still acknowledged after more than 90 years. Advanced developments in molecular biology and high-throughput molecular analyses have revealed, that the selection for high rates of aerobic glycolysis, which is a prerequisite for unlimited growth, is due to an accumulation of signaling pathways that are altered by gene mutations or changes in gene AGING expression [39]. This suggested that aerobic glycolysis operates in a sophisticated mechanism. Tumor cells proliferate at a speed outpacing the cellular energy supply, thus, redundant nutrient and oxygen consumption by cells could lead to the tumor microenvironment being hypoxic, acidic and lacking of sugar. This phenomenon is more prominent in solid tumors [40]. After nearly a century of unceasing research and exploration, the Warburg effect has been found to take place in various tumors, including breast cancer, lung cancer, gastric carcinoma and colon cancer. Cellular energy metabolic disorders are extensively acknowledged as one of the features of malignancies, although not all tumors demonstrate the Warburg effect. The glycolysis pathway also plays an important role in Barrett's esophagus developing into EAC, which is illustrated by upregulated pyruvate kinase activity [8]. Thus, a statistical model of glycolysis-related gene signatures comprised of various genes has been built to assess cancer prognosis. Research on large biological datasets has been supported by the technology of the quick development of high-throughput genetic sequencing [19]. A large amount of genomic data was obtained individually to develop new prognostic, diagnostic and immunological biomarkers [20]. Recently, a novel predictive signature was identified by analyzing the gene expression levels or mutations with microarray and RNA-sequencing data. Construction and verification were performed with a Cox proportional hazards regression model [41,42]. In this study, using bioinformatics methods, we identified genes (CLDN9, GFPT1, HMMR, RARS and STMN1) associated with cellular aerobic glycolysis and exhibited their prognostic ability in EAC. This study facilitates the preceding comprehension of EAC and provides a foundation for further EAC research. We collected glycolysis-related genes and compared data of EAC and normal samples AGING from the EAC dataset in TCGA. We then identified 3 functions displaying significant differences in GSEA. The predictive effect of the five-gene signature for patients with EAC was analyzed with univariate and multivariate Cox regression. Compared with other recognized prognostic estimating indicators, this selected risk profile should be a more efficient classification marker for patients with EAC and a more powerful and targeted prognostic method in evaluating prognostic outcome. We adopted the top-ranking function to screen genes in association with patient survival prediction, instead of wide-range exploration. Kaplan-Meier survival demonstrated that patients with high-risk parameters showed a poor prognosis. The detection and calculation of risk parameters in EAC patients indicated effective clinical value. However, we only made use of OS to predict patient outcomes, due to an inadequate metastasis number and lacking recurrence information from TCGA database, which is one limitation of the study. In addition, the risk parameter predicted EAC patient prognoses in all subgroups except for the female, G1-2, N0, and M1 subgroups, as was displayed in the stratified analysis, though the P value (0.053) of the N0 group infinitely approximated 0.05. The negative result of M1 may be restricted to specimen inadequacy (n=5). The results of the female and G1-2 subgroups indicate that the risk parameter is influenced by the gender and grade of patients with EAC, and the consequence requires deeper investigation. Furthermore, compared with the potential of a single parameter, the combination of clinical  characteristics and glycolysis risk score provided a higher potential for clinical application and a more precise prognostic value established with the nomogram.
In conclusion, we identified a glycolysis-related gene signature to predict the prognostic outcome of EAC patients. The five-gene signature was an independent prognostic marker for overall survival, with a lower risk parameter indicating better prognosis. Nomogram integrating clinical factors with this gene signature may not only play a role in predicting EAC patients' prognosis in clinical practice, but also provides an enlightenment in the underlying mechanisms of cellular glycolysis in tumorigenesis and the identification of more gene targets for EAC treatment.

Data collection
We obtained clinical information and gene expression data of patients with esophageal adenocarcinoma from The Cancer Genome Atlas (TCGA) database.
Additionally, the following clinical information was recorded: gender, age, stage, survival status and follow-up time. In total, 78 EAC and 9 normal samples were included for the subsequent study. Detailed information on the overall clinicopathologic features is summarized in Table 2. This study complies with the publication guidelines and access rules of TCGA.

Gene set enrichment analysis (GSEA)
To explore whether glycolysis-related genes exhibit statistically significant, concordant differences between EAC and normal samples, five glycolysisrelated gene sets, the Hallmark, BioCarta, KEGG, GO and reactome gene sets, were downloaded from the Molecular Signatures Database and analyzed with GSEA software 4.0.3. For selecting gene sets enriched in every phenotype, a normalized enrichment score (NES) was obtained by performing 1000 gene set permutations for each analysis. Finally, subsequent analysis was performed when the false discovery rate AGING (FDR) < 0.1, normalized P < 0.05 and |NES| > 1.6 of the gene set.

Establishment of the gene signature
To determine core genes that correlate with the prognosis of EAC patients in enriched glycolysis-related gene sets, the mRNA quantification data were matched with the survival status for subsequent analysis. OS-related core genes were identified with the univariate Cox regression analysis (P < 0.05), and subsequent analysis was performed by multivariate Cox regression. A linear joint risk score of gene expression level using regression coefficient β was established. The risk score for each sample was calculated as follows: risk score = (β1 × expression of gene 1) + (β2 × expression of gene 2) + (β3 × expression of gene 3) + (β4 × expression of gene 4) + (β5 × expression of gene 5). The samples were then divided into high-and low-risk groups based on the median risk scores for survival analysis.

Construction and evaluation of the nomogram
The survival probability of EAC patients was compared by the nomogram model, which integrated with thefivegene signature with clinicopathologic features; this was performed by R software (version 4.0.2). Calibration plots and C-index were generated as an assessment of the nomogram performance. The clinical outcome prediction is displayed on the y-axis and x-axis separately in the calibration graph, with which an ideal prediction could be indicated with a 45-degree dotted line. Bootstrapping was used as an internal validation to decrease the bias of the C-index's predictive ability.

Statistical analysis
Statistical analyses were conducted using Excel software (Microsoft Corporation, California) and R software. The prognostic significance of individual indicators was evaluated by univariate and multivariate Cox proportional hazard regression analyses. Kaplan-Meier curves and log-rank tests were utilized to assess the prognostic outcome. Comparison of the different expression levels between the two groups was performed by unpaired t test. Genetic changes in the 5 glycolysisrelated genes in EAC were obtained from the cBioPortal website (http://www.cbioportal.org/). R software (version 4.0.2) was utilized to draw the heatmaps, ROC curves, enrichment, forest and calibration plots. P < 0.05 was regarded as statistically significant.