Identification and prognostic value of metabolism-related genes in gastric cancer

Gastric cancer (GC) is one of the most commonly occurring cancers, and metabolism-related genes (MRGs) are associated with its development. Transcriptome data and the relevant clinical data were downloaded from The Cancer Genome Atlas and Gene Expression Omnibus databases, and we identified 194 MRGs differentially expressed between GC and adjacent nontumor tissues. Through univariate Cox and lasso regression analyses we identified 13 potential prognostic differentially expressed MRGs (PDEMRGs). These PDEMRGs (CKMT2, ME1, GSTA2, ASAH1, GGT5, RDH12, NNMT, POLR1A, ACYP1, GLA, OPLAH, DCK, and POLD3) were used to build a Cox regression risk model to predict the prognosis of GC patients. Further univariate and multivariate Cox regression analyses showed that this model could serve as an independent prognostic parameter. Gene Set Enrichment Analysis showed significant enrichment pathways that could potentially contribute to pathogenesis. This model also revealed the probability of genetic alterations of PDEMRGs. We have thus identified a valuable metabolic model for predicting the prognosis of GC patients. The PDEMRGs in this model reflect the dysregulated metabolic microenvironment of GC and provide useful noninvasive biomarkers.


INTRODUCTION
Gastric cancer (GC) is one of the most common gastrointestinal cancers, with a high incidence in East Asian countries. The pathogenesis of GC is a multifactorial and multi-step process [1]. GC is diagnosed using endoscopy, biopsy, and pathology. Its five-year survival rate is related to the stage of the disease at diagnosis [2]. Early diagnosis has increased due to the application of advanced detection methods, but the prog-nosis of patients with advanced GC remains very poor [3,4]. The occurrence of GC reflects the abnormal regulation of tumor-related genes [1,5,6]. Understanding the molecular mechanisms underlying GC will facilitate the diagnosis and enhance the treatment of GC, and the development of biomarkers for early detection will improve the prognosis of GC patients.
Dysregulation of the metabolic environment in the body plays a key role in cancer. The Warburg effect is a form AGING of glycolysis that occurs in tumors in an aerobic environment [7]. One study found that a long noncoding RNA, lncRNA-MACC1, can enhance the Warburg effect in GC cells and up-regulate expression of glycolytic enzymes [8]. There are also changes in amino acid metabolism in patients with GC; the levels of cysteine, serine, isoleucine, tyrosine, and valine are increased [9,10]. Exploring these metabolic changes in cancer may yield new treatments. In recent years, various metabolic enzymes and their products have become important as potential drug targets [11][12][13]. Drugs developed for the treatment of metabolic disorders may also be effective in the treatment of some cancers [14]. To study the prognostic utility of metabolism-related genes (MRGs) in GC, we established a GC prognostic risk model and studied its clinical application.

Identification of PDEMRGs in GC
Four hundred and seven mRNA samples (375 GC tissues and 32 adjacent nontumor tissues) were analyzed in The Cancer Genome Atlas (TCGA). Through the Wilcoxon signed-rank test, 194 differentially expressed MRGs (DEMRGs) were obtained, including 122 upregulated genes and 72 down-regulated genes of GC tissues compared with adjacent nontumor tissues ( Figure 1A, 1B).
To determine the prognostic DEMRGs (PDEMRGs), univariate Cox regression analysis was used to screen the expression of DEMRGs in a training cohort. Sixteen DEMRGs (8 high-risk genes and 8 low-risk genes) were identified to be related to the overall survival (OS) of GC patients ( Figure 1C).

Establishment and validation of the prognostic risk model
Lasso regression was performed to remove PDEMRGs that are related to each other to prevent the model from overfitting (Figure 2A, 2B). We obtained 13 candidate PDEMRGs (risk genes) to construct the prognostic risk model (Table 1). CKMT2, ME1, GSTA2, ASAH1, GGT5, RDH12, and NNMT were identified as high-risk genes, while POLR1A, ACYP1, GLA, OPLAH, DCK, and POLD3 were identified as low-risk genes.
Patients in the training cohort were divided into a highrisk (n=167) and a low-risk group (n=167) by the median risk score. To identify the prognostic difference between them we made a Kaplan-Meier curve. The OS was poorer in the high-risk group than the low-risk group (p < 0.001) ( Figure 3A). We ranked the risk score of patients in the TCGA dataset (training cohort). The dot chart showed the survival state of patients and a heat map described the expression pattern of the high-risk and low-risk genes in the two groups ( Figure 3C). Seven high-risk genes (CKMT2, ME1, GSTA2, ASAH1, GGT5, RDH12, and NNMT) were upregulated, while six low-risk genes (POLR1A, ACYP1, GLA, OPLAH, DCK, and POLD3) were downregulated. The risk genes showed opposite expression patterns for patients with low-risk scores. The prognostic model was validated in the Gene Expression Omnibus dataset (GEO, verification cohort), and the results were consistent with the training group ( Figure 3B, 3D).
These results indicated that this risk model can accurately predict the prognosis for GC patients.

Independent prognostic value of the risk model
We performed univariate and multivariate Cox regression analysis to determine if the risk score generated by the prognostic model was independent of other clinical indices.
In the training cohort, univariate Cox regression analysis representing age, stage, tumor (T), node (N), and risk score were significantly correlated with the OS (p < 0.05) ( Figure 4A). Multivariate Cox regression analysis indicated the variables of age, gender, and risk score were independently correlated with the OS (p < 0.05) ( Figure 4B). In the verification cohort, univariate and multivariate Cox regression analysis showed the variables of age, T, N, and risk score were significantly associated with the OS (p < 0.05) ( Figure 4C, 4D). The results indicated that the risk model could serve as an independent prognostic factor independent of other clinical indices.
The risk score was more precise than other clinical indices. The AUCs (area under the curve) of the training cohort at risk score, age, gender, T, and N were 0.695, AGING 0.572, 0.536, 0.558 and 0.574, respectively ( Figure 5A). However, the risk score in the verification cohort is not the largest, which is not consistent with the result of the training cohort. This may be related to the relatively small sample size of the verification cohort ( Figure 5C). To better predict the prognosis of GC patients, we established a nomogram model that accurately predicted the OS at 1, 3, and 5 years based on the variables Volcano plot of DEMRGs: red indicates upregulated DEMRGs, green indicates downregulated DEMRG, and black indicates DEMRGs that were not significantly differentially expressed. (C) Forrest plot of PDEMRGs: The red represents high-risk genes (hazard ratios, HR > 1); the green represents low-risk genes (HR < 1). AGING associated with OS (age, gender, grade, stage, T, N, metastasis (M) and risk score) ( Figure 5B, 5D).

Gene set enrichment analyses
Gene Set Enrichment Analysis software (GSEA) was used and identified 60 significantly enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways in the training cohort or verification cohort (Nominal p-value < 0.05). The majority of enrichment pathways were associated with metabolism, including arachidonic acid metabolism, drug metabolism by cytochrome p450, xenobiotics metabolism by cytochrome p450, pyrimidine and purine metabolism, glyoxylate and dicarboxylate metabolism, alanine aspartate and glutamate metabolism, cysteine and methionine metabolism, and fructose and mannose metabolism. Physiological processes such as glycan degradation and ubiquitin-mediated proteolysis significantly inhibit the occurrence and development of cancer, and there are common signaling pathways in cancer, MAPK signaling pathway, and the p53 signaling pathway ( Figure 6).

Clinical utility of the PDEMRGs
Exploration of the PDEMRGs during GC clinical progression indicated that the levels of GGT5 and NNMT were increased with clinical stage. This correlation between the expression levels of these two genes and GC progression may be useful in GC diagnosis ( Figure 7A).
Survival analysis indicated that the expression of GGT5, NNMT, and GLA had a significant association with patient survival (P < 0.05). Higher expression (yellow line) of GGT5 and NNMT indicated poorer prognosis. Lower expression (blue line) of GLA indicated lower patient survival (P < 0.05). The correlation between PDEMRGs and GC prognosis showed that PDEMRGs contribute to the progression of GC ( Figure 7B).

External validation of the PDEMRGs using the online database
The Gene Expression Profiling Interactive Analysis database (GEPIA) corroborated the differences in gene expression between GC and normal gastric tissues. Boxplot showed most genes in the model had differences in GC mRNA expression compared with normal gastric tissues (P < 0.05) ( Figure 8A). Genes such as CKMT2, GSTA2, and RDH12 were downregulated, while POLR1A, ASAH1, GLA, DCK, and POLD3 were up-regulated. Representative protein expression was determined in the Human Protein Atlas ( Figure 8B). The immunohistochemistry of GC genes was positive compared with normal gastric tissues,

AGING
suggesting that the protein expression was increased. These results are consistent with the results of mRNA expression. However, RDH12 was not found in the database.
NNMT, ACYP1, and GLA were significantly overexpressed in GC, while CKMT2, ME1, GSTA2, and RDH12 were significantly under-expressed in the Oncomine database ( Figure 9A). There is no mRNA expression of ASAH1, GGT5, POLR1A, OPLAH, DCK, and POLD3 in GC in the Oncomine database, but these genes have been confirmed to be over-expressed in GC both in the GEPIA and The Human Protein Atlas. Mutations in the form of amplification and deletion was expressed in the high-risk group, while amplification was expressed in the low-risk group. OPLAH had the most common genetic alterations (12%), and amplification was the most frequent genetic alteration ( Figure 9B).
We evaluated the tumor mutation burden (TMB) of the GC dataset and found the mutation count in the high TMB group was higher compared with the low-risk group (p < 0.05) ( Figure 10). This result illustrates that our model can stratify patients for personalized treatment.

DISCUSSION
The development of genome sequencing and combining prognosis-related genes with traditional parameters has advantages in predicting cancer [15,16]. Metabolism-related research is a new research focus [7]. Recent studies have analyzed the association between metabolism-related genes, the risk of GC, and AGING diagnostic value [17]. Wei Zheng found that single nucleotide polymorphisms (SNPs) in trace elementrelated metabolic genes were related to GC risk [18]. Therefore, the expression of PDEMRGs may predict the progression of GC and the prognosis of GC patients. We identified the PDEMRGs based on a training cohort and utilized the PDEMRGs to build a reliable model to predict the OS in GC patients, which we verified with a verification cohort.
Univariate and multivariate Cox regression analysis showed that this model was an independent prognostic factor independent of other clinical indices. The risk score derived with the model was more accurate than other clinical parameters in predicting OS. Nomogram analysis showed that this model combined with clinical indices (age, gender, grade, stage, and TNM) could be used to accurately predict the OS of GC patients at 1, 3, and 5 years. This may help plan short-term follow-up of AGING individualized treatment, so that early intervention can improve prognosis.
GSEA analysis showed that abundant metabolic pathways were related to tumors, which confirmed the close relationship between the model and the metabolic system. Patients in the high-risk group were associated with arachidonic acid and cytochrome P450 metabolic pathways, while patients in the low-risk group were associated with pyrimidine, glyoxylic acid and dicarboxylates, aspartic acid and glutamate, alanine, cysteine, and methionine, fructose and mannose, and purine metabolic pathways. Interestingly, the low-risk group involved more diverse metabolic pathways, and they may benefit from metabolic-related treatments.
We also analyzed the clinical application of the model. The expression of several genes, including GGT5, NNMT and ME1, was increased in stage I and II GC, and there was a negative association with patient survival. This indicates that this model has high prognostic value, especially for the short-term survival of GC patients. The PDEMRGs in the prognostic risk model are associated with the occurrence and development of cancer. For instance, ME1, a wellknown oncogene, promotes GC growth, lung metastasis, and peritoneal dissemination, and overexpression of ME1 correlates with shorter and diseasefree GC survival [19]. In addition, cancer cells expressing NNMT can alter the epigenetic state and increase expression of pro-tumorigenic gene products [20].
DNA replication stress induced by oncogenes is considered a driving factor of tumorigenesis. Research has shown that POLD3 plays a unique role in the process of RS-induced DNA break repair, so targeting POLD3 could provide a priority opportunity to target cancer cells [21][22][23]. DCK negatively regulates the transcriptional activity of NRF2, resulting in a decrease in the expression of antioxidant genes, and negatively regulates intracellular redox homeostasis and ROS production. DCK has a negative regulatory effect on the proliferation and metastasis of pancreatic cancer cells. The low expression of DCK promoted NRF2-mediated antioxidant transcription, which enhanced drug resistance to gemcitabine [24]. OPLAH encodes the 5oxoproline enzyme, which controls the synthesis and degradation of glutathione. Hypermethylation of OPLAH3 is a common feature of some tumors. Naumov et al. sequenced the genomes of 22 pairs of colorectal cancer (CRC) and adjacent tissues and found that OPLAH is the initiation gene of DNA methylation in CRC [25]. Roy et al. found that GLA promotes mitochondrial death and apoptosis and reduces hypoxia. It combines limitation of de novo fatty acid synthesis and the cholinergic anti-inflammatory pathway that confirms anticancer function [26]. Cao et al. reported that ACYP1 was lower in imatinib-resistant gastrointestinal stromal tumor T1 cells [27]. Silencing POLR1A can hinder G1-S cell cycle progression in p53inactivated human cancer cell lines [28]. Guo et al. observed that in cervical squamous cell carcinoma (CSCC) tissues, RDH12 expression was reduced by 74.5%. The expression of RDH12 was negatively associated with tumor size and infiltration depth in cervical cancer [29]. Studies have confirmed that GGT5 gene amplification contributes to non-small cell lung cancer (NSCLC). Cells produce high levels of AGING glutamate and promote glutamine metabolism [30]. Immunohistochemical staining was used to detect the expression of ASAH1 in 120 cases of non-special invasive ductal carcinoma (IDC-NOS). The expression of ASAH1 correlated with lymph node metastasis, suggesting that ASAH1 is a biomarker predictive of lymph node status [31]. Low expression of glutathione S-transferases (GSTs) in the liver reduces the detoxification of chemical carcinogens. GSTA2 was found to be in linkage disequilibrium in Caucasians [32]. It was concluded that CKMT2 was a key regulatory factor in the development of osteosarcoma, and significantly correlated with patient OS [33].
Mutations in the genome can alter gene expression [34]. Research by DeBerardinis and Chandel confirmed that glycolysis is correlated with activated oncogenes and mutated tumor suppressors [11]. When Laskowski and her colleagues studied aging complement factor H-deficient mice, they observed spontaneous hepatic tumor formation in more than 50% [35]. The results illustrate the interaction between aging, genetic mutation, and cancer. We believe that aging is closely related to gastric cancer. Additionally, Hamada et al. found that tumor mutation burden (TMB) is related to the emergence of new antigens [36]. Therefore, we checked whether this model reflected the TMB of GC patients. The results showed the high TMB group was significantly higher than the low TMB group. We found the overall probability of genetic alterations was higher in the low-risk group than in the high-risk group. These findings indicate that this model can be used in patients with different metabolic abnormalities, making individualized therapy strategies possible.
In summary, we used 13 PDEMRGs to build a risk model that accurately predicts the prognosis in patients with GC. In addition, this model reflects the dysregulated metabolic microenvironment in tumor patients, and provides biomarkers for metabolic treatment of these patients. However, further in vitro and in vivo experiments are needed to validate the results of this research.

Identification of differentially expressed metabolismrelated genes
Wilcoxon signed-rank test was used to analyze the differences of 745 annotated MRGs with protein-coding functions.

Establishment of experimental model
We used univariate Cox analysis to initially identify potential PDEMRGs. Lasso penalty Cox regression analysis [37] was used for confirmation. The penalized maximum likelihood estimator with 1000-fold cross validation was used to construct the prognostic risk model. The expression values of PDEMRGs were weighted by the regression coefficient of the Cox regression model to calculate the risk score of every patient. Risk score = (Coefficient mRNA1 ×mRNA1 expression) + (Coefficient mRNA2 ×mRNA2 expression) + ⋯ + (Coefficient mRNAn × mRNAn expression). Taking the median risk score of the training cohort as the cut-off value, all patients with GC were divided into a high-risk group and a low-risk group. R packages "survival" [38] and "survminer" were performed to compare the survival differences between the high-and low-risk group, and a significant p-value was obtained. The verification cohort was used for verification.

Independence of the PDEMRGs
Univariate and multivariate Cox regression analysis were performed to analyze the independent prognosis of GC patients with forwarding stepwise procedure. P < 0.05 indicated statistical significance. The "SurvivalROC" [39] of R package was used to determine the prognostic value of the risk score. The nomogram [40] was constructed by including all independent prognostic factors to predict the survival of GC patients at 1, 3, and 5 years.

External verification of PDEMRGs
To verify the expression of PDEMRGs in this model, the mRNA level was validated by Gene Expression Profiling Interactive Analysis database (GEPIA, http://gepia.cancer-pku.cn/) and the Oncomine database (https://www.oncomine.org/resource/main.html). The AGING Human Protein Atlas database (http://www.protei natlas.org) was used for the protein level. GEPIA was also used to determine the pathological stage and survival of PMRGs in this model. The genetic alterations of PMRGs in this model were evaluated by cBioportal for Cancer Genomics (http://www.cbioportal.org/).

Statistical analysis
All statistical analyses were run through R software version 3.6.1 (https://www.r-project.org/). Differences between variables were evaluated using independent ttests. Log-rank test was used to compare the high-risk with the low-risk group in the Kaplan-Meier curve. Qualitative variables were compared by Pearson χ 2 test or Fisher's exact test. A two-sided P<0.05 was considered statistically significant.

AUTHOR CONTRIBUTIONS
Fang Wen conceived, designed and write the manuscript. Jiani Huang analyzed the data and generated the figures and tables. Xiaona Lu, Wenjie Huang and Yulan Wang performed the literature search and collected data for the manuscript. Yingfeng Bai and Shuai Ruan revised the images. SuPing Gu and Xiaoxue Chen revised the tables and checked the manuscript. Peng Shu directed the manuscript. All authors read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors have declared that no conflicts of interest exist.