Identification of Prognostic Metabolism-Related Genes in Clear Cell Renal Cell Carcinoma

Background Clear cell renal cell carcinoma (ccRCC) is a cancer with abnormal metabolism. The purpose of this study was to investigate the effect of metabolism-related genes on the prognosis of ccRCC patients. Methods The data of ccRCC patients were downloaded from the TCGA and the GEO databases and clustered using the nonnegative matrix factorization method. The limma software package was used to analyze differences in gene expression. A random forest model was used to screen for important genes. A novel Riskscore model was established using multivariate regression. The model was evaluated based on the metabolic pathway, immune infiltration, immune checkpoint, and clinical characteristics. Results According to metabolism-related genes, kidney clear cell carcinoma (KIRC) datasets downloaded from TCGA were clustered into two groups and showed significant differences in prognosis and immune infiltration. There were 667 differentially expressed genes between the two clusters, of which 408 were screened by univariate analysis. Finally, 12 differentially expressed genes (MDK, SLC1A1, SGCB, C4orf3, MALAT1, PILRB, IGHG1, FZD1, IFITM1, MUC20, KRT80, and SALL1) were filtered out using the random forest model. The model of Riskscore was obtained by multiplying the expression levels of these 12 genes with the corresponding coefficients of the multivariate regression. We found that the Riskscore correlated with the expression of these 12 genes; the high Riskscore matched the low survival rate verified in the verification set. The analysis found that the Riskscore model was associated with most of the metabolic processes, immune infiltration of cells such as plasma cells, immune checkpoints such as PD-1, and clinical characteristics such as M stage. Conclusion We established a new Riskscore model for the prognosis of ccRCC based on metabolism. The genes in the model provided several novel targets for the study of ccRCC.


Introduction
Approximately four hundred thousand people are diagnosed with renal cell carcinoma (RCC) worldwide every year. Approximately 70% of these patients have clear cell RCC (ccRCC) [1], one of the most common malignancies of the urinary system [2]. Genetically, the continuous loss of multiple tumor suppressor genes leads to ccRCC [3]. Surgical resection is the main treatment for early-stage ccRCC, but approximately three out of ten patients have metastasis after resection [4]. ccRCC is not sensitive to radiotherapy and chemotherapy [5]; therefore, more effective targeted therapy is needed. Other studies have explored the possible molecular markers or therapeutic targets of ccRCC from different perspectives, such as autophagy associated long noncoding RNAs (lncRNAs) [6], methylation modification of m6A [7], DNA methylation [8], and immune invasion [9]. We set out to find new potential targets from a metabolic perspective.
Mutations in cancer cells can lead to metabolic reprogramming causing abnormal metabolic patterns to meet the different needs from normal cells for cancer proliferation and growth [10]. Increased aerobic glycolysis and impaired oxidative phosphorylation, known as the Warburg effect [11], occur in cancer. e research and development of many anticancer drugs are aimed at the changes in these metabolic pathways [12]. e transformation of renal epithelial cells into ccRCC leads to a decrease in the level of fatty acid oxidation and damage to the mitochondrial structure; the resulting accumulation of glycogen and lipids is the cause of ccRCC transparency [13]. e upregulation of glycolysis pathway genes is a central event in the pathogenesis of ccRCC [14]. e research and development of many anticancer drugs are aimed at the changes in these metabolic pathways [12]. Further studies on the changes in metabolism in ccRCC are needed. e random forest algorithm is a common machinelearning algorithm often used in cancer and biological research as a routine bioinformatics protocol; it was used to reduce the dimensions to screen more important genes. is method has been widely used in many different prognostic models of ccRCC such as chromatin-remodeling genes [15], DNA methylation patterns [16], and microRNA [17]. Cancer is usually accompanied by abnormal changes in metabolic patterns. Previous studies have also established prognostic models for healthy people and ccRCC patients from the perspective of metabolism [18]. Our study found some key gene differences between high-risk and low-risk ccRCC patients that were not previously noted by other researches, providing a new research target for a follow-up study.
It was recognized in as early as the 1960s that there is a relationship between immune infiltration and prognosis in different diseases [19]. e cells involved in immune infiltration are immune cells that appear in tumors and are divided into 22 types, such as T, B, NK, and plasma cells. For example, NK cells play a role in tumor immunity; receptor or coreceptor recognition of ligands on tumor cells can activate NK cells, resulting in targets with insufficient HLA I expression being killed [20]. e metabolism of NK cells is impaired in the tumor microenvironment [21]. In cancer, neutrophils may promote tumor progression, in part by producing reactive oxygen species (ROS) [22]. e rewiring of other metabolic pathways in neutrophils may affect their tumorigenesis/ metastasis promoting function [23]; such pathways are the main components of nontumor constituents in the tumor microenvironment, and different types of malignant tumors often show different features of immune cell subsets [24]. e prognostic significance of T cell tumor infiltration has been widely accepted [25]. Immune checkpoints refer to the set of inhibitory pathways possessed by immune cells to regulate and control the durability of the immune response while maintaining self-tolerance [26]. Many successful immunotherapies targeting these checkpoints are already available to treat ccRCC [27].
In this study, by clustering the cancer genome atlas (TCGA) kidney clear cell carcinoma (KIRC) datasets according to metabolic patterns and screening differentially expressed genes of two clusters, we constructed and verified a Riskscore model and analyzed the relationships between Riskscore and metabolic pathway, immune infiltration, immune checkpoint, and clinical features.

Datasets and Preprocessing.
e workflow of this study is presented in Supplementary Figure S1. e phenotype datasets and RNA sequencing datasets of TCGA Kidney Clear Cell Carcinoma (KIRC) were downloaded from UCSC Xena (https://xenabrowser.net/). en, the number of fragments per kilobase million fragments (FPKM) was converted to transcript/value per million-word node (TPM). e microarray dataset GSE29609 (n � 30) was used as an external validation set accessed from the Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/). Affymetrix was used to generate raw data of the microarray dataset and quantile normalization and background correction of these data were performed using the rapid motor adaptation algorithm in the affy package. e clinicopathologic features (gender, age, grade, stage, and status) of the TCGA KIRC dataset were sorted and are shown in Supplementary Table S1.

KIRC Metabolic Gene Clustered.
e 2752 metabolismrelated genes were obtained from a previous study [28]. A total of 2585 genes were identified in TCGA data. en, 1416 genes were screened by a univariate cox, and KIRC data were classified by the nonnegative matrix factorization clustering method to determine the metabolism-related patterns; datasets of patients were clustered for further analysis.

Immune Infiltration and Pathway Analysis.
e type and number of immune cells in KIRC samples were quantified using the cell-type identification by estimating relative subsets of known RNA transcripts algorithm [29] to compare the differences in immune infiltration in different cluster categories or risk groups. Gene set variation analysis was used to calculate the activity of the metabolic pathway using 114 metabolic pathway sets [29].

Establishment of the Metabolic Riskscore Model.
e limma package was used to identify the genes related to metabolism (P < 0.05 and |logfc| > 1.5); univariate screening was performed, and random survival forest was used for further screening [30]. e Riskscore was the sum of the gene expression value * regression coefficient (Riskscore �  e cell line and the medium above were purchased from Shanghai Zhong Qiao Xin Zhou Biotechnology Co., Ltd. A normal human kidney proximal tubular cell line (CL-0109), HK-2, and its special medium (CM-0109) were purchased from Procell Life Science & Technology Co., Ltd. ese cells were cultured at 5% CO 2 and 37°C. e medium contained 10% FBS (Gibco) and 1% Penicillin-Streptomycin Solution (C0222, Beyotime).

Quantitative Real-Time PCR (qRT-PCR).
Relative RNA expressions of SLC1A1, MALAT1, FZD1, and SALL1 in HK-2 and CAKI-1 cell lines were detected by qRT-PCR. e total RNA of cells was isolated by TRIzol® (15596026, ermo) and reverse-transcribed to cDNA by HiFiScript cDNA Synthesis Kit (CW2569, Cwbio). qPCR amplification was performed using UltraSYBR Mixture (CW2601, Cwbio) with QuantStudio ™ 1 Real-Time PCR System ( ermo), and the cycling conditions were followed by the operating instructions. e sequences of the primers used in this study are shown in Table 1. e expression of β-actin was selected as an internal reference, and the relative RNA expression of genes was calculated by the 2 −ΔΔ CT method (the expression fold of genes in HK-2 was regarded as 1, respectively) 2.7. Statistical Analyses. Before the unpaired Student's t-test was used to compare the differences between the two groups, the Shapiro-Wilk test was used to detect whether the variables were normally distributed. If they did not conform to the normal distribution, the Wilcoxon test was used to compare the differences between the groups. Correlation coefficients were calculated using the Pearson correlation analysis and distance correlation analysis. e datasets of patients were divided into high-risk and low-risk groups based on dichotomy. e data were visualized using ggplot2 (a package for R). Survival curves of subgroups were generated by the Kaplan-Meier method. e statistical significance of differences in each dataset was identified using the log-rank test. Survival curves were generated by survminer (a package for R); heat maps were generated using pheatmap. All statistical analyses above were performed in the environment of R 3.6.1. All statistical tests were two-sided and considered statistically significant when the P value was <0.05. Column charts of relative RNA expressions were drawn by GraphPad Prism 8.0.2.

Two Groups of Patients Clustered by Metabolic Patterns
Had a Different Prognosis. According to the metabolic genes, ccRCC patients were clustered into two groups (Figure 1(a)). It is suitable to divide the samples into 2 clusters rather than more clusters (Supplementary Figure S2). ere were significant differences in the results of survival analysis between the two groups ( Figure 1(b)). e distribution of immune cells in the two groups is shown in Figure 1(c). e results showed a significant difference between the two groups in NK cells activated, T cells follicular helper, B cells memory, neutrophils, dendritic cells activated, T cells CD4 memory activated, eosinophils, macrophages M1, B cells naive, and plasma cells. ere was a significant difference in certainty between the two groups.

e Riskscore Model Was Established according to the
Differentially Expressed Genes: the Higher the Score, the Worse the Prognosis. To study the prognosis of ccRCC based on metabolism, we established a Riskscore model according to the following steps. First, by analyzing the differential expression of metabolism-related genes between the above two categories, 667 candidate genes (Supplementary Table S2) were initially obtained; 408 genes (Supplementary Table S3) were left after univariate screening. Finally, 12 genes were obtained using the random forest algorithm (Figure 2 e Riskscore model for these 12 genes was established using the multivariate Cox method. e Riskscore of each sample was calculated by the sum of multiplying the gene expression in the sample with their coefficient (the weight calculated by the Cox regression model).
en, we analyzed the correlation between Riskscores and the above 12 genes and ranked the samples according to the model score to create a heat map. Figure 2 is Riskscore model could be used to predict the prognosis of ccRCC.

Riskscore and Different Metabolic Patterns between ccRCC Samples.
e results of the correlation analysis between Riskscore and the activities of 114 metabolic pathways are shown in Figure 3. e Riskscore was negatively correlated with carbohydrate metabolism pathways such as glycolysis, gluconeogenesis, pyruvate metabolism, citric acid cycle, oxidative phosphorylation, pentose and glucuronate interconversions, and pentose phosphate and was negatively correlated with amino acid metabolism pathways such as glycine, serine and threonine metabolism, alanine, aspartate and glutamate metabolism, homocysteine biosynthesis, methionine cycle, cysteine and methionine metabolism, and kynurenine metabolism. e Riskscore was also negatively correlated with lipid metabolism pathways such as fatty acid degradation, glycerolipid metabolism, glycerophospholipid metabolism, steroid hormone metabolism, and steroid hormone biosynthesis, and negatively correlated with purine metabolism, pyrimidine metabolism, and purine biosynthesis. Riskscore was also negatively correlated with remethylation, vitamin K, retinol metabolism, and other Table 1: Primers used in this study.

Primers
Sequences  M  M0  M1  N  N0  N1  T  T1  T2  T3  T4   Stage  I  II  III  IV  Grade  G1  G2  G3     pathways. But there were few pathways positively correlated with Riskscore, such as transsulfuration, thromboxane biosynthesis, linoleic acid metabolism, alpha-linoleic acid metabolism, cyclooxygenase arachidonic acid metabolism, and retinoic acid metabolism, etc. It can be seen that our Riskscore is related to most metabolic processes and is mainly negatively correlated, involving almost every category of metabolism.

Relationship between Immune Infiltration and Immune
Regulatory Factors in ccRCC Riskscore. To explore the relationship between Riskscore and immune infiltration, we divided the samples into two groups: high-Riskscore and low-Riskscore (Figure 4(a)). ere were significant Riskscore differences in immune cells, including macrophages M0, macrophages M2, mast cells activated, mast cells resting,

Relationship between Riskscore and Clinical Features of ccRCC.
To analyze the relationship between different clinical characteristics and Riskscore model, we first classified the data according to clinical characteristics. e results of the data analysis showed no significant difference in Riskscore among patients of different ages or genders, but there was a significant difference in Riskscore among samples with different M stage, N stage, T stage, grade, stage, and status ( Figure 5). Univariate and multivariate analyses of clinical features and Riskscores are shown in Table 2. Univariate analysis showed that the Riskscore was significantly correlated with all clinical features except gender, while multivariate analysis showed that the Riskscore was significantly correlated with age and M stage. e results above indicated that Riskscores based on these 12 metabolism-related genes were a feasible prognostic factor in different populations.

Discussion
Metabolic reprogramming usually occurs in tumors. Compared with other cancers, studies tend to regard ccRCC as a metabolic disease [31,32]. Metabonomics experiments have confirmed that there are significant changes in metabolic patterns in ccRCC, such as the rapid destruction of metabolic pathways of energy, amino acids, creatinine, and uric acid [33]. Many studies have evaluated the prognosis and diagnosis of ccRCC from the perspective of metabolic patterns [2] and treatment of ccRCC by reversing the abnormal metabolic pattern [34]. Our study has established a metabolic model to assess the prognosis of ccRCC and identified several genes related to the disease that were not considered previously. Among the genes we selected to construct the Riskscore model, SLC1A1 [35], FZD1 [36], and SALL1 [37] were downregulated in ccRCC, while MALAT1 [38] was  Journal of Oncology upregulated; there are specific clinical trials to verify their relationship with ccRCC. For example, MALAT1 is a lncRNA that acts as a marker in various cancers [39], and miR-182-5p can reduce the proliferation of ccRCC by binding with MALAT1 [38]. C4orf3 [40] and MDK [41] were also used as markers to evaluate ccRCC. Research on the remaining genes in ccRCC is rare, although these genes play important roles in other cancers. It seems that cancers prefer glycolysis that does not require oxygen consumption and does not involve pyruvate metabolism [42], and this phenomenon was confirmed by isotope experiments in ccRCC [43]. Oxidative phosphorylation is not high in ccRCC [4], and the TCA cycle in ccRCC is also reduced, differing from the metabolic pattern of the human brain and lung tumors [43]. e inhibition of gluconeogenesis and increased glycolysis are common in ccRCC [44]. In more than 600 cases of ccRCC, the level of fructose 1,6-bisphosphatase 1 (FBP1), a gluconeogenic enzyme, was reduced and was related to the poor prognosis of this disease [45]. In addition to the glycolysis pathway, the negative trends of our Riskscore model were consistent with those reported above for carbohydrate metabolism. Serine and threonine that often appear at protein kinase phosphorylation sites have hydroxyl groups in their structures.
ere may be cooperation between immune cells that increases the complexity, and even the immune cells may be further divided into more subtypes, so the observation may not be simply related to the number of cells of a given type. e overall survival and progressionfree survival of patients with ccRCC and Hodgkin's lymphoma with severe CD8 + T cell infiltration were significantly  shorter. However, in some ccRCC patients with a "normal" immune environment, oligoclonal CD8 T cells express perforin. A high density of CD8 + T cells is associated with a good prognosis in this subgroup [55]. e trends of our Riskscore model were consistent with those reported above. e growth and progression of cancer are associated with immunosuppression [56]. Studies have found that immune inhibitors such as LAG3 [27,52], BTLA [27], PD-1 (PDCD1) [52], and CTLA-4 [52], or immunostimulators like TNFSF13B [57] play important roles in ccRCC. Our Riskscore was related to the immune checkpoints, which provides a new way to explore the mechanism of ccRCC. e pattern of metabolism in ccRCC cells influences immune cells. For example, sulfatide, a product of ether lipid metabolism, accumulates in ccRCC, which could combine with platelets and evade cytotoxicity mediated by natural killer cells and immune surveillance [58]. In our study, we found that the effect of CD8 T cells was significantly higher in the high-Riskscore group (Figure 4(a)). Effector T cells require a high rate of glucose metabolism, while cancer cells inhibit T cells through using up nutrition and producing harmful components such as lactic acid. Although many CD8 + Tcells are involved in ccRCC, they cannot take glucose or glycolysis effectively [59]. Glutamine addiction is a characteristic of ccRCC; running out of glutamine in the tumor microenvironment leads to the secretion of IL-23 by macrophages, activating Treg responses, and thereby suppressing the anti-tumor toxicity of T cells [60].
Different types of cells are distributed in different parts of many tumors, including ccRCC. e heterogeneity of cell type may directly lead to the heterogeneity of metabolism in different parts of the same tumor; for example, the clear cells in ccRCC should respond to angiogenesis and glycolysis inhibitors, while eosinophilic components in ccRCC may benefit from mTOR or glutaminase inhibition [61]. Although the Riskscore model was based on the expression differences of metabolic genes, and the relationship between the Riskscore model and the prognosis was roughly in line with our prediction after verification, the relationship between our model and metabolic pathways was not the same as the actual changes in ccRCC metabolic patterns, indicating the metabolic complexity of different stages of ccRCC. Our study suggests that these genes might be important, but further studies are needed to clarify and validate the detailed mechanism behind their indicated significance.

Conclusion
In this study, we constructed a Riskscore model with 12 metabolism-related genes. e higher the score, the worse the prognosis. e Riskscore is closely related to metabolism, immune infiltration, and immune checkpoints, which can be used as one of the potential prognostic criteria of ccRCC.

Data Availability
All the data used in this study have been listed in the main text of the article. TCGA KIRC datasets were downloaded from UCSC Xena (https://xenabrowser.net/) as the training dataset for ccRCC, while GSE29609 was downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) as the independent validation dataset.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.