A 14 immune-related gene signature predicts clinical outcomes of kidney renal clear cell carcinoma

Kidney renal clear cell carcinoma (KIRC) is the leading cause of kidney cancer-related deaths. Currently, there are no studies in tumor immunology investigating the use of signatures as a predictor of overall survival in KIRC patients. Our study attempts to establish an immune-related gene risk signature to predict clinical outcomes in KIRC. A total of 528 patients from The Cancer Genome Atlas (TCGA) database were included in our analysis and randomly divided into training (n = 315) and testing sets (n = 213). We collected 1,534 immune-related genes from the Immunology Database and Analysis Portal as candidates to construct our signature. LASSO-COX was used to find gene models with the highest predictive ability. We used survival and Cox analysis to test the model’s independent prognostic ability. Univariate analysis identified 650 immune-related genes with prognostic abilities. After 1,000 iterations, we choose 14 of the most frequent and stable immune-related genes as our signature. We found that the signature was associated with M stage, T stage, and pathological staging. More importantly, the signature can independently predict clinical prognosis in KIRC patients. Gene Set Enrichment Analysis (GSEA) showed an association between our signature and critical metabolism pathways. Our research established a model based upon 14 immune-related genes that predicted the prognosis of KIRC patients based on tumor immune microenvironments.


INTRODUCTION
Kidney cancer is one of the most common urological tumors worldwide, with approximately 403,262 new cases and 175,098 deaths associated with this form of cancer in 2018 (Bray et al., 2018). Kidney renal clear cell carcinoma (KIRC) is the leading cause of kidney cancer mortality and the most common type, accounting for 85% of kidney cancers (Siegel, Miller & Jemal, 2017;Tseng, 2016). Considering the aging population of KIRC patients and the rising expense of treatment, KIRC is gradually becoming the focus of geriatric cancer (Tan, Filson & Litwin, 2015). Despite the rapid development of cancer treatments, mortality rates in KIRC remain stagnant. With the progression of next generation sequencing and data mining techniques, it is urgent that we explore prognostic biomarkers for KIRC, using molecular characteristics and tumor immune environments to guide patient therapy.
Over the past decade our understanding of immune components, including the impact of tumor microenvironments on patient survival and therapy response (Chen & Mellman, 2017;Grivennikov, Greten & Karin, 2010), has increased. Some studies have found that tumor-infiltrating immune cells are able to serve as either tumor suppressors or promoters in microenvironments. For example, CD8 + T cells have been associated with improved survival in cancer patients (Gajewski, Schreiber & Fu, 2013;Governa et al., 2017), while tumor associated macrophages and regulatory T cells demonstrate the ability to promote tumor development (Nishikawa & Sakaguchi, 2014;Noy & Pollard, 2014). Considering the complexity and significance of tumor immune microenvironments, it is imperative that we investigate immune-related biomarkers for KIRC patients. Recent studies have provided insight into the KIRC immune signature (Geissler et al., 2015;Şenbabaoğlu et al., 2016). Şenbabaoğlu et al. (2016) found mRNA signatures with the potential to be immunotherapeutic biomarkers in KIRC. However, their investigations do not include immune-related genes for analysis nor do they establish a systematic immune-related generisk signature for KIRC patients. Khadirnaikar et al. (2019) utilized immune associated lncRNA (long non-coding RNA) to construct prognostic subtypes in KIRC patients. Our immune clusters were more robust and independent. Additionally, they concentrated on lncRNA, not the overall immune-related genes. Smith et al. (2019) constructed endogenous retroviral signatures for KIRC patients, but they did not investigate the prognostic ability of the signature in different subtypes of patients. Therefore, it is essential that we explore a systematic prognostic signature based on tumor immune environments in KIRC.
In our study, we used RNA-seq data from The Cancer Genome Atlas (TCGA) to find immune-related genes with prognostic ability and to establish an immune-related risk signature for KIRC. To assess the clinical potential of the signature, we investigated the association between the signature, clinical parameters, and patient survival. Gene set enrichment analysis (GSEA) was performed to explore the molecular characteristics of the signature.

Patient cohort
The TCGA database from Xena browser was used to collect the clinical and RNA-seq data of 528 KIRC patients (Goldman et al., 2019). We randomly divided the dataset into training (n = 315) and test sets (n = 213). RNA-seq data was obtained to analyze the transcriptome profiling of RNA expression and were measured using fragments per kilobase of exon per million fragments mapped (FPKM). We performed a log2-based transformation to normalize RNA expression profiles. To ensure detection reliability, genes with more than half of their gene expression equal to zero were rejected in further investigations.

Immune-related genes
We downloaded a comprehensive list of over 1,534 immune-related genes from the Immunology Database and Analysis Portal (ImmPort; https://immport.niaid.nih.gov; (Bhattacharya et al., 2014). These genes cover a diverse range of functions in immunerelated pathways, including T cell receptor signaling pathways, B cell receptor signaling pathways, antigen processing and presentation, chemokine, interleukins, interleukin receptors, chemokine receptors, cytokines, interferons, interferon receptors, cytokines receptors, natural killer cell cytotoxicity, tumor necrosis factor (TNF) family members, TNF family member receptors, transforming growth factor-b (TGF-b) family members, and TGF-b family member receptors.

Establishment of immune-related gene signatures
We used the training set to establish an immune-related risk signature for KIRC and we performed univariate Cox analysis to screen out immune-related genes with prognostic properties (R package survival v2.42-6). After analyzing 1,534 immune-related genes, a total of 650 genes were identified with prognostic abilities. To identify the best gene model for predicting KIRC patient prognosis, genes with a P-value lower than 0.05 were evaluated using the Cox proportional hazards model with a lasso penalty (log lambda = −2.641, alpha = 1, iteration = 1,000, R package = ''glmnet''; (Friedman, Hastie & Tibshirani, 2010). We performed a 10-fold cross-validation to estimate the penalty parameter in the training dataset. The gene model with the highest frequency in 1,000 iterations was chosen as the immune-related risk signature for KIRC. Linear weighing was performed on gene expression value and the Cox coefficient to determine the risk score (Shang et al., 2017).

Assessment of prognostic ability
We used Harrell's c-index to estimate the predictive ability of the immune-related risk signature in the training, testing, and total cohort (Harrell et al. 1982). The independent prognostic ability of the immune-related risk signature was assessed using survival analysis as well as Cox analysis (R package survival, v2.42). We performed multivariable Cox regression to analyze the relationship between immune-related risk signatures and clinicopathological factors.

Gene set enrichment analysis
To identify the biological functions and pathways between high-and low-risk groups, we conducted GSEA to investigate potential biological mechanisms in the Molecular Signatures Database (MSigDB; Subramanian et al., 2005). GSEA was performed using GSEApy, a Python wrapper for gene enrichment (https://pypi.org/project/gseapy/). We selected C2 and C5, including pathway databases and GO terms, from the MSigDB. The gesa sub-command of GSEApy was used in GSEA with default parameters. Enriched gene sets with a false discovery rate (FDR) of less than 0.25 and a P-value of less than 0.05 were considered statistically significant.

Statistical analysis
Boxplots were created with the R package, ggplot2 (v3.0.0). The R package ComplexHeatmap (v1.18.1) was used to create heatmaps (Gu, Eils & Schlesner, 2016). We counted C-index with R packages ''survcomp'' (Harrell Jr et al., 1982;Schröder et al., 2011). The student's t -test was performed for statistical comparison. We chose R to conduct statistical analysis (https://www.r-project.org/). P-values lower than 0.05 were considered statistically significant. The main code of analysis was pushed to github (https://github.com/huchua/KIRC).  Figure 1 illustrates the workflow we used to develop an immune-related gene-risk signature. The immune-related gene signature was constructed within the KIRC training data set, while we applied the testing set to validate the signature. After 1,000 iterations, seven unique gene models were selected ( Fig. 2A, Table S1). The model selected as the immune-related gene-risk signature consisted of 14 genes and ranked the highest in frequency 282 times. Parameter selection in the LASSO-cox model was log lambda = −2.641 and alpha = 1. The univariate and multivariate Cox analysis of the 14 immune-related genes are shown in Tables 1 and 2. The principle component analysis of the 14 immune-related gene signature displayed a different distribution pattern between low-and high-risk groups when comparing the training and testing (Figs. 2B-2C). This indicated that the lowand high-risk groups had different immune phenotypes. In the training and testing, the c-index was 0.7862 and 0.6534, respectively (P < 0.001; Fig. 2D). In a time-dependent receiver operating characteristic curve (ROC) created for training and testing datasets, area under the curve (AUC) values at 1, 3, and 5 years were 0.679, 0.63, 0.627 and 0.65, 0.596, 0.568, respectively (Fig. S2). The 14 immune-related genes are AR, BID, BMP8A, CCL7,  Fig. 3 and Fig. S3. Because of the differences between the log-rank test and the univariate cox analysis, the results of the univariate cox analysis of other genes except GDF1 are very consistent with the results of the K-M (Kaplan-Meier) survival curves.

Correlation between the signature of immune-related genes and clinical parameters
Among the 14 immune-related genes, four genes (TXLNA, SEMA3G, AR, and BID) had a high expression, and 10 genes (IL20RB, CCR10, BMP8A, SEMA3A, CCL7, GDF1, KLRC2, LHB, FGF17, and IL4) had a low expression (Figs. 4A, 4B). The relationship between the signature and clinical factors demonstrated that patients with advanced pathological staging, M stage, and T stage had a higher risk score than those with early stage disease (Figs. 4A, 4B; 5A, 5B, 5D). However, we did not find a correlation between the signature and N stage (Figs. 4A, 4B; 5C).

Engagement of the immune-related gene risk signature in biological pathways and functions
GSEA was used to investigate the signature's biological pathways and functions. There were 177 KEGG pathways and 4,528 GO terms used in our investigation. Our analysis found that the signature was able to engage in a total of 19 enriched KEGG pathways (FDR < 0.25; Table 4). The low-risk signature was significantly correlated with 10 pathways, including the citrate cycle (TCA cycle) pathway, fatty acid metabolism pathway, propanoate metabolism pathway, butanoate metabolism pathway, peroxisome pathway, lysine degradation pathway, valine leucine and isoleucine degradation pathway, proximal tubule bicarbonate reclamation pathway, vasopressin regulated water reabsorption pathway, and the pyruvate metabolism pathway (Table 4). Similarly, eight GO annotations were enriched in the low-risk group (FDR < 0.25; Table 4).

DISCUSSION
Our study used the TCGA database and immune-related genes to establish a KIRC signature consisting of 14 immune-related genes. We found that patients in the high-risk group showed a positive association with M stage, T stage, and advanced pathological staging. Additionally, the signature exhibited strong prognostic abilities and independently predicted KIRC patient prognosis. Functional analysis highlighted the significance of our signature based on its involvement in many important pathways.
In our investigation, we used a total of 14 immune-related genes to construct our signature. In the signature, CCL7 increased the peripheral blood mononuclear cell recruitment in renal cell cancer through the inhibition of let-7d (Riihimaki et al., 2014). Wyler et al. (2014) demonstrated the ability of CCL7 to recruit monocytes through CCR2, promoting renal cell cancer metastasis to the brain. This indicates that CCL7 is a major factor in the development of KIRC in tumor immune microenvironments and may be a potential immunotherapy target. Fibroblast growth factor receptor 17 (FGF17) is another immune-related gene in our signature. FGF17 demonstrates a variety of functions in cancer development. Gauglhofer et al. (2011) found that FGF17 was involved in the paracrine and autocrine signaling of hepatocellular carcinoma and promoted the neoangiogenesis of hepatocellular carcinoma. Heer et al. (2004) showed that FGF17 was overexpressed in prostate cancer and participated in prostate carcinogenesis. Our study showed that FGF17 plays an important role in KIRC based on tumor immunology. However, the underlying mechanism of FGF17 in KIRC immune microenvironments requires further investigation. In our signature, IL-4 is another well-studied immune-related biomarker for KIRC. IL-4, released by immune cells, controls the expression of B7-H1, thus altering T cell responses in KIRC (Quandt et al., 2014). The investigation conducted by Chang et al. (2015) demonstrated that IL-4 expression can predict KIRC patient recurrence and survival outcomes. Collectively, these investigations reinforce the significance of the 14 immune-related gene-risk signature in KIRC.
Our signature is associated with the survival outcome of KIRC patients and clinical parameters, including pathological staging, M stage, and T stage. No correlation was found between the signature and N-stage, possibly due to a lack of N-stage information for many patients. According to the immune-related gene-risk signature, our study found that clinical cohorts in KIRC have different immunerelated risk factors. This signature also reflects differences in tumor immune  cell lung cancer with the C-index of 0.64 . Besides, our signature achieved a similar C-index of testing set with the immune signature in ovarian cancer (0.625) (Shen et al., 2019). More importantly, the C-index of our testing set showed a similar accuracy with the C-index of clinical staging systems in renal cancer (0.62) (Qu et al., 2018).
Our study expands on the signatures association with several important pathways, especially the metabolism pathway. This may reflect the mutual interaction between tumor metabolism and tumor immunology in KIRC. Pearce et al. (2013) showed that metabolic reprogramming can influence the fate and function of T cells in tumors. Our study further indicates the importance of metabolism pathways in KIRC immune microenvironments.
We acknowledge the limitation of our evaluation scheme, including randomly dividing the full data set into training and testing data set, which result in the inherent bias for the specific study. Another limitation of our research is the lack of independent validation data sets in our evaluation scheme. Therefore, our study should be further validated through a prospective cohort data to further illustrate the robustness of the model.

CONCLUSIONS
This investigation utilized RNA-seq data from the TCGA database to construct a 14 immune-related gene-risk signature with the ability to independently predict survival outcomes in KIRC; thus, providing novel clinical applications and possible immune targets for KIRC.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.