Identification of immune-related genes that predict prognosis and risk of bladder cancer: bioinformatics analysis of TCGA database

Background: Bladder cancer (BLCA) is the major tumor of the urinary system, and immune-related genes (IRGs) contribute significantly to its initiation and prognosis. Results: A total of 51 prognostic IRGs significantly associated with overall survival were identified. Functional enrichment analysis revealed that these genes were actively involved in tumor-related functions and pathways. Using multivariate Cox regression analysis, we detected 11 optimal IRGs (ADIPOQ, PPY, NAMPT, TAP1, AHNAK, OLR1, PDGFRA, IL34, MMP9, RAC3, and SH3BP2). We validated the prognostic value of this signature in two validation cohorts: GSE13507 (n = 165) and GSE32894 (n = 224). Furthermore, we performed a western blot and found that the expression of these IRGs matched their mRNA expression in TCGA. Moreover, correlations between risk score and immune-cell infiltration indicated that the prognostic signature reflected infiltration by several types of immune cells. Conclusion: We identified and validated an 11-IRG-based risk signature that may be a reliable tool to evaluate the prognosis of BLCA patients and help to devise individualized immunotherapies. Methods: Bioinformatics analysis was performed using TCGA and ImmPort databases. Cox regression was used to identify prognostic signatures. Two external GEO cohorts and western blotting of samples were performed to validate the IRG signature.


INTRODUCTION
Bladder cancer (BLCA), with about 80,470 new patients in the United States in 2019, is the major malignant tumor of the urinary system [1]. Approximately 25% of patients are muscle-invasive or metastatic bladder cancer when they are initially diagnosed [2,3]. Surgical resection is the main treatment for localized BLCA [4], whereas systematic chemotherapy is the preferred treatment for advanced and metastatic BLCA [5]. Despite these aggressive therapies, the five-year overall survival of bladder cancer remains less than 20% [6]. Thus, it is critical to explore alternative treatments and to determine promising prognostic indicators in BLCA.
At present, the molecular mechanism of BLCA has not yet been described, while growing evidence has revealed that immune-related genes (IRGs) and immune cell infiltration, play critical roles in the pathogenesis and progression of BLCA [7,8]. In recent years, immune checkpoint therapies have provided a promising opportunity for the treatment of advanced BLCA patients AGING [9]. Unfortunately, only approximately 20% of platinumrefractory and previously untreated patients may benefit from immunotherapy [10]. Therefore, it is imperative to screen and detect immunotherapy response and prognostic predictors to predict prognosis and risk of bladder cancer.
In this study, we used transcriptome data from TCGA to construct a prognostic signature of 11 differentially expressed IRGs. The prognostic IRG-based signature was further validated in two independent GEO datasets and in proteomics data from our samples. The underlying regulatory mechanisms of IRGs were explored using bioinformatics methods. This immunogenomic signature may be a reliable tool for individualized prediction of prognosis in BLCA patients.

Identification of differentially expressed IRGs
Compared with normal tissues, we identified 4876 differentially expressed genes (DEGs) in BLCA tissues including 3453 upregulated and 1423 downregulated genes ( Figure 1A and Supplementary Figure 1A). We extracted 120 upregulated and 140 downregulated IRGs corresponding to those identified in the ImmPort database ( Figure 1B and Supplementary Figure 1B). In addition, we performed GO functional enrichment and KEGG pathway analyses in 260 differentially expressed IRGs (DEIRGs). The top ten functional annotations were shown in Supplementary Table 1. The DEIRGs were mostly enriched in cell migration, leukocyte migration, extracellular matrix, receptor complex, receptor ligand activity and cytokine activity (Supplementary Figure  2A-2C). Furthermore, cytokine-cytokine receptor interactions were enriched in the KEGG pathways (Supplementary Figure 2D).

Identification of prognosis-associated DEIRGs
To determine possible prognosis-associated DEIRGs, univariate Cox regression analysis of DEIRGs were performed in the present study. After screening, we identified 51 DEIRGs that significantly correlated with the overall survival of BLCA patients ( Figure 2). Similar to the results for the DEIRGs, we found that these prognosis-associated DEIRGs were most enriched in cell migration, cell proliferation, extracellular matrix, receptor ligand activity and growth factor activity (Supplementary Table 2 and Figure 3A-3C). Human cytomegalovirus infection and the MAPK signaling pathway were most enriched in the KEGG analysis ( Figure 3D).

Significant modular analysis based on a PPI network
A PPI network was established utilizing the 51 prognosis-associated DEIRGs, shown in Figure 4A. In the PPI network, 27 hub genes were identified by the MCODE plugin of Cytoscape. When the k-core = 2, four significant module subgroups were obtained and genes between bladder cancer (BLCA) and non-tumor tissues. (B) Volcano plot of differentially expressed immune-related genes between bladder cancer (BLCA) and non-tumor tissues. The green dots represent downregulated genes, the red dots represent upregulated genes, and the black dots represent genes that were not significantly differentially expressed.
were showed in different colors, and the most important modules, including THBS1, PGF, SPP1, TGFB3, ELN,  OXTR, PROK1, AGTR1, TACR1, and EDNRA, were marked in green ( Figure 4B-4E). As shown in Figure  4F, functional annotation indicated the 27 hub genes were mostly related in sprouting angiogenesis, positive regulation of leukocyte chemotaxis, regulation of smooth muscle cell proliferation, MHC class I peptide loading complex, response to testosterone, and embryo implantation.

Construction of a transcription factor (TF) regulatory network
Next, we investigated the relationships between the DEIRGs in BLCA and the cancer-associated transcription factors (TFs). In total, we identified 77 differentially expressed TFs between BLCA (n = 414) and para-cancer tissues (n = 19) ( Figure 5A, Supplementary Figure 3). Subsequently, using correlation scores > 0.4 and p values <0.05 as reference values, combined with AGING 16 TFs and 51 DEIRGs, we constructed a TF regulatory network to illustrate the correlation between TFs and IRGs ( Figure 5B).

Independent prognostic value of the risk model of the cohort
We then performed univariate and multivariate Cox regression analysis to determine the efficacy of risk score derived from our prognostic risk model as an independent predictor after adjusting for other clinical parameters (Table 1). Univariate analysis revealed that age, clinical stage, tumor stage (T), lymph node (N), and risk score were significantly related to the prognosis of BLCA patients (p < 0.05). Multivariate Cox regression analysis demonstrated that the risk score was independently corelated with the overall survival in the cohort (p < 0.001) (Figure 7).

External validation of the prognostic risk model in the GSE13507 and GSE32894 cohorts
To validate the reliability of our IRG-based prognostic risk model, we utilized two external validation cohorts, GSE13507 and GSE32894. Similarly, patients in the high-risk group showed poorer survival than in the low-risk group ( Figure 8A, 8B). In addition, the results   were also shown with a KM curve and a ROC curve ( Figure 8C-8F). The cumulative results imply that the IRG-based prognostic risk characteristics based on IRGs can be used as a reliable prognostic model.

Validation of the IRGs in the proposed signature by western blot
To further validate the proposed signature at the protein level, we performed western blotting on bladder cancer tissues, adjacent tissues, SV-HUC-1, and T24 cells lines. Comparing the gene levels of RNA-seq from TCGA dataset (data not shown) with the expression levels of WB protein, our results indicated that the protein levels of ADIPOQ, NAMPT, TAP1, OLR1, AHNAK, PDGFRA, IL34, MMP9 and RAC3 were consistent with their RNA expression trends ( Figure 9).

Clinical application of the prognostic model
To assess the efficacy of our model in predicting the progression of BLCA, we evaluated the correlations  AGING between the risk signatures and clinical parameters including age, sex, pathological grade and clinical stage in the TCGA data cohort ( Table 2). As the levels of PPY, NAMPT, ADIPOQ, IL34, RAC3, TAP1, AHNAK, PDGFRA and the risk score increased, the pathological grade of BLCA patients increased (p < 0.01) ( Figure 10A-10I). With the increase of the other factors (risk score and PDGFRA and AHNAK levels), the clinical stage and the tumor stage of patients with BLCA also increased (all p < 0.01) ( Figure 10J-10O). These results demonstrate that the dysregulation of the expression of the DEIRGs is significantly associated with the development of BLCA.

Inferred immune cell fractions of the high-and lowrisk groups
To clarify whether our prognostic risk model can reflect the status of tumor immune microenvironment in BLCA patients, we evaluated the correlation between risk scores and immune cell infiltrations estimated by the CIBERSORT algorithm ( Figure 11). The correlations between the risk score of the prognostic signature and immune cell infiltration are shown in Figure 12. As the risk score increased, the numbers of neutrophils, M2 macrophages, and CD8 + T cells in BLCA tissues increased ( Figure 12).

DISCUSSION
Although cancer immunotherapy has expanded the treatment possibilities for BLCA, only a subset of patients responds to immunotherapy [11][12][13][14]. Thus, it is crucial to identify immune-related biomarkers for the progression of BLCA patients to improve the effect of immunotherapy. Recent studies have reported genomewide profiling investigating the role of multiple immune-related signatures in predicting tumor outcomes [15][16][17]. However, very few of these studies gained constructive therapeutic implications. Here, we established a robust immune-related risk signature of BLCA by integrated analysis of transcriptional profiles in TCGA. We also conducted external validation on overall survival rate and cancer-specific survival rate through two GEO datasets (GSE13507, n = 256; GSE32894, n = 308). Moreover, we performed western blotting in bladder tissues and cell lines to validate the IRGs at the protein level. As expected, the western blot demonstrated that the protein levels of the IRGs matched their RNA-seq levels derived from TCGA and GEO datasets.
Here we used univariate Cox regression analysis to assess the relationships between 260 DEIRGs and the prognosis of BLCA patients. We found that the expression of 51 DEIRGs significantly correlated with overall survival. To explore the underlying modulators related to these IRGs, a TF regulatory network comprising 77 differentially expressed TFs and DEIRGs that potentially regulate hub IRGs were constructed. Our results reveled that STAT1, TAP1, TAP2 and CXCL10 played central roles in this network, suggesting they are hub gene regulators. The transporter associated with antigen processing (TAP) is necessary for T-cell recognition [18]. Recent studies have demonstrated that TAP plays a critical role in cell differentiation, proliferation, the development and progression of cancer [19][20][21]. In addition, TAP detection has been recognized as a new independent indicator for the course of chemotherapy and clinical monitoring of several type of cancers [22,23]. STAT1 is considered as an oncogene that promotes cell adhesion, migration, and proliferation in bladder cancer [24]. Furthermore, the TF-IRG regulatory network emphasized that these hub TFs are closely associated with the prognostic signature genes, such as ADIPOQ, PDGFRA, MMP9 and RAC3. Table 2. Relationships between prognostic model associated IRGs and clinical variables of patients with BLCA. We then determined the risk-stratification value of these prognosis-related DEIRGs and identified 11 prognostic IRGs as potential prognostic indicators in clinical practice. Furthermore, our results revealed that the prognostic model was independent of clinical characteristics. Moreover, the signature illustrated the ability to discriminate the overall survival and cancer-specific survival in both the GSE13507 and GSE32894 datasets.

Lymph nodes (N1&2/N0) t(p)
Our IRG-based signature highlighted eleven IRGs, ADIPOQ, PPY, NAMPT, TAP1, AHNAK, OLR1,  AGING PDGFRA, IL34, MMP9, RAC3, and SH3BP2. Several studies showed that upregulation of MMP9 could promote the proliferation and invasion of bladder cancer [25,26]. ADIPOQ has been proposed to be a mediator of obesity-associated metabolism and to have direct effects on the development and progression of various types of malignancies [27,28]. Recent studies have identified NAMPT as a potential bladder cancer biomarker [29]. The subcellular localization of AHNAK exhibited different between BLCA tissues and normal tissues [30]. LOX-1 was upregulated in 57% of bladder cancer cells, and was associated with tumor invasion and metastasis [31,32]. Furthermore, expression of PDGFRA has been reported in BLCA specimens [33,34]. Moreover, upregulation of RAC3 in bladder cancer predicted an adverse clinical outcome and increased tumor immune response [35,36].
Gene functional annotation indicated that our IRGs were involved in cell migration, cell proliferation, cytokine interactions, and chemokine pathways. Cytokines and chemokines, the gene products of transcription factors, played important roles in BLCA progression, and metastasis [37][38][39]. In addition, a PPI network was conducted to elucidate the regulatory mechanisms influence on the IRGs at the protein level. MMP9, NAMPT, CXCL12, ADIPOQ, CXCL10, TAP1, TAP2, and STAT1 figured prominently in the PPI network. Functional annotation of the core genes derived from the PPI network also showed these hub genes were mainly enriched in sprouting angiogenesis, positive regulation of leukocyte chemotaxis, and regulation of smooth muscle cell proliferation.
Our results showed that after adjusting other clinical characteristics including age, clinical stage, T and M, the risk score could be used as an independent predictor. Multivariate analysis revealed that the risk score was independently associated with the overall survival in the cohort with a considerable hazard ratio. To determine the significance of the predictive values of the DEIRGs, we evaluated the correlations between the signatures and the clinicopathological factors age, sex, pathological grade, clinical stage, and tumor stage. We found that the levels of PPY, NAMPT, ADIPOQ, IL34, TAP1, RAC3, PDGFRA and AHNAK, combined with the risk scores, positively correlated with the progression of BLCA. Thus, combining this signature with other clinical factors may serve as a tool for predicting the prognosis of BLCA patients.
Accumulating evidence indicates that immune infiltration plays vital roles in the prognosis of BLCA patients [40,41]. Immune infiltration is an important determinant of treatment response and prognosis in BLCA patients, which is further supported by the findings of our present statistical analyses showing that the risk score positively correlated with the infiltration of tumors with CD8+ T cells, M2 macrophages, and neutrophils. The results establish the reliability of our signature to predict the prognoses of patients with BLCA. Limitations of our study include the intrinsically limited information acquired from bioinformatics analysis of transcriptome data, which may not reflect the entirety of the pathologically significant aspects of the antitumor immune response.
In addition, as a retrospective study, our results still have a bias be-cause of their heterogeneity, and so further preclinical and clinical investigations are required to identify the specific mechanisms of the effects of IRGs on BLCA.
In conclusion, we identified and validated an 11-IRGbased risk signature. The IRGs were mainly involved in tumor-related functions and pathways. Our immunegenomic signature may be a reliable tool to evaluate the prognosis of BLCA patients and help guide individualized immunotherapies. Nonetheless, further experiments are required to verify our present findings.

Clinical samples and data collection
Clinical and transcriptome RNA-seq data of 433 BLCA samples, including 414 BLCA patients and 19 matched normal samples, downloaded from the TCGA cohort. IRGs downloaded from the ImmPort portal database included 2498 IRGs [42]. ImmPort is a curated dataset to promote the reuse of immunological research data generated by intramural and extramural programs of the United States National Institutes of Health, and privately funded investigators [43].

Analysis of DEGs
DEGs were identified via the effective and convenient limma R package. We analyzed differential gene expression of all transcriptional data and IRGs using the cut-off values as follows: FDR < 0.05 and log 2 |FC| > 1. The pheatmap R software package was used to display the DEGs and DEIRGs. Subsequently, prognosisassociated DEIRGs were analyzed using univariate and multivariate Cox analysis.

Functional annotations and PPI network
GO enrichment analysis and KEGG pathway analysis were conducted with the clusterProfiler package [44]. In addition, we used STRING database to predict the PPI network of the prognosis-associated DEIRGs and to AGING evaluate the degree of interactions between proteins [39]. Cytoscape (version 3.5) was utilized to visualize the interactive network data [45]. Cytoscape plug-ins including MCODE, ClueGO, and CluePedia were used as previously described [46][47][48]. Specifically, MCODE was used to screen the most significant modules of hub genes from the PPI networks with selection criteria as follows: node score cut-off = 0.2, degree cut-off ≥ 2, and k-score = 2. The GO and KEGG pathway analyses of the selected hub genes were visualized utilizing ClueGO and CluePedia plug-ins.

Construction of a regulatory network linking TFs and IRGs
Cistrome Cancer is a comprehensive resource for predicting targets of TFs (http://cistrome.org/ CistromeCancer/). Target prediction is based on integration of correlations of expression levels among samples in each cancer included in TCGA, as well as the genomic TF binding patterns included in the ChIPseq data. The Cistrome Cancer database contains a list of 318 TFs [49]. We analyzed differentially expressed TFs and constructed a regulatory network linking TFs and IRGs.

Construction of a prognostic risk model
Gene-weighted values were calculated using the regression coefficients of the multivariate Cox regression model. The equation used in this analysis was as follows: 1 Risk score Coe (gene ) Exp (gene ) n i ii    "Coe (gene i)" represents the regression coefficient of gene i estimated from the multivariate Cox analysis, and "Exp (gene i)" is the expression of gene i. The prognostic risk model was used to calculate the risk score of each patient. Furthermore, KM curve was performed using the "survival" and "survminer" R packages.

External validation of the proposed signature in the GSE13507 and GSE32894 cohorts
We utilized the same risk score formula and cut-off value in two external validation cohorts, GSE13507 and GSE32894, to validate the IRG-based prognostic risk model. The prognostic model was presented in each dataset that contained the differentially expressed genes, a risk plot, and the distribution of risk score. Additionally, we evaluated the area under the ROC curve with the "survival ROC" package to assess the survival differences in the external datasets.

Validation of IRGs in the risk model by western blot
Bladder cancer tissues and normal adjacent tissues were obtained from six patients admitted to Shandong Provincial Hospital Affiliated to Shandong First Medical University. T24 cells were cultured in RPMI 1640, and SV-HUC-1 cells were maintained in F-12K medium. The medium was refreshed every other day. Bladder tissues and urothelial cell lines were lysed with RIPA buffer. 25 µg of protein quantified by BCA kit in the samples were subjected to 6-10% SDS-PAGE and transferred to a polyvinylidene fluoride membrane. The membrane was blocked with 5% skim milk and incubated with primary antibodies at 4° C overnight (Supplementary Table 3). After hybridization of secondary antibodies, the protein expression level was detected by the chemiluminescence method (Amersham Imager 600, GE, USA).

Inference of immune cell infiltration in BLCA tissues
Here we estimated immune infiltration between high and low risk score groups with the LM22 (22 types of immune cells) signature file via the CIBERSORT algorithm. After 1000 permutations of CIBERSORT, the distribution of 22 subtypes of immune cells in each patient was exhibited, along with the p-values, correlation coefficients, and root mean squared error (RMSE), which evaluates the accuracy of the results of each sample.

Statistical analysis
Statistical analysis was conducted using R software, and p < 0.05 indicated a significant difference. To evaluate the accuracy of the prognostic risk model, the "survivalROC" package was used to calculate the AUC. The t test was used to evaluate continuous variables, and the χ2 test was used to compare categorical variables. Multivariate analysis was performed to determine the independent prognostic significance of the immune-related risk signature. The relationships between the risk score and the clinical characteristics were evaluated using the "beeswarm" package.