Identification of transforming growth factor beta induced (TGFBI) as an immune-related prognostic factor in clear cell renal cell carcinoma (ccRCC).

Clear cell renal cell carcinoma (ccRCC) is the most common subtype among kidney cancer, which has poor prognosis. The aim of this study was to screen out novel prognostic biomarkers and therapeutic targets for immunotherapy, and some novel molecule drugs for ccRCC treatment. Immune scores ranged from -1109.36 to 2920.81 and stromal scores ranged from -1530.11 to 1955.39 were firstly calculated by applying ESTIMATE algorithm. Then 17 DEGs associated with immune score and stromal score were further identified. 6 candidate hub genes were screened out by performing overall survival (OS) and disease-free survival analyses based on TCGA-KIRC data, one of which including TGFBI was further regarded as hub gene associated with prognosis by calculating the R2 (R2 = 0.011, P = 0.018) and AUC (AUC = 0.874). The prognostic value of TGFBI was validated by performing OS, CSS, and PFS analyses based on GSE29609 and E-MTAB-3267. CMap analysis suggested that 3 molecule drugs might be novel choice for ccRCC treatment. Further analysis demonstrated that CNVs of TGFBI was associated with OS of patients with ccRCC. TGFBI expression was also correlated with histologic grade, pathologic stage, and immune infiltration level, significantly. TGFBI was the most relevant gene with OS among the candidate hub genes, which might be novel DNA methylation biomarkers for ccRCC. In conclusion, our findings indicated that TGFBI was correlated with prognosis of patients with ccRCC, which might be novel prognostic biomarkers, and targets for immunotherapy in ccRCC. Three small molecule drugs were also identified, which showed strong potential for ccRCC treatment.

AGING renal cell carcinoma (ccRCC) [7,8]. Although surgical treatment was the most effective therapy for localized ccRCC, there was a lack of drugs for adjuvant treatment [9]. What was worse, there was no effective treatment method for advanced ccRCC [10]. Thus, in this study, we tried to find out some novel prognostic biomarkers, which might be novel targets for immunotherapy.
For the first time, in this study, we firstly calculated immune score and stromal score of each case from TCGA-KIRC data, by applying Estimation of Stromal and Immune cells in Malignant Tumor tissues using Expression data (ESTIMATE) algorithm (a method provided by Yoshihara et al.) [11]. Then we screened out 17 differentially expressed genes (DEGs) associated with immune score and stromal score. Based on these DEGs, 3 small molecule drugs were obtained, which showed strong potential for ccRCC treatment. Finally, transforming growth factor beta induced (TGFBI) was screened out by using four kinds of survival analyses and two independent datasets, which were significantly associated with prognosis of patients with ccRCC. 9 different kinds analyses were further performed to explore the potential value of TGFBI in various aspects.
In conclusion, our finding indicated that TGFBI had great effects for assessing prognosis of patients with ccRCC, which might be a novel prognostic biomarker and target for immunotherapy. Moreover, three molecule drugs were screened out, which might be novel choice for clinicians for ccRCC treatment.

Immune scores and stromal scores were correlated with OS
To explore the relationship between immune score (or stromal score) and survival, we performed survival analysis in this part. As Figure 2A showed, 1179.98 was set as the optimal cutoff for grouping. ccRCC patients were divided into high immune score group (n = 234) and low immune score group (n = 296). The result suggested that ccRCC patients with high immune score had worse OS compared with these with low immune score (P = 0.0024, Figure 2B). Meanwhile, an illustration of optimal cutoff identification for stromal score is shown in Figure 2D (stromal score cutoff = 794.10). ccRCC patients were divided into two groups (high stromal score group: n = 208; low stromal group: n = 322). Similarly, low stromal score of cases was associated with better OS (P = 0.0250, Figure 2C). We also explored the correlation between scores and DFS, unfortunately, there was no significant relationship between them, as Figure 2C, 2F suggested.

DEG associated with immune score and stromal score screening
Based on "limma" in R software, 337 DEGs associated with immune score were screened out, including 334 up-regulated DEGs and 3 down-regulated DEGs ( Figure  3A, 3C). Adjust P value and log2FC of each immunerelated DEG were showed in Supplementary Table 2, in detail. Furthermore, 218 DEGs (204 up-regulated and 14 down-regulated) related to stromal score were picked out, accurately. Supplementary Table 3 showed the detailed information of each stromal-related DEG. Finally, 17 DEGs overlapped in immune-related DEGs and stromal-related DEGs were identified for further analysis ( Figure 3E).

Function and pathway enrichment analysis
The results of GO analysis demonstrated that 17 DEGs were significantly enriched in 114 BPs (Supplementary  Table 4), 4 CCs ( Figure 4B), and 8 MFs ( Figure 4C). The top 10 enriched BPs were regulation of cell-cell adhesion, regulation of immune effector process, leukocyte apoptotic process, positive regulation of cellcell adhesion, negative regulation of cell adhesion, AGING regulation of leukocyte cell-cell adhesion, regulation of T cell activation, leukocyte cell-cell adhesion, negative regulation of leukocyte apoptotic process, and protein activation cascade ( Figure 4A). Moreover, DEGs were significantly correlated with four KEGG pathways including complement and coagulation cascades, cytokine-cytokine receptor interaction, staphylococcus aureus infection, and viral protein interaction with cytokine and cytokine receptor suggested by Figure 4D.

Patient living days gradually decreased with an increasing TGFBI expression
We firstly compared the mRNA expression levels of TGFBI between tumors and normal tissues, the result suggested that expressions of TGFBI in tumors were significantly higher than these in normal tissues ( Figure  8A). Moreover, Expression of TGFBI (F = 6.34, P = 3.17E-04, Figure 8B) was significantly associated with tumor stage. High expression of TGFBI always related to higher tumor stage, as Figure 8B showed.
To validate the prognostic value of TGFBI, we performed three different survival analyses (OS, CSS, PFS) for all the candidate hub genes based on GSE29609. As shown in Figure 8D, high expressions of TGFBI (P = 0.030) were associated with worse OS. Meanwhile, there was a trend that patients with high expression of TGFBI had worse CSS (P = 0.066, Figure 8E) and PFS (P = 0.062 suggested by E-MTAB-3267 ( Figure 8C), P = 0.054 suggested by GSE29609 ( Figure 8F)). Summary above, we thought higher TGFBI expression was significantly associated with a shorter survival time, which showed great prognostic value in ccRCC.

Hub gene validation
Based on ccRCC data from Oncomine database, TGFBI expression comparison between tumors and normal tissues across 7 analyses were identified. The result demonstrated that the mRNA expression of TGFBI AGING (P = 9.89E-05, Figure 9A, 9B) was lower in normal tissues compared with ccRCCs. Furthermore, we explored the translational-level expression of hub genes. As Figure 9C, 9D showed, the translational-level expressions of TGFBI were higher in ccRCCs compared with normal tissues.

mRNA expressions of TGFBI in cancer cell lines
Based on GEPIA and CCLE, we explored hub genes expressions in all types of cancer cell lines. The expressions of TGFBI were different in these types of cancers. The expression of TGFBI was higher in GBM, HNSC, KIRC, KIRP, LGG, PAAD, SARC, STAD compared with corresponding normal tissues ( Figure 10A). Moreover, we found that not only the mRNA expression ( Figure 10B) but also the CNV level ( Figure 10C) of TGFBI ranked as the first highest of all types of cancer cell lines. All the results made the hub genes reliable.

Genetical alteration of TGFBI
According to the result, TGFBI altered in 75 (14%) of 533 ccRCC patients ( Figure 11A). And the main type was amplification ( Figure 11B). Meanwhile, there was no differences of TGFBI expression between cases with CNVs and cases without CNVs ( Figure 11C). When talking about the association between CNVs of TGFBI and survival of patients with ccRCC, there was a trend that amplification of TGFBI caused better OS ( Figure 11D) and DFS ( Figure 11E) of patients with ccRCC.

Associations between TGFBI expression and clinical factors
Cases with complete clinicopathological data from TCGA database were included for this analysis. The result demonstrated that TGFBI expression was significantly associated with stromal score (P < 0.001), AGING immune score (P < 0.001), gender (P < 0.001), neoplasm histologic grade (P < 0.001), pathologic stage (P = 0.011), and person neoplasm cancer status (P < 0.001) ( Table 2).

TGFBI expression was correlated with immune infiltration level in ccRCC
Immune infiltration level played important role in survival in tumors. Therefore, we explored the AGING relationship between hub genes and immune infiltration level. We found that TGFBI expression was negatively relevant to tumor purity (cor = -0.250, P = 5.14E-08) and positively related to macrophages (cor = 0.232, P = 7.08E-07), Figure 13A). Summary above we found TGFBI expression was significantly correlated with tumor purity in ccRCC, which suggested that TGFBI played an important role in immune infiltration in ccRCC.

TGFBI might be potential DNA methylation biomarkers for ccRCC
We also explored the relationship between hub genes and methylation in this study by using MEXPRESS. The result demonstrated that the promoter region of TGFBI ( Figure 14) showed higher methylation levels in ccRCC compared with normal tissues.

Hub genes related KEGG signaling pathways
The result of GSEA (

DISCUSSION
As the most common subtype in kidney cancer, ccRCC was difficult to be cured. Patients with ccRCC always had poor prognosis [6]. Because of the lack of effective therapy target and molecule drugs for ccRCC treatment, the objective of this study was to screen out novel prognostic biomarkers and small molecule drugs in ccRCC.
Tumor microenvironment (TME) was the internal environment of tumor cells producing and survival, which described the non-cancerous cells present in the tumor [12]. These cells included but not limited to fibroblasts, immune cells and cells that comprise the blood vessels [13]. Some proteins (such as these produced by all of the cells present in the tumor which support the growth of the cancer cells) were also included in TME [14]. Among all the non-cancerous cells, immune cells and stromal cells were two major types, which had been proved to be correlated with prognosis of cancers [14]. Thus, in this study, we firstly used ESTIMATE algorithm for immune score and stromal score calculation. Then we divided cases from TCGA database into two groups (high-immune score group, and low-immune score group/high-stromal score group, and low-stromal score group). DEGs between the two groups were selected. 337 immune-related DEGs including 334 up-regulated and 3 down-regulated and 218 stromal-related DEGs including 204 upregulated and 14 down-regulated were identified. 17 DEGs overlapped in the two kinds of DEGs were identified for further analysis. 6 genes including C1R (complement C1r), C1S (complement C1s), MMP7 (matrix metallopeptidase 7), SERPINF1 (serpin family F member 1), SLC38A5 (solute carrier family 38 member 5), and TGFBI (transforming growth factor beta induced) were picked out as candidate hub genes by performing OS and DFS analyses based on TCGA-KIRC data. Then we picked out one hub gene (TGFBI) by calculating the R 2 and AUC for each candidate hub gene, which were validated in mRNA level and translational-level. The prognostic value of TGFBI was AGING validated by applying OS, CSS, and PFS analyses based on GSE29609 and E-MTAB-3267. In addition, high expression of TGFBI was correlated to higher tumor stage. The result of TGFBI genetical alteration demonstrated that hub gene altered in 75 of 533 ccRCC patients, and the main type was amplification. Further analysis suggested that TGFBI expression was associated with gender, neoplasm histologic grade, and pathologic stage. Cox regression analysis also indicated that TGFBI was influence feature of OS of patients with ccRCC.
Considering about that immune infiltration level were significantly correlated with survival in tumors, we explored the relationship between hub gene expression level and immune infiltration level in ccRCC. The result demonstrated that TGFBI expression was correlated with tumor purity in ccRCC, which indicated that TGFBI played an important role in immune infiltration in ccRCC. We also explored the correlation between TGFBI and methylation around in the promoter region. The methylation levels of the promoter region of TGFBI were higher in ccRCC compared with normal tissues.
According to the function analyses, 17 DEGs were significantly enriched in 114 BPs, 4 CCs (blood microparticle, external side of plasma membrane, side of membrane, and extracellular matrix) and 8 MFs (main type: serine hydrolase activity). The top 10 BPs were regulation of cell-cell adhesion, regulation of immune effector process, leukocyte apoptotic process, positive regulation of cell-cell adhesion, negative regulation of cell adhesion, regulation of leukocyte cellcell adhesion, regulation of T cell activation, leukocyte cell-cell adhesion, negative regulation of leukocyte apoptotic process, and protein activation cascade. Summary above we found that these DEGs showed strong correlation with immune response and tumor immune microenvironment.

AGING
To provide clinicians some novel choice for ccRCC treatment, CMap analysis was performed and the result indicated that 3 small molecule drugs including vincamine, clenbuterol, betazole showed strong potential for ccRCC treatment.
A literature review for hub genes was carried out in order to understand hub genes deeper and better. TGFBI encoded an RGD-containing protein that binds to type I, II and IV collagens, which played an important role in the regulation of a variety of BPs [15]. Ozawa et al.   found that TGFBI expression in esophageal squamous cell carcinoma was associated with poor prognosis and hematogenous metastasis recurrence [16]. Chen et al. demonstrated that TGFBI was an important factor in epithelial-mesenchymal transformation (EMT) and malignant progression of prostate cancer [17]. Combining the literature review with this study, we believed that TGFBI played an important role in many events during immune response and tumor immune microenvironment.

AGING
However, this study also had certain limitations. Firstly, CD8+ cells played key role in tumor elimination. But in this study, the result (showed in Figure 13B, panel 2) demonstrated that CD8+ T cell infiltration had no or little significant effect on cumulative survival. Perhaps because of the unreasonable grouping and the TCGA data itself. Therefore, we will use novel dataset for this analysis in the short future. Secondly, CSS and PFS analyses of hub genes not showed significant results based on GSE29609 as we expected. Perhaps because of the few numbers of cases in GSE29609, which is related to too few data sets and the data itself. Thirdly, though we designed this research well and used strict thresholds for each part in this study, we did not conduct experiments to verify the results. Therefore, we will explore the related mechanisms of TGFBI in ccRCC through in vivo and in vitro experiments in further analysis.

CONCLUSIONS
In summary, we identified 17 DEGs associated with immune score and stromal score by using TCGA-KIRC data. TGFBI was screened out as hub gene calculating the R2 and AUC for each candidate hub gene, which was validated in mRNA level and translational-level. The prognostic value of TGFBI was validated by applying OS, CSS, and PFS analyses based on GSE29609 and E-MTAB-3267. Further analysis indicated that TGFBI might be a novel prognostic biomarker and therapy target for immunotherapy. In addition, three molecule drugs including vincamine, clenbuterol, and betazole were identified, which showed strong potential to treat ccRCC.

Data collection and data preprocessing
Microarray data of ccRCC (TCGA-KIRC data) was downloaded from The Cancer Genome Atlas (TCGA) database (https://genomecancer.ucsc.edu/). After excluding unqualified samples, totally 530 ccRCC samples from ccRCC patients were used in this study, which contained complete clinical information. Moreover, GSE29609 [18] performed on GPL1708 was retrieved from Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). This dataset including 39 ccRCCs with complete clinical information. Furthermore, E-MTAB-3267 including 53 ccRCCs with complete survival information was retrieved from ArrayExpress database (https://www.ebi.ac.uk/arrayexpress/). Both the two datasets were used for validating our findings. Figure 15 showed the flow diagram of this study. For the TCGA-KIRC data displayed as counts number, normalized and log2 transformation were performed based on R package "DEseq.2" [19]. For GSE29069, the raw expression data was normalization and transformation by using R package "affy" [20]. For E-MTAB-3267, we directly downloaded the normalized expression matrix for ArrayExpress database.

Immune score and stromal score calculation and their correlation with clinical phenotype
Based on TCGA-KIRC data, we firstly calculated immune score and stromal score of each sample by using ESTIMATE algorithm, which was applied by "estimate" [11] in R software. Then we compared the score difference between different gender (female vs male), neoplasm histologic grade (Gx, G1, G2, G3, G4), laterality (left vs right), pathologic stage (I, II, III, IV), and person neoplasm cancer status (tumor free vs with tumor), in order to explore the relationship between immune score or stromal score and important clinical phenotype. Moreover, survival analysis was performed to explore the correlation between immune score (or stromal score) and survival by using R package "survival" [21].

Differentially expressed gene (DEG) screening
To pick out biomarkers associated with immune score and stromal score, we identified DEGs in this part. We firstly used R package "maxstat" for identifying the best score cutoff for grouping samples most significantly by applying a method called maximally selected rank statistics. To screen out DEGs related to immune score, we firstly divided 530 ccRCC samples into high-immune group and low-immune group based on the optimal immune score cutoff calculated by "maxstat". Then "limma" [22] package in R was used for DEG screening. As for DEGs correlated to stromal score, we also divided 530 samples into two groups (high-stromal, and lowstromal) based on the optimal stromal score cutoff and screened out DEGs based on "limma" mentioned above. In this study, genes with Adjust P value < 0.05 and |log2 fold change (FC) | ≥ 1.0 were regarded as DEGs. The AGING DEGs which exhibited the same expression trends in immune related DEGs and stromal related DEGs were picked out for further analysis.

Function enrichment analysis
To explore potential functions of DEGs, Gene Ontology (GO) [23] enrichment analysis and Kyoto encyclopedia of Genes and Genomes (KEGG) [24] pathway analysis were performed through "clusterProfiler" [25] in R software. Gene sets were regarded as significantly enriched gene sets when P < 0.05.

Candidate molecule drug identification
Connectivity map (CMap) [26] could help researchers to quickly identify molecule drugs with high correlation with diseases (https://portals.broadinstitute.org/cmap/). Thus, based on the DEGs we screened out before, we performed Connectivity map (CMap) analysis to explore molecule drugs (which had high correlation with ccRCC). In this study, drugs were thought to be highly correlated to ccRCC when number of instances (n) > 5 and P value < 0.05. Moreover, small molecule drugs with |mean| ≥ 0.40 were thought to have great potential for ccRCC treatment in this study.

Candidate hub gene identification
In this study, we aimed to screen out some stromalimmune related prognostic biomarkers in ccRCC. Therefore, we performed two kinds of survival analyses (overall survival (OS) and disease-free survival (DFS)) for each DEG to screen out candidate hub genes based on Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/) [27]. GEPIA collected 516 ccRCC samples with complete survival information, we divided 516 ccRCCs into highexpression group (n = 258) and low-expression group (n = 258) by evaluating the median value of each DEG. DEGs with P value < 0.05 in both the two survival analyses were thought as candidate hub genes.

Hub gene identification
After screening out candidate hub genes, by using R package "basicTrendline" [28], we firstly calculated the R 2 to see if candidate hub genes were associated with OS. P value < 0.05 was significant in this analysis. Furthermore, in order to evaluate the potential of distinguishing normal tissues and ccRCCs, we plotted receiver operating characteristic (ROC) curve and calculated AUC (area under curve) for each candidate hub gene (based on R package "pROC" [29]). In this study, we thought genes with P value < 0.05 in both the two analyses and AUC ≥ 0.80 were hub genes.

Hub gene expressions comparison in tumors and normal tissues and in different pathologic stages
In this part, we firstly explore the expression levels of hub genes in tumors and normal tissues by using GEPIA. Unpaired t test was performed to measure the statistical significance. Based on TCGA-KIRC data with complete stage information, tumor stage (I, II, III and IV) boxplots were also performed. We used Oneway Analysis of Variance (ANOVA) test to measure the statistical significance.

Prognostic value of hub genes validation
Based on E-MTAB-3267 and GSE29609, three different survival analyses including OS, cancer specific survival (CSS), and progression free survival (PFS) were performed to validate the prognostic value of hub genes. P value < 0.05 was thought significantly.

Oncomine analysis and translational-level expression validation
After screening out the hub genes, we assessed the mRNA expression levels of hub genes in ccRCC and normal tissue based on Oncomine database (https://www.oncomine.org/) [30]. Student t test was used to measure the statistical significance. Moreover, we validated the translation-level expression levels of hub genes by using The Human Protein Atlas database (https://www.proteinatlas.org/) [31].

Cancer cell line encyclopedia (CCLE) analysis
To understand the hub genes better, we explored the mRNA expression levels and copy number variation (CNV) levels of hub genes in 40 types of tumors based on CCLE database (https://portals.broadinstitute.org/ccle/).

Hub gene genetical alteration
To explore mutations and CNVs of hub genes, based on CBio Cancer Genomics Portal (http://www.cbioportal.org/) [32,33], we first explored the genetic alterations of hub genes. Combined with relative mRNA expression levels of hub genes, the correlation between CNVs and mRNA expression levels of hub genes was also explored. Furthermore, we identified the relationship between CNVs and survival (OS, and DFS) of ccRCC patients by performing survival analysis.

Exploring relationship between hub genes and clinical features of patients with ccRCC
Based on TCGA-KIRC data, we divided 530 ccRCCs into high-expression group (n = 265) and low-AGING expression group (n = 265) by evaluating the median value of each hub gene expression. Then χ2 test or ANOVA was performed to analyze the correlations between hub gene expressions and clinical features (age, gender, laterality, neoplasm histologic grade, pathologic stage, and person neoplasm cancer status).

Cox proportional hazards regression analysis
In this part, expression levels of hub genes and some important features (age, gender, laterality, neoplasm histologic grade, pathologic stage, and person neoplasm cancer status) were included for univariable Cox analysis of overall survival (OS) by using TCGA-KIRC data. Factors were included for multivariate Cox analysis when P value < 0.05. To do this, we could make sure if these hub genes were irrelevant to other clinical features for predicting OS of ccRCC.

Exploring correlation between hub genes and immune microenvironment
By using TIMER (https://cistrome.shinyapps.io/timer/) [34], we explored the correlation between hub genes expressions and the abundance of immune infiltrates. Six kinds of tumor-infiltrating immune cells (B cells, CD8+ T cells, CD4+ T cells, macrophages, neutrophils, and dendritic cells) were included for this analysis [34]. Hub genes were thought to be significant associated with infiltrating level of an immunocyte when |correlation coefficient (cor) | ≥ 0.2 and P value < 0.05. Survival analysis was also performed between high infiltrating levels and low infiltrating levels of immunocyte.

Exploring relationship between hub gene expression and methylation around in the promoter region
Considering about that MEXPRESS [35,36] was a webtool for DNA methylation, expression, and clinical data visualization (https://mexpress.be/), we used this web tool to explore correlation between hub gene expression and methylation level. We calculated the Pearson correlation for difference (between expression value and methylation level) evaluation. For clinical factors contained two levels, P value was calculated to measure the difference. Furthermore, false discovery rate was calculated to compare the difference for clinical parameters which contained more than two levels.

Gene set enrichment analysis (GSEA)
To explore the potential functions of hub genes, GSEA [37] was conducted by using TCGA-KIRC data. 530 ccRCCs were firstly divided into two groups (high-expression, and low-expression) based on the median of hub genes expression levels. We set "c2.cp.kegg.v7.0.symbols.gmt" as the reference gene sets. KEGG signaling pathways with nominal P < 0.05, |ES| > 0.6, gene size ≥ 100 and FDR < 25% were considered to be significantly enriched in this study.

AUTHOR CONTRIBUTIONS
T. L., G. D., and X. Y. conceived and designed the study, G. D., X. Y., and Z. C. performed the analysis procedures, G. D., X. Y., Z. C., R. Z., K. T., J. B., H. W., and T. L. analyzed the results, T. L., and X. Y. contributed analysis tools, G. D. contributed to the writing of the manuscript. All authors reviewed the manuscript.

ACKNOWLEDGMENTS
We would like to acknowledge the TCGA and GEO database developed by the National Institutes of Health (NIH).

CONFLICTS OF INTEREST
No conflicts of interest existed in this study.