Bioinformatics analysis of C3 and CXCR4 demonstrates their potential as prognostic biomarkers in clear cell renal cell carcinoma (ccRCC)

The molecular prognostic biomarkers of clear cell renal cell carcinoma (ccRCC) are still unknown. We aimed at researching the candidate biomarkers and potential therapeutic targets of ccRCC. Three ccRCC expression microarray datasets (include GSE14762, GSE66270 and GSE53757) were downloaded from the gene expression omnibus (GEO) database. The differentially expressed genes (DEGs) between ccRCC and normal tissues were explored. The potential functions of identified DEGs were analyzed by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). And then the protein - protein interaction network (PPI) was established to screen the hub genes. After that, the expressions of hub genes were identified by the oncomine database. The hub genes’ prognostic values of patients with ccRCC were analyzed by GEPIA database. A total of 137 DEGs were identified by utilizing the limma package and RRA method, including 63 upregulated genes and 74 downregulated genes. It is found that 137 DEGs were mainly enriched in 82 functional terms and 24 pathways in accordance with the research results. Thirteen highest-scoring genes were screened as hub genes (include 10 upregulated genes and 3 downregulated candidate genes) by utilizing the PPI network and module analysis. Through integrating the oncoming database and GEPIA database, the author found that C3 and CXCR4 are not only overexpressed in ccRCC, but also associated with the prognosis of ccRCC. Further results could reveal that patients with high C3 expression had a poor overall survival (OS), while patients with high CTSS and TLR3 expressions had a good OS; patients with high C3 and CXCR4 expressions had a poor disease-free survival (DFS), while ccRCC patients with high TLR3 expression had a good DFS. These findings suggested that C3 and CXCR4 were the candidate biomarkers and potential therapeutic targets of ccRCC patients.


Introduction
Renal cell carcinoma (RCC) is the most common kidney malignancies, which originates in the renal tubular epithelium [1]. Among of RCC, clear cell RCC (ccRCC) is the most important histological subtype, accounting for ∼80% of RCC [2]. The vast majority of RCC are discovered by accident. Less than 5% of RCC are detected by the classic triad (gross hematuria, flank pain and abdominal mass) and are often advanced. Due to resistant to radiotherapy and chemotherapy, surgical resection is still the optimal treatment for RCC [2]. Although the emergence of immunotherapy and targeted therapy has diversified the treatment of RCC, the prognosis of patients with RCC who have lost the opportunity of surgery remains dismal [3]. Therefore, it is particularly important to understand the pathogenesis of RCC and investigate biomarkers to support the treatment and prediction of prognosis.
In recent years, bioinformatics analysis of gene expression microarrays could help identify the potential target genes of diseases and provide the molecular characteristics, regulatory pathways and cellular networks of diseases [4]. The gene expression omnibus (GEO, www.ncbi.nlm.nih.gov/geo/) database is an international public functional genomics database, which stores common array and sequence data. In the past decades, more and more scholars had indicated that tumorrelated genes were discovered by using GEO databases in their researches. For instance, Guo et al. found that 31 mostly changed hub genes were significant enriched in several pathways through integrated bioinformatical analysis, which mainly associated with cell cycle process, chemokines and G-protein coupled receptor signaling pathways in colorectal cancer [5]. Besides, Liang's research results indicated that BCL2, CCND1 and COL1A1 might be the key genes in thyroid papillary carcinoma through bioinformatics analysis [6]. What's more, bioinformatics has been widely used in the diagnosis and prognosis of renal cancer. For example, li et al. found that MMP2, DCN, COL4A1, CASR, GPR4, UTS2, and LDLR can be regarded as potential immunotherapy biomarkers for RCC [3]. And Tao constructed a immunerelated gene-based prognostic index, which can effectively predict the prognosis of patients with renal cancer and the associated immunoinfiltrating cells and provide a new method for predicting the prognosis and targeted therapy of renal cancer [7].
Based on the above researches and methods, the author analyzed the gene expression profile of ccRCC by using the GEO database, and then further analyzed the data to provide valuable hub genes for the following translational and clinical research.

Access to public resources
Three expression profiling datasets (GSE14762 [8], GSE66270 [9] and GSE53757 [10]) were downloaded from the Gene Expression Omni -bus (GEO) database of the National Center for Biotechnology Information (NCBI). The GSE14762 dataset included 11 tumor tissue samples and matched normal tissue samples. The GSE66270 dataset included 14 normal tissue samples and 14 tumor tissue samples. And the GSE53757 dataset included 72 tumor tissue samples and adjacent tissue samples. Among of them, the microarray data of GSE14762 was running at the GPL4866 Plaforms, and the microarray data of GSE66270 and GSE53757 were analyzed at the GPL570 Plaforms. Platforms and series matrix files were downloaded as TXT files. Details for GEO ccRCC data were shown in Table 1.

Detection of DEGs
The R language software (version 3.5.0; https://www.rproject.org/) and annotation package were used to handle the downloaded data files. Probe name in the downloaded data files was changed into international standard name. The package in the Bioconductor (http://www.bioconductor.org/) was used for gene distinguish expression analysis. Robust Multi-array Average (RMA) algorithm was used for the gene expression profile data preprocessing. And quantile normalization was performed to normalize the above data. P < 0.05 and [log2 Fold Change] ≥ 2 were regarded as the DEGs screening threshold. The Robust Rank Aggreg (RRA) analysis (http://cran.r-project.org/) was used to list the upregulated and down-regulated genes. DEGs of three datasets were represented by volcano map and hierarchical clustering heat map.

Gene ontology (GO) and KEGG enrichment analyses
The biological processes (BP), molecular functions (MF) and cellular components (CC) of DEGs were explored by applying two online biological tools. The online website g:Profiler (https://biit.cs.ut.ee/gprofiler/gost) was used for Go analysis. And DAVID 6.8 (https://david.ncifcrf.gov/) was used for KEGG analysis. P < 0.05 was considered as the significant threshold for GO and KEGG pathway analysis.

Expression and survival analysis of hub genes
The meta-analysis function of oncomine database (https://www.oncomine.org/) was used to better validate the expression level of hub genes. Besides, online database GEPIA (http://gepia.cancer-pku.cn/detail.php) was an interactive web server, which can analyze the expression of tumor and normal genes. The purpose of this study was to analyze the relationship between the hub genes expression and the survival analysis of [overall survival (OS) and disease free survival (DFS)].

Microarray data information
The RCC expression microarray datasets (GSE14762, GSE66270 and GSE53757) were standardized by RMA algorithm, and the results were shown in Fig. 1  the two groups of sample data included in each of the three databases were shown by volcano plot (Fig. 2). The cluster heatmaps of the top 100 DEGs from the three microarrays were shown in Fig. 3.

DEGs identification in ccRCC
The three microarray databases of RCC were analyzed and sorted by the limma package (threshold: P < 0.05 and [log2 Fold Change] ≥ 2), and then further analyzed by the RRA method. As a result, 137 DEGs were identified, including 63 overexpressed genes and 74 underexpressed genes ( Table 2). The heatmap of the top 20 overexpressed and under-expressed genes was revealed by R-heatmap software in Fig. 4.

GO and KEGG analysis of DEG
The author futher understood the function of hub genes include BP, CC and MF by using DAVID database. Significant results of the GO enrichment analysis of DEGs in ccRCC are shown in Table 3. As shown in Fig. 5a and b, GO analysis (threshold: P < 0.05 and count≥2) demonstrated that ccRCC hub genes were mainly enriched in 50 terms of BP group, such as response to hypoxia, oxidation-reduction process and proteolysis. In CC group, DEGs were enriched in 21 terms, such as extracellular exosome, plasma membrane and membrane integral component. Similarly in MF group, DEGs were enriched in 11 terms, such as identical protein binding, receptor binding and heparin binding. As shown in Fig.  5c, the result illustrated the relationship between the different functions of cytoscape software. The significantly enriched pathways were submitted to KEGG analysis to further analyze the above DEGs. As shown in Table 4 and Fig. 5d, the significant pathway enrichment of DEGs was indicated by KEGG analysis. These DEGs were enriched in 24 pathways, which mainly related to metabolic pathways, phagosome and other pathways.  . Red grid shows that the genes expression is uoverexpressed, green grid shows that the genes expression is under-expressed, black grid shows that there are no significant difference and gray grid shows that genes are too weak to be detected

PPI network and module analysis
String database was used to generate PPI networks of DEGs in RCC. Figure 6a showed the relationship between the 137 candidate hub genes. Besides, MCODE application was applied to screen out the highest-scoring nodes. And Fig. 6b displayed the module with the highest score (score = 10, node = 11, edges = 50). As a result, MCODE application selected 13 nodes with the highest score, including 10 upregulated candidate genes (C1QA, C1QB, C3, CTSS, CXCR4, FCER1G, ITGB2, TLR2, TLR3 and TYROBP) and 3 downregulated candidate genes (AQP2, KNG1, PLG).

Expression and survival analysis of hub genes
The oncomine database and GEPIA database were applied to further explore the expression and prognosis of the above screened genes. Six analyses were obtained from the oncomine database (Fig. 7). The significant (P < 0.05) expression of 10 genes were suggested by the result of meta-analysis. Figure 8 indicated the OS and DFS of 10 genes. And the result demonstrated that ccRCC patients with high C3 expression had a poor OS, while ccRCC patients with high CTSS and TLR3 expressions had a good OS. Besides, in ccRCC patients, high C3 and CXCR4 expressions indicated a poor DFS, while high TLR3 expression indicated a good DFS. Finally, C3 and CXCR4 were selected to distinguish the prognosis of ccRCC patients.

Discussion
Kidney cancer accounts for about 2 to 3% of adult malignant tumors, and 80 to 90% of adult renal malignancies. In 2012, about 338, 000 kidney cancer cases were newly discovered, accounting for 24% of all tumors; and there were 144,000 death cases, accounting for 17% of all tumors [11]. RCC was the most common kidney malignancies. The early symptoms of RCC were not obvious, and most patients are diagnosed with advanced stage or metastasis [12]. RCC was characteristic of easy recurrence and metastasis because of its complexity of the causes and pathogenesis. Moreover, it was insensitive to the traditional chemoradiotherapy. Under the influence of these reasons, RCC usually leaded to poor clinical outcomes. Hence, it could improve the diagnosis, treatment and prognosis of RCC via understanding more of the biological molecular mechanism.
The sequencing technology and bioinformatics are developing gradually, the collection and analysis of previous data will support to explor the pathogenesis of RCC and discover possible biomarkers for diagnosis andtreatment [13].
Bioinformatics method is a highly efficient research pathway, which could promote the development of related gene or group of disease by analyzing the biological data. At the present, Bioinformatics have been widely used at all areas, including medical research, the design of the discover disease-related genes, clinical diagnosis of disease, individualized treatment of diseases and new molecular targets for drug discovery [14].
One hundred thirty-seven DEGs were identified in this study, including 63 overexpressed genes and 74 underexpressed genes. It was found that these DEGs were mainly enriched in 82 terms and 24 pathways through GO and KEGG analysis. Thirteen highest-scoring genes were screened as hub gene through PPI network. Further verification based on the oncomine platform indicated that 10 hub genes (C1QA, C1QB, C3, CTSS, CXCR4, FCER1G, ITGB2, TLR2, TLR3 and TYROBP) had significantly highly expressed. Finally, through the GEPIA platform, the author found that ccRCC patients with high C3 expression had a poor OS, while ccRCC patients with high CTSS and TLR3 expressions had a better OS. Meanwhile, high C3 and CXCR4 expressions were associated with a poor DFS, while patients with high TLR3 expression had a good DFS.
As a protein coding gene, complement component 3 (C3) is involved in the occurrence and development of many diseases, including C3 deficiency, Autosomal Recessive and Hemolytic Uremic Syndrome, Atypical 5 [15]. And its related pathways are Immune response Table 2 The genes differentially expressed both in GEO database were identified in ccRCC samples Upregulated DEGs EGLN3, CA9, ANGPTL4, IGFBP3, ENO2, NDUFA4L2, SPAG4, HK2, CXCR4, APOC1, NOL3, LAPTM5, LPCAT1, PSMB9, CTSS, TYROBP,  NETO2, RRM2, TMEM45A, CAV2, LOC101928916 /// NNMT, TNFAIP6, PFKP, TLR3, LGALS1, MIR6787 /// SLC16A3, C3, COL23A1,  C1QA, CSTA, CAV1, ITGB2, SEMA5B, PLOD2, C1QB, TRIB3, MS4A6A, PDK1, BIRC3, DDB2, ENTPD1, TREM2    Lectin induced complement pathway and Signaling by GPCR. In previous reports, C3 was demonstrated as a potential prognostic marker for non-small cell lung cancer and may be a new immune marker to differentiate the prognosis of patients with non-small cell lung cancer [16,17]. Besides, Yuan et al. demonstated that overexpressed C3 could activate the JAK2/STAT3 pathway, which affected the progression of gastric cancer [18]. In addition, it had been reported that tumor cell-derived C3 could regulated TAMs through C3a-C3aR-PI3Kγ pathway to suppress the antitumor immunity [19]. CTSS (Cathepsin S) is a protein coding gene. Previous articles in papillary thyroid carcinoma reported that CTSS was highly expressed and related to transformation. These results revealed that the highly expression of CTSS was associated with poor prognosis and lymph    node metastasis [20]. Similarly, it had been reported that CTSS was over-expressed in triple-negative breast cancer, and the inhibition of CTSS could be conducted by inhibiting the growth and metastasis of triple-negative breast cancer [21]. Prof. Dheilly found that follicular lymphoma patients harbor a recurrent hotspot mutation targeting tyrosine 132 (Y132D) in cathepsin S (CTSS) that enhances protein activity. Futher study revealed that it could enhanced the anti-tumor immune responses in Non-Hodgkin Lymphoma by inhibiting CTSS [22]. In this study, the author analyzed the research data and found that CTSS was indeed highly expressed in RCC, but the high expression was associated with better prognosis. The prognosis of patients with high expression was even better, which is an opposite effect between expression and prognosis. The potential reasons for the inconsistent findings need further investigations. As a member of the Toll-like receptor (TLR) family, previous studies had reported that TLR3 was abnormally expressed in a variety of tumors, including breast, ovarian and prostate tumors. But TLR3 was associated with the clinical outcomes of various cancers [23,24]. Francesca revealed that TLR3 could induce apoptpsis in Non-Small-Cell Lung Cancer via boosting the innate immune response [25]. Besides, Fan's result demonstated that TLR3 suppressed the proliferation by downregulating the EGFR/PI3K/AKT pathway in brest cancer [26]. Similarly, TLR3 was also downregulated in hepatocellular carcinoma. And deep reseach showed that overexpression of TLR3 was associated with longer survival [27]. In this study, TLR3 was highly expressed in RCC but it was related to the better prognosis result. Chemokine receptor-4 (CXCR4) belongs to the superfamily of the seven-transmembrane domain, heterotrimeric G-protein-coupled receptors and is associated with cell proliferation, migration, invasion and survival. In the previous reports, it had been demonstrated that CXCR4 was upregulated in sporadic Vestibular schwannomas (VS) as well as in neurofibromatosis type 2 (NF2) tumors [28][29][30]. Besides, SDF-1 (CXCL12)/CXCR4 signaling has been verified to play a vital role in oncobiology, especially in hypoxia adaptation, metastasis and migration [31]. What's more, the CXCR4 antagonists (such as AMD3100, Mozobil®) were widely applied in hematopoietic stem cells, which could dramaticly increase the mobilization efficiency and yields of progenitor cells [32]. The results in this study showed that CXCR4 was over-expressed in RCC and associated with poor prognosis. However, the role of CXCR4 in RCC has been poorly studied. Therefore, the further exploration of the mechanism of CXCR4 in RCC will help people to find new therapeutic targets.

Conclusion
In summary, the author identified two ccRCC-associated candidate genes (C3 and CXCR4) with potential prognostic value via bioinformatics analysis of three expression profile datasets from the GEO database. Additionally, in this study, it have been found that CTSS and TLR3 were abnormally expressed in ccRCC and associated with ccRCC prognosis. However, their expression level is contrary to the prognosis. These novel biomarkers may have important clinical significance for the diagnosis and prognosis of RCC, but their detailed action mechanism in the development of renal carcinoma needs to be further explored. In the following   studies, the author will further verify the expression of the above genes in renal cancer through RT-QPCT. In addition, its downstream target genes and signaling pathways need to be explored and verified by cell experiments in vitro and animal experiments in vivo, which will help the author to better understand its developmental mechanism in renal cancer.