Key genes associated with prognosis and metastasis of clear cell renal cell carcinoma

Background Clear cell renal cell carcinoma (ccRCC) is a tumor that frequently shows the hematogenous pathway and tends to be resistant to radiotherapy and chemotherapy. However, the exact mechanism of ccRCC metastasis remains unknown. Methods Differentially expressed genes (DEGs) of three gene expression profiles (GSE85258, GSE105288 and GSE22541) downloaded from the Gene Expression Omnibus (GEO) database were analyzed by GEO2R analysis, and co-expressed DEGs among the datasets were identified using a Venn drawing tool. The co-expressed DEGs were investigated using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis, and hub genes were determined based on the protein-protein interaction network established by STRING. After survival analysis performed on UALCAN website, possible key genes were selected and verified in ccRCC cell lines and ccRCC tissues (n = 44). Statistical analysis was conducted using GraphPad Prism (Version 8.1.1). Results A total of 104 co-expressed DEGs were identified in the three datasets. Pathway analysis revealed that these genes were enriched in the extracellular matrix (ECM)–receptor interaction, protein digestion and absorption and focal adhesion. Survival analysis on 17 hub genes revealed that four key genes with a significant impact on survival: procollagen C-endopeptidase enhancer (PCOLCE), prolyl 4-hydroxylase subunit beta (P4HB), collagen type VI alpha 2 (COL6A2) and collagen type VI alpha 3 (COL6A3). Patients with higher expression of these key genes had worse survival than those with lower expression. In vitro experiments revealed that the mRNA expression levels of PCOLCE, P4HB and COL6A2 were three times higher and that of COL6A3 mRNA was 16 times higher in the metastatic ccRCC cell line Caki-1 than the corresponding primary cell line Caki-2. Immunohistochemistry revealed higher expression of the proteins encoded by these four genes in metastatic ccRCC compared with tumors from the corresponding primary sites, with statistical significance. Conclusion PCOLCE, P4HB, COL6A2 and COL6A3 are upregulated in metastatic ccRCC and might be related to poor prognosis and distant metastases.


Introduction
CcRCC is a highly invasive malignancy that accounts for 85-90% of renal cell carcinoma [1]. About 17-30% of ccRCC patients suffered from distant metastasis by the time of diagnosis [2]. Since ccRCC is not sensitive to radiotherapy and chemotherapy, radical or partial nephrectomy is still the main treatment, with 20-40% of local recurrence or distant metastasis [3]. With the in-depth studies of ccRCC driving genes, medications towards driving gene such as hypoxia-inducible factor 2a (HIF2a) has been preliminarily veri ed and raised the hope of new targeted therapy [4]. However, the mechanism of ccRCC metastasis is still unclear. Patients with metastasis cannot get the proper treatment to improve their survival. Hence, it is extremely urgent to clarify the mechanism of ccRCC metastasis and obtain effective therapeutic targets.
Recently, microarray technology and bioinformatics analysis have facilitated the exploration of genetic alterations among various cancers. In a variety of tumors, bioinformatics analysis helps researchers to nd out genes speci cally related with distant metastasis. With bioinformatics analysis, researchers found sodium-potassium-chloride cotransporter 1 (NKCC1) in gastric cancer tissue was signi cantly higher than that in normal gastric tissue. NKCC1 could promote the proliferation, invasion, migration and epithelial to mesenchymal transition (EMT) of gastric cancer cells [5]. In oesophageal squamous cell carcinoma (ESCC), U Three Protein 14a (UTP14A) screened by bioinformatics analysis was proved to promote the proliferation and metastasis of ESCC cells by regulating PERK/eIF2a signaling pathway [6].
In order to screen out the key molecular events involved in metastatic ccRCC, we downloaded and analyzed three gene expression microarray datasets on ccRCC from GEO. By analyzing the GO and KEGG enrichment in the co-expressed DEGs, we intended to investigate pathways involving in ccRCC metastasis. We also would like to sort out key genes associated with ccRCC prognosis and metastasis through PPI network, survival analysis and in vitro veri cation.

Identi cation of DEGs
GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r), an interactive web tool for DEGs analysis of GEO series across experimental conditions, was used to analyze the DEGs between primary ccRCC and metastatic ccRCC within three datasets. Gene expression with |log2FoldChange| > 0.5 and P < 0.05 were considered as statistical signi cance. Next, the online Venn drawing tool (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to obtain the intersecting co-expressed DEGs among these three datasets.

Bio-Informatic analysis of DEGs, hub genes selection and analysis
The GO datasets (biological process (BP), cellular component (CC), molecular function (MF) ) [11] and KEGG pathway [12] functional enrichment analysis of DEGs were conducted by DAVID (The Database for Annotation, Visualization and Integrated Discovery, http://david.ncifcrf.gov/,version6.8) [13] and P < 0.05 was considered to be statistically signi cant. STRING (Search Tool for the Retrieval of Interacting Genes, http://string-db.org, version 11.0) was applied to predict the PPI network of DEGs [14]. An interaction with a combined score > 0.4 among DEGs was recognized as statistical signi cance. MCODE (Molecular Complex Detection) app of Cytoscape(version 3.7.1) [15] was used to cluster the PPI network of DEGs based on topology and to identify the closely connected area [16]. The criteria were as follows: degree cut-off = 2, node score cut-off = 0.2, Max depth = 100 and k-score = 2.
Degree ≥ 10 was set to select the hub genes. Survival analysis of hub genes was conducted on UALCAN (http://ualcan.path.uab.edu/index.html), with a P < 0.01 as statistically signi cant [17].
IHC slides were scanned with a Digital Pathological section Scanner (KF-PRO-005) and then analyzed with Image Pro Plus to obtain the semi-quanti cation expression of each protein. The representative positive areas were selected in 400X magni cation, which was also called the area of interesting(AOI).
The integral optical density (IOD) was measured and calculated as the pixel area. The higher mean IOD of AOIs was considered to have a stronger expression.

Statistical analysis of qRT-PCR and IHC staining
Statistical analysis of qRT-PCR was analyzed by unpaired t-test, and immunohistochemistry semiquanti cation was analyzed by paired t-test of GraphPad Prism (Version 8.1.1). P-value < 0.05 was considered as statistically signi cant.

Identi cation of DEGs in ccRCC
A total of 996 DEGs in GSE85258, 711 DEGs in GSE105288 and 20820 DEGs in GSE2254, respectively, were obtained. Using the Venn diagram, 104 intersecting co-expressed DEGs were identi ed among the three datasets (Fig. 1A).
3.2. Bio-informatic analyses of DEGs and hub genes identi cation GO and KEGG pathway enrichment analyses were applied in the 104 co-expressed DEGs with DAVID online. The GO results revealed that in biological process (BP), DEGs were mainly enriched in extracellular matrix organization, cell adhesion, collagen catabolic process, proteolysis, oxidation-reduction process, positive regulation of cell proliferation, etc. (Fig. 1B). In molecular function (MF), DEGs were predominantly enriched in integrin binding, SMAD binding, collagen binding, cell adhesion molecule binding, heparin binding, protease binding, extracellular matrix structural constituent, peptidase activator activity, and serine-type peptidase activity (Fig. 1C). As for cellular component (CC), DEGs were enriched in extracellular exosome, extracellular space, extracellular region, extracellular matrix, proteinaceous extracellular matrix, collagen trimer, endoplasmic reticulum lumen and basement membrane (Fig. 1D). KEGG pathway analysis indicated that the DEGs were enriched in focal adhesion, ECM-receptor interaction, protein digestion and absorption, PI3K-Akt signaling pathway, platelet activation, complement and coagulation cascades, etc. (Fig. 1E).

PPI network construction,hub gene selection and key genes identi cation
STRING online database demonstrated the PPI of the DEGs. Among them, the most signi cant module containing 17 hub genes was established by the MCODE app of Cytoscape ( Fig. 2A, 2B). GO and KEGG pathway enrichment analyses of 17 hub genes were showed in Table 1. From GO enrichment results, 17 hub genes were mainly enriched in collagen bril organization, cellular response to amino acid stimulus and protein heterotrimerization in BP, proteinaceous extracellular matrix, collagen trimer and extracellular space in CC, and extracellular matrix structural constituent, platelet-derived growth factor binding and peptidase activator activity in MF. KEGG pathways were mainly enriched Protein digestion and absorption, ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway (Table 1). 4 key genes, PCOLCE, P4HB, COL6A2 and COL6A3 were sorted out with P-value < 0.01 among the 17 hub genes after survival analysis performed on the UALCAN website. Data showed that patients with higher expression of PCOLCE, P4HB, COL6A2 and COL6A3 had worse survival than those with lower expression (Fig. 2C-2F).

In vitro testi cation of key genes
qRT-PCR showed PCOLCE, P4HB, COL6A2 and COL6A3 were all upregulated in Caki-1 cell lines when compared with its primary counterpart Caki-2 cell line (P < 0.05). (Fig. 3A). Protein expression of these four genes was examined with immunohistochemistry using 22 pairs of primary ccRCC and their corresponding metastatic counterpart tissue. Mean IOD results of semi-quanti cation showed that in most paired tissues, higher expressions of all 4 proteins were observed in metastatic ccRCC tissue rather than primary sites(P < 0.05). In all, higher expressions in metastatic counterparts were observed in 19 out of 22 pairs regarding COL6A2 and PCOLCE, 18 out of 22 pairs regarding COL6A3 and P4HB expression. (Fig. 3B).

Discussion
CcRCC was believed to raise from the proximal tubular epithelium with aggressive behavior that was unparallel with its mild appearance. After surgery, the median survival time of patients with metastasis was only 13 months, and the 5-year survival rate was merely 12.3% [18]. As one of the main genetic events of ccRCC pathogenesis, the mutant VHL gene pathway has become the target of treatment towards ccRCC [4]. Hopefully, with the help of bio-informatic data management and further veri cation, potential key genes could be identi ed and act as new targets for ccRCC patients with metastasis.
In our study, bioinformatic analysis from the 104 co-expressed DEGs showed concrete pathways that might be related to ccRCC metastasis. GO analysis showed DEGs mainly distributed in extracellular space or exosome and were enriched in pathways with extracellular matrix organization, cell adhesion, integrin binding and extracellular matrix structural constituent, etc. KEGG pathway analysis revealed these DEGs enriched in focal adhesion, ECM-receptor interaction and PI3K-Akt signaling pathway. Exosomes, by carrying proteins, ceramide, mRNA or micro-RNA, can induce metastasis through participation in EMT, mesenchymal-epithelial reverting transition (MErT), and neoangiogenesis [19]. For example, in ccRCC cell lines, exosomes containing miR-19b-3p can initiate EMT and accelerate tumor metastasis by reducing PTEN expression [20]. EMT is essential in tumor metastasis through processes such as disorienting cell polarity, disconnection with the basement membrane, acquiring plasticity, invasiveness and anti-apoptosis [21]. Studies have shown that the PI3K-Akt signaling pathway might induce the EMT process directly or through cross-talk with the Wnt/b-catenin signaling pathway, TGF-b, or NF-kB and initiate distant metastasis [22]. In ccRCC, the phosphorylation of AKT, which was a key kinase of the PI3K signaling pathway was reported to be related to EMT [23]. Our results also showed DEGs enriched in the PI3K signaling pathway, suggesting that PI3K signaling pathway-mediated EMT was likely to participate in the metastasis of ccRCC. To sum up, our analysis results have indicated various processes that might be associated with ccRCC metastases.
During the effort of establishing key genes, we rst obtained 17 hub genes from the PPI network analysis. Using survival analysis, PCOLCE, P4HB, COL6A2 and COL6A3 were found related to ccRCC patient survival, which were then further tested with qRT-PCR and IHC staining. We found that these 4 genes presented higher expression in the metastatic counterpart of ccRCC both in cell lines and in the paired ccRCC tissues. Among them, P4HB and COL6A3 have been reported to be associated with ccRCC, whereas PCOLCE and COL6A2 were found to be associated with ccRCC for the rst time. By making primary exploration, we thought these 4 genes might play an important role during the metastases of ccRCC.
To our knowledge, this was the rst time that PCOLCE was found to be related to ccRCC. PCOLCE is a secreted glycoprotein that participates in the maturation of procollagen and ECM reconstruction by enhancing the activity of bone morphogenetic protein-1 (BMP-1) [24][25][26]. PCOLCE was overexpressed in osteosarcoma, which could promote lung metastasis via twist family bHLH transcription factor 1 (TWIST1). Reducing PCOLCE expression could prevent the migration, invasion and lung metastasis of osteosarcoma cells [27]. BMP-1 was reported to be up-regulated in gastric cancer associated with poor survival and distant metastasis [28]. Importantly, BMP-1 is elevated in ccRCC and promoted proliferation, migration, and invasion in ccRCC cell lines [29]. Our study showed PCOLCE, as an active enhancer of BMP-1, was higher in metastatic ccRCC, which raised the hypothesis that PCOLCE might be involved in ccRCC metastasis through regulation on BMP-1. Therefore, PCOLCE might act as a potential research target on ccRCC metastasis and more data was needed for further research.
P4HB, a beta subunit of Prolyl 4-hydroxylase, might help cancer cells survive from apoptosis induced by endoplasmic reticulum (ER) stress [30,31]. P4HB promoted proliferation, invasion, migration and angiogenesis in glioma through mitogen-activated protein kinase (MAPK) signaling pathway [32]. In gastric cancer, P4HB was proved to play an important role on regulating invasion and migration in the hypoxia-inducible factor 1a (HIF1a) network [33]. Previous studies showed that overexpression of P4HB in ccRCC was associated with poor outcome [34]. In our study, we further found that P4HB was speci cally elevated in metastatic ccRCC. Hence, we considered P4HB might play a key role in modulating ccRCC metastasis.
Also, COL6A2 was found to be related to ccRCC progression for the rst time. COL6A1-3 were three subunits of collagen VI [35,36]. Strong evidence had demonstrated COL6A1-3's role in promoting the progression of tumors [37]. COL6A2 was dysregulated and associated with poor prognosis by involving in TGFβ1 pathway in serous ovarian cancer [38,39]. COL6A3 may take part in the tumorigenesis and progression of cholangiocarcinoma through E2F1/LMCD1-AS1/miR-345-5p/COL6A3 axis and might act as a prognostic factor for pancreatic cancer [40]. Up-regulation of COL6A1 in ccRCC was associated with poor prognosis of patients and promoted tumor growth in vivo [41]. COL6A3 was reported to be overexpressed in metastatic renal cell carcinoma (RCC) and associated with poor survival [42]. To date, there is no report mentioning the relation between COL6A2 and ccRCC. As COL6A2 and COL6A3 were identi ed as hub genes in our results, an in-depth research is worth trying to clarify their mechanism on ccRCC metastasis.
With meaningful ndings in our study as to establishing 4 possible key genes related to ccRCC metastasis, there are many limitations. Firstly, we only conducted 22 paired ccRCC samples and 2 cell lines to verify the bio-informatic ndings, the size of which needs to be enlarged to obtain more reliable results. Secondly, the mechanism of four key genes in regulating ccRCC cells was still unclear. Further exploration should be conducted to clarify the function and potential mechanism of the four key genes both in vitro and in vivo.

Conclusion
In conclusion, our studies established 4 key genes (PCOLCE, P4HB, COL6A2 and COL6A3) that were related to patient survival and might be involved in ccRCC metastasis. These 4 key genes may be involved in the ECM-receptor interaction, focal adhesion and PI3K-Akt signaling pathway, etc. In vitro, experiments have shown increased expression of these 4 genes in metastatic ccRCC both in cell lines and tissues. Further investigation was worth taking out to evaluate the signi cance and mechanism of these key genes.