Identification of potential novel differentially-expressed genes and their role in invasion and migration in renal cell carcinoma

Clear cell renal cell carcinoma (ccRCC) remains one of the most common cancer types globally, and while it has been extensively studied, the molecular basis for its pathology remains incompletely understood. Herein, we profiled three previously published datasets (GSE66272, GSE100666, and GSE105261) in a single integrated analysis aimed at identifying disease-associated patterns of gene expression that may offer mechanistic insight into the drivers of this disease. We pooled expression data from 39 normal kidney samples and 39 kidney tumors, leading us to identify 310 differentially expressed genes (DEGs) that were linked to kidney cancer in all three analyzed datasets. Of these genes, 133 and 177 were up- and down-regulated, respectively, in cancer samples. We then incorporated these DEGs into a protein-protein interaction network with the STRING and Cytoscape tools, and we were able to identify signaling pathways significantly enriched for these DEGs. The relationship between DEG expression and ccRCC patient survival was further evaluated using a Kaplan-Meier approach, leading us to identify TIMP1 as an independent prognostic factor in ccRCC patients. When TIMP1 expression was disrupted in ccRCC cell lines, this impaired their migratory and invasive capabilities. In summary, we employed an integrative bioinformatics approach to identify ccRCC-related DEGs and associated signaling pathways. Together these findings offer novel insight into the mechanistic basis for ccRCC, potentially helping to identify novel therapeutic targets for the treatment of this deadly disease.


AGING
As is the case for other solid tumors, ccRCC develops in a stepwise manner over time, with initial genetic aberrations resulting in increasingly pathological changes in cell phenotype that eventually result in oncogenic transformation [5]. Many studies to date have employed an RNA sequencing (RNA-Seq) approach to survey cancer-related changes in gene expression at the whole genome level, yielding comprehensive and complex datasets [6]. By conducting systematic and integrative analyses of the relationship between differentially expressed genes and differentially engaged signaling pathways in tumor samples, it is possible to gain novel insights into ccRCC progression and therapeutic sensitivity. As such, leveraging currently available RNA-seq datasets can serve as a powerful tool for identifying better biomarkers of ccRCC that may help to guide its diagnosis or to better plan treatment in affected patients. At present, targeted therapies for ccRCC are limited due to a relatively poor understanding of which genes function as key drivers of disease [7]. Given that RNA-seq can aid in the identification of such key genes, further comprehensive transcriptomic analyses are essential in order to fully understand this complex and deadly disease.
In the present study, we conducted an integrative analysis of three previously published Gene Expression Omnibus (GEO) datasets (GSE66272, GSE100666, and GSE105261), and we utilized the 'limma' R package designed by the Bioconductor project [8] as well as a Venn diagram program in order to identify cancerrelated differentially expressed genes (DEGs) shared among these three datasets. We then evaluated the enrichment of these DEGs in specific gene ontology (GO) annotated categories and in defined KEGG pathways using the DAVID tool. We further constructed a protein-protein interaction (PPI) network incorporating these DEGs, and we analyzed this network with the MCODE (Molecular Complex Detection) algorithm as a means of identifying significant gene modules therein. The top 20 hub genes within this network were then screened using cytoHubba, and were imported into the Kaplan Meier plotter database to evaluate their prognostic relevance. We further confirmed the ccRCC-related differential expression of key hub genes by evaluating their expression levels in ccRCC and normal kidney tissue samples in the GEPIA database. Through these approaches, we identified Tissue Inhibitor of Metalloproteinases 1 (TIMP1) as a gene potentially associated with ccRCC patient prognosis. Together, our results offer novel insights into the mechanistic basis for ccRCC development, and may suggest novel diagnostic and/or prognostic biomarkers that may be valuable therapeutic targets for the future treatment of ccRCC patients.

Identification of DEGs in renal cell carcinoma
In this study, 39 ccRCC samples and 39 paracancerous normal kidney tissue samples were analyzed. Using the limma software package, 4224, 3879 and 549 DEGs were extracted from GSE66272, GSE100666 and GSE105261, respectively. Figure 1A-1C show the DEGs in two sets of sample data from each of the three microarrays. Then use Venn diagram software to identify the DEGs common to these three datasets. A total of 310 common DEGs were detected in the ccRCC organization, including 177 genes were down-regulated (logFC <1) and 133 genes were up-regulated (logFC> 1) ( Figure 1D).

Gene ontology of DEGs and KEGG pathway analysis in renal cell carcinoma
In order to analyze the biological classification of DEGs, DAVID was used for functional and pathway enrichment analysis. GO analysis showed that BP changes in DEGs were significantly enriched in extracellular matrix organization, excretion, response to drugs, angiogenesis and response to hypoxia. The changes of CC were mainly concentrated in exosome, extracellular region, apical plasma membrane, extracellular space, and basolateral plasma membrane. The changes of MF mainly focus on protein homodimerization activity, ATPase binding, the same protein binding, extracellular matrix binding and extracellular matrix structure composition. Analysis of the KEGG pathway suggested that the DEGs were significantly enriched in carbon metabolism, biosynthesis of antibiotics, aldosterone-regulated sodium reabsorption, collecting duct acid secretion and phagosome (Table 1).

Modular analysis via DEGs protein-protein interaction (PPI) network
To determine the modular structure, STRING online database (available online: http://string-db.org) and the Cytoscape software were used the combined 310 DEGs to construct the PPI network of DEGs ( Figure 1E) with the most important module obtained by using Cytoscape ( Figure 1F). The functional analyses of genes involved in this module were analyzed using DAVID. The results of GO analysis showed that the changes of BPs of important module genes were significantly rich in collagen catabolic process, extracellular matrix organization, response to peptide hormone, cellular response to amino acid stimulus and platelet degranulation ( Table 2). The CC changes of important module genes are mostly concentrated in extracellular AGING region, collagen trimer, endoplasmic reticulum lumen, extracellular matrix, and collagen type IV trimer ( Table  2). The MF changes of important module genes are mainly concentrated on extracellular matrix structural constituent, platelet-derived growth factor binding and protein binding (Table 2). KEGG pathway analysis showed that important modular genes were mostly enriched in receptor interaction, protein digestion and absorption, PI3K-Akt signaling pathway, focal adhesion and amoebiasis (Table 2).    Figure 2A). Table 3 shows the names, abbreviations and functions of these genes. The cBioPortal online platform was used to analyze the network of hub gene and their coexpressed genes ( Figure 2B). The bioprocess analysis of central genes and the enrichment analysis of the Kyoto Encyclopedia of Genes and Genomes (KEGG) are shown in Figures 2C, 2D. Hierarchical clustering indicated that analysis of hub genes essentially allowed kidney cancer samples to be distinguished from non-cancer samples ( Figure 2E).
In order to determine whether the pivotal genes in Clear cell renal cell carcinoma had clinical relevance, correlation analysis was performed with clinical correlation to kidney cancer outcomes according to the report of the Cancer Genome Atlas (TCGA) on the renal cancer data set. Using data from GEPIA, it was noted that ccRCC patients exhibiting genomic alteration in TIMP1 demonstrated a reduction in overall and disease-free survival (P=6.8x10 -7 for overall survival and P=3.7 x10 -5 for disease-free survival) ( Figure 3A, 3B). Therefore, it was considered that the hub gene TIMP1 may play a key role in the progression of clear cell renal cell carcinoma.

Analysis of the most significant hub gene
Based on RNA sequence data from the TCGA database, TIMP1 mRNA expression was compared between kidney tumor samples and adjacent normal tissues. Transcriptional level data demonstrated that TIMP1 expression was highly in 533 ccRCC tissues compared with 72 normal tissues ( Figure 3C). Consistent with this finding, Oncomine analysis of cancerous and normal tissues showed that TIMP1 was greatly overexpressed in ccRCC samples from different data sets ( Figure 4A). As shown in Figure 4B, TIMP1 mRNA expression in ccRCC samples was significantly correlated with late clinical stage, and the highest TIMP1 mRNA expression was observed in stages 3 and 4. Similarly, the relationship between TIMP1 mRNA expression and different pathological grade was determined, which suggested that TIMP1 mRNA expression was significantly correlated with pathological grade ( Figure  4C). In general, increased expression of TIMP1 mRNA in ccRCC patients was significantly correlated with advanced clinicopathologic parameters.
GSEA was used for hallmark analysis of TIMP1. The results suggest that the most significantly involved pathways included epithelial-mesenchymal-transition, inflammatory-response and IL6/JAK/STAT3 signaling, as displayed in Figures 5A-5C. In addition, the expression profiles of the 100 most significant genes are displayed in a heat map ( Figure 5D).

TIMP1 expression correlates with interstitial phenotype in ccRCC cells
GSEA enrichment analysis indicated that TIMP1 was involved in the EMT process. In order to understand the relationship between TIMP1 and the mesenchymal phenotype in ccRCC cells, TIMP1 and EMT represent markers (such as E-cadherin and N-) in 2 different ccRCC cell lines (A498 and Caki-1) and HK2 cells (normal renal cell lines) were examined by qRT-PCR and Western blot ( Figure 6A-6D). As consistently shown by qRT-PCR and Western blot analysis, TIMP1 was abundantly expressed in A498 and Caki-1 mesenchymal cells, which exhibit higher N-cadherin and lower E-cadherin expression than HK2 cells, indicating that that TIMP1 level may be related to the mesenchymal phenotype of ccRCC cell line.

Knockdown of TIMP1 induces MET of A498 and Caki-1 cells
To study the effect of TIMP1 knockdown on EMT in ccRCC cells, lentiviruses with TIMP1 shRNA or NC were transfected into A498 and Caki-1 cells. Stably passaged A498-KD, A498-NC, Caki-1-KD and Caki-1-NC cells were thus obtained. qRT-PCR demonstrated a dramatic decrease in TIMP1 mRNA expression in A498 and Caki-1 cells following lentivirus transfection ( Figure 7A, 7B). Western blot analysis confirmed that TIMP1 protein was inhibited in both A498 and Caki-1 cells ( Figure 7C). Through TIMP1 knockdown, Western blot analysis and qRT-PCR confirmed the enhanced expression of epithelial markers (e.g. E-cadherin) and interstitial markers (e.g. N-cadherin) expression is reduced in A498-KD and Caki-1-KD cells ( Figure 7C-7G). These results suggest that the knockdown of TIMP1 in A498 and Caki-1 cells induced transition to the epithelial phenotype.

Knocking down TIMP1 inhibits migration and invasion of A498 and Caki-1 cells
The cell migration and invasion characteristics are considered to be important consequences of EMT [9,10], so the effect of TIMP1 knockdown on the migration and invasiveness of A498 and Caki-1 cells was studied using transwell migration and invasion assays. The migration rate and invasion rate of A498-KD and Caki-1-KD cells decreased compared with A498-NC and Caki-1-NC cells, respectively ( Figure  8A-8D).

DISCUSSION
ccRCC is among the most common malignancies around the world. Early diagnosis and treatment of ccRCC can greatly improve prognosis and survival of patients with ccRCC. Although the underlying pathogenesis and molecular mechanisms of ccRCC development and progression have been studied using multiple omics technologies, the global mortality rate for ccRCC is still high over past decades. A series of studies reported that ccRCC is accompanied with accumulation of abnormal agents at cellular and molecular level, such as epigenetic, transcriptomic, miRNA, proteomic and metabolomic alternations [11][12][13][14]. These multiple omics studies have attempted to identify early diagnostic biomarkers of renal cancer, and highlight potential heterogeneity and molecular commonalities between different stages of renal cancer. It is well known that ccRCC exhibits important molecular heterogeneity, involving multiple changes of genetic and protein expression levels. Therefore, a series of appropriately-selected candidate biomarkers representative of all such tumors may be essential for identification. With the development of bioinformatic analysis, a number of molecules in ccRCC have been screened as potential novel prognostic biomarkers, but few of them has been systematically validated. Neither of them has been compared with each other to confirm additional studies that are required to identify candidate biomarkers. Potential biomarkers for early detection and prediction of ccRCC progression are still largely unknown.  TIMP1 has been demonstrated to be involved in the progression of tumors involving lung adenocarcinoma, glioma, prostate cancer, breast cancer, colorectal cancer and a number of other cancers [15]. TIMP1 expression is often remarkably enhanced in the in the late stages of such tumors, in addition to in patients with endometrial, breast or brain cancer who have a shorter time prior to relapse [16][17][18]. Consistent with this, a lack of TIMP1 immunostaining is related to improved prognosis in patients with lymph node-positive high-grade breast cancer [19]. In contrast, an increase in TIMP1 levels in tumor tissues was linked to a remarkable declined overall survival in breast cancer patients receiving standard adjuvant chemotherapy [16]. While TIMP1 has been shown to be overexpressed in many malignant tumors, the prognostic value and potential function of TIMP1 in ccRCC is still unclear. In the present study, the effect of different expression levels of TIMP1 in ccRCC on prognosis was explored. We found that the increased expression of TIMP1 in ccRCC was positively correlated with malignant behavior, and high AGING levels of TIMP1 predicted high risk of recurrence and reduced overall survival. This observation reveals a novel role for TIMP1 that influences ccRCC progression through potential DNA damage variants. Meanwhile, through functional enrichment and GSEA analysis, we found that TIMP1 was enriched in several hallmark signaling pathways, including epithelial-mesenchymal transition (EMT), IL6-JAK-STAT3 signaling pathway, and inflammatory response in ccRCC samples.
Inflammation is observed as a basic biological behavior and confirmed as a hallmark of many malignant tumors [20]. Cancer-associated inflammation involves crosstalk between malignant and non-malignant cells in an autocrine and paracrine fashion through mediators such as cytokines, chemokines and prostaglandins [21]. With the combined force of genetic alteration, inflammatory tumor microenvironments contribute towards tumor growth and distant metastasis [22]. Furthermore, the carcinogenic effect of inflammatory response could be inhibited by using anti-inflammatory drugs [23]. For instance, treatment with the anti-inflammatory drug dexamethasone significantly inhibits malignant transformation by inhibiting EMT, the process epithelial cells undergo to gain migration and invasion capabilities [24]. In addition, tumor microenvironments consist of multiple different inflammatory cells and mediators. The development of inhibitors targeting these risk factors markedly delays the development and metastasis of tumors [21]. Therefore, learning more about the pro-inflammatory mechanisms of TIMP1 may provide promise strategy for treatment of ccRCC.
Studies have demonstrated that TIMP1 may be involved in promotion of progression of colorectal cancer, possibly serving as a prognostic hallmark for colorectal cancer [25,26]. In addition, due to the difficulty in diagnosing pancreatic cancer, studies have demonstrated that TIMP1 may serve as an early diagnostic marker for pancreatic cancer.
The expression of TIMP1 in renal cell carcinoma regulates the IL6-JAK-STAT3 pathway. Previous studies have illustrated that IL-6 was related to tumorigenesis and EMT of non-small cell lung cancer [27], distant dissemination of prostate cancer and breast cancer [28], regeneration and drug resistance of breast cancer stem cells [29]. Within tumor microenvironments, IL-6/JAK/STAT3 signaling contributes to growth and metastasis of tumor cells by inhibiting the anti-tumor immune response [30]. These findings suggest that suppression of IL-6 pathway would be a novel strategy for treatment of refractory tumors.
Multiple studies have demonstrated that EMT had premetastatic effect in progression of tumor cells [31,32].
EMT is an embryonic program that relaxes cell-cell adhesion complexes and confers elevated migration and invasion capabilities. After undergoing EMT, Cancer cells are more aggressive and invasive. Moreover, these cells obtain stem-like characteristics, and are resistance to apoptosis [33]. Research data indicates that the EMT program can also facilitate cancer cells to produce proinflammatory factors [34], and the inflammation in turn promotes EMT program in tumors [34]. Therefore, these two phenomena may form positive feedback in the first steps of tumor formation and transfer alliance. Consistent with this, subcutaneous injecting gastric quiescent cancer stem cells to nude mice would induce EMT-like changes and form larger xenografts [35]. In addition, primary tumors may also be resistant to treatment due to the EMT/inflammation alliance [36]. GSEA analysis has shown that elevated expression of TIMP1 is involved in the biological processes of EMT, which may explain why enhanced expression of TIMP1 is correlated with declined overall survival of ccRCC patients. Future studies should now focus on the mechanism by which TIMP1 regulates the biological process of EMT, perhaps providing a target for the treatment of kidney cancer.
The present study attempted to identify candidate DEGs in primary ccRCC and TIMP1 was screened out. Furthermore, we demonstrated that overexpression of TIMP1 may be associated with invasion and migration of ccRCC cells. However, the underlying mechanisms of TIMP1 pathway in ccRCC was failed to investigate in this study. Therefore, more research in future should explore the detailed mechanisms of hub genes in progression of ccRCC.

CONCLUSIONS
In conclusion, our study identified hub genes and genes expressed differentially in healthy compared with primary ccRCC samples from GEO datasets, which may provide novel biomarkers for diagnosis and treatment of ccRCC. In addition, the study is the first to reveal that TIMP1 promotes progression of ccRCC and predicts poor prognosis of ccRCC patients. The study may provide novel and promising insights for subsequent research for elucidation of the molecular mechanisms of ccRCC. In this regard, large clinical cohort study and further elucidation are imperative to uncover the underlying mechanisms and pathogenesis for ccRCC patients.

Gene expression profiling
We downloaded the publicly available GSE66272, GSE100666, and GSE105261 gene expression datasets from NCBI-GEO. These studies compared gene expression profiles in kidney tumor and normal kidney tissue samples. GSE66272 data were generated with the GPL570 platform (Affymetrix Human Genome U133 Plus 2.0 Array), and were based on 27 tumor and 27 normal tissue samples. GSE100666 data were generated with the GPL16951 platform (Phalanx Human OneArray Ver. 6 Release 1), and were based on 3 ccRCC and 3 normal tissue samples. GSE105261 data were generated with the GPL10558 platform (Illumina HumanHT-12 V4.0 expression BeadChip), and were based on analyses of 9 tumor and 9 paracancerous normal kidney tissue samples.

Standardization and elucidation of DEGs
After downloading matrix files for each GEO dataset, we utilized a Robust Multi-Array Average (RMA) algorithm in order to adjust data for background signals, and to conduct quantile normalization and final oligonucleotidesper-transcript summarization with a median polish algorithm. The R Impute package was used to impute missing values based on a k-nearest neighbor (KNN) approach. Bayes methods were used to adjust for batch effects with the sva R package, as published previously, with R v3.6.1 being used for all analyses [37].
DEGs were identified via comparing ccRCC and normal kidney tissue gene expression profiles based on adjusted P-values, fold change values, and Benjamini and Hochberg FDR values. We removed any probe sets that did not correspond to a gene symbol, while genes with multiple probe sets were averaged. DEGs were considered to be significant if they met the criteria: |log2FC| >1.00, P <0.05. An online Venn diagram program was then used to identify shared DEGs among these three datasets, with data being uploaded in a TXT format. Genes were up-and down-regulated if logFC values were > 0 and < 0, respectively.

Functional enrichment analysis
The DAVID (http://david.ncifcrf.gov) (version 6.7) database was used to conduct functional annotation and pathway enrichment analyses for identified DEGs [38]. The KEGG database was used for pathway enrichment analysis of these DEGs in an effort to gain high-level insights into their potential functional relevance [39]. GO enrichment analyses were conducted to assess the enrichment of these DEGs in specific biological processes, molecular functions, and cell components (BPs, MFs, and CCs, respectively) [40]. Herein, DAVID was utilized for these enrichment analyses.

PPI network analysis
DEGs were grouped into a PPI network with the STRING database (http://string-db.org) [41], with a score of >0.4 being used as a significance cutoff for network construction. The network was then visualized and analyzed with the Cytoscape tool (v3.7.1) [42]. The Cytoscape Molecular Complex Detection (MCODE) plugin (v1.5.1) was further used for topological clustering within the network as a means of identifying significantly interconnected gene modules within the overall PPI network [43]. The following criteria were used for significant module identification: degrees of cut-off = 2, node score cut-off = 0.2, Max depth = 100, K-score = 2.

Hub gene selection and analysis
Hub genes were identified as genes that had a ≥10 degree of connectivity. We then utilized cBioPortal (http://www.cbioportal.org) [44] to identify genes that were co-expressed with these hub genes. We additionally utilized the ClueGO Cytoscape plugin, which allows for the visualization of non-redundant terms associated with gene clusters in networks that have been grouped based on functionality [45]. We utilized ClueGO v2.5.4 and CluePedia v1.5.4 to identify and visualize biological processes associated with these hub genes [46]. We additionally performed hub gene hierarchical clustering.

Survival analysis
Differences in survival between groups were evaluated via a Kaplan-Meier approach, with disease-free survival (DFS) as the primary endpoint. DFS was defined as the time between curative treatment and progression, death, or second-line treatment. As a secondary endpoint in these analyses, we assessed overall survival (OS) from diagnosis/treatment to death or most recent follow-up. After comparing survival between groups using Kaplan-Meier curves, 95% confidence intervals (95% CIs) were generated and curves were compared via log-rank tests. The sum of hub gene weights was used to determine overall scores.

Data processing of Gene set enrichment analysis (GSEA)
A GSEA approach was used to analyze TCGA data with the Category v2.10.1 package. In individual analyses, Student's t-tests were used to generate scores to assess consistent changes in DEG expression within pathways of interest. A 1000x permutation test was employed to identify pathways that were significantly altered, and the potential for false-positive result detection was controlled by correcting P-values using the Benjamini and Hochberg false discovery rate approach [47]. Genes were found to be significantly related when adj. P was < 0.01 and FDR < 0.25. R v3.6.1 was used for these statistical analyses.

Lentiviral preparation
HEK 293T cells (American Type Culture Collection, USA) were co-transfected with lentiviral vectors and packaging plasmids using Lipofectamine 3000 (Invitrogen, Shanghai, China) based on provided directions. Following a 48 h incubation, lentiviruscontaining supernatants were harvested.

Lentiviral transduction
A total of 3×105 ccRCC cells were added to 6 cm tissue culture dishes and were incubated overnight, after which 200 μl of an appropriate lentiviral supernatant and polybrene (8 μg/ml) were added to each well. Cells were then incubated at 37°C, and on day three the viruscontaining media was removed. Cells were then rinsed using PBS, and puromycin-containing media (2 μg/ml) was added. After overnight incubation, a subset of cells were collected for western blotting in order to identify stably transduced cells for further experimental utilization. Infection of target cells with lentiviral particles.

Western blotting
RIPA buffer was used to lyse harvested cells. Protein was then separated via SDS-PAGE, transferred to NC membranes (GE Health Care Life Science). Blots were blocked with 5% skim milk in TBST, followed by an overnight incubation with appropriate primary antibodies at 4°C. Antibodies were: anti-TIMP1 (Abcam, ab61224), and anti-Actin (Zen bioscience, 220529). Blots were then probed for 1 h with appropriate secondary antibodies (1:5000), and chemiluminescence was then used for protein visualization.

Transwell assay
A 24-well Transwell system (Greiner Bio-one, Switzerland) was employed for assays examining ccRCC cell invasive activity. The upper surface of these Transwell inserts was first coated using Matrigel (BD Bioscience, USA), after which 200 uL of serum-free media containing 2×105 cells was added to the upper chamber. Next, 500 uL of media containing 10% FBS was added to the lower compartment, and cells were incubated for 48 h at 37°C. A swab was then used to carefully remove cells remaining on the upper surface, while cells that had migrated to the lower surface were treated with 100% methanol for fixation followed by 0.1% crystal violet staining. A total of five fields of view per insert were then visualized at random via microscope (Olympus, Japan; 200x), with the number of cells in each field being quantified.

Ethics approval
The Ethics Committee of Qilu Hospital of Shandong University approved the study.

AUTHOR CONTRIBUTIONS
The work presented here was carried out in collaboration among all authors. LXG defined the research theme, discussed analyses, interpretation, and presentation. SSN, QX, and WS drafted the manuscript, analyzed the data, developed the algorithm and interpreted the results. LY, WWF and ZYF co-worked on associated data collection, cohort validation and helped to draft the manuscript. ZZL helped to perform the statistical analysis. SBK helped to perform the reference collection. All authors read and approved the final manuscript.

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.