For robust big data analyses: a collection of 150 important pro-metastatic genes

Metastasis is the greatest contributor to cancer-related death. In the era of precision medicine, it is essential to predict and to prevent the spread of cancer cells to significantly improve patient survival. Thanks to the application of a variety of high-throughput technologies, accumulating big data enables researchers and clinicians to identify aggressive tumors as well as patients with a high risk of cancer metastasis. However, there have been few large-scale gene collection studies to enable metastasis-related analyses. In the last several years, emerging efforts have identified pro-metastatic genes in a variety of cancers, providing us the ability to generate a pro-metastatic gene cluster for big data analyses. We carefully selected 285 genes with in vivo evidence of promoting metastasis reported in the literature. These genes have been investigated in different tumor types. We used two datasets downloaded from The Cancer Genome Atlas database, specifically, datasets of clear cell renal cell carcinoma and hepatocellular carcinoma, for validation tests, and excluded any genes for which elevated expression level correlated with longer overall survival in any of the datasets. Ultimately, 150 pro-metastatic genes remained in our analyses. We believe this collection of pro-metastatic genes will be helpful for big data analyses, and eventually will accelerate anti-metastasis research and clinical intervention.


Background
Cancer metastasis is the greatest cause of death in almost all types of malignancies [1]. Multiple factors from the tumor and the host contribute to the formation and progression of distant secondary tumors [1,2], and most of the mechanistic studies to date have mainly focused on the metastatic potential of tumor cells. It is believed that the metastasis of single cancer cells begins with the cells gaining the ability to migrate and invade. The cancer cells can gain motility in several ways, including epithelialmesenchymal transition (EMT) and fusion of cancer cells to highly mobile bone marrow-derived cells [3,4]. In the metastases formed by clusters of tumor cells, EMT may not be necessary [5]; however, the layer of endothelial cells enveloping the entire tumor cluster/embolus seems critical for the survival of tumor clusters [6].
The ability to identify cancer patients with a high risk of metastasis is essential in the era of precision medicine. In addition to applying clinicopathologic parameter combination, also known as clinical prognostic classifiers in some circumstances, molecular profiling based on highthroughput technologies is expected to allow for a more accurate and robust prognostic prediction of metastatic potential in patients. How to effectively analyze big data generated from high-throughput screening is an emerging issue for many bioinformaticians. We hypothesize that, with optimal weighting on the impact of each individual gene, a collection of key pro-metastatic genes could be useful to generate a prognostic tool to identify the metastatic potential of a specific tumor and novel signaling pathways underlying metastasis.

Main text
The increased investigation of cancer metastasis in recent years has identified over 200 pro-metastatic genes. In this review, we aim to identify a group of key pro-metastatic genes with in vivo functional evidence and reasonable clinical relevance for application to big data analyses. Figure 1 summarizes the analytic procedure of this review. First, we carefully selected 285 genes from the literature through searching PubMed based on the following criteria: (1) author-provided evidence of promoting migration and/or invasion of cancer cells; (2) authorprovided evidence of promoting metastasis in vivo using animal models; (3) when a gene has been reported as pro-metastatic in several articles, all articles reporting the link were reviewed, and the most convincing studies are listed as the key references in     Although different tumor types are believed to rely on different molecular mechanisms for metastasis, 23 common pro-metastatic genes have been identified in our analyses, associating with poor prognosis in both cancer types. Among them, we are most interested in 11 genes that are not only statistically significant in terms of prognostic impact but also associated with distinct overall survival curves in both cohorts, suggesting the genes' profound biological impacts on tumor progression. For the other 12 genes, although their biological Fig. 2 The survival curves of two cohorts of cancer patients comparing the mRNA expression levels of 11 genes. The data were retrieved from The Cancer Genome Atlas (TCGA) database. The survival curves were plotted using the Kaplan-Meier method and compared using the log-rank test. Consistently, among all 11 genes presented in this figure, elevated gene expression levels significantly associate with shorter overall patient survival (P < 0.05) in both tumor types. ccRCC clear cell renal cell carcinoma, HCC hepatocellular carcinoma impact on tumor progression were found to be significant in log-rank tests in both cohorts, the survival curves of high versus low expression groups crossed at some time points. The 11 most interesting genes are BIRC5 (Survivin), CXCL1, CXCL8 (IL8), E2F1, ETV4, EZH2, MMP1, MMP9, MYB, PTTG1, and YBX1. Figure 2 shows the survival curves of patients with either ccRCC or HCC expressing these 11 genes. Our findings suggest that different tumor types may partially share some common metastatic mechanisms, therefore strengthening the rationale of applying the list of 150 pro-metastatic genes to big data analyses. Interestingly, 4 of these 11 genes encode secreted proteins, namely, CXCL1, CXCL8, MMP1, and MMP9, which are ideal pharmaceutical targets for blocking cancer metastasis.
Although not covered in this review article, emerging data regarding the regulatory roles of non-coding RNA in metastasis have linked different pro-metastatic genes to forming signaling cascades [7][8][9]. Further investigation into the roles of non-coding RNA in metastasis is warranted.

Conclusions
In summary, we present here a collection of 150 important pro-metastatic genes for big data analyses. We expect more key molecules to be identified and validated in the near future to be included in the list, thereby accelerating the efforts in preventing and treating cancer metastasis.