Abstract
Despite multiple publications, molecular signatures predicting the course of hepatocellular carcinoma (HCC) have not yet been integrated into clinical routine decision-making. Given the diversity of published signatures, optimal number, best combinations, and benefit of functional associations of genes in prognostic signatures remain to be defined. We investigated a vast number of randomly chosen gene sets (varying between 1 and 10,000 genes) to encompass the full range of prognostic gene sets on 242 transcriptomic profiles of patients with HCC. Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential by separating patient subgroups with significantly diverse survival. This was further substantiated by investigating gene sets and signaling pathways also resulting in a comparable high number of significantly prognostic gene sets. However, combining multiple random gene sets using “swarm intelligence” resulted in a significantly improved predictability for approximately 63% of all patients. In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival. For all other patients, a reliable prediction seems highly unlikely for any selected gene set. Using a machine learning and independent validation approach, we demonstrated a high reliability of random gene sets and swarm intelligence in HCC prognosis. Ultimately, these findings were validated in two independent patient cohorts and independent technical platforms (microarray, RNASeq). In conclusion, we demonstrate that using “swarm intelligence” of multiple gene sets for prognosis prediction may not only be superior but also more robust for predictive purposes.
Key messages
-
Molecular signatures predicting HCC have not yet been integrated into clinical routine
-
Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential; independent of the technical platform (microarray, RNASeq)
-
Using “swarm intelligence” resulted in a significantly improved predictability
-
In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival
-
Overall, “swarm intelligence” is superior and more robust for predictive purposes in HCC
Similar content being viewed by others
References
Cao H, Phan H, Yang LX (2012) Improved chemotherapy for hepatocellular carcinoma. Anticancer Res 32:1379–1386
Llovet JM, Montal R, Sia D, Finn RS. Molecular therapies and precision medicine for hepatocellular carcinoma. Nat Rev Clin Oncol 2018
Teufel A, Staib F, Kanzler S, Weinmann A, Schulze-Bergkamen H, Galle PR (2007) Genetics of hepatocellular carcinoma. World J Gastroenterol 13:2271–2282
Marquardt JU, Galle PR, Teufel A (2012) Molecular diagnosis and therapy of hepatocellular carcinoma (HCC): an emerging field for advanced technologies. J Hepatol 56:267–275
Teufel A, Marquardt JU, Galle PR (2012) Novel insights in the genetics of HCC recurrence and advances in transcriptomic data integration. J Hepatol 56:279–281
Kim K, Zakharkin SO, Allison DB (2010) Expectations, validity, and reality in gene expression profiling. J Clin Epidemiol 63:950–959
Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, Durnez A, Demetris AJ, Thorgeirsson SS (2004a) Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology 40:667–676
Samur MK (2014) RTCGAToolbox: a new tool for exporting TCGA firehose data. PLoS One 9:e106397
Lee JS, Heo J, Libbrecht L, Chu IS, Kaposi-Novak P, Calvisi DF, Mikaelyan A, Roberts LR, Demetris AJ, Sun Z, Nevens F, Roskams T, Thorgeirsson SS (2006) A novel prognostic subtype of human hepatocellular carcinoma derived from hepatic progenitor cells. Nat Med 12:410–416
Yamashita T, Forgues M, Wang W, Kim JW, Ye Q, Jia H, Budhu A, Zanetti KA, Chen Y, Qin LX, Tang ZY, Wang XW (2008) EpCAM and alpha-fetoprotein expression defines novel prognostic subtypes of hepatocellular carcinoma. Cancer Res 68:1451–1461
Ayers M, Symmans WF, Stec J, Damokosh AI, Clark E, Hess K, Lecocke M, Metivier J, Booser D, Ibrahim N, Valero V, Royce M, Arun B, Whitman G, Ross J, Sneige N, Hortobagyi GN, Pusztai L (2004) Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol 22:2284–2293
Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald N, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C (2012) Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366:883–892
Ioannidis JP (2010) Expectations, validity, and reality in omics. J Clin Epidemiol 63:945–949
Wooden B, Goossens N, Hoshida Y, Friedman SL (2017) Using big data to discover diagnostics and therapeutics for gastrointestinal and liver diseases. Gastroenterology 152:53–67 e3
Roessler S, Budhu A, Wang XW (2014) Deciphering cancer heterogeneity: the biological space. Front Cell Dev Biol 3:2–12
Itzel T, Scholz P, Maass T, Krupp M, Marquardt JU, Strand S, Becker D, Staib F, Binder H, Roessler S, Wang XW, Thorgeirsson S, Müller M, Galle PR, Teufel A (2015) Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis. Bioinformatics 31:216–224
Zhang Y, Wang S, Li D, Zhnag J, Gu D, Zhu Y, He F (2011) A systems biology-based classifier for hepatocellular carcinoma diagnosis. PLoS One 6:e22426
Consortium M, Shi L, Reid LH et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161
Acknowledgements
The authors thank Dr. Snorri Thorgeirsson, NIH/NCI, Bethesda, MD for his generous support and providing clinical parameters to the GSE4024 and GSE1898 data sets. S.R. was supported by the German Research Foundation (DFG) CRC SFB/TR 209 Liver Cancer project B01.
Author information
Authors and Affiliations
Contributions
Study concept and design: TI, RS, TM, SST, and AT; acquisition of data—public expression data, analysis, and interpretation of data: TI, RS, TM, SM, SR, MPE, ME, and AT; drafting of the manuscript: TI, RS, TM, SM, HJS, WH, ME, and AT; critical revision of the manuscript for important intellectual content: TI, RS, TM, SM, SR, SST, MPE, HJS, WH, ME, and AT; statistical analysis: RS; obtained funding: WH, ME, HJS, and AT.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Supplemental figure 1
Differences in clinical characteristics between patients with good and poor prognosis. (PNG 69 kb)
Supplemental figure 2
Unsupervised clustering based on 50 gene containing gene sets separating prognostic subgroups with high significance (p < 0.0001). (PNG 1126 kb)
Supplemental table 1
Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 56 kb)
Supplemental table 2
Summary of theoretical possible number of gene sets and percent of evaluated gene sets performing 500,000 (5 × 100,000) re-iterations. (PDF 41 kb)
Supplemental table 3
Analysis of gene sets and signaling pathways obtained from KEGG, Biocarta, Reactome, and PID as well as GO terms (Biological Product (BP), Cellular Component (CC), and Molecular Function (MF)) for prognostic capability in patients with HCC. (PDF 5756 kb)
Supplemental table 4
397 randomly chosen 50 genes containing gene sets evaluated for survival prediction with a significance level of p = 0.0001. (PDF 299 kb)
Supplemental table 5
Reduced data set containing only data from patients whose samples were assigned to either the good or poor prognosis group. Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 94 kb)
Supplemental table 6
Average clinical characteristics of patient groups with good, poor, or undetermined prognosis. (PDF 12 kb)
Supplemental table 7
Exemplary listing of 100 re-iterations of machine learning approach and results for validation of our swarm intelligence approach. Left: Heatmap of learning approach for the randomly chosen gene set. Middle: Survival analysis (Kaplan–Meier) of learning samples. Right: Survival analysis (Kaplan–Meier) of test samples. Full procedure contained 5 independent runs including 1000 re-iterations each. (PDF 24297 kb)
Rights and permissions
About this article
Cite this article
Itzel, T., Spang, R., Maass, T. et al. Random gene sets in predicting survival of patients with hepatocellular carcinoma. J Mol Med 97, 879–888 (2019). https://doi.org/10.1007/s00109-019-01764-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00109-019-01764-2