Skip to main content

Advertisement

Log in

Random gene sets in predicting survival of patients with hepatocellular carcinoma

  • Original Article
  • Published:
Journal of Molecular Medicine Aims and scope Submit manuscript

Abstract

Despite multiple publications, molecular signatures predicting the course of hepatocellular carcinoma (HCC) have not yet been integrated into clinical routine decision-making. Given the diversity of published signatures, optimal number, best combinations, and benefit of functional associations of genes in prognostic signatures remain to be defined. We investigated a vast number of randomly chosen gene sets (varying between 1 and 10,000 genes) to encompass the full range of prognostic gene sets on 242 transcriptomic profiles of patients with HCC. Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential by separating patient subgroups with significantly diverse survival. This was further substantiated by investigating gene sets and signaling pathways also resulting in a comparable high number of significantly prognostic gene sets. However, combining multiple random gene sets using “swarm intelligence” resulted in a significantly improved predictability for approximately 63% of all patients. In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival. For all other patients, a reliable prediction seems highly unlikely for any selected gene set. Using a machine learning and independent validation approach, we demonstrated a high reliability of random gene sets and swarm intelligence in HCC prognosis. Ultimately, these findings were validated in two independent patient cohorts and independent technical platforms (microarray, RNASeq). In conclusion, we demonstrate that using “swarm intelligence” of multiple gene sets for prognosis prediction may not only be superior but also more robust for predictive purposes.

Key messages

  • Molecular signatures predicting HCC have not yet been integrated into clinical routine

  • Depending on the selected size, 4.7 to 23.5% of all random gene sets exhibit prognostic potential; independent of the technical platform (microarray, RNASeq)

  • Using “swarm intelligence” resulted in a significantly improved predictability

  • In these patients, approx. 70% of all random 50-gene containing gene sets resulted in equal and stable prediction of survival

  • Overall, “swarm intelligence” is superior and more robust for predictive purposes in HCC

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Cao H, Phan H, Yang LX (2012) Improved chemotherapy for hepatocellular carcinoma. Anticancer Res 32:1379–1386

    CAS  PubMed  Google Scholar 

  2. Llovet JM, Montal R, Sia D, Finn RS. Molecular therapies and precision medicine for hepatocellular carcinoma. Nat Rev Clin Oncol 2018

  3. Teufel A, Staib F, Kanzler S, Weinmann A, Schulze-Bergkamen H, Galle PR (2007) Genetics of hepatocellular carcinoma. World J Gastroenterol 13:2271–2282

    Article  CAS  Google Scholar 

  4. Marquardt JU, Galle PR, Teufel A (2012) Molecular diagnosis and therapy of hepatocellular carcinoma (HCC): an emerging field for advanced technologies. J Hepatol 56:267–275

    Article  Google Scholar 

  5. Teufel A, Marquardt JU, Galle PR (2012) Novel insights in the genetics of HCC recurrence and advances in transcriptomic data integration. J Hepatol 56:279–281

    Article  Google Scholar 

  6. Kim K, Zakharkin SO, Allison DB (2010) Expectations, validity, and reality in gene expression profiling. J Clin Epidemiol 63:950–959

    Article  Google Scholar 

  7. Lee JS, Chu IS, Heo J, Calvisi DF, Sun Z, Roskams T, Durnez A, Demetris AJ, Thorgeirsson SS (2004a) Classification and prediction of survival in hepatocellular carcinoma by gene expression profiling. Hepatology 40:667–676

    Article  CAS  Google Scholar 

  8. Samur MK (2014) RTCGAToolbox: a new tool for exporting TCGA firehose data. PLoS One 9:e106397

    Article  Google Scholar 

  9. Lee JS, Heo J, Libbrecht L, Chu IS, Kaposi-Novak P, Calvisi DF, Mikaelyan A, Roberts LR, Demetris AJ, Sun Z, Nevens F, Roskams T, Thorgeirsson SS (2006) A novel prognostic subtype of human hepatocellular carcinoma derived from hepatic progenitor cells. Nat Med 12:410–416

    Article  CAS  Google Scholar 

  10. Yamashita T, Forgues M, Wang W, Kim JW, Ye Q, Jia H, Budhu A, Zanetti KA, Chen Y, Qin LX, Tang ZY, Wang XW (2008) EpCAM and alpha-fetoprotein expression defines novel prognostic subtypes of hepatocellular carcinoma. Cancer Res 68:1451–1461

    Article  CAS  Google Scholar 

  11. Ayers M, Symmans WF, Stec J, Damokosh AI, Clark E, Hess K, Lecocke M, Metivier J, Booser D, Ibrahim N, Valero V, Royce M, Arun B, Whitman G, Ross J, Sneige N, Hortobagyi GN, Pusztai L (2004) Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol 22:2284–2293

    Article  CAS  Google Scholar 

  12. Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, Tarpey P, Varela I, Phillimore B, Begum S, McDonald N, Butler A, Jones D, Raine K, Latimer C, Santos CR, Nohadani M, Eklund AC, Spencer-Dene B, Clark G, Pickering L, Stamp G, Gore M, Szallasi Z, Downward J, Futreal PA, Swanton C (2012) Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366:883–892

    Article  CAS  Google Scholar 

  13. Ioannidis JP (2010) Expectations, validity, and reality in omics. J Clin Epidemiol 63:945–949

    Article  Google Scholar 

  14. Wooden B, Goossens N, Hoshida Y, Friedman SL (2017) Using big data to discover diagnostics and therapeutics for gastrointestinal and liver diseases. Gastroenterology 152:53–67 e3

    Article  Google Scholar 

  15. Roessler S, Budhu A, Wang XW (2014) Deciphering cancer heterogeneity: the biological space. Front Cell Dev Biol 3:2–12

    Google Scholar 

  16. Itzel T, Scholz P, Maass T, Krupp M, Marquardt JU, Strand S, Becker D, Staib F, Binder H, Roessler S, Wang XW, Thorgeirsson S, Müller M, Galle PR, Teufel A (2015) Translating bioinformatics in oncology: guilt-by-profiling analysis and identification of KIF18B and CDCA3 as novel driver genes in carcinogenesis. Bioinformatics 31:216–224

    Article  CAS  Google Scholar 

  17. Zhang Y, Wang S, Li D, Zhnag J, Gu D, Zhu Y, He F (2011) A systems biology-based classifier for hepatocellular carcinoma diagnosis. PLoS One 6:e22426

    Article  CAS  Google Scholar 

  18. Consortium M, Shi L, Reid LH et al (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24:1151–1161

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank Dr. Snorri Thorgeirsson, NIH/NCI, Bethesda, MD for his generous support and providing clinical parameters to the GSE4024 and GSE1898 data sets. S.R. was supported by the German Research Foundation (DFG) CRC SFB/TR 209 Liver Cancer project B01.

Author information

Authors and Affiliations

Authors

Contributions

Study concept and design: TI, RS, TM, SST, and AT; acquisition of data—public expression data, analysis, and interpretation of data: TI, RS, TM, SM, SR, MPE, ME, and AT; drafting of the manuscript: TI, RS, TM, SM, HJS, WH, ME, and AT; critical revision of the manuscript for important intellectual content: TI, RS, TM, SM, SR, SST, MPE, HJS, WH, ME, and AT; statistical analysis: RS; obtained funding: WH, ME, HJS, and AT.

Corresponding author

Correspondence to Andreas Teufel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplemental figure 1

Differences in clinical characteristics between patients with good and poor prognosis. (PNG 69 kb)

Supplemental figure 2

Unsupervised clustering based on 50 gene containing gene sets separating prognostic subgroups with high significance (p < 0.0001). (PNG 1126 kb)

Supplemental table 1

Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 56 kb)

Supplemental table 2

Summary of theoretical possible number of gene sets and percent of evaluated gene sets performing 500,000 (5 × 100,000) re-iterations. (PDF 41 kb)

Supplemental table 3

Analysis of gene sets and signaling pathways obtained from KEGG, Biocarta, Reactome, and PID as well as GO terms (Biological Product (BP), Cellular Component (CC), and Molecular Function (MF)) for prognostic capability in patients with HCC. (PDF 5756 kb)

Supplemental table 4

397 randomly chosen 50 genes containing gene sets evaluated for survival prediction with a significance level of p = 0.0001. (PDF 299 kb)

Supplemental table 5

Reduced data set containing only data from patients whose samples were assigned to either the good or poor prognosis group. Randomly chosen gene expression gene sets ranging from 1 to 10,000 genes (compare Fig. 1) were investigated for prognostic capability in patients with HCC. Fivefold re-iteration demonstrated stable results. (PDF 94 kb)

Supplemental table 6

Average clinical characteristics of patient groups with good, poor, or undetermined prognosis. (PDF 12 kb)

Supplemental table 7

Exemplary listing of 100 re-iterations of machine learning approach and results for validation of our swarm intelligence approach. Left: Heatmap of learning approach for the randomly chosen gene set. Middle: Survival analysis (Kaplan–Meier) of learning samples. Right: Survival analysis (Kaplan–Meier) of test samples. Full procedure contained 5 independent runs including 1000 re-iterations each. (PDF 24297 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Itzel, T., Spang, R., Maass, T. et al. Random gene sets in predicting survival of patients with hepatocellular carcinoma. J Mol Med 97, 879–888 (2019). https://doi.org/10.1007/s00109-019-01764-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00109-019-01764-2

Keywords

Navigation