Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genetics and Genomics

wMKL: multi-omics data integration enables novel cancer subtype identification via weight-boosted multi-kernel learning

Abstract

Background

Cancer is a heterogeneous disease driven by complex molecular alterations. Cancer subtypes determined from multi-omics data can provide novel insight into personalised precision treatment. It is recognised that incorporating prior weight knowledge into multi-omics data integration can improve disease subtyping.

Methods

We develop a weighted method, termed weight-boosted Multi-Kernel Learning (wMKL) which incorporates heterogeneous data types as well as flexible weight functions, to boost subtype identification. Given a series of weight functions, we propose an omnibus combination strategy to integrate different weight-related P-values to improve subtyping precision.

Results

wMKL models each data type with multiple kernel choices, thus alleviating the sensitivity and robustness issue due to selecting kernel parameters. Furthermore, wMKL integrates different data types by learning weights of different kernels derived from each data type, recognising the heterogeneous contribution of different data types to the final subtyping performance. The proposed wMKL outperforms existing weighted and non-weighted methods. The utility and advantage of wMKL are illustrated through extensive simulations and applications to two TCGA datasets. Novel subtypes are identified followed by extensive downstream bioinformatics analysis to understand the molecular mechanisms differentiating different subtypes.

Conclusions

The proposed wMKL method provides a novel strategy for disease subtyping. The wMKL is freely available at https://github.com/biostatcao/wMKL.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Flowchart of wMKL illustrated with three omics data types, namely miRNA, mRNA and DNA methylation.
Fig. 2: Clustering results of PRCC.
Fig. 3: Clustering results of LUAD.
Fig. 4: Correlation analysis of differential features between miRNA and mRNA and between methylation and mRNA for PRCC.

Similar content being viewed by others

Data availability

The TCGA data analysed in this study can be accessed through the Genomic Data Commons Data Portal (http://cancergenome.nih.gov/). The wMKL is implemented in the R package wMKL, freely available on GitHub (https://github.com/biostatcao/wMKL).

References

  1. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46:10546–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Wu D, Wang D, Zhang MQ, Gu J. Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genom. 2015;16:1022.

    Article  Google Scholar 

  3. Nguyen T, Tagett R, Diaz D, Draghici S. A novel approach for data integration and disease subtyping. Genome Res. 2017;27:2025–39.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Method. 2014;11:333.

    Article  CAS  Google Scholar 

  5. Speicher NK, Pfeifer N. Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics. 2015;31:i268–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Röder B, Kersten N, Herr M, Speicher NK, Pfeifer N. web-rMKL: a web server for dimensionality reduction and sample clustering of multi-view data based on unsupervised multiple kernel learning. Nucleic Acids Res. 2019;47:W605–09.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann Appl Stat. 2013;7:523.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  8. Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25:2906–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ramazzotti D, Lal A, Wang B, Batzoglou S, Sidow A. Multi-omic tumor data reveal diversity of molecular mechanisms that correlate with survival. Nat Commun. 2018;9:1–14.

    Article  CAS  Google Scholar 

  10. Xu T, Le TD, Liu L, Wang R, Sun B, Li J. Identifying cancer subtypes from miRNA-tf-mRNA regulatory networks and expression data. PLoS ONE. 2016;11:e0152792.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Ruan P, Wang Y, Shen R, Wang S. Using association signal annotations to boost similarity network fusion. Bioinformatics. 2019;35:3718–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Coretto P, Serra A, Tagliaferri R. Robust clustering of noisy high-dimensional gene expression data for patients subtyping. Bioinformatics. 2018;34:4064–72.

    Article  CAS  PubMed  Google Scholar 

  13. Arora A, Olshen AB, Seshan VE, Shen R. Pan-cancer identification of clinically relevant genomic subtypes using outcome-weighted integrative clustering. Genome Med. 2020;12:1–13.

    Article  Google Scholar 

  14. Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Method. 2017;14:414–6.

    Article  CAS  Google Scholar 

  15. Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc. 2020;115:393–402.

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  16. Ng A, Jordan M, Weiss Y. On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst. 2002;14:849–56.

    Google Scholar 

  17. Von Luxburg U. A tutorial on spectral clustering. Stat Comput. 2007;17:395–416.

    Article  MathSciNet  Google Scholar 

  18. Xu A, Chen J, Peng H, Han G, Cai H. Simultaneous interrogation of cancer omics to identify subtypes with significant clinical differences. Front Genet. 2019;10:236.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Shi Q, Zhang C, Peng M, Yu X, Zeng T, Liu J, et al. Pattern fusion analysis by adaptive alignment of multiple heterogeneous omics data. Bioinformatics. 2017;33:2706–14.

    Article  CAS  PubMed  Google Scholar 

  20. Conway K, Edmiston SN, Tse CK, Bryant C, Kuan PF, Hair BY, et al. Racial variation in breast tumor promoter methylation in the Carolina Breast Cancer Study. Cancer Epidemiol Prev Biomark. 2015;24:921–30.

    Article  CAS  Google Scholar 

  21. Shimomura A, Shiino S, Kawauchi J, Takizawa S, Sakamoto H, Matsuzaki J, et al. Novel combination of serum microRNA for detecting breast cancer in the early stage. Cancer Sci. 2016;107:326–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nakagawa T, Kollmeyer TM, Morlan BW, Anderson SK, Bergstralh EJ, Davis BJ, et al. A tissue biomarker panel predicting systemic progression after PSA recurrence post-definitive prostate cancer therapy. PLoS ONE. 2008;3:e2318.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  23. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res. 2016;44:e71.

    Article  PubMed  Google Scholar 

  24. Gusev A, Lee SH, Trynka G, Finucane H, Vilhjálmsson BJ, Xu H, et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am J Hum Genet. 2014;95:535–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, et al. Missing value estimation methods for DNA microarrays. Bioinformatics. 2001;17:520–5.

    Article  CAS  PubMed  Google Scholar 

  26. Yang H, Cao H, He T, Wang T, Cui Y. Multilevel heterogeneous omics data integration with kernel fusion. Brief Bioinformatics. 2020;21:156–70.

    PubMed  Google Scholar 

  27. Witten DM, Tibshirani R, Hastie T. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 2009;10:515–34.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Oliveros JC (2007–2015). Venny. An interactive tool for comparing lists with Venn’s diagrams, https://bioinfogp.cnb.csic.es/tools/venny/index.html. 2007–2015

  29. Huang HY, Lin YC, Li J, Huang KY, Shrestha S, Hong HC, et al. miRTarBase 2020: updates to the experimentally validated microRNA–target interaction database. Nucleic Acids Res. 2020;48:D148–54.

    CAS  PubMed  Google Scholar 

  30. Schubert M, Klinger B, Klünemann M, Sieber A, Uhlitz F, Sauer S, et al. Perturbation-response genes reveal signaling footprints in cancer gene expression. Nat Commun. 2018;9:20.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  31. Li T, Fu J, Zeng Z, Cohen D, Li J, Chen Q, et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res. 2020;48:W509–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Xie C, Mao X, Huang J, Ding Y, Wu J, Dong S, et al. KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res. 2011;39:W316–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med. 2016;374:135–45.

    Article  Google Scholar 

  34. Krawczyk KM, Nilsson H, Allaoui R, Lindgren D, Arvidsson M, Leandersson K, et al. Papillary renal cell carcinoma-derived chemerin, IL-8, and CXCL16 promote monocyte recruitment and differentiation into foam-cell macrophages. Lab Investig. 2017;97:1296–305.

    Article  CAS  PubMed  Google Scholar 

  35. Singh NP, Vinod P. Integrative analysis of DNA methylation and gene expression in papillary renal cell carcinoma. Mol Genet Genom. 2020;295:807–24.

    Article  CAS  Google Scholar 

  36. Khella H, Bakhet M, Allo G, Jewett M, Girgis A, Latif A, et al. miR-192, miR-194 and miR-215: a convergent microRNA network suppressing tumor progression in renal cell carcinoma. Carcinogenesis. 2013;34:2231–9.

    Article  CAS  PubMed  Google Scholar 

  37. Chen SC, Chen FW, Hsu YL, Kuo PL. Systematic analysis of transcriptomic profile of renal cell carcinoma under long-term hypoxia using next-generation sequencing and bioinformatics. Int J Mol Sci. 2017;18:2657.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Huang D, Ding Y, Luo WM, Bender S, Qian CN, Kort E, et al. Inhibition of MAPK kinase signaling pathways suppressed renal cell carcinoma growth and angiogenesis in vivo. Cancer Res. 2008;68:81–8.

    Article  CAS  PubMed  Google Scholar 

  39. Courthod G, Tucci M, Di Maio M, Scagliotti GV. Papillary renal cell carcinoma: a review of the current therapeutic landscape. Crit Rev Oncol/Hematol. 2015;96:100–12.

    Article  PubMed  Google Scholar 

  40. Twardowski PW, Mack PC, Lara PN Jr. Papillary renal cell carcinoma: current progress and future directions. Clin Genitourin Cancer. 2014;12:74–9.

    Article  PubMed  Google Scholar 

  41. Mizutani Y, Nakanishi H, Yoshida O, Fukushima M, Bonavida B, Miki T. Potentiation of the sensitivity of renal cell carcinoma cells to TRAIL-mediated apoptosis by subtoxic concentrations of 5-fluorouracil. Eur J Cancer. 2002;38:167–76.

    Article  CAS  PubMed  Google Scholar 

  42. Thorburn A. Tumor necrosis factor-related apoptosis-inducing ligand (TRAIL) pathway signaling. J Thorac Oncol. 2007;2:461–5.

    Article  PubMed  Google Scholar 

  43. Braun DA, Hou Y, Bakouny Z, Ficial M, Sant’Angelo M, Forman J, et al. Interplay of somatic alterations and immune infiltration modulates response to PD-1 blockade in advanced clear cell renal cell carcinoma. Nat Med. 2020;26:909–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Wang M, Zhang C, Song Y, Wang Z, Wang Y, Luo F, et al. Mechanism of immune evasion in breast cancer. Onco Targets Ther. 2017;10:1561.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Feng H, Zhang Y, Liu K, Zhu Y, Yang Z, Zhang X, et al. Intrinsic gene changes determine the successful establishment of stable renal cancer cell lines from tumor tissue. Int J Cancer. 2017;140:2526–34.

    Article  CAS  PubMed  Google Scholar 

  46. Cao Y, Hoeppner LH, Bach S, Guangqi E, Guo Y, Wang E, et al. Neuropilin-2 promotes extravasation and metastasis by interacting with endothelial α5 integrin. Cancer Res. 2013;73:4579–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Braga EA, Fridman MV, Loginov VI, Dmitriev AA, Morozov SG. Molecular mechanisms in clear cell renal cell carcinoma: role of miRNAs and hypermethylated miRNA genes in crucial oncogenic pathways and processes. Front Genet. 2019;10:320.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Drake RR, McDowell C, West C, David F, Powers TW, Nowling T, et al. Defining the human kidney N‐glycome in normal and cancer tissues using MALDI imaging mass spectrometry. J Mass Spectrom. 2020;55:e4490.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J Clin. 2021;71:209–49.

    Article  Google Scholar 

  50. Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511:543–50.

    Article  ADS  Google Scholar 

  51. Wilkerson MD, Yin X, Walter V, Zhao N, Cabanski CR, Hayward MC, et al. Differential pathogenesis of lung adenocarcinoma subtypes involving sequence mutations, copy number, chromosomal instability, and methylation. PLoS ONE. 2012;7:e36530.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  52. Denisenko TV, Budkevich IN, Zhivotovsky B. Cell death-based treatment of lung adenocarcinoma. Cell Death Dis. 2018;9:117.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Xu F, Chen J, Yang X, Hong X, Li Z, Lin L, et al. Analysis of lung adenocarcinoma subtypes based on immune signatures identifies clinical implications for cancer therapy. Mol Ther Oncolytics. 2020;17:241–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yan J, Ma C, Gao Y. MicroRNA-30a-5p suppresses epithelial-mesenchymal transition by targeting profilin-2 in high invasive non-small cell lung cancer cell lines. Oncol Rep. 2017;37:3146–54.

    Article  CAS  PubMed  Google Scholar 

  55. Fruman DA, Chiu H, Hopkins BD, Bagrodia S, Cantley LC, Abraham RT. The PI3K pathway in human disease. Cell. 2017;170:605–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Wang JB, Huang X, Li FR. Impaired dendritic cell functions in lung cancer: a review of recent advances and future perspectives. Cancer Commun. 2019;39:43.

    Article  Google Scholar 

  57. Gao X, Zhang Y, Saum KU, Schöttker B, Breitling LP, Brenner H. Tobacco smoking and smoking-related DNA methylation are associated with the development of frailty among older adults. Epigenetics. 2017;12:149–56.

    Article  PubMed  Google Scholar 

  58. Cho WC, Chow AS, Au JS. MiR-145 inhibits cell proliferation of human lung adenocarcinoma by targeting EGFR and NUDT1. RNA Biol. 2011;8:125–31.

    Article  CAS  PubMed  Google Scholar 

  59. Flamini V, Dudley E, Jiang WG, Cui Y. Distinct mechanisms by which two forms of miR-140 suppress the malignant properties of lung cancer cells. Oncotarget. 2018;9:36474–91.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Han LK, Huai QL, Guo W, Song P, Kong DM, Gao SG, et al. Identification of prognostic genes in lung adenocarcinoma immune microenvironment. Chin Med J. 2021;134:2125–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Lin N, Yao Z, Xu M, Chen J, Lu Y, Yuan L, et al. Long noncoding RNA MALAT1 potentiates growth and inhibits senescence by antagonizing ABI3BP in gallbladder cancer cells. J Exp Clin Cancer Res. 2019;38:244.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Morello V, Cabodi S, Sigismund S, Camacho-Leal M, Repetto D, Volante M, et al. β1 integrin controls EGFR signaling and tumorigenic properties of lung cancer cells. Oncogene. 2011;30:4087–96.

    Article  CAS  PubMed  Google Scholar 

  63. Pasternack H, Kuempers C, Deng M, Watermann I, Olchers T, Kuehnel M, et al. Identification of molecular signatures associated with early relapse after complete resection of lung adenocarcinomas. Sci Rep. 2021;11:9532.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  64. Wu J, Zhou J, Xu Q, Foley R, Guo J, Zhang X, et al. Identification of key genes driving tumor associated macrophage migration and polarization based on immune fingerprints of lung adenocarcinoma. Front Cell Dev Biol. 2021;9:751800.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Zhao Z, He B, Cai Q, Zhang P, Peng X, Zhang Y, et al. Combination of tumor mutation burden and immune infiltrates for the prognosis of lung adenocarcinoma. Int Immunopharmacol. 2021;98:107807.

    Article  CAS  PubMed  Google Scholar 

  66. Luo J, Liu Z. Long non-coding RNA TTN-AS1 promotes the progression of lung adenocarcinoma by regulating PTEN/PI3K/AKT signaling pathway. Biochem Biophys Res Commun. 2019;514:140–7.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to an associate editor and three anonymous reviewers for their valuable insights and constructive feedback, which significantly contributed to the enhancement of this paper.

Funding

HC was supported by the National Natural Science Foundation of China (71403156), the Fundamental Research Programme of Shanxi Province (202303021211130), China Scholarship Council (No.201908140151), Startup Foundation for Doctors of Shanxi Medical University (BS201722). HY was supported by the National Natural Science Foundation of China (81872717). YC was supported by the Michigan State University.

Author information

Authors and Affiliations

Authors

Contributions

HC and YC designed the study; HC performed simulations and real data analysis with assistance from CJ, ZL, HY, YZ and YC; HC developed the software tool with assistance from RF; HC and YC wrote the manuscript with input from all other authors. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Yuehua Cui.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics approval and consent to participate

Not applicable.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, H., Jia, C., Li, Z. et al. wMKL: multi-omics data integration enables novel cancer subtype identification via weight-boosted multi-kernel learning. Br J Cancer 130, 1001–1012 (2024). https://doi.org/10.1038/s41416-024-02587-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41416-024-02587-w

Search

Quick links