Abstract
The identification of the cancer driver genes is essential for personalized therapy. The mutation frequency of most driver genes is in the middle (2–20%) or even lower range, which makes it difficult to find the driver genes with low-frequency mutations. Other forms of genomic aberrations, such as copy number variations (CNVs) and epigenetic changes, may also reflect cancer progression. In this work, a method for identifying the potential cancer driver genes (iPDG) based on molecular data integration is proposed. DNA copy number variation, somatic mutation, and gene expression data of matched cancer samples are integrated. In combination with the method of iKEEG, the "key genes" of cancer are identified, and the change in their expression levels is used for auxiliary evaluation of whether the mutated genes are potential drivers. For a mutated gene, the concept of mutational effect is defined, which takes into account the effects of copy number variation, mutation gene itself, and its neighbor genes. The method mainly includes two steps: the first step is data preprocessing. First, DNA copy number variation and somatic mutation data are integrated. Then, the integrated data are mapped to a given interaction network, and the diffusion kernel is used to form the mutation effect matrix. The second step is to obtain the key genes by using the iKGGE method, and construct the connection matrix by means of the gene expression data of the key genes and mutation impact matrix of the mutated genes. Experiments on TCGA breast cancer and Glioblastoma multiforme datasets demonstrate that iPDG is effective not only to identify the known cancer driver genes but also to discover the rare potential driver genes. When measured by functional enrichment analysis, we find that these genes are clearly associated with these two types of cancers.
Similar content being viewed by others
References
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249
Akavia UD, Litvin O, Kim J, Sanchez-Garcia F, Kotliar D, Causton HC, Pochanard P, Mozes E, Garraway LA, Pe'er D (2010) An integrated approach to uncover drivers of cancer. Cell 143(6):1005–1017
Amgalan B, Lee H (2015) DEOD: uncovering dominant effects of cancer-driver genes based on a partial covariance selection method. Bioinformatics 31(15):2452–2460
An O, Dall'Olio GM, Mourikis TP, Ciccarelli FD (2016) NCG 5.0: updates of a manually curated repository of cancer genes and associated properties from cancer mutational screenings. Nucleic Acids Res 44(D1):D992–D999.
Babaei S, Hulsman M, Reinders M, de Ridder J (2013) Detecting recurrent gene mutation in interaction network context using multi-scale graph diffusion. Bmc Bioinf 14:29.
Bachman KE, Argani P, Samuels Y, Silliman N, Ptak J, Szabo S, Konishi H, Karakas B, Blair BG, Lin C et al (2004) The PIK3CA gene is mutated with high frequency in human breast cancers. Cancer Biol Ther 3(8):772–775
Banerji S, Cibulskis K, Rangel-Escareno C, Brown KK, Carter SL, Frederick AM, Lawrence MS, Sivachenko AY, Sougnez C, Zou LH et al (2012) Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486(7403):405–409
Bashashati A, Haffari G, Ding JR, Ha G, Lui K, Rosner J, Huntsman DG, Caldas C, Aparicio SA, Shah SP (2012) DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol 13(12):R124.
Bertrand D, Chng KR, Sherbaf FG, Kiesel A, Chia BKH, Sia YY, Huang SK, Hoon DSB, Liu ET, Hillmer A et al (2015) Patient-specific driver gene prediction and risk assessment through integrated network analysis of cancer omics profiles. Nucleic Acids Res 43(7):e44.
Cervigne NK, Machado J, Goswami RS, Sadikovic B, Bradley G, Perez-Ordonez B, Galloni NN, Gilbert R, Gullane P, Irish JC et al (2014) Recurrent genomic alterations in sequential progressive leukoplakia and oral cancer: drivers of oral tumorigenesis? Hum Mol Genet 23(10):2618–2628
Cheng FX, Zhao JF, Zhao ZM (2016) Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 17(4):642–656
Chin L, Meyerson M, Aldape K, Bigner D, Mikkelsen T, VandenBerg S, Kahn A, Penny R, Ferguson ML, Gerhard DS et al (2008) Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455(7216):1061–1068
Cho A, Shim JE, Kim E, Supek F, Lehner B, Lee I (2016) MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol 17:129.
Cizkova M, Vacher S, Meseure D, Trassard M, Susini A, Mlcuchova D, Callens C, Rouleau E, Spyratos F, Lidereau R, Bièche I (2013) PIK3R1 underexpression is an independent prognostic marker in breast cancer. BMC Cancer 13:545.
Dees ND, Zhang QY, Kandoth C, Wendl MC, Schierding W, Koboldt DC, Mooney TB, Callaway MB, Dooling D, Mardis ER et al (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res 22(8):1589–1598
Ding PJ, Luo JW, Liang C, Xiao Q, Cao BW (2018) Human disease MiRNA inference by combining target information based on heterogeneous manifolds. J Biomed Inform 80:26–36
Estival A, Pineda E, Martinez-Garcia M, Marruecos J, Mesia C, Lucas A, Macia M, Gil M, Gallego O, Verger E et al (2016) MGMT methylated (Met) patients (p) with glioblastoma (GBM) have a better prognosis with an earlier response (ER) than those who have a late response or pseudoprogression (LR/PsP). Results of the Gliocat study. Ann Oncol 27:338.
Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR (2004) A census of human cancer genes. Nat Rev Cancer 4(3):177–183
Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Edkins S et al (2007) Patterns of somatic mutation in human cancer genomes. Nature 446(7132):153–158
Haber DA, Settleman J (2007) Cancer—drivers and passengers. Nature 446(7132):145–146
Hofree M, Shen JP, Carter H, Gross A, Ideker T (2013) Network-based stratification of tumor mutations. Nat Methods 10(11):1108–1115
Hou JP, Ma J (2014) DawnRank: discovering personalized driver genes in cancer. Genome Med 6:56.
Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57
Hudson TJ, Anderson W, Aretz A, Barker AD, Bell C, Bernabe RR, Bhan MK, Calvo F, Eerola I, Gerhard DS et al (2010) International network of cancer genome projects. Nature 464(7291):993–998
Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager M, Hankinson SE, Wacholder S, Wang ZM, Welch R, Hutchinson A et al (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39(7):870–874
Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, Sauter G, Monni O, Elkahloun A et al (2002) Impact of DNA amplification on gene expression patterns in breast cancer. Can Res 62(21):6240–6245
Inthal A, Zeitlhofer P, Zeginigg M, Morak M, Grausenburger R, Fronkova E, Fahrner B, Mann G, Haas OA, Panzer-Grümayer R (2012) CREBBP HAT domain mutations prevail in relapse cases of high hyperdiploid childhood acute lymphoblastic leukemia. Leukemia 26(8):1797–1803.
Jia PL, Zhao ZM (2014) VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data. PLoS Computl Biol 10(2):e1003460
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44(D1):D457–D462
Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006
Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete structures. In: Icml. pp 315–322.
Kumar R, Neilsen PM, Crawford J, McKirdy R, Lee J, Powell JA, Saif Z, Martin JM, Lombaerts M, Cornelisse CJ et al (2005) FBXO31 is the chromosome 16q24.3 senescence gene, a candidate breast tumor suppressor, and a component of an SCF complex. Cancer Res 65(24):11304–1313.
Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA et al (2013) Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457):214–218
Leiserson MD, Vandin F, Wu HT, Dobson JR, Raphael BR (2014) Pan-cancer identification of mutated pathways and protein complexes. Cancer Res 74(19):5324.
Liao B, Jiang Y, Liang W, Zhy W, Cai L, Cao Z (2014) Gene selection using locality sensitive laplacian score. IEEE/ACM Trans Comput Biol Bioinform 11(6):1146–1156.
Liu JL, Liu TJ, Aldape KD, Mao ZY, LaFortune TA, Yung WKA (2006) Nuclear PTEN as a potential therapeutic molecule in GBM. Neuro-Oncology 8(4):398–399
Lu X, Li X, Liu P, Qian X, Miao Q, Peng S (2018) The integrative method based on the module-network for identifying driver genes in cancer subtypes. Molecules 23(2):183
Lu X, Qian X, Li X, Miao Q, Peng S (2019) DMCM: a data-adaptive mutation clustering method to identify cancer-related mutation clusters. Bioinformatics 35(3):389–397.
Mansour WY, Tennstedt P, Volquardsen J, Oing C, Kluth M, Hube-Magg C, Borgmann K, Simon R, Petersen C, Dikomey E et al (2018) Loss of PTEN-assisted G2/M checkpoint impedes homologous recombination repair and enhances radio-curability and PARP inhibitor treatment response in prostate cancer. Sci Rep 8:3947.
Mearini L (2017) Frequency and prognostic value of PTEN loss in patients with upper tract urothelial carcinoma treated with radical nephroureterectomy EDITORIAL COMMENT. J Urol 198(6):1277–1278
Network CGAR (2012) Comprehensive genomic characterization of squamous cell lung cancers The Cancer Genome Atlas Research Network (vol 489, pg 519, 2012). Nature 491(7423):288–288
Ng S, Collisson EA, Sokolov A, Goldstein T, Gonzalez-Perez A, Lopez-Bigas N, Benz C, Haussler D, Stuart JM (2012) PARADIGM-SHIFT predicts the function of mutations in multiple cancers using pathway impact analysis. Bioinformatics 28(18):I640–I646
Page K, Wiszniewska J, Basehore M, Watral M, Eng C, Gururangan S (2007) Rhabdomyosarcoma (RMS) of extremity and cerebral glioblastoma multiforme (GBM) in a child with Li-fraumeni syndrome and germline TP53 splice mutation. Neuro-Oncology 9(4):544–544
Pirooznia M, Goes FS, Zandi PP (2015) Whole-genome CNV analysis: advances in computational approaches. Front Genet 6:138.
Qiao N, Huang Y, Naveed H, Green CD, Han JDJ (2013) CoCiter: an efficient tool to infer gene function by assessing the significance of literature co-citation. PloS ONE 8(9):e74074.
Ramadoss A, Leu S, Ritz MF, Schaefer T, Tintignac L, Tostado C, Frank S, Mariani L, Boulay JL (2016) Act locally: the 3q26 genes SOX2, PIK3CA, MFN1 and OPA1 co-regulate GBM cell invasion. Neuro-Oncology 18:74–74
Raphael BJ, Dobson JR, Oesper L, Vandin F (2014) Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine. Genome Med 6:5
Rozenchan PB, Mundim FG, Roela RA, Katayama ML, Pasini FS, Brentani H, Lyra EC, Folgueira MAAK, Brentani MM (2014) RHOA, RAC1 and PAK1 evaluation in paired stromal fibroblasts of breast cancer primary and of lymph node metastasis: Importance of these biomarkers in lymph node invasion. Cancer Res 74(19).
Santra MK, Wajapeyee N, Green MR (2009) F-box protein FBXO31 mediates cyclin D1 degradation to induce G1 arrest after DNA damage. Nature 459(7247):722–725.
Shi K, Gao L, Wang BB (2016) Discovering potential cancer driver genes by an integrated network-based approach. Mol BioSyst 12(9):2921–2931
Stratton MR, Campbell PJ, Futreal PA (2009) The cancer genome. Nature 458(7239):719–724
Suo C, Hrydziuszko O, Lee D, Pramana S, Saputra D, Joshi H, Calza S, Pawitan Y (2015) Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival. Bioinformatics 31(16):2607–2613
Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39(Database issue):561–568.
Vandin F, Upfal E, Raphael BJ (2011) Algorithms for detecting significantly mutated pathways in cancer. J Comput Biol 18(3):507–522
Vogelstein B, Papadopoulos N, Velculescu VE, Zhou SB, Diaz LA, Kinzler KW (2013) Cancer genome landscapes. Science 339(6127):1546–1558
Wei PJ, Zhang D, Xia JF, Zheng CH (2016) LNDriver: identifying driver genes by integrating mutation and expression data based on gene-gene interaction network. Bmc Bioinf 2016, 17:467.
Wu LL, Wang YZ, Liu Y, Yu SY, Xie H, Shi XJ, Qin S, Ma F, Tan TZ, Thiery JP et al (2014) A central role for TRPS1 in the control of cell cycle and cancer development. Oncotarget 5(17):7677–7690
Xi JN, Wang MH, Li A (2017) Discovering potential driver genes through an integrated model of somatic mutation profiles and gene functional information. Mol BioSyst 13(10):2135–2144
Xiao Q, Luo JW, Liang C, Cai J, Ding PJ (2018) A graph regularized non-negative matrix factorization method for identifying microRNA-disease associations. Bioinformatics 34(2):239–248
Yi SH, Park JHY (2004) Down-regulation of ErbB2 and ErbB3 levels by curcumin in MCF-7 human breast cancer cells. Faseb J 18(4):A126–A126
Youn A, Simon R (2011) Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics 27(2):175–181
Zhang W, Wang S (2017) An integrated framework for identifying mutated driver pathway and cancer progression. IEEE/ACM Trans Comput Biol Bioinf 1–1.
Zhang W, Wang SL (2018) An efficient strategy for identifying cancer-related key genes based on graph entropy. Comput Biol Chem 74:142–148
Zhao JF, Zhang SH, Wu LY, Zhang XS (2012) Efficient methods for identifying mutated driver pathways in cancer. Bioinformatics 28(22):2940–2947
Zheng CH, Zhang L, Ng VTY, Shiu SCK, Huang DS (2011) Molecular pattern discovery based on penalized matrix decomposition. Ieee Acm T Comput Bi 8(6):1592–1603
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant Nos. 61672011, 61472467 and 61471169), and the Collaboration and Innovation Center for Digital Chinese Medicine of 2011 Project of Colleges and Universities in Hunan Province.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhang, W., Wang, SL. A Novel Method for Identifying the Potential Cancer Driver Genes Based on Molecular Data Integration. Biochem Genet 58, 16–39 (2020). https://doi.org/10.1007/s10528-019-09924-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10528-019-09924-2