Skip to main content

Mining Massive Genomic Data for Therapeutic Biomarker Discovery in Cancer: Resources, Tools, and Algorithms

  • Chapter
  • First Online:
Big Data Analytics in Genomics
  • 2836 Accesses

Abstract

Cancer research is experiencing an evolution empowered by high-throughput technologies that makes it possible to collect molecular information for the entire genome at the DNA, RNA, protein, and epigenetic levels. Due to the complex nature of cancer, several organizations have launched comprehensive molecular profiling for thousands of cancer patients using multiple high-throughput technologies to investigate cancer genomics, transcriptomics, proteomics, and epigenomics. To speed up the bench-to-bedside translation, additional efforts have been made to profile hundreds of preclinical cell line models coupled with systematic screening of anticancer agents. This leads to an explosion of massive genomic data that shifts the bottleneck from data generation to data analytics. In this chapter, we will first introduce different types of genomic data as well as resources from publicly accessible data repositories that can be utilized to search for therapeutic targets for cancer treatment. We then introduce software tools frequently used for genomic data mining. Finally, we summarize working algorithms for the discovery of therapeutic biomarkers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hanahan, D. and R.A. Weinberg, The hallmarks of cancer. cell, 2000. 100(1): p. 57–70.

    Google Scholar 

  2. Davies, H., et al., Mutations of the BRAF gene in human cancer. Nature, 2002. 417(6892): p. 949–954.

    Google Scholar 

  3. Samuels, Y., et al., High frequency of mutations of the PIK3CA gene in human cancers. Science, 2004. 304(5670): p. 554–554.

    Google Scholar 

  4. Lynch, T.J., et al., Activating mutations in the epidermal growth factor receptor underlying responsiveness of non–small-cell lung cancer to gefitinib. New England Journal of Medicine, 2004. 350(21): p. 2129–2139.

    Google Scholar 

  5. Paez, J.G., et al., EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science, 2004. 304(5676): p. 1497–1500.

    Google Scholar 

  6. Pao, W., et al., EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proceedings of the National Academy of Sciences of the United States of America, 2004. 101(36): p. 13306–13311.

    Google Scholar 

  7. Weiss, R. NIH Launches Cancer Genome Project. 2005; Available from: http://www.washingtonpost.com/wp-dyn/content/article/2005/12/13/AR2005121301667.html.

  8. Hudson, T.J., et al., International network of cancer genome projects. Nature, 2010. 464(7291): p. 993–998.

    Google Scholar 

  9. Barretina, J., et al., The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 2012. 483(7391): p. 603–607.

    Google Scholar 

  10. Rees, M.G., et al., Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nature chemical biology, 2015.

    Google Scholar 

  11. Shoemaker, R.H., The NCI60 human tumour cell line anticancer drug screen. Nature Reviews Cancer, 2006. 6(10): p. 813–823.

    Google Scholar 

  12. Yang, W., et al., Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research, 2013. 41(D1): p. D955–D961.

    Google Scholar 

  13. Ding, L., et al., Expanding the computational toolbox for mining cancer genomes. Nature Reviews Genetics, 2014. 15(8): p. 556–570.

    Google Scholar 

  14. Colburn, W., et al., Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework. Biomarkers Definitions Working Group. Clinical Pharmacol & Therapeutics, 2001. 69: p. 89–95.

    Google Scholar 

  15. Frank, R. and R. Hargreaves, Clinical biomarkers in drug discovery and development. Nature Reviews Drug Discovery, 2003. 2(7): p. 566–580.

    Google Scholar 

  16. Liang, M.H., et al., Methodologic issues in the validation of putative biomarkers and surrogate endpoints in treatment evaluation for systemic lupus erythematosus. Endocrine, metabolic & immune disorders drug targets, 2009. 9(1): p. 108.

    Google Scholar 

  17. Leary, R.J., et al., Development of personalized tumor biomarkers using massively parallel sequencing. Science translational medicine, 2010. 2(20): p. 20ra14–20ra14.

    Google Scholar 

  18. Ji, Y., et al., Glycine and a Glycine Dehydrogenase (GLDC) SNP as Citalopram/Escitalopram Response Biomarkers in Depression: Pharmacometabolomics-Informed Pharmacogenomics. Clinical Pharmacology & Therapeutics, 2011. 89(1): p. 97–104.

    Google Scholar 

  19. CHEN, H.Y., et al., Biomarkers and transcriptome profiling of lung cancer. Respirology, 2012. 17(4): p. 620–626.

    Google Scholar 

  20. Zhao, L., et al., Identification of candidate biomarkers of therapeutic response to docetaxel by proteomic profiling. Cancer research, 2009. 69(19): p. 7696–7703.

    Google Scholar 

  21. Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009. 10(1): p. 57–63.

    Google Scholar 

  22. Pritchard, C.C., H.H. Cheng, and M. Tewari, MicroRNA profiling: approaches and considerations. Nature Reviews Genetics, 2012. 13(5): p. 358–369.

    Google Scholar 

  23. Wright, P., et al., A review of current proteomics technologies with a survey on their widespread use in reproductive biology investigations. Theriogenology, 2012. 77(4): p. 738–765. e52.

    Google Scholar 

  24. Mueller, C., L.A. Liotta, and V. Espina, Reverse phase protein microarrays advance to use in clinical trials. Molecular oncology, 2010. 4(6): p. 461–481.

    Google Scholar 

  25. Strahl, B.D. and C.D. Allis, The language of covalent histone modifications. Nature, 2000. 403(6765): p. 41–45.

    Google Scholar 

  26. Lund, A.H. and M. van Lohuizen, Epigenetics and cancer. Genes & development, 2004. 18(19): p. 2315–2335.

    Google Scholar 

  27. Zuo, T., et al., Methods in DNA methylation profiling. Epigenomics, 2009. 1(2): p. 331–345.

    Google Scholar 

  28. Soon, W.W., M. Hariharan, and M.P. Snyder, High-throughput sequencing for biology and medicine. Molecular systems biology, 2013. 9(1): p. 640.

    Google Scholar 

  29. Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic acids research, 2007. 35(suppl 1): p. D760–D765.

    Google Scholar 

  30. Barrett, T. and R. Edgar, Gene Expression Omnibus: Microarray Data Storage, Submission, Retrieval, and Analysis. Methods in enzymology, 2006. 411: p. 352–369.

    Google Scholar 

  31. Barrett, T., et al., NCBI GEO: archive for functional genomics data sets—update. Nucleic acids research, 2013. 41(D1): p. D991–D995.

    Google Scholar 

  32. Wilhite, S.E. and T. Barrett, Strategies to explore functional genomics data sets in NCBI’s GEO database, in Next Generation Microarray Bioinformatics. 2012, Springer. p. 41–53.

    Google Scholar 

  33. Davis, S. and P.S. Meltzer, GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007. 23(14): p. 1846–1847.

    Google Scholar 

  34. Kauffmann, A., et al., Importing arrayexpress datasets into r/bioconductor. Bioinformatics, 2009. 25(16): p. 2092–2094.

    Google Scholar 

  35. Wu, L., et al., Multidrug-resistant phenotype of disease-oriented panels of human tumor cell lines used for anticancer drug screening. Cancer research, 1992. 52(11): p. 3029–3034.

    Google Scholar 

  36. Garnett, M.J., et al., Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 2012. 483(7391): p. 570–575.

    Google Scholar 

  37. Cowley, G.S., et al., Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific data, 2014. 1.

    Google Scholar 

  38. Team, R.C., R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. 2014, ISBN 3-900051-07-0.

    Google Scholar 

  39. Huber, W., et al., Orchestrating high-throughput genomic analysis with Bioconductor. Nature methods, 2015. 12(2): p. 115–121.

    Google Scholar 

  40. Durinck, S., et al., Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature protocols, 2009. 4(8): p. 1184–1191.

    Google Scholar 

  41. Durinck, S., et al., BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 2005. 21(16): p. 3439–3440.

    Google Scholar 

  42. Goecks, J., A. Nekrutenko, and J. Taylor, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 2010. 11(8): p. R86.

    Google Scholar 

  43. Blankenberg, D., et al., Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology, 2010: p. 19.10. 1–19.10. 21.

    Google Scholar 

  44. Reich, M., et al., GenePattern 2.0. Nature genetics, 2006. 38(5): p. 500–501.

    Google Scholar 

  45. Gao, J., et al., Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 2013. 6(269): p. pl1.

    Google Scholar 

  46. Rhodes, D.R., et al., Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia, 2007. 9(2): p. 166-180.

    Google Scholar 

  47. Tusher, V.G., R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 2001. 98(9): p. 5116–5121.

    Google Scholar 

  48. Geman, D., et al., Classifying gene expression profiles from pairwise mRNA comparisons. Statistical applications in genetics and molecular biology, 2004. 3(1): p. 1–19.

    Google Scholar 

  49. Youssef, Y.M., et al., Accurate molecular classification of kidney cancer subtypes using microRNA signature. European urology, 2011. 59(5): p. 721–730.

    Google Scholar 

  50. Price, N.D., et al., Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proceedings of the National Academy of Sciences, 2007. 104(9): p. 3414–3419.

    Google Scholar 

  51. Xu, L., et al., Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics, 2005. 21(20): p. 3905–3911.

    Google Scholar 

  52. Shi, P., et al., Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. Bmc Bioinformatics, 2011. 12(1): p. 375.

    Google Scholar 

  53. Tan, A.C., et al., Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 2005. 21(20): p. 3896–3904.

    Google Scholar 

  54. Yuan, M. and Y. Lin, Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006. 68(1): p. 49–67.

    Google Scholar 

  55. Zou, H. and T. Hastie, Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005. 67(2): p. 301–320.

    Google Scholar 

  56. Friedman, J., T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2008. 9(3): p. 432–441.

    Google Scholar 

  57. Hastie, T., et al., The entire regularization path for the support vector machine. The Journal of Machine Learning Research, 2004. 5: p. 1391–1415.

    Google Scholar 

  58. Friedman, J., T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 2010. 33(1): p. 1.

    Google Scholar 

  59. Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644–648.

    Google Scholar 

  60. Teschendorff, A.E., et al., PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer. Bioinformatics, 2006. 22(18): p. 2269–2275.

    Google Scholar 

  61. Tong, P., et al., SIBER: systematic identification of bimodally expressed genes using RNAseq data. Bioinformatics, 2013. 29(5): p. 605–613.

    Google Scholar 

  62. Hanahan, D. and R.A. Weinberg, Hallmarks of cancer: the next generation. cell, 2011. 144(5): p. 646–674.

    Google Scholar 

  63. Huang, D.W., B.T. Sherman, and R.A. Lempicki, Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research, 2009. 37(1): p. 1–13.

    Google Scholar 

  64. Tong, P. and K.R. Coombes, integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012. 28(22): p. 2861–2869.

    Google Scholar 

  65. Jiang, P., et al., Inference of transcriptional regulation in cancers. Proceedings of the National Academy of Sciences, 2015. 112(25): p. 7731–7736.

    Google Scholar 

  66. Vaske, C.J., et al., Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010. 26(12): p. i237–i245.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Tong, P., Li, H. (2016). Mining Massive Genomic Data for Therapeutic Biomarker Discovery in Cancer: Resources, Tools, and Algorithms. In: Wong, KC. (eds) Big Data Analytics in Genomics. Springer, Cham. https://doi.org/10.1007/978-3-319-41279-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41279-5_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41278-8

  • Online ISBN: 978-3-319-41279-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics