Mining Massive Genomic Data for Therapeutic Biomarker Discovery in Cancer: Resources, Tools, and Algorithms

Tong, Pan; Li, Hua

doi:10.1007/978-3-319-41279-5_10

Pan Tong² &
Hua Li³

2836 Accesses

Abstract

Cancer research is experiencing an evolution empowered by high-throughput technologies that makes it possible to collect molecular information for the entire genome at the DNA, RNA, protein, and epigenetic levels. Due to the complex nature of cancer, several organizations have launched comprehensive molecular profiling for thousands of cancer patients using multiple high-throughput technologies to investigate cancer genomics, transcriptomics, proteomics, and epigenomics. To speed up the bench-to-bedside translation, additional efforts have been made to profile hundreds of preclinical cell line models coupled with systematic screening of anticancer agents. This leads to an explosion of massive genomic data that shifts the bottleneck from data generation to data analytics. In this chapter, we will first introduce different types of genomic data as well as resources from publicly accessible data repositories that can be utilized to search for therapeutic targets for cancer treatment. We then introduce software tools frequently used for genomic data mining. Finally, we summarize working algorithms for the discovery of therapeutic biomarkers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hanahan, D. and R.A. Weinberg, The hallmarks of cancer. cell, 2000. 100(1): p. 57–70.
Google Scholar
Davies, H., et al., Mutations of the BRAF gene in human cancer. Nature, 2002. 417(6892): p. 949–954.
Google Scholar
Samuels, Y., et al., High frequency of mutations of the PIK3CA gene in human cancers. Science, 2004. 304(5670): p. 554–554.
Google Scholar
Lynch, T.J., et al., Activating mutations in the epidermal growth factor receptor underlying responsiveness of non–small-cell lung cancer to gefitinib. New England Journal of Medicine, 2004. 350(21): p. 2129–2139.
Google Scholar
Paez, J.G., et al., EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science, 2004. 304(5676): p. 1497–1500.
Google Scholar
Pao, W., et al., EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proceedings of the National Academy of Sciences of the United States of America, 2004. 101(36): p. 13306–13311.
Google Scholar
Weiss, R. NIH Launches Cancer Genome Project. 2005; Available from: http://www.washingtonpost.com/wp-dyn/content/article/2005/12/13/AR2005121301667.html.
Hudson, T.J., et al., International network of cancer genome projects. Nature, 2010. 464(7291): p. 993–998.
Google Scholar
Barretina, J., et al., The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature, 2012. 483(7391): p. 603–607.
Google Scholar
Rees, M.G., et al., Correlating chemical sensitivity and basal gene expression reveals mechanism of action. Nature chemical biology, 2015.
Google Scholar
Shoemaker, R.H., The NCI60 human tumour cell line anticancer drug screen. Nature Reviews Cancer, 2006. 6(10): p. 813–823.
Google Scholar
Yang, W., et al., Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic acids research, 2013. 41(D1): p. D955–D961.
Google Scholar
Ding, L., et al., Expanding the computational toolbox for mining cancer genomes. Nature Reviews Genetics, 2014. 15(8): p. 556–570.
Google Scholar
Colburn, W., et al., Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework. Biomarkers Definitions Working Group. Clinical Pharmacol & Therapeutics, 2001. 69: p. 89–95.
Google Scholar
Frank, R. and R. Hargreaves, Clinical biomarkers in drug discovery and development. Nature Reviews Drug Discovery, 2003. 2(7): p. 566–580.
Google Scholar
Liang, M.H., et al., Methodologic issues in the validation of putative biomarkers and surrogate endpoints in treatment evaluation for systemic lupus erythematosus. Endocrine, metabolic & immune disorders drug targets, 2009. 9(1): p. 108.
Google Scholar
Leary, R.J., et al., Development of personalized tumor biomarkers using massively parallel sequencing. Science translational medicine, 2010. 2(20): p. 20ra14–20ra14.
Google Scholar
Ji, Y., et al., Glycine and a Glycine Dehydrogenase (GLDC) SNP as Citalopram/Escitalopram Response Biomarkers in Depression: Pharmacometabolomics-Informed Pharmacogenomics. Clinical Pharmacology & Therapeutics, 2011. 89(1): p. 97–104.
Google Scholar
CHEN, H.Y., et al., Biomarkers and transcriptome profiling of lung cancer. Respirology, 2012. 17(4): p. 620–626.
Google Scholar
Zhao, L., et al., Identification of candidate biomarkers of therapeutic response to docetaxel by proteomic profiling. Cancer research, 2009. 69(19): p. 7696–7703.
Google Scholar
Wang, Z., M. Gerstein, and M. Snyder, RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 2009. 10(1): p. 57–63.
Google Scholar
Pritchard, C.C., H.H. Cheng, and M. Tewari, MicroRNA profiling: approaches and considerations. Nature Reviews Genetics, 2012. 13(5): p. 358–369.
Google Scholar
Wright, P., et al., A review of current proteomics technologies with a survey on their widespread use in reproductive biology investigations. Theriogenology, 2012. 77(4): p. 738–765. e52.
Google Scholar
Mueller, C., L.A. Liotta, and V. Espina, Reverse phase protein microarrays advance to use in clinical trials. Molecular oncology, 2010. 4(6): p. 461–481.
Google Scholar
Strahl, B.D. and C.D. Allis, The language of covalent histone modifications. Nature, 2000. 403(6765): p. 41–45.
Google Scholar
Lund, A.H. and M. van Lohuizen, Epigenetics and cancer. Genes & development, 2004. 18(19): p. 2315–2335.
Google Scholar
Zuo, T., et al., Methods in DNA methylation profiling. Epigenomics, 2009. 1(2): p. 331–345.
Google Scholar
Soon, W.W., M. Hariharan, and M.P. Snyder, High-throughput sequencing for biology and medicine. Molecular systems biology, 2013. 9(1): p. 640.
Google Scholar
Barrett, T., et al., NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic acids research, 2007. 35(suppl 1): p. D760–D765.
Google Scholar
Barrett, T. and R. Edgar, Gene Expression Omnibus: Microarray Data Storage, Submission, Retrieval, and Analysis. Methods in enzymology, 2006. 411: p. 352–369.
Google Scholar
Barrett, T., et al., NCBI GEO: archive for functional genomics data sets—update. Nucleic acids research, 2013. 41(D1): p. D991–D995.
Google Scholar
Wilhite, S.E. and T. Barrett, Strategies to explore functional genomics data sets in NCBI’s GEO database, in Next Generation Microarray Bioinformatics. 2012, Springer. p. 41–53.
Google Scholar
Davis, S. and P.S. Meltzer, GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics, 2007. 23(14): p. 1846–1847.
Google Scholar
Kauffmann, A., et al., Importing arrayexpress datasets into r/bioconductor. Bioinformatics, 2009. 25(16): p. 2092–2094.
Google Scholar
Wu, L., et al., Multidrug-resistant phenotype of disease-oriented panels of human tumor cell lines used for anticancer drug screening. Cancer research, 1992. 52(11): p. 3029–3034.
Google Scholar
Garnett, M.J., et al., Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, 2012. 483(7391): p. 570–575.
Google Scholar
Cowley, G.S., et al., Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Scientific data, 2014. 1.
Google Scholar
Team, R.C., R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2012. 2014, ISBN 3-900051-07-0.
Google Scholar
Huber, W., et al., Orchestrating high-throughput genomic analysis with Bioconductor. Nature methods, 2015. 12(2): p. 115–121.
Google Scholar
Durinck, S., et al., Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nature protocols, 2009. 4(8): p. 1184–1191.
Google Scholar
Durinck, S., et al., BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 2005. 21(16): p. 3439–3440.
Google Scholar
Goecks, J., A. Nekrutenko, and J. Taylor, Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol, 2010. 11(8): p. R86.
Google Scholar
Blankenberg, D., et al., Galaxy: a web-based genome analysis tool for experimentalists. Current protocols in molecular biology, 2010: p. 19.10. 1–19.10. 21.
Google Scholar
Reich, M., et al., GenePattern 2.0. Nature genetics, 2006. 38(5): p. 500–501.
Google Scholar
Gao, J., et al., Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 2013. 6(269): p. pl1.
Google Scholar
Rhodes, D.R., et al., Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. Neoplasia, 2007. 9(2): p. 166-180.
Google Scholar
Tusher, V.G., R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences, 2001. 98(9): p. 5116–5121.
Google Scholar
Geman, D., et al., Classifying gene expression profiles from pairwise mRNA comparisons. Statistical applications in genetics and molecular biology, 2004. 3(1): p. 1–19.
Google Scholar
Youssef, Y.M., et al., Accurate molecular classification of kidney cancer subtypes using microRNA signature. European urology, 2011. 59(5): p. 721–730.
Google Scholar
Price, N.D., et al., Highly accurate two-gene classifier for differentiating gastrointestinal stromal tumors and leiomyosarcomas. Proceedings of the National Academy of Sciences, 2007. 104(9): p. 3414–3419.
Google Scholar
Xu, L., et al., Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data. Bioinformatics, 2005. 21(20): p. 3905–3911.
Google Scholar
Shi, P., et al., Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction. Bmc Bioinformatics, 2011. 12(1): p. 375.
Google Scholar
Tan, A.C., et al., Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 2005. 21(20): p. 3896–3904.
Google Scholar
Yuan, M. and Y. Lin, Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006. 68(1): p. 49–67.
Google Scholar
Zou, H. and T. Hastie, Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2005. 67(2): p. 301–320.
Google Scholar
Friedman, J., T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 2008. 9(3): p. 432–441.
Google Scholar
Hastie, T., et al., The entire regularization path for the support vector machine. The Journal of Machine Learning Research, 2004. 5: p. 1391–1415.
Google Scholar
Friedman, J., T. Hastie, and R. Tibshirani, Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 2010. 33(1): p. 1.
Google Scholar
Tomlins, S.A., et al., Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 2005. 310(5748): p. 644–648.
Google Scholar
Teschendorff, A.E., et al., PACK: Profile Analysis using Clustering and Kurtosis to find molecular classifiers in cancer. Bioinformatics, 2006. 22(18): p. 2269–2275.
Google Scholar
Tong, P., et al., SIBER: systematic identification of bimodally expressed genes using RNAseq data. Bioinformatics, 2013. 29(5): p. 605–613.
Google Scholar
Hanahan, D. and R.A. Weinberg, Hallmarks of cancer: the next generation. cell, 2011. 144(5): p. 646–674.
Google Scholar
Huang, D.W., B.T. Sherman, and R.A. Lempicki, Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic acids research, 2009. 37(1): p. 1–13.
Google Scholar
Tong, P. and K.R. Coombes, integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory. Bioinformatics, 2012. 28(22): p. 2861–2869.
Google Scholar
Jiang, P., et al., Inference of transcriptional regulation in cancers. Proceedings of the National Academy of Sciences, 2015. 112(25): p. 7731–7736.
Google Scholar
Vaske, C.J., et al., Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics, 2010. 26(12): p. i237–i245.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Pan Tong
School of Biomedical Engineering, Bio-ID Center, Shanghai Jiao Tong University, Shanghai, China
Hua Li

Authors

Pan Tong
View author publications
You can also search for this author in PubMed Google Scholar
Hua Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Li .

Editor information

Editors and Affiliations

Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
Ka-Chun Wong

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tong, P., Li, H. (2016). Mining Massive Genomic Data for Therapeutic Biomarker Discovery in Cancer: Resources, Tools, and Algorithms. In: Wong, KC. (eds) Big Data Analytics in Genomics. Springer, Cham. https://doi.org/10.1007/978-3-319-41279-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-41279-5_10
Published: 25 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41278-8
Online ISBN: 978-3-319-41279-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics