Abstract
As one of the most recent advanced technologies developed for biomedical research, the next generation sequencing (NGS) technology has opened more opportunities for scientific discovery of genetic information. The NGS technology is particularly useful in elucidating a genome for the analysis of DNA copy number variants (CNVs). The study of CNVs is important as many genetic studies have led to the conclusion that cancer development, genetic disorders, and other diseases are usually relevant to CNVs on the genome. One way to analyze the NGS data for detecting boundaries of CNV regions on a chromosome or a genome is to phrase the problem as a statistical change point detection problem presented in the read count data. We therefore provide a statistical change point model to help detect CNVs using the NGS read count data. We use a Bayesian approach to incorporate possible parameter changes in the underlying distribution of the NGS read count data. Posterior probabilities for the change point inferences are derived. Extensive simulation studies have shown advantages of our proposed methods. The proposed methods are also applied to a publicly available lung cancer cell line NGS dataset, and CNV regions on this cell line are successfully identified.
Appendix
Derivation of the posterior for the constant prior
Given the priors in (5) and (6), the joint posterior is found to be:
where
where
Hence, if we can derive I1 and I2, respectively, we can multiply them to get the results for
Let
Let
Similarly, we have
where Y̅1, SS1, Y̅2 and SS2 are defined as in (short-symbol).
Let
and
Derivation of the posterior distribution for the Jeffreys prior
With the Jeffreys prior given in (prior2-lambda) and the change point location prior of (5), the joint posterior probability is given by
and then the posterior probability of the position k is
where
After intergation and algebraic simplification, we obtain:
Therefore,
References
Abyzov, A., A. E. Urban, Snyder and M. Gerstein (2011): “CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing,” Genome Res., 21, 974–984.Search in Google Scholar
Anscombe, F. J. (1948): “The transformation of poisson, binomial and negative-binomial data,” Biometrika, 35, 246–254.10.1093/biomet/35.3-4.246Search in Google Scholar
Chen, J. and A. K. Gupta (2012): Parametric statistical change point analysis – with applications to genetics, medicine, and finance, 2nd edition, New York: Birkhauser.10.1007/978-0-8176-4801-5Search in Google Scholar
Chen, J. and Y. P. Wang (2009): “A statistical change point model approach for the detection of DNA copy number variations in array CGH data,” IEEE/ACM Transact. Comput. Biol. Bioinformatics, 6, 529–541.Search in Google Scholar
Chen, J., A. Yiğiter and K. C. Chang (2011): “A Bayesian approach to inference about a change point model with application to DNA copy number experimental data,” J. Appl. Stat., 38, 1899–1913.Search in Google Scholar
Chiang, D. Y., G. Getz, D. B. Jaffe, M. J. T. O’Kelly, X. Zhao, S. L. Carter, C. Russ, C. Nusbaum, M. Meyerson and E. S. Lander (2009): “High-resolution mapping of copy-number alterations with massively parallel sequencing,” Nat. Methods, 6, 99–103.Search in Google Scholar
Guha, S., Y. Li and D. Neuberg (2008): “Bayesian hidden markov modeling of array CGH data,” J. Am. Stat. Assoc., 103, 485–497.Search in Google Scholar
He, D., N. Furlotte and E. Eskin (2010): “Detection and reconstruction of tandemly organized de novo copy number variations,” BMC Bioinformatics, 11, S12.10.1186/1471-2105-11-S11-S12Search in Google Scholar PubMed PubMed Central
Ivakhno, S., T. Royce, A. J. Cox, D. J. Evers, R. K. Cheetham and S. Tavaré (2010): “CNVseq: a novel framework for identification of copy number changes in cancer from second-generation sequencing data,” Bioinformatics, 26, 3051–3058.10.1093/bioinformatics/btq587Search in Google Scholar PubMed
Jeffreys, H. (1946): “An invariant form for the prior probability in estimation problems,” Proc. R. Soc. London. Series A, Mathematic. Phys. Sci., 186, 453–461.Search in Google Scholar
Magi, A., L. Tattini, T. Pippucci, F. Torricelli and M. Benelli (2012): “Read count approach for DNA copy number variants detection,” Bioinformatics, 28, 470–478.10.1093/bioinformatics/btr707Search in Google Scholar PubMed
Metzker, M. L. (2010): “Sequencing technologies – the next generation,” Nat. Rev. Genet., 11, 31–46.Search in Google Scholar
Miller, C. A., O. Hampton, C. Coarfa and A. Milosavljevic (2011): “ReadDepth: A parallel R package for detecting copy number alterations from short sequencing reads,” PLoS One, 6(1), e16327.10.1371/journal.pone.0016327Search in Google Scholar PubMed PubMed Central
Olshen, A. B., E. S. Venkatraman, R. Lucito and M. Wigler (2004): “Circular binary segmentation for the analysis of array-based DNA copy number data,” Biostatistics, 5, 557–572.10.1093/biostatistics/kxh008Search in Google Scholar PubMed
Patel, L. R., M. Nykter, K. Chen and W. Zhang (2013): “Cancer genome sequencing: Understanding malignancy as a disease of the genome, its conformation, and its evolution,” Cancer Lett., 340, 152–160.Search in Google Scholar
Ritz A., P. L. Paris, M. M. Ittmann, C. Collins and B. J. Raphael (2011): “Detection of recurrent rearrangement breakpoints from copy number data,” BMC Bioinformatics, 12, 114.10.1186/1471-2105-12-114Search in Google Scholar PubMed PubMed Central
Scheinin I., D. Sie, H. Bengtsson, M. A. van de Wiel, A. B. Olshen, H. F. van Thuijl, H. F. van Essen, P. P. Eijk, F. Rustenburg, G. A. Meijer, J. C. Reijneveld, P. Wesseling, D. Pinkel, D. G. Albertson and B. Ylstra (2014): “DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly,” Genome Res., 24(12), 2022–32.10.1101/gr.175141.114Search in Google Scholar PubMed PubMed Central
Seshan, V. E. and A. Olshen (2014) DNAcopy: DNA copy number data analysis. R package version 1.38.1.Search in Google Scholar
University of California Santa Cruz (UCSC) Genome Browser. (2014). http://genome.ucsc.edu/.Search in Google Scholar
Venkatraman, E. S. and A. B. Olshen (2007): “A faster circular binary segmentation algorithm for the analysis of array CGH data,” Bioinformatics, 23, 657–663.10.1093/bioinformatics/btl646Search in Google Scholar PubMed
Xie, C. and M. Tammi (2009): “CNV-seq, a new method to detect copy number variation using high-throughput sequencing,” BMC Bioinformatics, 10, 80.10.1186/1471-2105-10-80Search in Google Scholar PubMed PubMed Central
Yoon, S., Z. Xuan, V. Makarov, K. Ye and J. Sebat (2009): “Sensitive and accurate detection of copy number variants using read depth of coverage,” Genome Res., 19, 1586–1592.Search in Google Scholar
Zhang, J., R. Chiodini, A. Badr and G. Zhang (2011): “The impact of next-generation sequencing on genomics,” J. Genet. Genomics, 38, 95–109.Search in Google Scholar
©2015 by De Gruyter