Abstract
We are presenting a comprehensive comparative analysis of five differential methylation (DM) identification methods: methylKit, BSmooth, BiSeq, HMM-DM, and HMM-Fisher, which are developed for bisulfite sequencing (BS) data. We summarize the features of these methods from several analytical aspects and compare their performances using both simulated and real BS datasets. Our comparison results are summarized below. First, parameter settings may largely affect the accuracy of DM identification. Different from default settings, modified parameter settings yield higher sensitivity and/or lower false positive rates. Second, all five methods show more accurate results when identifying simulated DM regions that are long and have small within-group variation, but they have low concordance, probably due to the different approaches they have used for DM identification. Third, HMM-DM and HMM-Fisher yield relatively higher sensitivity and lower false positive rates than others, especially in DM regions with large variation. Finally, we have found that among the three methods that involve methylation estimation (methylKit, BSmooth, and BiSeq), BiSeq can best present raw methylation signals. Therefore, based on these results, we suggest that users select DM identification methods based on the characteristics of their data and the advantages of each method.
Acknowledgments
This work is supported by Dr. Shuying Sun’s start-up funds and the Research Enhancement Program provided by Texas State University. We are very grateful for three anonymous reviewers’ comments and suggestions, which help us improve this manuscript greatly.
References
Akalin, A., M. Kormaksson, S. Li, F. E. Garrett-Bakelman, M. E. Figueroa, A. Melnick and C. E. Mason (2012): “methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles,” Genome Biol., 13, R87.Search in Google Scholar
Akman, K., T. Haaf, S. Gravina, J. Vijg and A. Tresch (2014): “Genome-wide quantitative analysis of DNA methylation from bisulfite sequencing data,” Bioinformatics, 30, 1933–1934.10.1093/bioinformatics/btu142Search in Google Scholar
Aryee, M. J., A. E. Jaffe, H. Corrada-Bravo, C. Ladd-Acosta, A. P. Feinberg, K. D. Hansen and R. A. Irizarry (2014): “Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays,” Bioinformatics, 30, 1363–1369.10.1093/bioinformatics/btu049Search in Google Scholar
Baylin, S. and T. H. Bestor (2002): “Altered methylation patterns in cancer cell genomes: Cause or consequence?,” Cancer Cell, 1, 299–305.10.1016/S1535-6108(02)00061-2Search in Google Scholar
Becker, C., J. Hagmann, J. Muller, D. Koenig, O. Stegle, K. Borgwardt and D. Weigel (2011): “Spontaneous epigenetic variation in the Arabidopsis thaliana methylome,” Nature, 480, 245–249.10.1038/nature10555Search in Google Scholar PubMed
Benjamini, Y. and R. Heller (2007): “False discovery rates for spatial signals,” J. Am. Stat. Assoc., 102, 1272–1281.Search in Google Scholar
Benjamini, Y. and Y. Hochberg (1997): “Multiple hypotheses testing with weights,” Scand. J. Stat., 24, 407–418.Search in Google Scholar
Benjamini, Y., A. M. Krieger and D. Yekutieli (2006): “Adaptive linear step-up procedures that control the false discovery rate,” Biometrika, 93, 491–507.10.1093/biomet/93.3.491Search in Google Scholar
Bock, C. (2012): “Analysing and interpreting DNA methylation data,” Anglais, 13, 705–719.10.1038/nrg3273Search in Google Scholar PubMed
Butcher, L. M. and S. Beck (2015): “Probe Lasso: A novel method to rope in differentially methylated regions with 450K DNA methylation data,” Methods (San Diego, Calif.), 72, 21–28.Search in Google Scholar
Challen, G. A., D. Sun, M. Jeong, M. Luo, J. Jelinek, J. S. Berg, C. Bock, A. Vasanthakumar, H. Gu, Y. Xi, S. Liang, Y. Lu, G. J. Darlington, A. Meissner, J.-P. J. Issa, L. A. Godley, W. Li and M. A. Goodell (2011): “Dnmt3a is essential for hematopoietic stem cell differentiation,” Nat. Genet., 44, 23–31.Search in Google Scholar
Dolzhenko, E. and A. D. Smith (2014): “Using beta-binomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments,” BMC Bioinformatics, 15, 215–215.10.1186/1471-2105-15-215Search in Google Scholar PubMed PubMed Central
Du, P. and R. Bourgon (2014): “methyAnalysis: DNA methylation data analysis and visualization,” R package version 1.10.0.Search in Google Scholar
Eckhardt, F., J. Lewin, R. Cortese, V. K. Rakyan, J. Attwood, M. Burger, J. Burton, T. V. Cox, R. Davies, T. A. Down, C. Haefliger, R. Horton, K. Howe, D. K. Jackson, J. Kunde, C. Koenig, J. Liddle, D. Niblett, T. Otto, R. Pettett, S. Seemann, C. Thompson, T. West, J. Rogers, A. Olek, K. Berlin and S. Beck (2006): “DNA methylation profiling of human chromosomes 6, 20 and 22,” Nat. Genet., 38, 1378–1385.Search in Google Scholar
Feng, H., K. N. Conneely and H. Wu (2014): “A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data,” Nucleic Acids Res., 42, e69–e69.Search in Google Scholar
Gopalakrishnan, S., B. O. Van Emburgh and K. D. Robertson (2008): “DNA methylation in development and human disease,” Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 647, 30–38.10.1016/j.mrfmmm.2008.08.006Search in Google Scholar PubMed PubMed Central
Gu, H., C. Bock, T. S. Mikkelsen, N. Jager, Z. D. Smith, E. Tomazou, A. Gnirke, E. S. Lander and A. Meissner (2010): “Genome-scale DNA methylation mapping of clinical samples at single-nucleotide resolution,” Nat. Methods, 7, 133–136.Search in Google Scholar
Gu, H., Z. D. Smith, C. Bock, P. Boyle, A. Gnirke and A. Meissner (2011): “Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling,” Nat. Protoc., 6, 468–481.Search in Google Scholar
Guzman, L., M. Depix, A. Salinas, R. Roldan, F. Aguayo, A. Silva and R. Vinet (2012): “Analysis of aberrant methylation on promoter sequences of tumor suppressor genes and total DNA in sputum samples: a promising tool for early detection of COPD and lung cancer in smokers,” Diagn. Pathol., 7, 87.10.1186/1746-1596-7-87Search in Google Scholar PubMed PubMed Central
Hansen, K., B. Langmead and R. Irizarry (2012): “BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions,” Genome Biol., 13, R83.Search in Google Scholar
Hansen, K. D., W. Timp, H. C. Bravo, S. Sabunciyan, B. Langmead, O. G. McDonald, B. Wen, H. Wu, Y. Liu, D. Diep, E. Briem, K. Zhang, R. A. Irizarry and A. P. Feinberg (2011): “Increased methylation variation in epigenetic domains across cancer types,” Nat. Genet., 43, 768–775.Search in Google Scholar
Harris, E. Y., N. Ponts, A. Levchuk, K. L. Roch and S. Lonardi (2010): “BRAT: bisulfite-treated reads analysis tool,” Bioinformatics, 26, 572–573.10.1093/bioinformatics/btp706Search in Google Scholar PubMed PubMed Central
Hebestreit, K., M. Dugas and H. U. Klein (2013): “Detection of significantly differentially methylated regions in targeted bisulfite sequencing data,” Bioinformatics, 29, 1647–1653.10.1093/bioinformatics/btt263Search in Google Scholar PubMed
Irizarry, R. A., C. Ladd-Acosta, B. Wen, Z. Wu, C. Montano, P. Onyango, H. Cui, K. Gabo, M. Rongione, M. Webster, H. Ji, J. B. Potash, S. Sabunciyan and A. P. Feinberg (2009): “The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores,” Nat. Genet., 41, 178–186.Search in Google Scholar
Jaffe, A. E., P. Murakami, H. Lee, J. T. Leek, M. D. Fallin, A. P. Feinberg and R. A. Irizarry (2012): “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies,” Int. J. Epidemiol., 41, 200–209.Search in Google Scholar
Jayanth, N. and M. Puranik (2011): “Methylation stabilizes the imino tautomer of dAMP and amino tautomer of dCMP in solution,” J. Phys. Chem. B, 115, 6234–6242.Search in Google Scholar
Jiang, P., K. Sun, F. M. F. Lun, A. M. Guo, H. Wang, K. C. A. Chan, R. W. K. Chiu, Y. M. D. Lo and H. Sun (2014): “Methy-pipe: an integrated bioinformatics pipeline for whole genome bisulfite sequencing data analysis,” PLoS ONE, 9, e100360.10.1371/journal.pone.0100360Search in Google Scholar PubMed PubMed Central
Law, J. A. and S. E. Jacobsen (2010): “Establishing, maintaining and modifying DNA methylation patterns in plants and animals,” Anglais, 11, 204–220.10.1038/nrg2719Search in Google Scholar PubMed PubMed Central
Li, S., F. Garrett-Bakelman, A. Akalin, P. Zumbo, R. Levine, B. To, I. Lewis, A. Brown, R. D’Andrea, A. Melnick and C. Mason (2013): “An optimized algorithm for detecting and annotating regional differential methylation,” BMC Bioinformatics, 14, S10.10.1186/1471-2105-14-S5-S10Search in Google Scholar PubMed PubMed Central
Li, Y., J. Zhu, G. Tian, N. Li, Q. Li, M. Ye, H. Zheng, J. Yu, H. Wu, J. Sun, H. Zhang, Q. Chen, R. Luo, M. Chen, Y. He, X. Jin, Q. Zhang, C. Yu, G. Zhou, J. Sun, Y. Huang, H. Zheng, H. Cao, X. Zhou, S. Guo, X. Hu, X. Li, K. Kristiansen, L. Bolund, J. Xu, W. Wang, H. Yang, J. Wang, R. Li, S. Beck, J. Wang and X. Zhang (2010): “The DNA Methylome of Human Peripheral Blood Mononuclear Cells,” PLoS Biology, 8, e1000533.10.1371/journal.pbio.1000533Search in Google Scholar PubMed PubMed Central
Lister, R., M. Pelizzola, R. H. Dowen, R. D. Hawkins, G. Hon, J. Tonti-Filippini, J. R. Nery, L. Lee, Z. Ye, Q. M. Ngo, L. Edsall, J. Antosiewicz-Bourget, R. Stewart, V. Ruotti, A. H. Millar, J. A. Thomson, B. Ren and J. R. Ecker (2009): “Human DNA methylomes at base resolution show widespread epigenomic differences,” Nature, 462, 315–322.10.1038/nature08514Search in Google Scholar PubMed PubMed Central
Park, Y., M. E. Figueroa, L. S. Rozek and M. A. Sartor (2014): “MethylSig: a whole genome DNA methylation analysis pipeline,” Bioinformatics, 30, 2414–2422.10.1093/bioinformatics/btu339Search in Google Scholar PubMed PubMed Central
Pawitan, Y., S. Michiels, S. Koscielny, A. Gusnanto and A. Ploner (2005): “False discovery rate, sensitivity and sample size for microarray studies,” Bioinformatics, 21, 3017–3024.10.1093/bioinformatics/bti448Search in Google Scholar PubMed
Peters, T. J., M. J. Buckley, A. L. Statham, R. Pidsley, K. Samaras, R. V Lord, S. J. Clark and P. L. Molloy (2015): “De novo identification of differentially methylated regions in the human genome,” Epigenetics Chromatin, 8, 6.10.1186/1756-8935-8-6Search in Google Scholar PubMed PubMed Central
Robinson, M. D., A. Kahraman, C. W. Law, H. Lindsay, M. Nowicka, L. M. Weber and X. Zhou (2014): “Statistical methods for detecting differentially methylated loci and regions,” Front. Genet., 5, 324.Search in Google Scholar
Saito, Y., J. Tsuji and T. Mituyama (2014): “Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions,” Nucleic Acids Res., 42, e45.Search in Google Scholar
Sofer, T., E. D. Schifano, J. A. Hoppin, L. Hou and A. A. Baccarelli (2013): “A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure,” Bioinformatics, 29, 2884–2891.10.1093/bioinformatics/btt498Search in Google Scholar PubMed PubMed Central
Song, Q., B. Decato, E. E. Hong, M. Zhou, F. Fang, J. Qu, T. Garvin, M. Kessler, J. Zhou and A. D. Smith (2013): “A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics,” PLoS ONE, 8, e81148.10.1371/journal.pone.0081148Search in Google Scholar PubMed PubMed Central
Storey, J. D. (2002): “A direct approach to false discovery rates,” J Roy Stat Soc B Met, 64, 479–498.10.1111/1467-9868.00346Search in Google Scholar
Storey, J. D. and R. Tibshirani (2003): “Statistical significance for genomewide studies,” Proc. Natl. Acad. Sci., 100, 9440–9445.Search in Google Scholar
Strathdee, G. and R. Brown (2002): “Aberrant DNA methylation in cancer: potential clinical interventions,” Expert Rev. Mol. Med., 4, 1–17.Search in Google Scholar
Su, J., H. Yan, Y. Wei, H. Liu, H. Liu, F. Wang, J. Lv, Q. Wu and Y. Zhang (2013): “CpG_MPs: identification of CpG methylation patterns of genomic regions from high-throughput bisulfite sequencing data,” Nucleic Acids Res., 41, e4–e4.10.1093/nar/gks829Search in Google Scholar PubMed PubMed Central
Sun, D., Y. Xi, B. Rodriguez, H. Park, P. Tong, M. Meong, M. Goodell and W. Li (2014): “MOABS: model based analysis of bisulfite sequencing data,” Genome Biol., 15, R38.Search in Google Scholar
Sun, S. and X. Yu (2016a): “HMM-Fisher: identifying differential methylation using a hidden Markov model and Fisher’s exact test,” Stat. Appl. Genet. Mol. Biol., 15, 55–67.10.1515/sagmb-2015-0076Search in Google Scholar PubMed
Sun, S. and X. Yu (2016b): “HMM-Fisher,” GitHub repository, https://github.com/xxy39/HMM-Fisher.Search in Google Scholar
Sun, Z., Y. W. Asmann, K. R. Kalari, B. Bot, J. E. Eckel-Passow, T. R. Baker, J. M. Carr, I. Khrebtukova, S. Luo, L. Zhang, G. P. Schroth, E. A. Perez and E. A. Thompson (2011): “Integrated analysis of gene expression, CpG Island methylation, and gene copy number in breast cancer cells by deep sequencing,” PLoS ONE, 6, e17490.10.1371/journal.pone.0017490Search in Google Scholar PubMed PubMed Central
Suzuki, M. and A. Bird (2008): “DNA methylation landscapes: provocative insights from epigenomics,” Anglais, 9, 465–476.10.1038/nrg2341Search in Google Scholar PubMed
Wang, D., L. Yan, Q. Hu, L. E. Sucheston, M. J. Higgins, C. B. Ambrosone, C. S. Johnson, D. J. Smiraglia and S. Liu (2012): “IMA: an R package for high-throughput analysis of Illumina’s 450K Infinium methylation data,” Bioinformatics, 28, 729–730.10.1093/bioinformatics/bts013Search in Google Scholar PubMed PubMed Central
Wang, H., L. Tuominen and C. Tsai (2011): “SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures,” Bioinformatics, 27, 225–231.10.1093/bioinformatics/btq650Search in Google Scholar PubMed
Wei, S., R. Brown and T. Huang (2003): “Aberrant DNA methylation in ovarian cancer: is there an epigenetic predisposition to drug response?,” Ann. N. Y. Acad Sci., 983, 243–250.Search in Google Scholar
Xu, H., R. H. Podolsky, D. Ryu, X. Wang, S. Su, H. Shi and V. George (2013): “A method to detect differentially methylated loci with next-generation sequencing,” Genet Epidemiol., 37, 377–382.Search in Google Scholar
Yu, X. and S. Sun (2016a): “HMM-DM: identifying differentially methylated regions using a hidden Markov model,” Stat. Appl. Genet. Mol. Biol., 15, 69–81.10.1515/sagmb-2015-0077Search in Google Scholar PubMed
Yu, X. and S. Sun (2016b): “HMM-DM,” GitHub repository, https://github.com/xxy39/HMM-DM.Search in Google Scholar
Zhang, Y., H. Liu, J. Lv, X. Xiao, J. Zhu, X. Liu, J. Su, X. Li, Q. Wu, F. Wang and Y. Cui (2011): “QDMR: a quantitative method for identification of differentially methylated regions by entropy,” Nucleic Acids Res., 39, e58–e58.Search in Google Scholar
Supplemental Material:
The online version of this article (DOI: 10.1515/sagmb-2015-0078) offers supplementary material, available to authorized users.
©2016 by De Gruyter