Abstract
Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of the main factors in current efforts to detect missing heritability for complex diseases. Simulation is a critical tool in developing methodologies that can more effectively detect and study epistasis. Here we present a simulator, epiSIM (epistasis SIMulator), that can simulate some of the statistical properties of genetic data. EpiSIM is capable of expanding the range of the epistasis models that current simulators offer, including epistasis models that display marginal effects and those that display no marginal effects. One or more of these epistasis models can be embedded simultaneously into a single simulation data set, jointly determining the phenotype. In addition, epiSIM is independent of any outside data source in generating linkage disequilibrium patterns and haplotype blocks. We demonstrate the wide applicability of epiSIM by performing several data simulations, and examine its properties by comparing it with current representative simulators and by comparing the data that it generates with real data. Our experiments demonstrate that epiSIM is a valuable addition and a nice complement to the existing epistasis simulators. The software package is available online at https://sourceforge.net/projects/episimsimulator/files/.
Similar content being viewed by others
References
Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265
Cancare F, Marin A, Sciuto D (2011) Dedicated hardware accelerators for the epistatic analysis of human genetic data. International Conference on Embedded Computer Systems:102–109
Carvajal-Rodriguez A (2008) Simulation of genomes: a review. Curr Genomics 9:155–159
Carvajal-Rodriguez A (2010) Simulation of genes and genomes forward in time. Curr Genomics 11:58–61
Chen GK, Marjoram P, Wall JD (2009a) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142
Chen L, Yu G, Miller DJ, Song L, Langefeld C, Herrington D, Liu Y, Wang Y (2009b) A ground truth based comparative study on detecting epistatic SNPs. IEEE Internat Confer Bioinform Biomed Workshop:26–31
Culverhouse R, Suarez BK, Lin J, Reich T (2002) A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70:461–471
Gunther T, Gawenda I, Schmid KJ (2011) Phenosim: a software to simulate phenotypes for testing in genome-wide association studies. BMC Bioinformatics 12:265
Herold C, Steffens M, Brockschmidt FF, Baur MP, Becker T (2009) INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 25:3275–3281
Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13:110–122
Jenkins PA, Griffiths RC (2011) Inference from samples of DNA sequences using a two-locus model. J Comput Biol 18:109–127
Li J, Chen Y (2008) Generating samples for association studies based on HapMap data. BMC Bioinformatics 9:44
Liang L, Zollner S, Abecasis GR (2007) GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics 23:1565–1567
Maher B (2008) Personal genomes: the case of the missing heritability. Nature 456:18–21
Mailund T, Schierup MH, Pedersen CN, Mechlenborg PJ, Madsen JN, Schauser L (2005) CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinformatics 6:252
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753
Miller DJ, Zhang Y, Yu G, Liu Y, Chen L, Langefeld CD, Herrington D, Wang Y (2009) An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25:2478–2485
Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays 27:637–646
Neuenschwander S, Hospital F, Guillaume F, Goudet J (2008) quantiNemo: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulation. Bioinformatics 24:1552–1553
Pattaro C, Ruczinski I, Fallin DM, Parmigiani G (2008) Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies. BMC Genomics 9:405
Peng B, Amos CI (2010) Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinformatics 11:442
Peng B, Kimmel M (2005) simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21:3686–3687
Posada D, Wiuf C (2003) Simulating haplotype blocks in the human genome. Bioinformatics 19:289–290
Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147
Scott MD, Alison AM, Digna RV, Scott MW, Marylyn DR (2006) Data simulation software for whole-genome association and other studies in human genetics. Pacific Symposium on Biocomputing:499–510
Shang J, Zhang J, Sun Y, Liu D, Ye D, Yin Y (2011) Performance analysis of novel methods for detecting epistasis. BMC Bioinformatics 12:475
Shang J, Zhang J, Lei X, Zhang Y, Chen B (2012) Incorporating heuristic information into ant colony optimization for epistasis detection. Genes Genomics 34:271–278
Tang W, Wu X, Jiang R, Li Y (2009) Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet 5:e1000464
VanLiere JM, Rosenberg NA (2008) Mathematical properties of the r2 measure of linkage disequilibrium. Theor Popul Biol 74:130–137
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W (2010a) BOOST: a fast approach to detecting gene–gene interactions in genome-wide case-control studies. Am J Hum Genet 87:325–340
Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W (2010b) Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26:30–37
Wang Y, Liu X, Robbins K, Rekaya R (2010) AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 3:117
Wright FA, Huang H, Guan X, Gamiel K, Jeffries C, Barry WT, de Villena FP, Sullivan PF, Wilhelmsen KC, Zou F (2007) Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23:2581–2588
Yuan X, Zhang J, Wang Y (2011) Simulating linkage disequilibrium structures in a human population for SNP association studies. Biochem Genet 49:395–409
Yuan X, Miller DJ, Zhang J, Herrington D, Wang Y (2012) An overview of population genetic data simulation. J Comput Biol 19:42–54
Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions in case-control studies. Nat Genet 39:1167–1173
Zhang F, Liu J, Chen J, Deng HW (2008) HAPSIMU: a genetic simulation platform for population-based association studies. BMC Bioinformatics 9:331
Zhang X, Huang S, Zou F, Wang W (2010) TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26:i217–i227
Acknowledgments
We are grateful to the anonymous reviewers whose suggestions and comments contributed to the significant improvement of this paper. This work was supported by “the Fundamental Research Funds for the Central Universities” (Research on Pathogenic Patterns of Complex Diseases Based on DNA Methylation and SNP); the National Natural Science Foundation of China (Grant No. 61070137, 61070143); the Major Research Plan of the National Natural Science Foundation of China (Grant No. 91130006); the Key Program of the National Natural Science Foundation of China (Grant No. 60933009); the Young Scientists Fund of the National Natural Science Foundation of China (Grant No. 61100164).
Conflicts of interest
The authors have declared that no competing interests exist.
Author information
Authors and Affiliations
Corresponding authors
Electronic supplementary material
Below is the link to the electronic supplementary material.
13258_2013_81_MOESM1_ESM.rar
Additional file 1 Title: current version of epiSIM (version 1.0). Description: the archive includes the current version of epiSIM, a detailed manual of its usage, and the simulation data sets used in this study. (RAR 18,266 kb)
Rights and permissions
About this article
Cite this article
Shang, J., Zhang, J., Lei, X. et al. EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes Genom 35, 305–316 (2013). https://doi.org/10.1007/s13258-013-0081-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13258-013-0081-9