Skip to main content
Log in

EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of the main factors in current efforts to detect missing heritability for complex diseases. Simulation is a critical tool in developing methodologies that can more effectively detect and study epistasis. Here we present a simulator, epiSIM (epistasis SIMulator), that can simulate some of the statistical properties of genetic data. EpiSIM is capable of expanding the range of the epistasis models that current simulators offer, including epistasis models that display marginal effects and those that display no marginal effects. One or more of these epistasis models can be embedded simultaneously into a single simulation data set, jointly determining the phenotype. In addition, epiSIM is independent of any outside data source in generating linkage disequilibrium patterns and haplotype blocks. We demonstrate the wide applicability of epiSIM by performing several data simulations, and examine its properties by comparing it with current representative simulators and by comparing the data that it generates with real data. Our experiments demonstrate that epiSIM is a valuable addition and a nice complement to the existing epistasis simulators. The software package is available online at https://sourceforge.net/projects/episimsimulator/files/.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Barrett JC, Fry B, Maller J, Daly MJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21:263–265

    Article  PubMed  CAS  Google Scholar 

  • Cancare F, Marin A, Sciuto D (2011) Dedicated hardware accelerators for the epistatic analysis of human genetic data. International Conference on Embedded Computer Systems:102–109

  • Carvajal-Rodriguez A (2008) Simulation of genomes: a review. Curr Genomics 9:155–159

    Article  PubMed  CAS  Google Scholar 

  • Carvajal-Rodriguez A (2010) Simulation of genes and genomes forward in time. Curr Genomics 11:58–61

    Article  PubMed  CAS  Google Scholar 

  • Chen GK, Marjoram P, Wall JD (2009a) Fast and flexible simulation of DNA sequence data. Genome Res 19:136–142

    Article  PubMed  CAS  Google Scholar 

  • Chen L, Yu G, Miller DJ, Song L, Langefeld C, Herrington D, Liu Y, Wang Y (2009b) A ground truth based comparative study on detecting epistatic SNPs. IEEE Internat Confer Bioinform Biomed Workshop:26–31

  • Culverhouse R, Suarez BK, Lin J, Reich T (2002) A perspective on epistasis: limits of models displaying no main effect. Am J Hum Genet 70:461–471

    Article  PubMed  Google Scholar 

  • Gunther T, Gawenda I, Schmid KJ (2011) Phenosim: a software to simulate phenotypes for testing in genome-wide association studies. BMC Bioinformatics 12:265

    Article  PubMed  Google Scholar 

  • Herold C, Steffens M, Brockschmidt FF, Baur MP, Becker T (2009) INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 25:3275–3281

    Article  PubMed  CAS  Google Scholar 

  • Hoban S, Bertorelle G, Gaggiotti OE (2012) Computer simulations: tools for population and evolutionary genetics. Nat Rev Genet 13:110–122

    PubMed  CAS  Google Scholar 

  • Jenkins PA, Griffiths RC (2011) Inference from samples of DNA sequences using a two-locus model. J Comput Biol 18:109–127

    Article  PubMed  CAS  Google Scholar 

  • Li J, Chen Y (2008) Generating samples for association studies based on HapMap data. BMC Bioinformatics 9:44

    Article  PubMed  Google Scholar 

  • Liang L, Zollner S, Abecasis GR (2007) GENOME: a rapid coalescent-based whole genome simulator. Bioinformatics 23:1565–1567

    Article  PubMed  CAS  Google Scholar 

  • Maher B (2008) Personal genomes: the case of the missing heritability. Nature 456:18–21

    Article  PubMed  CAS  Google Scholar 

  • Mailund T, Schierup MH, Pedersen CN, Mechlenborg PJ, Madsen JN, Schauser L (2005) CoaSim: a flexible environment for simulating genetic data under coalescent models. BMC Bioinformatics 6:252

    Article  PubMed  Google Scholar 

  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753

    Article  PubMed  CAS  Google Scholar 

  • Miller DJ, Zhang Y, Yu G, Liu Y, Chen L, Langefeld CD, Herrington D, Wang Y (2009) An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25:2478–2485

    Article  PubMed  CAS  Google Scholar 

  • Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. BioEssays 27:637–646

    Article  PubMed  CAS  Google Scholar 

  • Neuenschwander S, Hospital F, Guillaume F, Goudet J (2008) quantiNemo: an individual-based program to simulate quantitative traits with explicit genetic architecture in a dynamic metapopulation. Bioinformatics 24:1552–1553

    Article  PubMed  CAS  Google Scholar 

  • Pattaro C, Ruczinski I, Fallin DM, Parmigiani G (2008) Haplotype block partitioning as a tool for dimensionality reduction in SNP association studies. BMC Genomics 9:405

    Article  PubMed  Google Scholar 

  • Peng B, Amos CI (2010) Forward-time simulation of realistic samples for genome-wide association studies. BMC Bioinformatics 11:442

    Article  PubMed  Google Scholar 

  • Peng B, Kimmel M (2005) simuPOP: a forward-time population genetics simulation environment. Bioinformatics 21:3686–3687

    Article  PubMed  CAS  Google Scholar 

  • Posada D, Wiuf C (2003) Simulating haplotype blocks in the human genome. Bioinformatics 19:289–290

    Article  PubMed  CAS  Google Scholar 

  • Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF, Moore JH (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147

    Article  PubMed  CAS  Google Scholar 

  • Scott MD, Alison AM, Digna RV, Scott MW, Marylyn DR (2006) Data simulation software for whole-genome association and other studies in human genetics. Pacific Symposium on Biocomputing:499–510

  • Shang J, Zhang J, Sun Y, Liu D, Ye D, Yin Y (2011) Performance analysis of novel methods for detecting epistasis. BMC Bioinformatics 12:475

    Article  PubMed  Google Scholar 

  • Shang J, Zhang J, Lei X, Zhang Y, Chen B (2012) Incorporating heuristic information into ant colony optimization for epistasis detection. Genes Genomics 34:271–278

    Article  Google Scholar 

  • Tang W, Wu X, Jiang R, Li Y (2009) Epistatic module detection for case-control studies: a Bayesian model with a Gibbs sampling strategy. PLoS Genet 5:e1000464

    Article  PubMed  Google Scholar 

  • VanLiere JM, Rosenberg NA (2008) Mathematical properties of the r2 measure of linkage disequilibrium. Theor Popul Biol 74:130–137

    Article  PubMed  Google Scholar 

  • Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NL, Yu W (2010a) BOOST: a fast approach to detecting gene–gene interactions in genome-wide case-control studies. Am J Hum Genet 87:325–340

    Article  PubMed  CAS  Google Scholar 

  • Wan X, Yang C, Yang Q, Xue H, Tang NL, Yu W (2010b) Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 26:30–37

    Article  PubMed  CAS  Google Scholar 

  • Wang Y, Liu X, Robbins K, Rekaya R (2010) AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 3:117

    Article  PubMed  Google Scholar 

  • Wright FA, Huang H, Guan X, Gamiel K, Jeffries C, Barry WT, de Villena FP, Sullivan PF, Wilhelmsen KC, Zou F (2007) Simulating association studies: a data-based resampling method for candidate regions or whole genome scans. Bioinformatics 23:2581–2588

    Article  PubMed  CAS  Google Scholar 

  • Yuan X, Zhang J, Wang Y (2011) Simulating linkage disequilibrium structures in a human population for SNP association studies. Biochem Genet 49:395–409

    Article  PubMed  CAS  Google Scholar 

  • Yuan X, Miller DJ, Zhang J, Herrington D, Wang Y (2012) An overview of population genetic data simulation. J Comput Biol 19:42–54

    Article  PubMed  CAS  Google Scholar 

  • Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions in case-control studies. Nat Genet 39:1167–1173

    Article  PubMed  CAS  Google Scholar 

  • Zhang F, Liu J, Chen J, Deng HW (2008) HAPSIMU: a genetic simulation platform for population-based association studies. BMC Bioinformatics 9:331

    Article  PubMed  Google Scholar 

  • Zhang X, Huang S, Zou F, Wang W (2010) TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26:i217–i227

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

We are grateful to the anonymous reviewers whose suggestions and comments contributed to the significant improvement of this paper. This work was supported by “the Fundamental Research Funds for the Central Universities” (Research on Pathogenic Patterns of Complex Diseases Based on DNA Methylation and SNP); the National Natural Science Foundation of China (Grant No. 61070137, 61070143); the Major Research Plan of the National Natural Science Foundation of China (Grant No. 91130006); the Key Program of the National Natural Science Foundation of China (Grant No. 60933009); the Young Scientists Fund of the National Natural Science Foundation of China (Grant No. 61100164).

Conflicts of interest

The authors have declared that no competing interests exist.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Junliang Shang or Junying Zhang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

13258_2013_81_MOESM1_ESM.rar

Additional file 1 Title: current version of epiSIM (version 1.0). Description: the archive includes the current version of epiSIM, a detailed manual of its usage, and the simulation data sets used in this study. (RAR 18,266 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shang, J., Zhang, J., Lei, X. et al. EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis. Genes Genom 35, 305–316 (2013). https://doi.org/10.1007/s13258-013-0081-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-013-0081-9

Keywords

Navigation