Skip to main content

Phylogenetic Methods for Genome-Wide Association Studies in Bacteria

  • Protocol
  • First Online:
Bacterial Pangenomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2242))

Abstract

Genome-wide association studies in bacteria have great potential to deliver a better understanding of the genetic basis of many biologically important phenotypes, including antibiotic resistance, pathogenicity, and host adaptation. Such studies need however to account for the specificities of bacterial genomics, especially in terms of population structure, homologous recombination, and genomic plasticity. A powerful way to tackle this challenge is to use a phylogenetic approach, which is based on long-standing methodology for the evolutionary analysis of bacterial genomic data. Here we present both the theoretical and practical aspects involved in the use of phylogenetic methods for bacterial genome-wide association studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Loman NJ, Pallen MJ (2015) Twenty years of bacterial genome sequencing. Nat Rev Microbiol 13:787–794

    Article  CAS  PubMed  Google Scholar 

  2. Enright MC, Spratt BG (2011) The genomic view of bacterial diversification. Science 331:407–409

    Article  CAS  PubMed  Google Scholar 

  3. Wilson DJ (2012) Insights from genomics into bacterial pathogen populations. PLoS Pathog 8:e1002874

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Didelot X et al (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Köser CU et al (2012) Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 8:e1002824

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Falush D, Bowden R (2006) Genome-wide association mapping in bacteria? Trends Microbiol 14:353–355

    Article  CAS  PubMed  Google Scholar 

  7. Falkow S (1988) Molecular Koch’s postulates applied to microbial pathogenicity. Rev Infect Dis 10:274–276

    Article  Google Scholar 

  8. Falkow S (2004) Molecular Koch’s postulates applied to bacterial pathogenicity — a personal recollection 15 years later. Nat Rev Microbiol 2:67–72

    Article  CAS  PubMed  Google Scholar 

  9. Fredricks DN, Relman DA (1996) Sequence-based identification of microbial pathogens: a reconsideration of Koch’ s postulates. Clin Microbiol Rev 9:18–33

    Article  CAS  PubMed  Google Scholar 

  10. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  PubMed Central  CAS  Google Scholar 

  11. The Wellcome Trust Case Control Consortium (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713–720

    Article  PubMed Central  CAS  Google Scholar 

  12. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791

    Article  CAS  PubMed  Google Scholar 

  13. Stephen M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10:681–690

    Article  CAS  Google Scholar 

  14. Marchini J et al (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517

    Article  CAS  PubMed  Google Scholar 

  15. Read TD, Massey RC (2014) Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med 6:109

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Chen PE, Shapiro BJ (2015) The advent of genome-wide association studies for bacteria. Curr Opin Microbiol 25:17–24

    Article  CAS  PubMed  Google Scholar 

  17. Power RA et al (2016) Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 18:41–50

    Article  PubMed  CAS  Google Scholar 

  18. Achtman M (2008) Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62:53–70

    Article  CAS  PubMed  Google Scholar 

  19. Sheppard SK et al (2011) Niche segregation and genetic structure of Campylobacter jejuni populations from wild and agricultural host species. Mol Ecol 20:3484–3490

    Article  PubMed  PubMed Central  Google Scholar 

  20. Zhang K, Jin L (2003) HaploBlockFinder: haplotype block analyses. Bioinformatics 19:1300–1301

    Article  CAS  PubMed  Google Scholar 

  21. Didelot X et al (2010) Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 186:1435–1449

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Didelot X, Maiden MCJ (2010) Impact of recombination on bacterial evolution. Trends Microbiol 18:315–322

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Shapiro BJ et al (2009) Looking for Darwin’s footprints in the microbial world. Trends Microbiol 17:196–204

    Article  CAS  PubMed  Google Scholar 

  24. Ochman H et al (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304

    Article  CAS  PubMed  Google Scholar 

  25. Rankin DJ et al (2011) What traits are carried on mobile genetic elements, and why? Heredity (Edinb) 106:1–10

    Article  CAS  Google Scholar 

  26. Achtman M et al (1999) Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A 96:14043

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Holt KE et al (2012) Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44:1056–1059

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Zwick ME et al (2012) Genomic characterization of the Bacillus cereus sensu lato species: backdrop to the evolution of Bacillus anthracis. Genome Res 22:1512–1524

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Holt KE et al (2008) High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet 40:987–993

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Didelot X et al (2007) A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination? Genome Res 17:61–68

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lees JA et al (2016) Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun 7:12797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Earle SG et al (2016) Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 1:16041

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Brynildsrud O et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17:238

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Collins C, Didelot X (2018) A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 14:e1005958

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Méric G et al (2018) Disease-associated genotypes of the commensal skin bacterium Staphylococcus epidermidis. Nat Commun 9:5034

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Sheppard SK et al (2013) Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 110:11923–11927

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Farhat MR et al (2013) Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 45:1183–1189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Benson DA et al (2017) GenBank. Nucleic Acids Res 45:D37–D42

    Article  CAS  PubMed  Google Scholar 

  39. Alikhan N et al (2018) A genomic overview of the population structure of Salmonella. PLoS Genet 14:e1007261

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Jolley KAA, Maiden MCJ (2010) BIGSdb: scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 11:595

    Article  PubMed  PubMed Central  Google Scholar 

  41. Brown T et al (2016) SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genom 2:e000044. https://doi.org/10.1099/mgen.0.000044

    Article  PubMed Central  Google Scholar 

  42. Farhat MR et al (2019) GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun 10:2128

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Darling AE et al (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12

    Article  PubMed  PubMed Central  Google Scholar 

  46. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Guindon S et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321

    Article  CAS  PubMed  Google Scholar 

  48. Price MN et al (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. Nguyen LT et al (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274

    Article  CAS  PubMed  Google Scholar 

  50. Croucher NJ et al (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43:e15

    Article  PubMed  CAS  Google Scholar 

  51. Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11:e1004041

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Argimón S et al (2016) Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genom 2:e000093

    PubMed  PubMed Central  Google Scholar 

  53. Letunic I, Bork P (2016) Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhou Z et al (2018) GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 28:1395–1404

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Ansari MA, Didelot X (2016) Bayesian inference of the evolution of a phenotype distribution on a phylogenetic tree. Genetics 204:89–98

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Hunt DE et al (2008) Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081–1085

    Article  CAS  PubMed  Google Scholar 

  57. R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  58. Paradis E, Schliep K (2019) Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528

    Article  CAS  PubMed  Google Scholar 

  59. Yu G et al (2017) Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28–36

    Article  Google Scholar 

  60. Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Sahl JW et al (2014) The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. PeerJ 2:e332

    Article  PubMed  PubMed Central  Google Scholar 

  62. Rutherford K et al (2000) Artemis: sequence visualization and annotation. Bioinformatics 16:944–945

    Article  CAS  PubMed  Google Scholar 

  63. Carver T et al (2009) DNAPlotter: circular and linear interactive genome visualization. Bioinformatics 25:119–120

    Article  CAS  PubMed  Google Scholar 

  64. Krzywinski M et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Maiden MCJ et al (2013) MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 11:728–736

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Didelot X, Falush D (2007) Inference of bacterial microevolution using multilocus sequence data. Genetics 175:1251–1266

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Hedge J, Wilson J (2014) Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio 5:e02158–e02114

    Article  PubMed  PubMed Central  Google Scholar 

  68. Vos M, Didelot X (2009) A comparison of homologous recombination rates in bacteria and archaea. ISME J 3:199–208

    Article  CAS  PubMed  Google Scholar 

  69. Joy JB et al (2016) Ancestral reconstruction. PLoS Comput Biol 12:e1004763

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  70. Visscher PM et al (2008) Heritability in the genomics era — concepts and misconceptions. Nat Rev Genet 9:255–266

    Article  CAS  PubMed  Google Scholar 

  71. Sims GE, Kim S-H (2011) Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proc Natl Acad Sci U S A 108:8329–8334

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Sims GE et al (2009) Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci U S A 106:2677–2682

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Martins E, Garland T (1991) Phylogenetic analyses of the correlated evolution of continuous characters: a simulation study. Evolution (N Y) 45:534–557

    Google Scholar 

  74. Garland T et al (2005) Phylogenetic approaches in comparative physiology. J Exp Biol 208:3015–3035

    Article  PubMed  Google Scholar 

  75. Garland T et al (1993) Phylogenetic analysis of covariance by computer simulation. Syst Biol 42:265–292

    Article  Google Scholar 

  76. Liò P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8:1233–1244

    Article  PubMed  Google Scholar 

  77. Didelot X et al (2008) Inferring genomic flux in bacteria. Genome Res 19:306–317

    Article  PubMed  CAS  Google Scholar 

  78. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300

    Google Scholar 

  79. Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Didelot .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Didelot, X. (2021). Phylogenetic Methods for Genome-Wide Association Studies in Bacteria. In: Mengoni, A., Bacci, G., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 2242. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1099-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1099-2_13

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1098-5

  • Online ISBN: 978-1-0716-1099-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics