Abstract
Genome-wide association studies in bacteria have great potential to deliver a better understanding of the genetic basis of many biologically important phenotypes, including antibiotic resistance, pathogenicity, and host adaptation. Such studies need however to account for the specificities of bacterial genomics, especially in terms of population structure, homologous recombination, and genomic plasticity. A powerful way to tackle this challenge is to use a phylogenetic approach, which is based on long-standing methodology for the evolutionary analysis of bacterial genomic data. Here we present both the theoretical and practical aspects involved in the use of phylogenetic methods for bacterial genome-wide association studies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Loman NJ, Pallen MJ (2015) Twenty years of bacterial genome sequencing. Nat Rev Microbiol 13:787–794
Enright MC, Spratt BG (2011) The genomic view of bacterial diversification. Science 331:407–409
Wilson DJ (2012) Insights from genomics into bacterial pathogen populations. PLoS Pathog 8:e1002874
Didelot X et al (2012) Transforming clinical microbiology with bacterial genome sequencing. Nat Rev Genet 13:601–612
Köser CU et al (2012) Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLoS Pathog 8:e1002824
Falush D, Bowden R (2006) Genome-wide association mapping in bacteria? Trends Microbiol 14:353–355
Falkow S (1988) Molecular Koch’s postulates applied to microbial pathogenicity. Rev Infect Dis 10:274–276
Falkow S (2004) Molecular Koch’s postulates applied to bacterial pathogenicity — a personal recollection 15 years later. Nat Rev Microbiol 2:67–72
Fredricks DN, Relman DA (1996) Sequence-based identification of microbial pathogens: a reconsideration of Koch’ s postulates. Clin Microbiol Rev 9:18–33
The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678
The Wellcome Trust Case Control Consortium (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464:713–720
Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791
Stephen M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10:681–690
Marchini J et al (2004) The effects of human population structure on large genetic association studies. Nat Genet 36:512–517
Read TD, Massey RC (2014) Characterizing the genetic basis of bacterial phenotypes using genome-wide association studies: a new direction for bacteriology. Genome Med 6:109
Chen PE, Shapiro BJ (2015) The advent of genome-wide association studies for bacteria. Curr Opin Microbiol 25:17–24
Power RA et al (2016) Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 18:41–50
Achtman M (2008) Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens. Annu Rev Microbiol 62:53–70
Sheppard SK et al (2011) Niche segregation and genetic structure of Campylobacter jejuni populations from wild and agricultural host species. Mol Ecol 20:3484–3490
Zhang K, Jin L (2003) HaploBlockFinder: haplotype block analyses. Bioinformatics 19:1300–1301
Didelot X et al (2010) Inference of homologous recombination in bacteria using whole-genome sequences. Genetics 186:1435–1449
Didelot X, Maiden MCJ (2010) Impact of recombination on bacterial evolution. Trends Microbiol 18:315–322
Shapiro BJ et al (2009) Looking for Darwin’s footprints in the microbial world. Trends Microbiol 17:196–204
Ochman H et al (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304
Rankin DJ et al (2011) What traits are carried on mobile genetic elements, and why? Heredity (Edinb) 106:1–10
Achtman M et al (1999) Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci U S A 96:14043
Holt KE et al (2012) Shigella sonnei genome sequencing and phylogenetic analysis indicate recent global dissemination from Europe. Nat Genet 44:1056–1059
Zwick ME et al (2012) Genomic characterization of the Bacillus cereus sensu lato species: backdrop to the evolution of Bacillus anthracis. Genome Res 22:1512–1524
Holt KE et al (2008) High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat Genet 40:987–993
Didelot X et al (2007) A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination? Genome Res 17:61–68
Lees JA et al (2016) Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun 7:12797
Earle SG et al (2016) Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nat Microbiol 1:16041
Brynildsrud O et al (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17:238
Collins C, Didelot X (2018) A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 14:e1005958
Méric G et al (2018) Disease-associated genotypes of the commensal skin bacterium Staphylococcus epidermidis. Nat Commun 9:5034
Sheppard SK et al (2013) Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity factor in Campylobacter. Proc Natl Acad Sci U S A 110:11923–11927
Farhat MR et al (2013) Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet 45:1183–1189
Benson DA et al (2017) GenBank. Nucleic Acids Res 45:D37–D42
Alikhan N et al (2018) A genomic overview of the population structure of Salmonella. PLoS Genet 14:e1007261
Jolley KAA, Maiden MCJ (2010) BIGSdb: scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics 11:595
Brown T et al (2016) SimBac: simulation of whole bacterial genomes with homologous recombination. Microb Genom 2:e000044. https://doi.org/10.1099/mgen.0.000044
Farhat MR et al (2019) GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun 10:2128
McKenna A et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303
Darling AE et al (2010) progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147
Kurtz S et al (2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Guindon S et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Price MN et al (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490
Nguyen LT et al (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32:268–274
Croucher NJ et al (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res 43:e15
Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11:e1004041
Argimón S et al (2016) Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genom 2:e000093
Letunic I, Bork P (2016) Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245
Zhou Z et al (2018) GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens. Genome Res 28:1395–1404
Ansari MA, Didelot X (2016) Bayesian inference of the evolution of a phenotype distribution on a phylogenetic tree. Genetics 204:89–98
Hunt DE et al (2008) Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320:1081–1085
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Paradis E, Schliep K (2019) Ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528
Yu G et al (2017) Ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28–36
Page AJ et al (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–3693
Sahl JW et al (2014) The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes. PeerJ 2:e332
Rutherford K et al (2000) Artemis: sequence visualization and annotation. Bioinformatics 16:944–945
Carver T et al (2009) DNAPlotter: circular and linear interactive genome visualization. Bioinformatics 25:119–120
Krzywinski M et al (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–1645
Maiden MCJ et al (2013) MLST revisited: the gene-by-gene approach to bacterial genomics. Nat Rev Microbiol 11:728–736
Didelot X, Falush D (2007) Inference of bacterial microevolution using multilocus sequence data. Genetics 175:1251–1266
Hedge J, Wilson J (2014) Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not. MBio 5:e02158–e02114
Vos M, Didelot X (2009) A comparison of homologous recombination rates in bacteria and archaea. ISME J 3:199–208
Joy JB et al (2016) Ancestral reconstruction. PLoS Comput Biol 12:e1004763
Visscher PM et al (2008) Heritability in the genomics era — concepts and misconceptions. Nat Rev Genet 9:255–266
Sims GE, Kim S-H (2011) Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs). Proc Natl Acad Sci U S A 108:8329–8334
Sims GE et al (2009) Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc Natl Acad Sci U S A 106:2677–2682
Martins E, Garland T (1991) Phylogenetic analyses of the correlated evolution of continuous characters: a simulation study. Evolution (N Y) 45:534–557
Garland T et al (2005) Phylogenetic approaches in comparative physiology. J Exp Biol 208:3015–3035
Garland T et al (1993) Phylogenetic analysis of covariance by computer simulation. Syst Biol 42:265–292
Liò P, Goldman N (1998) Models of molecular evolution and phylogeny. Genome Res 8:1233–1244
Didelot X et al (2008) Inferring genomic flux in bacteria. Genome Res 19:306–317
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300
Nagarajan N, Pop M (2013) Sequence assembly demystified. Nat Rev Genet 14:157–167
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Didelot, X. (2021). Phylogenetic Methods for Genome-Wide Association Studies in Bacteria. In: Mengoni, A., Bacci, G., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 2242. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1099-2_13
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1099-2_13
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1098-5
Online ISBN: 978-1-0716-1099-2
eBook Packages: Springer Protocols