Skip to main content

Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies

  • Protocol
  • First Online:
Environmental Microbial Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2569))

Abstract

Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Szöllõsi GJ, Tannier E, Daubin V, Boussau B (2014) The inference of gene trees with species trees. Syst Biol 64:e42–e62

    PubMed  PubMed Central  Google Scholar 

  2. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733–D745

    PubMed  Google Scholar 

  3. Ochman H, Lawrence JG, Groisman EA (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299–304

    CAS  PubMed  Google Scholar 

  4. Doolittle WF, Boucher Y, Nesbø CL, Douady CJ, Andersson JO, Roger AJ (2003) How big is the iceberg of which organellar genes in nuclear genomes are but the tip? Philos Trans R Soc Lond Ser B Biol Sci 358:39–57. discussion 57–8

    CAS  Google Scholar 

  5. Puigbò P, Wolf YI, Koonin EV (2009) Search for a “tree of life” in the thicket of the phylogenetic forest. J Biol 8:59

    PubMed  PubMed Central  Google Scholar 

  6. Dagan T, Artzy-Randrup Y, Martin W (2008) Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proc Natl Acad Sci U S A 105:10039–10044

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol:833–844. https://doi.org/10.1038/nbt.3935

  8. Bowers RM, The Genome Standards Consortium, Kyrpides NC, Stepanauskas R, Harmon-Smith M, Doud D et al (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat Biotechnol:725–731. https://doi.org/10.1038/nbt.3893

  9. Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A et al (2015) Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523:208–211

    CAS  PubMed  Google Scholar 

  10. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng J-F et al (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499:431–437

    CAS  PubMed  Google Scholar 

  11. Zaremba-Niedzwiedzka K, Caceres EF, Saw JH, Bäckström D, Juzokaite L, Vancaester E et al (2017) Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541:353–358

    CAS  PubMed  Google Scholar 

  12. Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ et al (2016) A new view of the tree of life. Nat Microbiol 1:16048

    CAS  PubMed  Google Scholar 

  13. Castelle CJ, Banfield JF (2018) Major new microbial groups expand diversity and Alter our understanding of the tree of life. Cell 172:1181–1197

    CAS  PubMed  Google Scholar 

  14. Williams TA, Foster PG, Cox CJ, Embley TM (2013) An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504:231–236

    CAS  PubMed  Google Scholar 

  15. Mande SS, Mohammed MH, Ghosh TS (2012) Classification of metagenomic sequences: methods and challenges. Brief Bioinform 13:669–681

    PubMed  Google Scholar 

  16. Orakov A, Fullam A, Coelho LP, Khedkar S, Szklarczyk D, Mende DR et al (2021) GUNC: detection of chimerism and contamination in prokaryotic genomes. Genome Biol 22:178

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Steinegger M, Salzberg SL (2020) Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol 21:115

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Harris JK, Kelley ST, Spiegelman GB, Pace NR (2003) The genetic core of the universal ancestor. Genome Res 13:407–412

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Creevey CJ, Doerks T, Fitzpatrick DA, Raes J, Bork P (2011) Universally distributed single-copy genes indicate a constant rate of horizontal transfer. PLoS One 6:e22099

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG et al (2019) Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains bacteria and archaea. Nat Commun 10:5477

    CAS  PubMed  PubMed Central  Google Scholar 

  21. de Queiroz A, Gatesy J (2007) The supermatrix approach to systematics. Trends Ecol Evol 22:34–41

    PubMed  Google Scholar 

  22. Roch S, Steel M (2014) Likelihood-based tree reconstruction on a concatenation of aligned sequence data sets can be statistically inconsistent. Theor Popul Biol 100:56–62

    Google Scholar 

  23. Kubatko LS, Degnan JH (2007) Inconsistency of phylogenetic estimates from concatenated data under coalescence. Syst Biol 56:17–24

    CAS  PubMed  Google Scholar 

  24. Boussau B, Szöllősi GJJ, Duret L (2013) Genome-scale coestimation of species and gene trees. Genome Res 23:323–330

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Wang Y, Nakhleh L (2018) Towards an accurate and efficient heuristic for species/gene tree co-estimation. Bioinformatics 34:i697–i705

    CAS  PubMed  Google Scholar 

  26. Ogilvie HA, Bouckaert RR, Drummond AJ (2017) StarBEAST2 brings faster species tree inference and accurate estimates of substitution rates. Mol Biol Evol 34:2101–2114

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Mol Biol Evol 27:570–580

    CAS  PubMed  Google Scholar 

  28. Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, Roychoudhury A (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol Biol Evol 29:1917–1932

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Chifman J, Kubatko LS (2014) Quartet inference from SNP data under the coalescent model. Bioinformatics 30:3317–3324

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Leaché AD, Rannala B (2011) The accuracy of species tree estimation under simulation: a comparison of methods. Syst Biol 60:126–137

    PubMed  Google Scholar 

  31. Knowles LL, Lanier HC, Klimov PB, He Q (2012) Full modeling versus summarizing gene-tree uncertainty: method choice and species-tree accuracy. Mol Phylogenet Evol 65:501–509

    PubMed  Google Scholar 

  32. Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35:4453–4455

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Price MN, Dehal PS, Arkin AP (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490

    PubMed  PubMed Central  Google Scholar 

  35. Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ (2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32. https://doi.org/10.1093/molbev/msu300

  36. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542

    PubMed  PubMed Central  Google Scholar 

  37. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol 10:302

    PubMed  PubMed Central  Google Scholar 

  38. Morel B, Schade P, Lutteropp S, Williams TA, Szöllősi GJ, Stamatakis A (2021) SpeciesRax: a tool for maximum likelihood species tree inference from gene family trees under duplication, transfer, and loss. bioRxiv:2021.03.29.437460. https://doi.org/10.1101/2021.03.29.437460

  39. Wu Y (2012) Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood. Evolution 66:763–775

    PubMed  Google Scholar 

  40. Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using coalescence times among sequences. Syst Biol 58:468–477

    CAS  PubMed  Google Scholar 

  41. Liu L, Yu L (2011) Estimating species trees from unrooted gene trees. Syst Biol 60:661–667

    PubMed  Google Scholar 

  42. Vachaspati P, Warnow T (2015) ASTRID: accurate species TRees from internode distances. BMC Genomics 16:S3

    PubMed  PubMed Central  Google Scholar 

  43. Mirarab S, Reaz R, Bayzid MS, Zimmermann T, Swenson MS, Warnow T (2014) ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30:i541–i548

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Mirarab S, Nakhleh L, Warnow T (2021) Multispecies coalescent: theory and applications in phylogenetics. Annu Rev Ecol Evol Syst. https://doi.org/10.1146/annurev-ecolsys-012121-095340

  45. Bininda-Emonds ORP (ed) (2004) Phylogenetic Supertrees: combining information to reveal the tree of life. Kluwer Academic Publishers, p 550

    Google Scholar 

  46. Holmes S (2003) Statistics for phylogenetic trees. Theor Popul Biol 63:17–32

    PubMed  Google Scholar 

  47. Degnan JH (2013) Anomalous unrooted gene trees. Syst Biol 62:574–590

    CAS  PubMed  Google Scholar 

  48. Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62:833–862

    PubMed  Google Scholar 

  49. Mirarab S, Warnow T (2015) ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31:i44–i52

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Zhang C, Rabiee M, Sayyari E, Mirarab S (2018) ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform 19:153

    Google Scholar 

  51. Rabiee M, Sayyari E, Mirarab S (2019) Multi-allele species reconstruction using ASTRAL. Mol Phylogenet Evol:286–296. https://doi.org/10.1016/j.ympev.2018.10.033

  52. Yin J, Zhang C, Mirarab S (2019) ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics 35:3961–3969

    CAS  PubMed  Google Scholar 

  53. Davidson R, Vachaspati P, Mirarab S, Warnow T (2015) Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 16:S1

    PubMed  PubMed Central  Google Scholar 

  54. Roch S, Snir S (2012) Recovering the tree-like trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. Lect Notes Comput Sci:224–238. https://doi.org/10.1007/978-3-642-29627-7_23

  55. Legried B, Molloy EK, Warnow T, Roch S (2021) Polynomial-time statistical estimation of species trees under gene duplication and loss. J Comput Biol 28:452–468

    CAS  PubMed  Google Scholar 

  56. Markin A, Eulenstein O (2020) Quartet-based inference methods are statistically consistent under the unified duplication-loss-coalescence model. Available: http://arxiv.org/abs/q-bio.PE/2004.04299

  57. Solís-Lemus C, Yang M, Ané C (2016) Inconsistency of species tree methods under gene flow. Syst Biol 65:843–851

    PubMed  Google Scholar 

  58. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–1055

    CAS  PubMed  PubMed Central  Google Scholar 

  59. Lagesen K, Hallin P, Rødland EA, Staerfeldt H-H, Rognes T, Ussery DW (2007) RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res 35:3100–3108

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Laslett D, Canback B (2004) ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res 32:11–16

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S et al (2016) Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17:132

    PubMed  PubMed Central  Google Scholar 

  62. Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform 11:119

    Google Scholar 

  63. Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60

    CAS  PubMed  Google Scholar 

  64. Chen I-MA, Chu K, Palaniappan K, Ratner A, Huang J, Huntemann M et al (2021) The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res:D751–D763. https://doi.org/10.1093/nar/gkaa939

  65. Davis JJ, Wattam AR, Aziz RK, Brettin T, Butler R, Butler RM et al (2020) The PATRIC bioinformatics resource center: expanding data and analysis capabilities. Nucleic Acids Res 48:D606–D612

    CAS  PubMed  Google Scholar 

  66. Parks DH, Chuvochina M, Rinke C, Mussig AJ, Chaumeil P-A, Hugenholtz P (2021) GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab776

  67. Mende DR, Letunic I, Maistrenko OM, Schmidt TSB, Milanese A, Paoli L et al (2020) proGenomes2: an improved database for accurate and consistent habitat, taxonomic and functional annotations of prokaryotic genomes. Nucleic Acids Res 48:D621–D625

    CAS  PubMed  Google Scholar 

  68. Almeida A, Nayfach S, Boland M, Strozzi F, Beracochea M, Shi ZJ et al (2021) A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat Biotechnol 39:105–114

    CAS  PubMed  Google Scholar 

  69. Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F et al (2020) A genomic catalog of Earth’s microbiomes. Nat Biotechnol 39:499–509

    PubMed  PubMed Central  Google Scholar 

  70. Danko D, Bezdan D, Afshin EE, Ahsanuddin S, Bhattacharya C, Butler DJ et al (2021) A global metagenomic map of urban microbiomes and antimicrobial resistance. Cell 184:3376–3393.e17

    PubMed  PubMed Central  Google Scholar 

  71. Heath TA, Hedtke SM, Hillis DM (2008) Taxon sampling and the accuracy of phylogenetic analyses. J Syst Evol 46:239–257

    Google Scholar 

  72. Hillis DM, Pollock DD, McGuire JA, Zwickl DJ (2003) Is sparse taxon sampling a problem for phylogenetic inference? Syst Biol 52:124–126

    PubMed  Google Scholar 

  73. Zwickl DJ, Hillis DM (2002) Increased taxon sampling greatly reduces phylogenetic error. Syst Biol 51:588–598

    PubMed  Google Scholar 

  74. Hedtke SM, Townsend TM, Hillis DM (2006) Resolution of phylogenetic conflict in large data sets by increased taxon sampling. Syst Biol 55:522–529

    PubMed  Google Scholar 

  75. Balaban M, Moshiri N, Mai U, Jia X, Mirarab S (2019) TreeCluster: clustering biological sequences using phylogenetic trees. PLoS One 14:e0221068

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F et al (2019) Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176:649–662.e20

    PubMed  PubMed Central  Google Scholar 

  77. Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM (2007) DNA–DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol:81–91. https://doi.org/10.1099/ijs.0.64483-0

  78. Sarmashghi S, Bohmann K (2019) P. Gilbert MT, Bafna V, Mirarab S. Skmer: assembly-free and alignment-free sample identification using genome skims. Genome Biol 20:34

    PubMed  PubMed Central  Google Scholar 

  79. Jain C, Rodriguez-R LM, Phillippy AM, Konstantinidis KT, Aluru S (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat Commun 9:5114

    PubMed  PubMed Central  Google Scholar 

  80. Parks DH, Chuvochina M, Chaumeil P-A, Rinke C, Mussig AJ, Hugenholtz P (2020) A complete domain-to-species taxonomy for bacteria and archaea. Nat Biotechnol 38:1079–1086

    CAS  PubMed  Google Scholar 

  81. Murray CS, Gao Y, Wu M (2021) Re-evaluating the evidence for a universal genetic boundary among microbial species. Nat Comm:4059

    Google Scholar 

  82. Gamez JE, Modave F, Kosheleva O (2008) Selecting the most representative sample is NP-hard: need for expert (fuzzy) knowledge. In: 2008 IEEE international conference on fuzzy systems (IEEE world congress on computational intelligence). IEEE. https://doi.org/10.1109/fuzzy.2008.4630502

  83. Ling J, O’Donoghue P, Söll D (2015) Genetic code flexibility in microorganisms: novel mechanisms and impact on physiology. Nat Rev Microbiol 13:707–721

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Molloy EK, Warnow T (2018) To include or not to include: the impact of gene filtering on species tree estimation methods. Syst Biol 67:285–303

    PubMed  Google Scholar 

  85. Segata N, Börnigen D, Morgan XC, Huttenhower C (2013) PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat Commun 4:2304

    PubMed  Google Scholar 

  86. Asnicar F, Thomas AM, Beghini F, Mengoni C, Manara S, Manghi P et al (2020) Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat Comm. https://doi.org/10.1038/s41467-020-16366-7

  87. Wiens JJ (2006) Missing data and the design of phylogenetic analyses. J Biomed Inform 34:–42. https://doi.org/10.1016/j.jbi.2005.04.001

  88. Smirnov V, Warnow T (2021) Phylogeny estimation given sequence length heterogeneity. Syst Biol 70:268–282

    PubMed  Google Scholar 

  89. Nguyen N-PD, Mirarab S, Kumar K, Warnow T (2015) Ultra-large alignments using phylogeny-aware profiles. Genome Biol 16:124

    PubMed  PubMed Central  Google Scholar 

  90. Mirarab S, Nguyen N, Guo S, Wang L-S, Kim J, Warnow T (2015) PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences. J Comput Biol 22:377–386

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Finn RD, Clements J, Eddy SR (2011) {HMMER} web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Warnow T, Mirarab S (2021) Multiple sequence alignment for large heterogeneous datasets using SATé, PASTA, and UPP. Methods Mol Biol 2231:99–119

    CAS  PubMed  Google Scholar 

  93. Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T (2009) trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25:1972–1973

    PubMed  PubMed Central  Google Scholar 

  94. Portik DM, Wiens JJ (2020) Do alignment and trimming methods matter for phylogenomic (UCE) analyses? Syst Biol. https://doi.org/10.1093/sysbio/syaa064

  95. Zhang C, Zhao Y, Braun EL, Mirarab S (2020) TAPER: Pinpointing errors in multiple sequence alignments despite varying rates of evolution. bioRxiv:2020.11.30.405589

    Google Scholar 

  96. Tan G, Muffato M, Ledergerber C, Herrero J, Goldman N, Gil M et al (2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst Biol 64. https://doi.org/10.1093/sysbio/syv033

  97. Sayyari E, Whitfield JB, Mirarab S (2017) Fragmentary gene sequences negatively impact gene tree and species tree reconstruction. Mol Biol Evol 34:3279–3291

    CAS  PubMed  Google Scholar 

  98. Philippe H, de Vienne DM, Ranwez V, Roure B, Baurain D, Delsuc F (2017) Pitfalls in supermatrix phylogenomics. Eur J Taxon. https://doi.org/10.5852/ejt.2017.283

  99. Springer MS, Gatesy J (2017) On the importance of homology in the age of phylogenomics. Syst Biodivers:1–19

    Google Scholar 

  100. Mai U, Mirarab S (2018) TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees. BMC Genomics 19:272

    PubMed  PubMed Central  Google Scholar 

  101. Springer MS, Gatesy J (2016) The gene tree delusion. Mol Phylogenet Evol 94:1–33

    PubMed  Google Scholar 

  102. Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N et al (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A 111:E4859–E4868

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14:587–589

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Quang LS, Gascuel O, Lartillot N (2008) Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24:2317–2323

    CAS  PubMed  Google Scholar 

  105. Felsenstein J (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol:368–376. https://doi.org/10.1007/bf01734359

  106. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS (2018) UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol 35:518–522

    CAS  PubMed  Google Scholar 

  107. Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol:307–321. https://doi.org/10.1093/sysbio/syq010

  108. Anisimova M, Gil M, Dufayard J-F, Dessimoz C, Gascuel O (2011) Survey of branch support methods demonstrates accuracy, power, and robustness of fast likelihood-based approximation schemes. Syst Biol 60:685–699

    PubMed  PubMed Central  Google Scholar 

  109. Sayyari E, Mirarab S (2016) Fast coalescent-based computation of local branch support from quartet frequencies. Mol Biol Evol 33:1654–1668

    CAS  PubMed  PubMed Central  Google Scholar 

  110. Mirarab S (2019) Species tree estimation using ASTRAL: practical considerations. Arxiv preprint 1904(03826) Available: http://arxiv.org/abs/1904.03826

  111. Letunic I, Bork P (2021) Interactive tree of life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49:W293–W296

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Cantrell K, Fedarko MW, Rahman G, McDonald D, Yang Y, Zaw T et al (2021) EMPress enables tree-guided, interactive, and exploratory analyses of multi-omic data dets. mSystems 6. https://doi.org/10.1128/mSystems.01216-20

  113. Cordova J, Navarro G (2016) Simple and efficient fully-functional succinct trees. Theor Comput Sci:135–145. https://doi.org/10.1016/j.tcs.2016.04.031

  114. Vázquez-Baeza Y, Pirrung M, Gonzalez A, Knight R (2013) EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience 2:16

    PubMed  PubMed Central  Google Scholar 

  115. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA et al (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37:852–857

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147

    Google Scholar 

  117. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A et al (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol 36:996–1004

    CAS  PubMed  Google Scholar 

  118. Moshiri N (2020) TreeSwift: a massively scalable python tree package. SoftwareX 11:100436

    PubMed  PubMed Central  Google Scholar 

  119. Sukumaran J, Holder MT (2010) DendroPy: a python library for phylogenetic computing. Bioinformatics 26:1569–1571

    CAS  PubMed  Google Scholar 

  120. Huerta-Cepas J, Dopazo J, Gabaldón T (2010) ETE: a python environment for tree exploration. BMC Bioinform 11:24

    Google Scholar 

  121. Junier T, Zdobnov EM (2010) The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26:1669–1670

    CAS  PubMed  PubMed Central  Google Scholar 

  122. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540–552

    CAS  PubMed  Google Scholar 

  123. Bossert S, Murray EA, Pauly A, Chernyshov K, Brady SG, Danforth BN (2020) Gene tree estimation error with ultraconserved elements: an empirical study on Pseudapis bees. Syst Biol (0):1–19

    Google Scholar 

  124. Zhang C, Scornavacca C, Molloy E, Mirarab S (2019) ASTRAL-Pro: quartet-based species tree inference despite paralogy. bioRxiv:2019.12.12.874727

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiyun Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Zhu, Q., Mirarab, S. (2022). Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. In: Luo, H. (eds) Environmental Microbial Evolution. Methods in Molecular Biology, vol 2569. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2691-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2691-7_7

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2690-0

  • Online ISBN: 978-1-0716-2691-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics