Review
Gene tree discordance, phylogenetic inference and the multispecies coalescent

https://doi.org/10.1016/j.tree.2009.01.009Get rights and content

The field of phylogenetics is entering a new era in which trees of historical relationships between species are increasingly inferred from multilocus and genomic data. A major challenge for incorporating such large amounts of data into inference of species trees is that conflicting genealogical histories often exist in different genes throughout the genome. Recent advances in genealogical modeling suggest that resolving close species relationships is not quite as simple as applying more data to the problem. Here we discuss the complexities of genealogical discordance and review the issues that new methods for multilocus species tree inference will need to address to account successfully for naturally occurring genomic variability in evolutionary histories.

Section snippets

The problem of gene tree discordance

Until recently, the state of the art for molecular phylogenetic studies typically involved (i) sequencing a gene in individual representatives of a collection of species; (ii) inferring a ‘gene tree’ (see Glossary) for the sequences; and (iii) declaring the gene tree to be the estimate of the tree of species relationships. With the increasing abundance of molecular data and the recognition that evolutionary trees from different genes often have conflicting branching patterns 1, 2, 3, 4, 5, 6, 7

The multispecies coalescent

Coalescent theory 1, 2, 17, which models genealogies within populations, can be used to investigate probabilities that gene trees have branching patterns (topologies) that differ from a species tree topology. The basic model, which we call the ‘multispecies coalescent,’ generalizes the Wright-Fisher model of genetic drift 18, 19, 20, applying it to multiple populations connected by an evolutionary tree.

The coalescent for a single population traces the ancestries of a subset of individual copies

Conceptual basis for discordance

Given enough time measured in coalescent time units (Box 2), lineages within a population coalesce with high probability. After ∼5Ne generations along species tree branches, where Ne is the effective number of chromosomes, lineages are likely to have coalesced within each population, and monophyly of lineages (and, therefore, congruence between gene trees and the species tree) is probable 3, 25, 29, 41, 42. With shorter branches, multiple gene lineages tend to persist into deeper portions of

Gene tree probabilities

Probability calculations for properties of gene trees given a species tree are important for understanding the magnitude of genealogical discordance, for predicting the behavior of phylogenetic algorithms and for assessing the fit of the multispecies coalescent. Such computations rely on the concept of coalescent histories, which for a given gene tree and species tree topology represent the sequences of species tree branches on which gene tree coalescences can occur (online Supplementary Box S1

Species tree inference

Discordant gene trees contain information about features of the species tree, such as its topology, divergence times and population sizes. Conflicting gene trees therefore provide a basis for inferring species trees using procedures that do not simply equate the estimated species tree with a single estimated gene tree. A desirable property for methods that estimate species trees is statistical consistency: an estimator should converge on the true species tree as more individuals, longer DNA

Conclusions

Conflicts between gene trees estimated at different loci have sometimes been seen as obstacles for inferring phylogenies. However, we suggest that gene tree conflict provides an opportunity to obtain information regarding the processes that have shaped organismal genomes. Researchers have used conflicting gene genealogies to infer ancestral population parameters such as population size and divergence times 30, 72, and to examine species divergence processes 11, 36. It is only recently, however,

Acknowledgements

We thank M. DeGiorgio, S. Edwards, M. Slatkin and two anonymous reviewers for comments. This work was supported by grants from the National Science Foundation (DEB-0716904), the Burroughs Wellcome Foundation and the Alfred P. Sloan Foundation.

Glossary

Ancestral polymorphism
the existence of more than one allele at a locus in an ancestral population; through incomplete lineage sorting, polymorphisms can persist through species divergences, resulting in misleading similarities of DNA sequences that do not necessarily reflect population relationships.
Anomalous gene tree (AGT)
a gene tree topology that is more probable than the gene tree topology that matches the species tree topology.
Anomaly zone
for a given species tree topology, the set of

References (80)

  • F. Tajima

    Evolutionary relationship of DNA sequences in finite populations

    Genetics

    (1983)
  • R.R. Hudson

    Testing the constant-rate neutral allele model with protein sequence data

    Evolution Int. J. Org. Evolution

    (1983)
  • M. Nei

    Molecular Evolutionary Genetics

    (1987)
  • P. Pamilo et al.

    Relationships between gene trees and species trees

    Mol. Biol. Evol.

    (1988)
  • J. Felsenstein

    Phylogenies from molecular sequences: inference and reliability

    Annu. Rev. Genet.

    (1988)
  • W.P. Maddison

    Gene trees in species trees

    Syst. Biol.

    (1997)
  • I. Ebersberger

    Mapping human genetic ancestry

    Mol. Biol. Evol.

    (2007)
  • J. Syring

    Widespread genealogical nonmonophyly in species of Pinus subgenus Strobus

    Syst. Biol.

    (2007)
  • K. Takahashi

    Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by analysis of the insertion of retroposons

    Mol. Biol. Evol.

    (2001)
  • W.B. Jennings et al.

    Speciational history of Australian grassfinches (Poephila) inferred from thirty gene trees

    Evolution Int. J. Org. Evolution

    (2005)
  • B.C. Carstens et al.

    Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers

    Syst. Biol.

    (2007)
  • D.A. Pollard

    Widespread discordance of gene trees with species tree in Drosophila: evidence for incomplete lineage sorting

    PLoS Genet.

    (2006)
  • J.F.C. Kingman

    On the genealogy of large populations

    J. Appl. Probab.

    (1982)
  • M. Nordborg

    Coalescent theory

  • J. Hein

    Gene Genealogies, Variation and Evolution

    (2005)
  • J. Wakeley

    Coalescent Theory

    (2009)
  • J.C. Avise

    Phylogeography

    (2000)
  • D.J. Funk et al.

    Species-level paraphyly and polyphyly: frequency, causes and consequences, with insights from animal mitochondrial DNA

    Annu. Rev. Ecol. Evol. Syst.

    (2003)
  • N. Takahata

    Gene genealogy in three related populations: consistency probability between gene and population trees

    Genetics

    (1989)
  • J.H. Degnan et al.

    Gene tree distributions under the coalescent process

    Evolution Int. J. Org. Evolution

    (2005)
  • J.H. Degnan et al.

    Discordance of species trees with their most likely gene trees

    PLoS Genet.

    (2006)
  • M. Slatkin et al.

    The concordance of gene trees and species trees at two linked loci

    Genetics

    (2006)
  • W.P. Maddison et al.

    Inferring phylogeny despite incomplete lineage sorting

    Syst. Biol.

    (2006)
  • N.A. Rosenberg

    The shapes of neutral gene genealogies in two species: probabilities of monophyly, paraphyly, and polyphyly in a coalescent model

    Evolution Int. J. Org. Evolution

    (2003)
  • B. Rannala et al.

    Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci

    Genetics

    (2003)
  • L. Liu

    Estimating species trees using multiple-allele DNA sequence data

    Evolution Int. J. Org. Evolution

    (2008)
  • J. Felsenstein

    Inferring Phylogenies

    (2004)
  • W.J. Ewens

    Mathematical Population Genetics

    (2004)
  • J. Wakeley

    The effects of subdivision on the genetic divergence of populations and species

    Evolution Int. J. Org. Evolution

    (2000)
  • J. Hey et al.

    The study of structured populations – new hope for a difficult and divided science

    Nat. Rev. Genet.

    (2003)
  • Cited by (0)

    View full text