Deep metazoan phylogeny: When different genes tell different stories

https://doi.org/10.1016/j.ympev.2013.01.010Get rights and content

Abstract

Molecular phylogenetic analyses have produced a plethora of controversial hypotheses regarding the patterns of diversification of non-bilaterian animals. To unravel the causes for the patterns of extreme inconsistencies at the base of the metazoan tree of life, we constructed a novel supermatrix containing 122 genes, enriched with non-bilaterian taxa. Comparative analyses of this supermatrix and its two non-overlapping multi-gene partitions (including ribosomal and non-ribosomal genes) revealed conflicting phylogenetic signals. We show that the levels of saturation and long branch attraction artifacts in the two partitions correlate with gene sampling. The ribosomal gene partition exhibits significantly lower saturation levels than the non-ribosomal one. Additional systematic errors derive from significant variations in amino acid substitution patterns among the metazoan lineages that violate the stationarity assumption of evolutionary models frequently used to reconstruct phylogenies. By modifying gene sampling and the taxonomic composition of the outgroup, we were able to construct three different yet well-supported phylogenies. These results show that the accuracy of phylogenetic inference may be substantially improved by selecting genes that evolve slowly across the Metazoa and applying more realistic substitution models. Additional sequence-independent genomic markers are also necessary to assess the validity of the phylogenetic hypotheses.

Highlights

► Deep metazoan phylogeny was tested using non-overlapping multi-gene matrices. ► Different partitions produce conflicting phylogenies. ► Level of saturation and LBA artifacts depend on gene sampling strategy. ► Ctenophora-basal and the sponge paraphyly correlate with higher saturation. ► Genes involved in translation support the Coelenterata and monophyly of Porifera.

Introduction

The historical sequence of early animal diversification events has been the subject of debate for approximately a century. Morphological character analyses leave a degree of uncertainty concerning the evolutionary relationships among the five major metazoan lineages: Porifera, Placozoa, Ctenophora, Cnidaria, and Bilateria (Collins et al., 2005). In the last few years, this debate has been fueled by a plethora of conflicting phylogenetic hypotheses generated using molecular data (Dunn et al., 2008, Erwin et al., 2011, Philippe et al., 2009, Pick et al., 2010, Schierwater et al., 2009, Sperling et al., 2009). The persisting controversy includes questions concerning the earliest diverging animal lineage (Porifera vs. Placozoa vs. Ctenophora), the validity of the Eumetazoa (Bilateria + Cnidaria + Ctenophora) and Coelenterata (Cnidaria + Ctenophora) clades, and relationships among the main lineages of Porifera (sponges; reviewed in Wörheide et al. (2012)). These questions are fundamental for understanding the evolution of both animal body plans and genomes (Philippe et al., 2009).

In 2003, Rokas and co-authors (Rokas et al., 2003a) showed that the evolutionary relationships between major metazoan lineages cannot be resolved using single genes or a small number of protein-coding sequences. Because of the high stochastic error, the analyses of the individual genes resulted in conflicting phylogenies. These authors also observed that at least 8000 randomly selected characters (>20 genes) are required to overcome the effect of these discrepancies (Rokas et al., 2003b). However, the authors’ subsequent attempt at resolving the deep metazoan relationships using a large dataset containing 50 genes from 17 metazoan taxa (including six non-bilaterian species) was not successful (Rokas et al., 2005). By contrast, the analysis of the identical set of genes robustly resolved the higher-level phylogeny of Fungi, a group of approximately the same age as the Metazoa (Yuan et al., 2005). Based on this result, these authors concluded that because of the rapidity of the metazoan radiation, the true phylogenetic signal preserved on the deep internal branches was too low to reliably deduce their branching order (Rokas and Carroll, 2006). However, this conclusion did not discourage scientists from further attempts at resolving this difficult phylogenetic question using the traditional sequence-based phylogenetic approach. The main strategy of the subsequent studies was increasing the amount of data, including both gene and taxon sampling. In 2008, a novel hypothesis of early metazoan evolution was proposed by Dunn et al. (2008) based on the analysis of 150 nuclear genes (21,152 amino acid [aa] characters) from 71 metazoan taxa (however, with only nine non-bilaterian species among them). According to this hypothesis, ctenophores represent the most ancient, earliest diverging branch of the Metazoa. This evolutionary scenario did not gain any support from the analysis of another large alignment that contained 128 genes (30,257 aa) and a larger number of non-bilateral metazoan species (22; Philippe et al., 2009). This study revived the Coelenterata and Eumetazoa hypotheses (Hyman, 1940) and placed the Placozoa as the sister-group of the Eumetazoa. Another scenario for early metazoan evolution was proposed by Schierwater et al. (2009) based on the analysis of a dataset that included not only nuclear protein-coding genes but also mitochondrial genes and morphological characters (a “total evidence” dataset). This study reconstructed monophyletic “Diploblasta” (i.e., non-bilaterian metazoans) with a “basal” Placozoa as the sister-group of the Bilateria.

Recently published metazoan phylogenies differ in their taxon and gene sampling and their application of phylogenetic methods and thresholds, including the use of different models of amino acid substitution. Any of these factors may be a source of the observed incongruity among the proposed deep metazoan phylogenies (Dunn et al., 2008, Philippe et al., 2009, Schierwater et al., 2009). Comparative analyses of the three above-described multi-gene alignments showed that the observed conflict can be partially attributed to the presence of contaminations, alignment errors, and reliance on simplified evolutionary models (Philippe et al., 2011) or long branch attraction artifacts caused by insufficient ingroup taxon sampling (Pick et al., 2010). Correcting the alignment errors in the datasets by Dunn et al. (2008) and Schierwater et al. (2009) and applying an evolutionary model that best fit these data, altered both the tree topology and basal node support, but failed to resolve the incongruences between the three phylogenies.

The objective of the present study is to further assess the causes of inconsistency between deep (non-bilaterian) metazoan phylogenies obtained using phylogenomic (large multi-gene) datasets with a main emphasis on the effect of gene sampling. We approached this question with multiple comparative analyses of a novel phylogenomic dataset with two multi-gene sub-matrices that have identical taxon samplings, comparable lengths, and missing data percentage but different gene contents. We also increased the taxon sampling by adding new data from non-bilaterian lineages, including seven Porifera species, one Ctenophora species, and a novel placozoan strain.

Section snippets

Data acquisition

New data were generated for nine species of non-bilaterian metazoans, including one ctenophore, Beroe sp., an unidentified placozoan species (Placozoan strain H4), and seven sponges: Asbestopluma hypogea, Ephydatia muelleri, Pachydictyum globosum, Tethya wilhelma (all from class Demospongiae), Crateromorpha meyeri (class Hexactinellida), Corticium candelabrum (class Homoscleromorpha), (Expressed Sequence Tag [EST] libraries), and Sycon ciliatum (class Calcarea; EST and genomic data). The data

Different gene matrices tell different stories

The ProtTest analyses indicated that LG + Γ + I was the evolutionary model that best fit the majority of the single-gene alignments in a Maximum Likelihood (ML) framework. However, a further statistical comparison (cross-validation test; Stone, 1974) extended to more complex evolutionary models rejected the LG in favor of GTR (scores of 383 and 61 in favor of GTR for the ribosomal and non-ribosomal matrices, respectively), which, in turn, was outperformed by both the Bayesian CAT (with a score

Why do different genes tell different stories?

The multiple conflicting metazoan phylogenies presented here and in previous publications (Dunn et al., 2008, Erwin et al., 2011, Philippe et al., 2009, Pick et al., 2010, Schierwater et al., 2009, Sperling et al., 2009, Srivastava et al., 2010) have one feature in common: they have long terminal and short internal branches. Frequently, such a topology is a sign of ancient rapid radiations, which are closely spaced diversification events that occurred deep in time (Rokas et al., 2003a; Rokas et

Conclusions

This study shows an extreme sensitivity of the higher-level metazoan phylogeny to the gene composition of the phylogenomic matrices. The gene sampling strategy determines the level of saturation and LBA biases in the resulting phylogenies. According to our results, a careful a priori (i.e., post-sequencing and before analyses) selection of genes that evolve slowly across all metazoan lineages helps to decrease systematic errors and recover the phylogenetic signal from the noise. Using this

Author contributions

G.W. conceived the research and obtained the funding; T.N. and G.W. designed the research; T.N. and F.S. analyzed the data; M.A., Mn.A., M.E., J.H., B.S., W.M., M.W. and G.W. provided data; M.M., M.N., and J.V. provided samples; M.M. contributed to manuscript revision; and T.N. and G.W. wrote the paper.

Acknowledgments

We thank S. Leys, B. Bergum, Ch. Arnold, M. Krüß, and E. Gaidos for providing samples; M. Kube and his team (MPE for Molecular Genetics, Berlin, Germany) for library construction; I. Ebersberger and his team (Center for Integrative Bioinformatics, Vienna, Austria) for data processing; and K. Nosenko for the artwork. This work was financially supported by the German Research Foundation (DFG Priority Program SPP1174 “Deep Metazoan Phylogeny,” Projects Wo896/6 and WI 2216/2-2). M.A. and Mn.A.

References (84)

  • J.L. Thorley et al.

    Testing the phylogenetic stability of early tetrapods

    Journal of Theoretical Biology

    (1999)
  • G. Wörheide et al.

    Deep phylogeny and evolution of sponges (Phylum Porifera)

  • F. Abascal et al.

    ProtTest: selection of best-fit models of protein evolution

    Bioinformatics

    (2005)
  • J. Bergsten

    A review of long-branch attraction

    Cladistics

    (2005)
  • R.B. Bevan et al.

    Calculating the evolutionary rates of different genes: a fast, accurate estimator with applications to maximum likelihood phylogenetic analysis

    Systematic Biology

    (2005)
  • C. Bleidorn et al.

    On the phylogenetic position of Myzostomida: can 77 genes get it wrong?

    BMC Evolutionary Biology

    (2009)
  • H. Brinkmann et al.

    Archaea sister group of bacteria? Indications from tree reconstruction artifacts in ancient phylogenies

    Molecular Biology and Evolution

    (1999)
  • S. Capella-Gutiérrez et al.

    TrimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

    Bioinformatics

    (2009)
  • J.M. Carlton et al.

    Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis

    Science

    (2007)
  • C.I. Castillo-Davis et al.

    The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint

    Genome Research

    (2004)
  • Codon Usage Database....
  • A.G. Collins et al.

    Phylogenetic context and basal metazoan model systems

    Integrative and Comparative Biology

    (2005)
  • I. Comas et al.

    From phylogenetics to phylogenomics: the evolutionary relationships of insect endosymbiotic gamma-proteobacteria as a test case

    Systematic Biology

    (2007)
  • M. Donoghue et al.

    The suitability of molecular and morphological evidence in reconstructing plant phylogeny

  • C.W. Dunn et al.

    Broad phylogenomic sampling improves resolution of the animal tree of life

    Nature

    (2008)
  • I. Ebersberger et al.

    A consistent phylogenetic backbone for the fungi

    Molecular Biology and Evolution

    (2011)
  • R.C. Edgar

    MUSCLE: multiple sequence alignment with high accuracy and high throughput

    Nucleic Acids Research

    (2004)
  • D.H. Erwin et al.

    The Cambrian conundrum: early divergence and later ecological success in the early history of animals

    Science

    (2011)
  • J. Felsenstein

    A likelihood approach to character weighting and what it tells us about parsimony and compatibility

    Biological Journal of the Linnean Society

    (1978)
  • J. Felsenstein

    Parsimony in systematics: biological and statistical issues

    Annual Review of Ecology and Systematics

    (1983)
  • M. Fourment et al.

    PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change

    BMC Evolutionary Biology

    (2006)
  • J. Gatesy et al.

    Hidden likelihood support in genomic data: can forty-five wrongs make a right?

    Systematic Biology

    (2005)
  • A. Graybeal

    Evaluating the phylogenetic utility of genes – a search for genes informative about deep divergences among vertebrates

    Systematic Biology

    (1994)
  • E. Haeckel

    Generelle Morphologie der Organismen

    (1866)
  • D.M. Hillis

    Taxonomic sampling, phylogenetic accuracy, and investigator bias

    Systematic Biology

    (1998)
  • B.R. Holland et al.

    Outgroup misplacement and phylogenetic inaccuracy under a molecular clock – a simulation study

    Systematic Biology

    (2003)
  • H. Hori et al.

    The rates of evolution in some ribosomal components

    Journal of Molecular Evolution

    (1977)
  • J. Hughes et al.

    Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles)

    Molecular Biology and Evolution

    (2006)
  • L. Hyman

    The Invertebrates: Protozoa through Ctenophora

    (1940)
  • N. King et al.

    The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans

    Nature

    (2008)
  • I. Landais et al.

    Annotation pattern of ESTs from Spodoptera frugiperda Sf9 cells and analysis of the ribosomal protein genes reveal insect-specific features and unexpectedly low codon usage bias

    Bioinformatics

    (2003)
  • N. Lartillot et al.

    A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process

    Molecular Biology and Evolution

    (2004)
  • Cited by (202)

    View all citing articles on Scopus
    1

    Current address: Swire Institute of Marine Science, School of Biological Sciences, The University of Hong Kong, Hong Kong

    View full text