Research paperRooted phylogeny of the three superkingdoms
Introduction
Charles Darwin's formulations of descent with modification and natural selection were rooted in the Nineteenth Century's Earth sciences and the perspective of long-term, continuous evolution of biological diversity inferred from the geological record [1], [2]. Nevertheless, a concurrent but contrasting view of abrupt evolutionary transition was championed by Georges Cuvier, who, among others, was influenced by observations of cataclysmic shifts of geological strata associated with abrupt breaks in the fossil succession [2]. Likewise, Louis Agassiz recognized the ice ages as powerful environmental signatures capable of punctuating the fossil record [2]. Though the Darwinian view now dominates evolutionary thought, there are six well-documented major global extinctions that have been identified in the most recent 800 MY of the geologic record [3], [4]. Five of these have been identified as global events in a paleontologic record that also is punctuated with numerous less extensive extinctions [3], [4]. The sixth event in the late Neoproterozoic: the so-called “Snowball Earth” may have produced survivors that were the ancestors to the Cambrian radiation [4]. Is the footprint of such a catastrophic event recognizable in the phylogenies of modern organisms?
When conditions are so inhospitable to life, as in the Snowball Earth scenario, the culling of species might be so extreme that few clades survive to propagate when conditions become more tolerable. If all but one or a few clades had been eliminated in such a mass extinction, the survivors though originally crown organisms would appear from a present day perspective to be the root ancestors of a new tree of phylogenetic diversity.
One sure sign that the present crown of the phylogenetic tree had been re-rooted and diversified after a mass extinction would be that an ancestor with an incongruously complex body plan, such as that of a frog, were identified at the root of the modern tree. Frogs are no one's idea of the first cellular common ancestor to the global phylogenetic tree. Equally incongruous would be a genome that encodes three fourths of all the compact protein domains so far identified in proteomes of the three modern superkingdoms. Such a genome is no one's idea of the genome of the ur-ancestor, i.e. the first cell and the root of Earth's first phylogenetic tree. It is just such an incongruously complex genome that we have reconstructed for the most recent universal common ancestor (MRUCA) of the modern crown of global phylogeny. Accordingly, we suggest that the modern crown is a re-diversified tree rooted in complex survivors of mass extinction events that occurred some time before the Cambrian radiation.
The data supporting these interpretations were obtained by phylogenetic analysis of roughly 1700 compact protein domains, each representing a cohort of structural and functional homologs that were identified by hidden Markov annotation at the level of superfamily in hundreds of genomes [5], [6]. The resulting genome content cladogram (tree) links archaea and bacteria (the akaryotes) as sister clades that diverge from a last akaryote common ancestor (LACA). In parallel, several eukaryote sister clades diverge independently from a last eukaryote common ancestor (LECA). Here, LACA and LECA diverge independently from a more complex MRUCA. Reconstructions of the proteomes of the three ancestors, LACA, LECA and MRUCA confirm the independent divergence of akaryotes and eukaryotes.
Speculations concerning the endosymbiotic origins of mitochondria and chloroplasts based on the previous identification of bacteria as the root of sequence-based gene trees [7], [8], [9], [10] are not supported by the present data. Instead, genome content-based trees confirm the numerous challenges to the bacterial rooting of modern phylogeny along with the rejection of the evolutionary schemes such trees claim to support [11], [12], [13], [14], [15], [16], [17], [18], [19]. In brief, the data suggest that most of the protein elements necessary for the construction of cells of the three superkingdoms, including eukaryote organelles were already expressed in the bottlenecked population that re-rooted the phylogenetic tree following a cataclysmic collapse of the biosphere. According to our phylogenetic reconstructions, bacteria and archaea are not identifiable as ancestors to eukaryotes. Instead they diverge from a common ancestor independently of the eukaryotes as highly specialized, fast growing unicellular organisms that have evolved efficient simplicity as the hallmarks of their cellular architectures [20], [21] to survive predation by their relatively complex eukaryote cousins [19].
Section snippets
Data sources
Structural and functional annotations of proteins from completely sequenced genomes were obtained from the SUPERFAMILY (1.75) database. Here, annotations are based on hidden Markov models (HMM) that identify recurrent protein domains at the superfamily level of the SCOP (Structural Classification of Proteins) hierarchy [22]. In this hierarchy, the domains correspond to stable tertiary folds that have been identified by X-ray crystallographic and/or NMR spectroscopic methods [22]. At the
Phylogenomic approach
One general reason for abandoning sequence-based reconstructions for deep rooting of phylogeny is that contrary to their label, they are not strictly speaking “sequenced-based”. Instead, most of the sequence information is lost in reconstructions because they are in reality “alignment composition-based”, which enhances their vulnerability to distortion over long evolutionary distances [14], [32], [33]. We have chosen instead to reconstruct phylogeny based on genome content of SFs for several
A view from the crown
The present genome content trees (Fig. 5) identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. In effect, LACA and LECA descend independently in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium but a very complex ancestor with a proteome featuring homologies to many eukaryote SFs as well as to many akaryote SFs.
The
Acknowledgments
We thank Minglei Wang, K. M. Kim, and G. Caetano-Anolles for teaching us about superfamilies; S. G. E. Andersson, Otto Berg, Björn Canbäck, M. A. Huynen, David Penny, Susannah Porter and I. Winkler for often scathing criticism; the Swedish Science Council (VR) for support to A. T.; the Nobel Committee for Chemistry of the Royal Swedish Science Academy and the Royal Physiographic Society, Lund for support to CGK.
References (58)
- et al.
SCOP: a structural classification of proteins database for the investigation of sequences and structures
J. Mol. Biol.
(1995) - et al.
Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes
Mol. Phylogenet. Evol.
(1999) - et al.
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
J. Mol. Biol.
(2001) The holy grail of the perfect character: the cladistic treatment of morphometric data
Cladistics
(1993)- et al.
The origins of modern proteomes
Biochimie
(2007) Evolutionary aspects of whole-genome biology
Curr. Opin. Struct. Biol.
(2005)- et al.
Lateral gene transfer
Curr. Biol.
(2011) - et al.
A minimal estimate for the gene content of the last universal common ancestor – exobiology from a terrestrial perspective
Res. Microbiol.
(2006) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life
(1859)The Growth of Biological Thought
(1982)
Biodiversity; past, present, and future
J. Paleontol.
A neoproterozoic snowball earth
Science
Structural and functional constraints in the evolution of protein families
Nat. Rev. Mol. Cell. Biol.
The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny
Proc. Nat. Acad. Sci. U.S.A.
On the origin of mitosing cells
J. Theor. Biol.
The hydrogen hypothesis for the first eukaryote
Nature
Where is the root of the universal tree of life?
BioEssays
The rooting of the universal tree of life is not reliable
J. Mol. Evol.
Evolutionary Genomics Leads the Way, Evolutionary Genomics and Systems Biology
Bushes in the tree of life
PLoS Biol.
Why would phylogeneticists ignore computerized sequence alignment?
Syst. Biol.
Resolving difficult phylogenetic questions: why more sequences are not enough
PLoS Biol.
The origin of eukaryotes and their relationship with the archaea: are we at a phylogenomic impasse?
Nat. Rev. Microbiol.
Genomics and the irreducible nature of eukaryote cells
Science
Costs of accuracy determined by a maximal growth rate constraint
Q. Rev. Biophys.
Translational accuracy and the fitness of bacteria
Annu. Rev. Genet.
Phylogeny determined by protein domain content
Proc. Nat. Acad. Sci. U.S.A.
Structure is three to ten times more conserved than sequence - a study of structural response in protein cores
Proteins
Cited by (34)
The elements of life: A biocentric tour of the periodic table
2023, Advances in Microbial PhysiologyMitochondria are not captive bacteria
2017, Journal of Theoretical BiologyCitation Excerpt :We attribute the apparent mosaicism of shared proteins to the divergent phylogenetic patterns of descent from the common ancestor. In fact, rooted phylogeny shows that these shared proteins are descendants of the common ancestor of the modern ToL (Harish and Kurland, 2017a, b; Harish et al., 2013). The exception to that genomic normality is a minor fraction, usually amounting to less than 3–5% of the nominal mitochondrial proteome of 600 superfamilies that may be found in the organellar genome itself.
Protein lipograms
2017, Journal of Theoretical BiologyEmpirical genome evolution models root the tree of life
2017, BiochimieAkaryotes and Eukaryotes are independent descendants of a universal common ancestor
2017, BiochimieCitation Excerpt :The ESP model constructs an intrinsically rooted tree that resolves two primary lineages: Akaryotes and Eukaryotes. Here, Archaea and Bacteria are sister clades within the Akaryotes [27,28]. This genome phylogeny supports the taxonomic classification proposed by Mayr, which was based on gross cellular phenotypes [31].