Skip to main content
Log in

Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

It is widely appreciated that noisy, highly variable data can impede phylogeney reconstruction. Researchers have for a long time omitted problematic data from phylogenetic analyses, such as the third-codon positions and variable regions. In the analyses of the phylogenetic relations of the angiosperms; however, inclusion of complete gene sequences into genomic-scale alignments has become a common practice. Here we demonstrate that this practice can be misleading. We show that support of the basal-most position of Amborella trichopoda among the angiosperms in the chloroplast genomic data is based only on a tiny subset (< 1% of the total alignment length) of the most variable positions in alignment, exhibiting mean maximum likelihood (ML) distance among the angiosperm operational taxonomic units (OTUs) approximately 36 substitutions/site. Exclusion of these positions leads to disappearance of the basal Amborella branch. Likewise, the recently reported sister-group relationship of Ceratophyllum to the eudicots is based on the presence of 2% of the most variable positions in the genomic alignment, exhibiting, on average, 20 substitutions/site in comparison among the angiosperm OTUs. These observations highlight a need for excluding a certain proportion of saturated positions in alignment from phylogenomic analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Barkman TJ, Chenery G, McNeal JR, Lyons-Weile J, Ellisens WJ, Moore G, Wolfe AD, dePamphilis CW (2000) Independent and combined analyses of sequences from all three genomic compartments converge on the root of flowering plant phylogeny. Proc Natl Acad Sci USA 97:13166–13171

    Article  PubMed  CAS  Google Scholar 

  • Bergsten J (2005) A review of long-branch attraction. Cladistics 21:163–193

    Article  Google Scholar 

  • Borsch T, Hilu KW, Quandt D, Wilde V, Neinhuis C, Barthlott W (2003) Non-coding plastid trnT-trnF sequences reveal a well resolved phylogeny of basal angiosperms. J Evol Biol 16:558–576

    Article  PubMed  CAS  Google Scholar 

  • Crane PR, Friis EM, Pedersen KR (1995) The origin and early diversification of angiosperms. Nature 374:27–33

    Article  CAS  Google Scholar 

  • Ewing B, Green P (1998) Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res 8:186–194

    PubMed  CAS  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410

    Article  Google Scholar 

  • Goremykin V, Hansmann S, Martin W (1997) Evolutionary analysis of 58 proteins encoded in six completely sequenced chloroplast genomes: Revised molecular estimates of two seed plant divergence times. Plant Syst Evol 206:337–351

    Article  Google Scholar 

  • Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (2003) The chloroplast genome of the “basal” angiosperm Calycanthus fertilis—structural and phylogenetic analyses. Plant Syst Evol 242:119–135

    Article  CAS  Google Scholar 

  • Goremykin VV, Hirsch-Ernst KI, Wolfl S, Hellwig FH (2004) The chloroplast genome of Nymphaea alba: Whole-genome analyses and the problem of identifying the most basal angiosperm. Mol Biol Evol 21:1445–1454

    Article  PubMed  CAS  Google Scholar 

  • Goremykin VV, Holland B, Hirsch-Ernst KI, Hellwig FH (2005) Analysis of Acorus calamus chloroplast genome and its phylogenetic implications. Mol Biol Evol 22:1813–1822

    Article  PubMed  CAS  Google Scholar 

  • Goremykin VV, Hellwig FH (2006) A new test of phylogenetic model fitness addresses the issue of the basal angiosperm phylogeny. Gene 381:81–91

    Article  PubMed  CAS  Google Scholar 

  • Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:504–696

    Article  Google Scholar 

  • Hilu KW, Borsch T, Muller K, Soltis DE, Soltis PS, Savolainen V, Chase MW, Powell M, Alice L, Evans R et al (2003) Angiosperm phylogeny based on matK sequence information. Am J Bot 90:1758–1776

    Article  CAS  Google Scholar 

  • Hiratsuka J, Shimada H, Whittier R, Ishibashi T, Sakamoto M, Mori M, Kondo C, Honji Y, Sun CR, Meng BY et al (1989) The complete sequence of the rice (Oryza sativa) chloroplast genome: intermolecular recombination between distinct tRNA genes accounts for a major plastid DNA inversion during the evolution of the cereals. Mol Gen Genet 217:185–194

    Article  PubMed  CAS  Google Scholar 

  • Jansen RK, Cai Z, Raubeson LA, Daniell H, Depamphilis CW, Leebens-Mack J, Muller KF, Guisinger-Bellian M, Haberle RC, Hansen AK et al (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc Natl Acad Sci USA 104:19369–19374

    Article  PubMed  CAS  Google Scholar 

  • Jeffroy O, Brinkmann H, Delsuc F, Philippe H (2006) Phylogenomics: the beginning of incongruence? Trends Genet 22:225–231

    Article  PubMed  CAS  Google Scholar 

  • Leebens-Mack J, Raubeson LA, Cui LY, Kuehl JV, Fourcade MH, Chumley TW, Boore JL, Jansen RK, de Pamphilis CW (2005) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling one’s way out of the Felsenstein zone. Mol Biol Evol 22:1948–1963

    Article  PubMed  CAS  Google Scholar 

  • Mathews S, Donoghue MJ (1999) The root of angiosperm phylogeny inferred from duplicate phytochrome genes. Science 286:947–950

    Article  PubMed  CAS  Google Scholar 

  • Mathews S, Donoghue MJ (2000) Basal angiosperm phylogeny inferred from duplicate phytochromes A and C. Int J Plant Sci 161(Suppl):S41–S55

    Article  Google Scholar 

  • Moore MJ, Bell CD, Soltis PS, Soltis DE (2007) Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc Natl Acad Sci USA 104:19363–19368

    Article  PubMed  Google Scholar 

  • Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight DNA. Nucleic Acids Res 8:4321–4325

    Article  PubMed  CAS  Google Scholar 

  • Posada D, Crandall KA (1998) Modeltest: Testing the model of DNA substitution. Bioinformatics 14:817–818

    Article  PubMed  CAS  Google Scholar 

  • Parkinson CL, Adams KL, Palmer JD (1999) Multigene analyses identify the three earliest lineages of extant flowering plants. Curr Biol 9:1485–1488

    Article  PubMed  CAS  Google Scholar 

  • Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (1999) The earliest angiosperms: evidence from mitochondrial, plastid and nuclear genomes. Nature 402:404–407

    Article  PubMed  CAS  Google Scholar 

  • Qiu Y-L, Lee J, Bernasconi-Quadroni F, Soltis DE, Soltis PS, Zanis M, Zimmer EA, Chen Z, Savolainen V, Chase MW (2000) Phylogeny of basal angiosperms: analyses of five genes from three genomes. Int J Plant Sci 161(Suppl):S3–S27

    Article  CAS  Google Scholar 

  • Qiu Y-L, Dombrovska O, Lee J, Li L, Whitlock BA, Bernasconi-Quadroni F, Rest JS, Davis CC, Borsch T, Hilu KW et al (2005) Phylogenetic analyses of basal angiosperms based on nine plastid, mitochondrial, and nuclear genes. Int J Plant Sci 166:815–842

    Article  CAS  Google Scholar 

  • Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402:402–403

    Article  PubMed  CAS  Google Scholar 

  • Soltis PS, Soltis DE, Zanis MJ, Kim S (2000a) Basal lineages of angiosperms: relationships and implications for floral evolution. Int J Plant Sci 161(Suppl):S97–S107

    Article  Google Scholar 

  • Soltis DE, Soltis PS, Chase MW, Mort ME, Albach DC, Zanis M, Savolainen V, Hahn WH, Hoot SB, Fay MF et al (2000b) Angiosperm phylogeny inferred from 18S rDNA, rbcL, and atpB sequences. Bot J Linn Soc 133:381–461

    Google Scholar 

  • Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu Y-L, Chase MW, Farris JS, Stefanovic S, Rice DW, Palmer JD, Soltis PS (2004) Genome-scale data, angiosperm relationships, and “ending incongruence”: A cautionary tale in phylogenetics. Trends Plants Sci 9:477–483

    Article  CAS  Google Scholar 

  • Staden R, Beal KF, Bonfield JK (2000) The Staden package 1998. Meth Mol Biol 132:115–130

    CAS  Google Scholar 

  • Stefanovic S, Rice DW, Palmer JD (2004) Long branch attraction, taxon sampling, and the earliest angiosperms: Amborella or monocots? BMC Evol Biol 4:35

    Article  PubMed  Google Scholar 

  • Strimmer K, von Haeseler A (1996) Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13:964–969

    CAS  Google Scholar 

  • Swofford DL (2002) PAUP*: phylogenetic analysis using parsimony (* and other methods). Version 4. Sinauer, Sunderland

    Google Scholar 

  • Tang J, Xia H, Cao M, Zhang X, Zeng W, Hu S, Tong W, Wang J, Wang J, Yu J, Yang H, Zhu Z (2004) A comparison of rice chloroplast genomes. Plant Physiol 135:412–420

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim V. Goremykin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goremykin, V.V., Viola, R. & Hellwig, F.H. Removal of Noisy Characters from Chloroplast Genome-Scale Data Suggests Revision of Phylogenetic Placements of Amborella and Ceratophyllum . J Mol Evol 68, 197–204 (2009). https://doi.org/10.1007/s00239-009-9206-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-009-9206-9

Keywords

Navigation