Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Repeat-sequence turnover shifts fundamentally in species with large genomes

Abstract

Given the 2,400-fold range of genome sizes (0.06–148.9 Gbp (gigabase pair)) of seed plants (angiosperms and gymnosperms) with a broadly similar gene content (amounting to approximately 0.03 Gbp), the repeat-sequence content of the genome might be expected to increase with genome size, resulting in the largest genomes consisting almost entirely of repetitive sequences. Here we test this prediction, using the same bioinformatic approach for 101 species to ensure consistency in what constitutes a repeat. We reveal a fundamental change in repeat turnover in genomes above around 10 Gbp, such that species with the largest genomes are only about 55% repetitive. Given that genome size influences many plant traits, habits and life strategies, this fundamental shift in repeat dynamics is likely to affect the evolutionary trajectory of species lineages.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Content of repeats present in more than 20 copies in the genomes of 101 seed plant species ranging in size from 0.063–88.55 Gbp and encompassing much of the known range of genome sizes encountered in seed plants.
Fig. 2: Repeat dynamics across the range of plant genome sizes.

Similar content being viewed by others

Data availability

The genomic DNA data analysed were already available in the European Nucleotide Archive (ENA) (https://www.ebi.ac.uk/ena/browser/home) or Illumina sequenced during the work described in this article and archived in the ENA (Supplementary Table 3). Details of the ENA accession identifier for each sample are provided in Supplementary Table 3a. Details of the source of the plant material and sequencing platform are given in Supplementary Table 3a. Genome size data were taken from reported estimates given in the Plant DNA C-values Database release 7.1 (https://cvalues.science.kew.org/) or from source publications not yet included in the database; in each case the source reference is provided in Supplementary Table 3a, column S and the species analysed here are listed in Supplementary Table 3b (further information is available in Methods, ‘Flow cytometry and genome size data’).

Code availability

Most of the code used to analyse these data are integral to the published, established software packages as stated above and parameter settings are described as appropriate. New code was generated to filter out all low-quality sequence reads, reads containing adapter sequences and reads with similarity to the plastid and mitochondrial genomes, and this is available in the Git repository https://bitbucket.org/repeatexplorer/re_utilities.

References

  1. Lisch, D. How important are transposons for plant evolution? Nat. Rev. Genet. 14, 49–61 (2013).

    CAS  Google Scholar 

  2. Bennetzen, J. L. & Park, M. Distinguishing friends, foes, and freeloaders in giant genomes. Curr. Opin. Genet. Dev. 49, 49–55 (2018).

    CAS  Google Scholar 

  3. Kersey, P. J. Plant genome sequences: past, present, future. Curr. Opin. Plant Biol. 48, 1–8 (2019).

    CAS  Google Scholar 

  4. Elliott, T. A. & Gregory, T. R. What’s in a genome? The C-value enigma and the evolution of eukaryotic genome content. Phil. Trans. Roy. Soc. B 370, 20140331 (2015).

    Google Scholar 

  5. Elliott, T. A. & Gregory, T. R. Do larger genomes contain more diverse transposable elements? BMC Evol. Biol. 15, 69 (2015).

    Google Scholar 

  6. Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA 10, 1 (2019).

    Google Scholar 

  7. Mabuchi, T., Kokubun, H., Mii, M. & Ando, T. Nuclear DNA content in the genus Hepatica (Ranunculaceae). J. Plant Res. 118, 37–41 (2005).

    CAS  Google Scholar 

  8. Nowoshilow, S. et al. The axolotl genome and the evolution of key tissue formation regulators. Nature 554, 50–55 (2018).

    CAS  Google Scholar 

  9. Stritt, C., Wyler, M., Gimmi, E. L., Pippel, M. & Roulin, A. C. Diversity, dynamics and effects of long terminal repeat retrotransposons in the model grass Brachypodium distachyon. New Phytol. 227, 1736–1748 (2020).

    CAS  Google Scholar 

  10. Ma, J. X. & Bennetzen, J. L. Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc. Natl Acad. Sci. USA 103, 383–388 (2006).

    CAS  Google Scholar 

  11. Neumann, P., Koblížková, A., Navrátilová, A. & Macas, J. Significant expansion of Vicia pannonica genome size mediated by amplification of a single type of giant retroelement. Genetics 173, 1047–1056 (2006).

    CAS  Google Scholar 

  12. Nystedt, B. et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).

    CAS  Google Scholar 

  13. De La Torre, A. R., Li, Z., Van de Peer, Y. & Ingvarsson, P. K. Contrasting rates of molecular evolution and patterns of selection among gymnosperms and flowering plants. Mol. Biol. Evol. 34, 1363–1377 (2017).

    Google Scholar 

  14. Metcalfe, C. J., Filée, J., Germon, I., Joss, J. & Casane, D. Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol. Biol. Evol. 29, 3529–3539 (2012).

    CAS  Google Scholar 

  15. Sun, C., López Arriaza, J. R. & Mueller, R. L. Slow DNA loss in the gigantic genomes of salamanders. Genome Biol. Evol. 4, 1340–1348 (2012).

    Google Scholar 

  16. Vu, G. T. H., Cao, H. X., Reiss, B. & Schubert, I. Deletion-bias in DNA double-strand break repair differentially contributes to plant genome shrinkage. New Phytol. 214, 1712–1721 (2017).

    CAS  Google Scholar 

  17. Tiley, G. P. & Burleigh, J. G. The relationship of recombination rate, genome structure, and patterns of molecular evolution across angiosperms. BMC Evol. Biol. 15, 194 (2015).

    Google Scholar 

  18. Kent, T. V., Uzunović, J. & Wright, S. I. Coevolution between transposable elements and recombination. Philos. Trans. Roy. Soc. B 372, 20160458 (2017).

    Google Scholar 

  19. Maumus, F. & Quesneville, H. Deep investigation of Arabidopsis thaliana junk DNA reveals a continuum between repetitive elements and genomic dark matter. PLoS ONE 9, e94101 (2014).

    Google Scholar 

  20. Kelly, L. J. et al. Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size. New Phytol. 208, 596–607 (2015).

    CAS  Google Scholar 

  21. Bennetzen, J. L. & Kellogg, E. A. Do plants have a one-way ticket to genomic obesity? Plant Cell 9, 1509–1514 (1997).

    CAS  Google Scholar 

  22. Leitch, A. R. & Leitch, I. J. Ecological and genetic factors linked to contrasting genome dynamics in seed plants. New Phytol. 194, 629–646 (2012).

    CAS  Google Scholar 

  23. Francis, D., Davies, M. S. & Barlow, P. B. A strong nucleotypic effect of DNA C-value on the cell cycle regardless of ploidy level. Ann. Bot. 101, 747–757 (2008).

    CAS  Google Scholar 

  24. Doyle, J. J. & Coate, J. E. Polyploidy, the nucleotype, and novelty: the Impact of genome doubling on the biology of the cell. Int. J. Plant Sci. 180, 1–52 (2019).

    Google Scholar 

  25. Roddy, A. B. et al. The scaling of genome size and cell size limits maximum rates of photosynthesis with implications for ecological strategies. Int. J. Plant Sci. 181, 75–87 (2020).

    Google Scholar 

  26. Lawson, T. & Blatt, M. R. Stomatal size, speed, and responsiveness impact on photosynthesis and water use efficiency. Plant Physiol. 164, 1556–1570 (2014).

    CAS  Google Scholar 

  27. Franks, P. J. & Beerling, D. J. Maximum leaf conductance driven by CO2 effects on stomatal size and density over geologic time. Proc. Natl Acad. Sci. USA 106, 10343–10347 (2009).

    CAS  Google Scholar 

  28. Pellicer, J., Hidalgo, O., Dodsworth, S. & Leitch, I. J. Genome size diversity and its impact on the evolution of land plants. Genes 9, 88 (2018).

    Google Scholar 

  29. Knight, C. A., Molinari, N. A. & Petrov, D. A. The large genome constraint hypothesis: evolution, ecology and phenotype. Ann. Bot. 95, 177–190 (2005).

    CAS  Google Scholar 

  30. Vidic, T., Greilhuber, J., Vilhar, B. & Dermastia, M. Selective significance of genome size in a plant community with heavy metal pollution. Ecol. Appl. 19, 1515–1521 (2009).

    CAS  Google Scholar 

  31. Fleischmann, A. et al. Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms. Ann. Bot. 114, 1651–1663 (2014).

    CAS  Google Scholar 

  32. Van de Peer, Y., Mizrachi, E. & Marchal, K. The evolutionary significance of polyploidy. Nat. Rev. Genet. 18, 411–424 (2017).

    Google Scholar 

  33. Landis, J. B. et al. Impact of whole-genome duplication events on diversification rates in angiosperms. Am. J. Bot. 105, 348–363 (2018).

    Google Scholar 

  34. Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).

    Google Scholar 

  35. Pellicer, J. & Leitch, I. J. The Plant DNA C-values database (release 7.1): an updated online repository of plant genome size data for comparative studies. New Phytol. 226, 301–305 (2019).

    Google Scholar 

  36. Ickert-Bond, S. M. et al. Polyploidy in gymnosperms—insights into the genomic and evolutionary consequences of polyploidy in Ephedra. Mol. Phyl. Evol. 147, 106786 (2020).

    Google Scholar 

  37. Pellicer, J. & Leitch, I. J. in Molecular Plant Taxonomy Vol. 1115 (ed. Besse, P.) 279–307 (Humana Press, 2014).

  38. Ferrari, S. & Cribari-Neto, F. Beta regression for modelling rates and proportions. J. Appl. Stat. 31, 799–815 (2004).

    Google Scholar 

  39. Cribari-Neto, F. & Zeileis, A. Beta Regression in R. J. Stat. Softw. 34, 1–24 (2010).

    Google Scholar 

  40. Smithson, M. & Verkuilen, J. A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. Psychol. Meth. 11, 54–71 (2006).

    Google Scholar 

  41. Durka, W. & Michalski, S. G. Daphne: a dated phylogeny of a large European flora for phylogenetically informed ecological analyses. Ecology 93, 2297–2297 (2012).

    Google Scholar 

  42. Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

    CAS  Google Scholar 

  43. Rambaut, A. FigTree version 1.4.3 http://tree.bio.ed.ac.uk/software/figtree (2012).

  44. Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).

    Google Scholar 

  45. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. & R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. R Package version 3.1 http://cran.r-project.org/package=nlme (2017).

Download references

Acknowledgements

We thank Natural Environment Research Council (NE/G020256/1), the Czech Academy of Sciences (RVO:60077344) and Ramón y Cajal Fellowship (RYC-2017-2274) funded by the Ministerio de Ciencia y Tecnología (Gobierno de España) for support. In addition, the work was supported by the European Regional Development Fund–European Social Fund project ELIXIR-CZ–Capacity Building (no. CZ.02.1.01/0.0/0.0/16_013/0001777) and ELIXIR-CZ research infrastructure project (LM2015047) for the access to computing and storage facilities. We also thank Natural Environment Research Council for funding a studentship to S.D. and the China Scholarship Council for funding W.W. Finally, we thank R.A. Nichols for helpful advice and J. Marquardt for supplying DNA of H. non-scripta.

Author information

Authors and Affiliations

Authors

Contributions

A.R.L., I.J.L., J. Macas and P. Novák conceived the experiment and designed, implemented and coordinated the project. P. Novák conducted genomic sequence analysis, P. Neumann conducted (retro)transposon protein-coding domains analysis, J.P. and J. Mlinarec provided material and flow cytometry analysis, and M.S.G. provided the statistical analysis. J. Mlinarec, L.J.K., S.D., W.W., A. Kovařík, A. Koblížková and J.P. provided sequence data and experimental advice. All authors were involved in writing the manuscript.

Corresponding authors

Correspondence to Jiří Macas, Ilia J. Leitch or Andrew R. Leitch.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Distribution of genome sizes (GS).

Distribution of genome sizes (GS) in (a) 10,770 angiosperms (out of c. 350,000 known species) (b) 506 gymnosperms (out of c. 1,000 known species).

Extended Data Fig. 2 Genome proportion of the different classes of repeats.

Genome proportion of the different classes of repeats based on copy number fitted against ln-transformed genome size. The grey regression lines show estimated trends for all 101 species fitted with a beta regression (see also Supplementary Table 4). Orange lines show the estimated slope from phylogenetic least squares (PGLS) using a phylogeny with proportional branch lengths (phy P) fitted with an Ornstein–Uhlenbeck process. We also tested a phylogeny with branch lengths transformed to a cladogram (phy C), and results were similar to this PGLS (not shown).

Extended Data Fig. 3 Transposable element analysis of 77 seed plant species.

Analysis of 77 seed plant species (69 angiosperms (1 early-diverging angiosperm, 53 eudicots, 15 monocots) and eight gymnosperms) showing how the proportion of the genome occupied by transposable element-related protein coding domains varies with ln-transformed genome size. The regression lines show the slopes estimated from a beta regression, and from a PGLS with an Ornstein–Uhlenbeck process. The regression line of the graph is similar to that seen for the whole repetitive fraction (see Extended Data Fig. 2a).

Extended Data Fig. 4 Genome proportion of repeats in eudicots, monocots and gymnosperms.

Genome proportion of repeats in four categories (sequences ≤ 20 copies, low (21–500), middle (501–10,000) and high (>10,000) copy sequences fitted against ln-transformed genome size, separately for eudicots, monocots and gymnosperms. See also Supplementary Table 4, which shows significant relationships in these datasets.

Supplementary information

Supplementary Information

Supplementary Tables 1–8 and Figs. 1–4.

Reporting Summary

Supplementary Table Excel file 1

The data and methods used in previously published work to estimate repeat genome proportions (GP) are provided in an Excel spreadsheet (filename: Novak_Supplementary_Table_1 (2 Sept).xlsx).

Supplementary Table Excel file 2

Details of the materials used in sequencing and repeat genome proportions (GP) and genome-size (GS) data: (a) shows the 101 plant species analysed for total repeat GP and the GPs of each of the four repeat categories based on the number of mutual similarity hits. It also shows the GPs of transposable elements (TEs), genome sizes (GS, bp/1 C) of the species analysed and the sources of that data; (b) shows the species in which the GS data were obtained in this work; and (c) lists the technical and biological replicates examined with the sources of the data (filename: Novak_Supplementary_Table_3 (2 Sept).xlsx).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Novák, P., Guignard, M.S., Neumann, P. et al. Repeat-sequence turnover shifts fundamentally in species with large genomes. Nat. Plants 6, 1325–1329 (2020). https://doi.org/10.1038/s41477-020-00785-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41477-020-00785-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing