Abstract
The white poplar (Populus alba) is widely distributed in Central Asia and Europe. There are natural populations of white poplar in Irtysh River basin in China. It also can be cultivated and grown well in northern China. In this study, we sequenced the genome of P. alba by single-molecule real-time technology. De novo assembly of P. alba had a genome size of 415.99 Mb with a contig N50 of 1.18 Mb. A total of 32,963 protein-coding genes were identified. 45.16% of the genome was annotated as repetitive elements. Genome evolution analysis revealed that divergence between P. alba and Populus trichocarpa (black cottonwood) occurred ~5.0 Mya (3.0, 7.1). Fourfold synonymous third-codon transversion (4DTV) and synonymous substitution rate (ks) distributions supported the occurrence of the salicoid WGD event (~ 65 Mya). Twelve natural populations of P. alba in the Irtysh River basin in China were sequenced to explore the genetic diversity. Average pooled heterozygosity value of P. alba populations was 0.170±0.014, which was lower than that in Italy (0.271±0.051) and Hungary (0.264±0.054). Tajima’s D values showed a negative distribution, which might signify an excess of low frequency polymorphisms and a bottleneck with later expansion of P. alba populations examined.
Similar content being viewed by others
References
Alexa, A., and Rahnenfuhrer, J. (2010). topGO: Enrichment Analysis for Gene Ontology. R package version 2.30.1.
Argus, G.W., Eckenwalder, J.E., Kiger, R.W. (2010). Salicaceae. In Flora of North America, Flora of North America Editorial Committee, ed. vol. 7. (New York: Oxford University Press).
Bairoch, A., and Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28, 45–48.
Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile DNA 6, 11.
Benson, G. (1999). Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–580.
Biswas, S., and Akey, J.M. (2006). Genomic insights into positive selection. Trends Genets 22, 437–446.
Brundu, G., Lupi, R., Zapelli, I., Fossati, T., Patrignani, G., Camarda, I., Sala, F., and Castiglione, S. (2008). The origin of clonal diversity and structure of Populus alba in Sardinia: evidence from nuclear and plastid microsatellite markers. Ann Bot 102, 997–1006.
Chan, A.P., Crabtree, J., Zhao, Q., Lorenzi, H., Orvis, J., Puiu, D., Melake-Berhan, A., Jones, K.M., Redman, J., Chen, G., et al. (2010). Draft genome sequence of the oilseed species Ricinus communis. Nat Biotechnol 28, 951–956.
Chen, C., Khaleel, S.S., Huang, H., and Wu, C.H. (2014). Software for preprocessing Illumina next-generation sequencing short read sequences. Source Code Biol Med 9, 8.
Christe, C., Stölting, K.N., Bresadola, L., Fussi, B., Heinze, B., Wegmann, D., and Lexer, C. (2016). Selection against recombinant hybrids maintains reproductive isolation in hybridizing Populus species despite F1 fertility and recurrent gene flow. Mol Ecol 25, 2482–2498.
Christe, C., Stölting, K.N., Paris, M., Fraїsse, C., Bierne, N., and Lexer, C. (2017). Adaptive evolution and segregating load contribute to the genomic landscape of divergence in two tree species connected by episodic gene flow. Mol Ecol 26, 59–76.
Dai, X., Hu, Q., Cai, Q., Feng, K., Ye, N., Tuskan, G.A., Milne, R., Chen, Y., Wan, Z., Wang, Z., et al. (2014). The willow genome and divergent evolution from poplar after the common genome duplication. Cell Res 24, 1274–1277.
DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al. (2011). A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 43, 491–498.
Edgar, R.C., and Myers, E.W. (2005). PILER: identification and classification of genomic repeats. Bioinformatics 21, i152–i158.
Emms, D.M., and Kelly, S. (2015). OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16, 157.
EUFORGEN. (1999). Populus nigra network: Report of the fifth meeting..
Fang, C., Zhao, S., Skvortsov, A. (1999). Salicaceae. In Flora of China, Z. Y. Wu, P.H. Raven, D.Y. Hong, ed. vol. 4. (Beijing: Science Press; St. Louis, MO: Missouri Botanical Garden Press).
Ferreira, S., Hjernø, K., Larsen, M., Wingsle, G., Larsen, P., Fey, S., Roepstorff, P., and Salomé Pais, M. (2006). Proteome profiling of Populus euphratica Oliv. upon heat stress. Ann Bot 98, 361–377.
Fussi, B., Lexer, C., and Heinze, B. (2010). Phylogeography of Populus alba (L.) and Populus tremula (L.) in Central Europe: secondary contact and hybridisation during recolonisation from disconnected refugia. Tree Genets Genomes 6, 439–450.
Götz, S., García-Gómez, J.M., Terol, J., Williams, T.D., Nagaraj, S.H., Nueda, M.J., Robles, M., Talón, M., Dopazo, J., and Conesa, A. (2008). High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36, 3420–3435.
Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R., and Wortman, J.R. (2008). Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol 9, R7.
Hamzeh, M., and Dayanandan, S. (2004). Phylogeny of Populus (Salicaceae) based on nucleotide sequences of chloroplast TRNTTRNF region and nuclear rDNA. Am J Bot 91, 1398–1408.
Han, M.V., Thomas, G.W.C., Lugo-Martinez, J., and Hahn, M.W. (2013). Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol 30, 1987–1997.
Hoff, K.J., Lange, S., Lomsadze, A., Borodovsky, M., and Stanke, M. (2016). BRAKER1: Unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32, 767–769.
Verde, I., Abbott, A.G., Scalabrin, S., Jung, S., Shu, S., Marroni, F., Zhebentyayeva, T., Dettori, M.T., Grimwood, J., Cattonaro, F., et al. (2013). The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45, 487–494.
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., and Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44, D457–D462.
Kent, W.J. (2002). BLAT—The BLAST-like alignment tool. Genome Res 12, 656–664.
Kofler, R., Orozco-terWengel, P., De Maio, N., Pandey, R.V., Nolte, V., Futschik, A., Kosiol, C., and Schlötterer, C. (2011). PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals. PLoS ONE 6, e15925.
Lamesch, P., Berardini, T.Z., Li, D., Swarbreck, D., Wilks, C., Sasidharan, R., Muller, R., Dreher, K., Alexander, D.L., Garcia-Hernandez, M., et al. (2012). The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40, D1202–D1210.
Lexer, C., Fay, M.F., Joseph, J.A., Nica, M.S., and Heinze, B. (2005). Barrier to gene flow between two ecologically divergent Populus species, P. alba (white poplar) and P. tremula (European aspen): the role of ecology and life history in gene introgression. Mol Ecol 14, 1045–1057.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760.
Lin, Y.C., Wang, J., Delhomme, N., Schiffthaler, B., Sundström, G., Zuccolo, A., Nystedt, B., Hvidsten, T.R., de la Torre, A., Cossu, R.M., et al. (2018). Functional and evolutionary genomic inferences in Populus through genome and population sequencing of American and European aspen. Proc Natl Acad Sci USA 115, e10970–E10978.
Ma, T., Wang, J., Zhou, G., Yue, Z., Hu, Q., Chen, Y., Liu, B., Qiu, Q., Wang, Z., Zhang, J., et al. (2013). Genomic insights into salt adaptation in a desert poplar. Nat Commun 4, 2797.
Motamayor, J.C., Mockaitis, K., Schmutz, J., Haiminen, N., Livingstone, D., Cornejo, O., Findley, S.D., Zheng, P., Utro, F., Royaert, S., et al. (2013). The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color. Genome Biol 14, r53.
Myburg, A.A., Grattapaglia, D., Tuskan, G.A., Hellsten, U., Hayes, R.D., Grimwood, J., Jenkins, J., Lindquist, E., Tice, H., Bauer, D., et al. (2014). The genome of Eucalyptus grandis. Nature 510, 356–362.
Ouyang, S., Zhu, W., Hamilton, J., Lin, H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L., et al. (2007). The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Res 35, D883–D887.
Parra, G., Bradnam, K., and Korf, I. (2007). CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067.
Roiron, P., Ali, A.A., Guendon, J.L., Carcaillet, C., and Terral, J.F. (2004). Preuve de l'indigénat de Populus alba L. dans le Bassin méditerranéen occidental. Comptes Rendus Biologies 327, 125–132.
Schmutz, J., Cannon, S.B., Schlueter, J., Ma, J., Mitros, T., Nelson, W., Hyten, D.L., Song, Q., Thelen, J.J., Cheng, J., et al. (2010). Genome sequence of the palaeopolyploid soybean. Nature 463, 178–183.
Simão, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V., and Zdobnov, E.M. (2015). BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212.
Singh, R., Ming, R., and Yu, Q. (2016). Comparative analysis of GC content variations in plant genomes. Tropical Plant Biol 9, 136–149.
Smit, A., Hubley, R., and Green, P. (2013–2015). RepeatMasker Open-4.0 ( http://www.repeatmasker.org).
Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313.
Stölting, K.N., Nipper, R., Lindtke, D., Caseys, C., Waeber, S., Castiglione, S., and Lexer, C. (2013). Genomic scan for single nucleotide polymorphisms reveals patterns of divergence and gene flow between ecologically divergent species. Mol Ecol 22, 842–855.
Stölting, K.N., Paris, M., Meier, C., Heinze, B., Castiglione, S., Bartha, D., and Lexer, C. (2015). Genome-wide patterns of differentiation and spatially varying selection between postglacial recolonization lineages of Populus alba (Salicaceae), a widespread forest tree. New Phytol 207, 723–734.
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595.
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111.
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515.
Tuskan, G.A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., Putnam, N., Ralph, S., Rombauts, S., Salamov, A., et al. (2006). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604.
Van de Peer, Y., Fawcett, J.A., Proost, S., Sterck, L., and Vandepoele, K. (2009a). The flowering world: a tale of duplications. Trends Plant Sci 14, 680–688.
Van de Peer, Y., Maere, S., and Meyer, A. (2009b). The evolutionary significance of ancient genome duplications. Nat Rev Genet 10, 725–732.
Van der Auwera, G.A., Carneiro, M.O., Hartl, C., Poplin, R., Del Angel, G., Levy-Moonshine, A., Jordan, T., Shakir, K., Roazen, D., Thibault, J., et al. (2013). From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 11, 11.10.11-11.10.33.
Wang, Y., Tang, H., Debarry, J.D., Tan, X., Li, J., Wang, X., Lee, T., Jin, H., Marler, B., Guo, H., et al. (2012). MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 40, e49.
Wu, G.A., Prochnik, S., Jenkins, J., Salse, J., Hellsten, U., Murat, F., Perrier, X., Ruiz, M., Scalabrin, S., Terol, J., et al. (2014). Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication. Nat Biotechnol 32, 656–662.
Xu, Z., and Wang, H. (2007). LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35, W265–W268.
Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586–1591.
Acknowledgements
We thank Dr. Jian Wang for assisting with the population sampling from Irtysh River basin. This work was supported by the National Science Fund for Distinguished Young Scholars (31425006) and Chinese Academy of Forestry (CAFYBB2018ZX001).
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Liu, YJ., Wang, XR. & Zeng, QY. De novo assembly of white poplar genome and genetic diversity of white poplar population in Irtysh River basin in China. Sci. China Life Sci. 62, 609–618 (2019). https://doi.org/10.1007/s11427-018-9455-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11427-018-9455-2