Reshuffling of the ancestral core-eudicot genome shaped chromatin topology and epigenetic modification in Panax

Wang, Zhen-Hui; Wang, Xin-Feng; Lu, Tianyuan; Li, Ming-Rui; Jiang, Peng; Zhao, Jing; Liu, Si-Tong; Fu, Xue-Qi; Wendel, Jonathan F.; Van de Peer, Yves; Liu, Bao; Li, Lin-Feng

doi:10.1038/s41467-022-29561-5

Download PDF

Article
Open access
Published: 07 April 2022

Reshuffling of the ancestral core-eudicot genome shaped chromatin topology and epigenetic modification in Panax

Nature Communications volume 13, Article number: 1902 (2022) Cite this article

6240 Accesses
31 Citations
5 Altmetric
Metrics details

Subjects

Abstract

All extant core-eudicot plants share a common ancestral genome that has experienced cyclic polyploidizations and (re)diploidizations. Reshuffling of the ancestral core-eudicot genome generates abundant genomic diversity, but the role of this diversity in shaping the hierarchical genome architecture, such as chromatin topology and gene expression, remains poorly understood. Here, we assemble chromosome-level genomes of one diploid and three tetraploid Panax species and conduct in-depth comparative genomic and epigenomic analyses. We show that chromosomal interactions within each duplicated ancestral chromosome largely maintain in extant Panax species, albeit experiencing ca. 100–150 million years of evolution from a shared ancestor. Biased genetic fractionation and epigenetic regulation divergence during polyploidization/(re)diploidization processes generate remarkable biochemical diversity of secondary metabolites in the Panax genus. Our study provides a paleo-polyploidization perspective of how reshuffling of the ancestral core-eudicot genome leads to a highly dynamic genome and to the metabolic diversification of extant eudicot plants.

The variation and evolution of complete human centromeres

Article Open access 03 April 2024

Glennis A. Logsdon, Allison N. Rozanski, … Evan E. Eichler

The complex polyploid genome architecture of sugarcane

Article Open access 27 March 2024

A. L. Healey, O. Garsmeur, … A. D’Hont

A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range

Article Open access 11 April 2024

Qichao Lian, Bruno Huettel, … Raphael Mercier

Introduction

Polyploidy or whole genome duplication (WGD) is a ubiquitous phenomenon in angiosperms^1,2 and all extant flowering plants likely evolved from a polyploid ancestor³. Recurring polyploidization and (re)diploidization events in flowering plants led to highly dynamic plant genomes^4,5,6,7. However, it remains poorly characterized how these often-cyclical genome doubling/diploidization processes have contributed to angiosperm evolution and diversification.

It is evidenced that all extant core-eudicots share an ancient WGD, usually referred to as the γ-triplication event^3,8. The ancestral core-eudicot genome was restored to a diploid karyotype, and most of the extant core-eudicot species experienced additional paleo-polyploidization events during their independent diversification processes^1,2. Frequent genome doubling followed by independent diploidization and chromosomal rearrangement processes have provided extant core-eudicots with structural genomic and phenotypic diversity^5,9. An example of such genome structural evolution is cotton (Gossypium) where, following polyploidization, both large genomic fragment reorganization (i.e., chromosome fusion and fission) and individual gene repertoire evolution (i.e., biased genetic fractionation) have generated phenotypic novelty and species diversification^10,11,12. However, relatively little is understood about how reshuffling of the duplicated ancestral core-eudicot genome affected genome plasticity of extant core-eudicot plants and the underlying mechanisms responsible for the genetic and epigenetic partitioning of a duplicated genome. The genus Panax (Araliaceae) includes four diploids, three tetraploids and one species complex¹³. It has been shown that this genus shares the core-eudicot γ triplication and has undergone two additional Panax-specific WGDs (Pg-β and Pg-α)^14,15. In particular, as a medically important genus, all ginseng species contain a large number of secondary metabolites^16,17. These attributes make the ginseng genus an ideal system to elucidate how the reshuffling of the duplicated (or better triplicated) ancestral core-eudicot genome has affected genome structure, epigenetic regulation and secondary metabolites diversity of extant plant species after repeated polyploidization and (re)diploidization events.

In this study, we assemble chromosomal-level reference genomes of one diploid (experienced γ and Pg-β) and three closely related tetraploid (experienced γ, Pg-β and an additional, more recent, Pg-α duplication) Panax species. We infer the evolutionary history of the seven ancestral core-eudicot chromosomes (Eu1-Eu7) in the four Panax species. Based on this paleo-polyploidization framework of the genus Panax, our genome-wide comparisons of the three-dimensional (3D) genome architecture and cytosine methylation and gene expression dynamics (mRNA, lncRNA and small RNA) further reveal that reorganization of the ancestral genome structure is associated with the reconfirmation of chromatin topology and epigenetic regulation divergence in extant Panax genomes. Our study thus provides a genome-wide landscape view of how polyploidization and subsequent (re)diploidization contribute to genome structure plasticity and metabolomic diversity of extant Panax species.

Results

Genome assembly, gene annotation and quality control

Our chromosome analyses confirmed the diploid (2n = 2x = 24) and tetraploid (2n = 4x = 48) karyotypes of the four Panax species (Supplementary Fig. 1). Genome sizes of the four species were estimated by genome survey (Table 1 and Supplementary Fig. 2) and flow cytometry (Supplementary Fig. 3), respectively. To obtain reliable inference of the karyotype evolution, we employed three different strategies to de novo assemble the four Panax species (Supplementary Note 1–2). The resulting assemblies were 1.96 Gb and 2.02 Gb for P. stipuleanatus and P. japonicus, with a contig N50 of 2.88 Mb and 1.58 Mb for the two species, respectively (Table 1). Total lengths of the assembled genomes were relatively larger for P. ginseng (3.36 Gb) and P. quinquefolius (3.57 Gb), with a contig N50 of 19.75 Mb and 0.87 Mb for the two species, respectively. Genome annotation of the four Panax species identified 41,224–74,307 protein-coding genes (Table 1).

Table 1 Statistics of genome features of the four Panax species.

Full size table

Assessments of the genome quality revealed high gene completeness (BUSCO = 93.00–95.14%) and genome contiguity (LAI = 7.13–16.24) for all four species (Table 1). In particular, genome contiguity measures of P. stipuleanatus (LAI = 12.85) and P. japonicus (LAI = 16.24) were comparable to the model species Arabidopsis thaliana (LAI = 15.62) and Vitis vinifera (LAI = 14.58)^18,19, although the two Panax species had much larger genome sizes. Genome collinearity analyses showed that, while the four species varied dramatically in genome size, they still maintained high collinearity across the 12 orthologous chromosomes (Supplementary Fig. 4). Based on the genome collinearity and sequence homoeology to diploid relatives, we further separated the 24 chromosomes of the three tetraploid species as two subgenomes (Supplementary Table 1). Phylogenetic inference based on orthologous genes revealed that subgenome B of the three tetraploid species clustered with the diploid P. notoginseng while subgenomes A formed a monophyletic clade (Supplementary Fig. 5). These genomic features together corroborated the quality of the genome assemblies of the four Panax species.

Reconstruction of the ancestral karyotype in the modern Panax genome

The polyploidization history of the four Panax species was estimated by calculating synonymous substitution rates (Ks) between homologous gene pairs. Our results confirmed that the genus Panax experienced the core-eudicot-shared γ triplication and two additional lineage-specific duplications (Pg-α and Pg-β)^14,15,20 (Supplementary Fig. 6). Likewise, we also identified the other previously inferred paleo-polyploidizations (i.e., Dc-β) in Daucus carota (carrot) and Lactuca sativa (lettuce)²¹. Karyotype evolution of the ancestral core-eudicot genome in extant Panax species was inferred by analyzing genome collinearity between grape (putative post-γ core-eudicot genome) and the above selected six species (extant core-eudicot genomes). Our genome-wide comparisons identified more collinear orthologous genes (referred to as ancestral genes) in the four Panax species (16,010–31,729) than those of carrot (13,985) and lettuce (14,087) genomes (Supplementary Table 2). In particular, these ancestral genes tend to be retained in Panax genomes as large contiguous genomic blocks, i.e., on average about 17–20 (95% confidence interval (CI)) ancestral genes localized in each of these collinear genomic blocks (Supplementary Figs. 7–16 and Supplementary Table 2). In contrast, a smaller number of collinear ancestral genes have been retained in the carrot (95% CI: 12–13) and lettuce (95% CI: 11–12) genomes (t-test, p value < 0.01). Together, our results indicate that the ancestral core-eudicot genome has been well-preserved in modern Panax genomes, even after several rounds of polyploidization-diploidization.

Based on genome collinearity analyses, we identified 26 post-γ and four post-Pg-β chromosomal fusion events in the Panax genomes, 10 and 20 of the post-γ events were also characterized in lettuce and carrot genomes, respectively (Supplementary Figs. 8–16). Given the shared ancestral (γ) and lineage-specific (i.e., Pg-α and Pg-β) polyploidization/(re)diploidization histories of the six extant core-eudicot species^20,21,22, we propose an evolutionary framework wherein the 21 (after hexaploidy/ triplication) post-γ ancestral core-eudicot chromosomes (A1-A7, B1-B7 and C1-C7) were rearranged into eight pre-Pg-β ancestral chromosomes (Ar1-Ar8) through the identified 26 post-γ chromosomal fusions (Fig. 1 and Supplementary Figs. 8–16). Thereafter, the post-Pg-β genome (Ar1a-Ar8a and Ar1b-Ar8b) was structured into the ancestral Panax genome (Pa1-Pa12) via four post-Pg-β chromosomal fusion events (Fig. 1). Among the extant Panax genomes, we also identified three chromosomal rearrangements, including one fragmental inversion on P. stipuleanatus chromosome 4, one reciprocal translocation between P. stipuleanatus chromosomes 8 and 9, and one inversion on P. notoginseng chromosome 6 (Fig. 1 and Supplementary Figs. 17 and 18).

**Fig. 1: Evolutionary rearrangements of the ancestral core-eudicot genome to generate the genome of extant ginseng species.**

The above inferences have revealed the evolutionary transformation of the seven ancestral core-eudicot chromosomes (Eu1–Eu7) into 42 homoeologous genomic regions (referred as to duplicated ancestral core-eudicot chromosome) in extant Panax genomes after the γ triplication (3×) and Pg-β duplication (2×) (Fig. 1). We then allocated all identified collinear genes to the 42 ancestral core-eudicot chromosomes (Fig. 2a and Supplementary Data 1). In the P. stipuleanatus genome, for example, biased genetic fractionation of the gene duplicates was a general phenomenon in all the 42 ancestral core-eudicot chromosomes, with only 993 (6.7% of total) of the ancestral genes retaining more than half (>3) of the duplicate pairs (Supplementary Data 1). In addition, our comparisons also showed that genes duplicated by the more recent Pg-β duplication (6484 gene pairs) were less fractionated compared to those derived from the more ancient γ triplication (4143 gene pairs). Further genome-wide comparisons of the fractionation pattern confirmed that gene duplicates derived from the same ancestral gene showed different retention rates along the ancestral core-eudicot chromosomes (Supplementary Fig. 19). For example, even though the six homologous genomic regions (marked with purple color in Fig. 1) in the extant P. stipuleanatus genome (chromosomes 2, 4, 6, 7, 8, and 11) were duplicated from the same ancestral core-eudicot chromosome Eu2, the numbers of retained ancestral genes differed dramatically along the γ-derived triplicates (i.e., among the Ps1-1/Ps1-2/Ps1-3 or Ps2-1/Ps2-2/Ps2-3) (Supplementary Fig. 19 and Supplementary Data 2). In contrast, Pg-β-derived gene duplicates (i.e., between Ps1-1 and Ps1-2, Ps1-2 and Ps2-2 or Ps1-3 and Ps2-3) showed similar gene fractionation rates along the ancestral core-eudicot chromosomes.

**Fig. 2: Global alignment of ginseng genomic regions to the grape genome and chromatin topology of the paleo-polyploidization-derived homologous regions in ginseng.**

We next focused on how these ancestral genes duplicated in the different WGDs evolved in the diversification process of extant Panax species. Our pan-genomic analyses assigned these ancestral genes to 29,499 orthogroups, only 1836 (6.2% of the total) of which were specific to each of the seven extant Panax genomes (one diploid and six tetraploid genomes) (Supplementary Table 3). Further collinearity comparisons revealed that 6874 (32.6–49.4%) of these ancestral genes have been retained in the seven extant Panax genomes as collinear orthologous genes (Supplementary Table 4 and Supplementary Data 3). Among the three tetraploid species, we identified 13,679 (52.1% of the total) and 14,550 (57.6%) collinear orthologous genes in the subgenomes A and B, respectively (Supplementary Table 5 and Supplementary Data 4). It is notable that while the three tetraploid species showed high genome collinearity (see Supplementary Fig. 4), P. japonicus possesses a substantially smaller genome (2.02 Gb) compared to P. ginseng (3.36 Gb) and P. quinquefolius (3.57 Gb) (see Table 1). This phenomenon can be explained, at least partially, by the different evolutionary history of long terminal repeats (LTRs). For example, compared to P. ginseng, the increased genome size of P. quinquefolius was likely due to the post-speciation (<1 MYA) burst of unknown LTRs (Supplementary Figs. 12–13). In contrast, while P. japonicus experienced the shared pre-speciation LTR burst, the majority of the Copia, Gypsy and other unknown retrotransposons have expanded more recently. Distinct expansion patterns of the three retrotransposon families were also observed for the two diploid species, P. stipuleanatus and P. notoginseng (Supplementary Figs. 20–21). Together, these features suggest that biased fractionation, together with broad-scale chromosomal rearrangements have resulted in extensive diversification of genome structure in extant Panax species.

Association between core-eudicot genome repatterning and chromatin topology

The evolutionary rearrangement of chromosomes and genome content can profoundly affect chromatin topologies²³. Evidence from cotton and other plant species confirmed that polyploidization reshapes the chromatin topology of newly formed polyploid genomes^24,25. However, it is largely unknown how the reshuffling of the ancestral core-eudicot genome affected the remodeling of chromatin topology in extant eudicot genomes. Based on the above described paleo-genomic framework, we investigated whether the chromatin topology observed in the extant P. stipuleanatus genome was associated with the evolutionary history of the ancestral core-eudicot genome. We assume that, if all the duplicated ancestral core-eudicot chromosomes were fully preserved, the 42 homologous genomic regions in extant Panax genomes would have maintained similar chromatin topologies. However, our inference of the ancestral core-eudicot karyotype evolution revealed considerable DNA-based genomic structural variation in the extant Panax species. Therefore, we wondered whether the chromatin topologies of duplicated ancestral chromosomes were randomly reestablished in the extant P. stipuleanatus genome. To this end, we studied the 3D genome architecture of P. stipuleanatus at the chromosome level. Genome-wide comparisons of chromatin interactions at 100 Kb resolution identified 12 interaction blocks corresponding to the Panax chromosomes (Supplementary Fig. 22a). This observed higher level of intra-chromosomal interactions compared to inter-chromosomal interactions (t-test, p < 0.01) indicates remodeling of chromatin topology in the Panax genome during the polyploidization/(re)diploidization processes.

To further examine this phenomenon, we compared the chromatin topologies between the homologous genomic regions derived from γ and Pg-β based on the same interaction matrix. Our analyses revealed that the post-Pg-β chromosome pairs (Ar1a-Ar8a and Ar1b-Ar8b in Fig. 1) showed no significantly higher values of the chromosomal interaction (estimated by the log2-normalized frequencies of the valid read pairs among the different genomic regions) compared to the other duplicated chromosomes (t-test, p = 0.45). For example, the overall chromatin interaction between the chromosomes Chr2 (Ar3a) and Chr6 (Ar3b) (from −7.083 to −8.808) was similar to the other inter-chromosomal comparisons (i.e., Chr2 vs. Chr3 (Ar4a) and Chr6 vs. Chr3) (from −7.236 to −9.051) (Supplementary Fig. 22b). Similarly, most of the eight post-Pg-β chromosome pairs also differ in the activated (A)/inactivated (B) chromatin compartments (Supplementary Figs. 23–25 and Supplementary Data 5). This trend was more evident in the distribution pattern of sub-megabase topologically associating domain-like (TAD-like) structures, where the total number and length of TAD-like structures varied dramatically between the post-Pg-β chromosome pairs (Fig. 2b, Supplementary Figs. 26–28 and Supplementary Data 6). Nevertheless, the epigenetic modification patterns of TAD-like structures were broadly consistent with previous observations^26,27, with the TAD-like regions showing hypermethylation at cytosine sites and lower levels of gene expression compared to the border regions (Supplementary Fig. 29). These features indicate that chromatin topology remodeling of the duplicated ancestral core-eudicot chromosomes has further increased the 3-D genome diversity of the extant Panax species.

It is notable that the degree of intra-chromosomal interaction was broadly consistent with the reorganization patterns inferred from comparison to the ancestral core-eudicot chromosomes (Eu1–Eu7) (Fig. 3a, b and Supplementary Figs. 23–25). A general pattern was that chromatin interactions between the homologous genomic regions derived from the same ancestral core-eudicot chromosome were stronger than those between genomic regions from different ancestral chromosomes. For example, both the extant P. stipuleanatus chromosomes 2 and 6 are homologous to the post-γ ancestral chromosomes B2 (purple frame) and C6b (blue frame) (Fig. 3b). Levels of the chromatin interaction within the two segments were significantly stronger compared to those between the two segments and the other genomic regions (t-test, p < 0.01). In line with this, we also observed that the A/B compartment switching genomic regions broadly overlapped with ancestral chromosome fusion/fission sites (Fig. 3a, b and Supplementary Figs. 23–25). Nevertheless, we did not find a similar correlation between the TAD-like structure and the ancestral core-eudicot karyotype (Fig. 3c), possibly due to localized regional DNA sequence divergence. Together, these results suggest that while the chromatin topologies (compartment A/B and TAD-like) of duplicated ancestral core-eudicot chromosomes were reestablished in extant ginseng genome, their intra-chromosomal interactions have been largely maintained during the polyploidization/(re)diploidization processes.

**Fig. 3: Three-dimensional (3-D) genome architecture, cytosine methylation and gene expression patterns in extant *Panax stipuleanatus* genome.**

Epigenetic regulation divergence of the duplicated ancestral genes

Reorganization of the ancestral core-eudicot chromosomes resulted in a nested pattern of duplicated genes and genomic regions and an altered chromatin topology. We then examined whether this repatterning of the ancestral core-eudicot genome has also promoted epigenetic regulation divergence of gene duplicates in extant Panax genomes. By comparing the patterns of gene expression and cytosine methylation, we found that, after the large-scale reorganization of duplicated ancestral chromosomes, a large proportion of the retained gene duplicates showed tissue-biased expression (39.7% of the total) (Supplementary Data 7) and differential cytosine methylation (21.1%) (Supplementary Data 1). Taking the Pg-β duplicate segment C6b as an example, the above comparisons revealed remodeling of the chromatin topologies of the two homologous regions in the extant Panax genome (see Fig. 3a–c). Here, our biased fractionation analyses further confirmed that only 69 (24.7% of the total) Pg-β-derived genes retained both duplicate pairs (Supplementary Data 7). In contrast, 98 (35.1%) and 112 (40.1%) ancestral genes evolved back to singleton status (i.e., lost their duplicate again) on each of the two C6b duplicated segments. Both the singleton genes as well as the retained duplicates differed in patterns of gene expression and cytosine methylation (Fig. 3d, e), which indicates that epigenetic regulation divergence of the duplicated ancestral genes has promoted the epigenetic regulation divergence of extant Panax species.

We next addressed how these ancestral genes duplicated by distinct WGDs interact in the highly plastic Panax genome. Our analyses of the gene co-expression network revealed that the retained genes derived from the seven ancestral core-eudicot chromosomes exhibited similar degrees of functional connection in extant Panax genomes (Supplementary Fig. 30 and Supplementary Table 6). In the leaf tissue, for example, while the total numbers of genes identified in the leaf-related regulatory module varied among the seven ancestral core-eudicot chromosomes (from 217 to 388), functionally important genes showed nearly equal contributions to the leaf development processes (t-test, all p values > 0.01), i.e., photosynthesis, kinase and synthase (Fig. 4a, Supplementary Fig. 30 and Supplementary Table 7). A similar phenomenon was also observed in the overall epigenetic regulation dynamics, where the duplicated ancestral core-eudicot chromosomes did not show dramatic changes in patterns of gene expression and cytosine methylation in extant Panax genomes (Fig. 4b, c and Supplementary Fig. 31).

**Fig. 4: Evolutionary dynamics of the ancestral core-eudicot genes in extant *Panax stipuleanatus* genome.**

The above observations indicate that while the genomic regions duplicated from the same ancestral core-eudicot chromosome (Eu1–Eu7) showed dramatic biased genetic fractionation and divergence in epigenetic regulation, genes retained on each of the duplicated genomic regions had similar functional contributions to the tissue development of extant Panax species. We then examined whether the genes retained within these duplicated ancestral core-eudicot chromosomes were involved in similar molecular functions after the repeated polyploidizations/(re)diploidizations. As expected, biased fractionation has resulted in a complementary retention pattern of the duplicated ancestral genes, i.e., only 6.7% of ancestral genes have retained all copies of their duplicates (see in Supplementary Data 1). In particular, the retained genes on each of the duplicated ancestral core-eudicot chromosome are involved in similar KEGG pathways, i.e., 62.6–78.6% of the KEGG pathways shared in more than half (>3) of the duplicated ancestral chromosomes (Supplementary Fig. 32). Likewise, the majority of the balanced-expression genes also shared similar molecular functions among the duplicated ancestral core-eudicot chromosomes (Supplementary Fig. 33). In contrast, tissue-dominant or -suppressed genes showed functional divergence among the duplicated ancestral core-eudicot chromosomes. This phenomenon is associated with gene functions, where balanced-expression genes were mainly enriched in basic cellular activities (i.e., TCA cycle, mRNA surveillance and ubiquitin mediated proteolysis), but tissue-dominant or -suppressed genes were functionally related to environmental adaptations (i.e., photosynthesis and nitrogen metabolism) (Supplementary Fig. 34). These observations suggest that the complementary retention of gene duplicates may have – at least partly – dealt with functional redundancy in extant Panax species.

It is notable that the regulatory non-coding RNAs (lncRNAs and small RNAs) exhibited relatively higher expression divergence than protein-coding genes among the duplicated ancestral core-eudicot chromosomes (Fig. 4d and Supplementary Fig. 35). Non-coding RNAs are RNA molecules transcribed from the genome but not translated into proteins²⁸. Both the lncRNAs and small RNAs play crucial regulatory roles in a variety of biological processes by modulating gene expression at the transcriptional and post-transcriptional levels²⁹. Here, the observed high expression divergence of non-coding RNAs suggest that DNA-based structural reorganization may have impacted birth-death (expressed-silent) of these RNA molecules among the duplicated ancestral core-eudicot chromosomes. Together, our findings suggest that, while the evolutionary dynamics of individual ancestral genes varied dramatically at both genetic and epigenetic levels, overall patterns of the molecular function and epigenetic regulation remained relatively stable among the duplicated eudicot ancestral core-eudicot chromosomes in extant Panax species.

Evolutionary contributions of the duplicated ancestral core-eudicot genes to metabolomic diversity

The evolutionary role of paleo-polyploidization in genome evolution and phenotypic diversification of angiosperms has long been of interest^1,30,31,32. Our comparative analyses revealed that the repeated polyploidization/(re)diploidization processes have resulted in high dynamics of genome structure and epigenetic regulation of extant Panax species. Here, we further investigated whether this reshuffling of the ancestral core-eudicot genome has also promoted phenotypic diversification. Our results showed that, while the post-polyploidization genome contraction is observed at both the chromosomal and individual gene levels, gene families related to secondary metabolites were significantly expanded in the Panax genus and other selected eudicot species, especially those involved in the phenylpropanoid, sesquiterpenoid and triterpenoid biosynthesis pathways (Supplementary Figs. 36–38). Plant secondary metabolites are low molecular weight organic compounds, which not only function as signal molecules to regulate plant growth and development, but also mediate interactions with various biotic and abiotic stresses³³. In eudicots, the majority of secondary metabolites, such as terpenoids, steroids and cyanogenic glycoside, are catalyzed by the cytochrome P450 (CYP) superfamily³³. We explored the evolutionary roles of polyploidization-derived CYPs in the diversification of plant secondary metabolites.

As the largest family of enzymes in plant metabolism, all CYPs in angiosperms were derived from 11 ancestral genes with variable patterns of post-polyploidy retention and additional duplication³⁴. Here, we identified candidate genes of nine major Arabidopsis CYP clades in Panax and other representative eudicot species (Supplementary Fig. 39). As expected, these extant core-eudicot species possessed distinct copy numbers of the nine CYP clades (Supplementary Fig. 39). In the ginseng genus, for example, highly variable copy numbers of the CYPs among the WGD-derived genomic regions (i.e., post-Pg-β chr2 and chr6) are possibly due to the independent retention of duplicated CYPs during the polyploidization/(re)diploidization processes (Supplementary Fig. 40 and Supplementary Data 8). At the epigenetic regulation level, compared to the total number of genes that showed hypermethylation (mean: 57.1% (95% CI: 49.6–50.1%)) and balanced expression (60.3%) (Supplementary Data 7), the CYPs were preferentially hypomethylated (mean: 24.0% (95% CI: 20.1–27.9%)) and exhibited tissue-biased expression (79.3%) (Supplementary Data 9), which suggests that biased genetic fractionation and divergent epigenetic regulation of the duplicated CYPs may have promoted the diversification of secondary metabolites in extant Panax species.

We next focused specifically on the ginsenosides, which are the major triterpene saponin and found almost exclusively in Panax species^16,35. Triterpene saponins are one of the largest and most structurally diverse plant-specialized metabolites, which play important roles in, for example, plant antifungal and antibacterial activities^36,37,38. In eudicots, the subfamily CYP716 (belonging to CYP85 clade) is a major contributor to the diversification of triterpenoid biosynthesis³⁹. Through analyzing the paleo-polyploidization history of the CYP716 subfamily, our results showed that the ginseng genus contained five (A, D, E, S, and Y) CYP716 subgroups, with the A and Y subgroups preserving both ancestral duplicates (Fig. 5a and Supplementary Fig. 41). The five CYP716 subgroups not only neo-functionalized for oxidation and hydroxylation at different carbon positions (Fig. 5a), but also showed expression-level sub-functionalization in leaf and root tissues (Fig. 5b). More importantly, neo-functionalization and expression-level sub-functionalization of the lineage-specific protopanaxadiol synthase (Y-subgroup) and protopanaxatriol synthase (S-subgroup) genes, together with the eudicot-common oleanolic acid synthase gene (A-subgroup) and other key genes (i.e., UGTs) of the ginsenoside biosynthesis, have facilitated the evolution of the immense diversity in structure and function of ginsenosides in Panax genus. Our metabolic analyses further confirmed that both dammarane-type (synthesized by S and Y subgroups) and oleanane-type (synthesized by A subgroup) ginsenosides showed different concentrations in leaf and root tissues (Fig. 5c and Supplementary Figs. 42 and 43).

**Fig. 5: Functional diversification of the CYP716 subfamily and triterpenoid biosynthesis genes.**

Discussion

Polyploidy is a universal phenomenon in the evolutionary history of all angiosperm plants^1,30. Studies from across the entire phylogenetic spectrum of angiosperms have clearly illustrated the critical roles of polyploidization in genome evolution and species diversification^1,2,5. Genome collinearity analyses have also confirmed that reshuffling of ancestral genome blocks following polyploidy resulted in great diversity of genome architecture in extant plant species^5,9,10. In this study, focusing on the genus Panax, our comparative analyses revealed that while the ancestral core-eudicot genome has experienced three rounds of WGDs (γ, Pg-β and Pg-α), all these duplicated ancestral core-eudicot chromosomes (Eu1-Eu7) are still preserved—at least to some extent—in extant Panax species. In particular, Panax species possess a relatively more conserved ancestral core-eudicot genome relative to some other extant eudicots, such as carrot and lettuce, although each of these extant species experienced additional paleo-polyploidization events after the γ hexaploidy event.

Using this paleo-genomic framework, we further examined whether this reshuffling of the ancestral core-eudicot genome has caused remodeling of chromatin topology in extant Panax genomes. Compared to neo-polyploid species, i.e., cotton and wheat, recent polyploidization has reshaped the chromatin topologies between the subgenomes at a small-scale level^24,25. Our comparisons showed that chromatin topologies (A/B compartment and TAD-like structure) of the duplicated ancestral core-eudicot chromosomes (Eu1–Eu7) were extensively remodeled in extant Panax species. Of special significance, we show that while all seven ancestral core-eudicot chromosomes underwent substantial changes in DNA-based genome structure, chromatin interactions within the same ancestral chromosome have been largely maintained in extant ginseng genomes after the cyclic polyploidization-diploidization processes. These findings provide a paleo-polyploidization perspective of how reshuffling of the ancestral core-eudicot genome has defined the reestablishment of chromatin topology in derived extant eudicot species.

Genome contraction after polyploidization is a common phenomenon at both the chromosomal and individual gene level. Evidence from diverse neo-polyploid species and experimental aneuploid lines have illustrated that gene dosage balance and divergence in molecular function or gene expression have determined the post-polyploidization retention of the gene duplicates⁴⁰. Here, we show that biased genetic fractionation, together with divergent epigenetic regulation, likely played important roles in reshaping the duplicated ancestral core-eudicot chromosomes during polyploidization and (re)diploidizations. In particular, although the reshuffling of the ancestral core-eudicot genome has resulted in genome structure diversity and epigenetic regulation divergence, genomic regions derived from the seven ancestral chromosomes likely have similar functional contributions to tissue development of extant Panax species.

It has been proposed that the highly dynamic nature of the genome architecture has acted as a key factor in the diversification of polyploid species^5,41,42. We here showcase plant secondary metabolites as examples to illustrate the important roles of paleo-polyploidization/(re)diploidization processes in promoting metabolic diversity of extant eudicot plants. Our results revealed that preferential retention and regulation divergence of CYP genes have promoted the qualitative diversification of secondary metabolites in extant Panax species. Of significance, neo- and sub-functionalization within the CYP716 subfamily has likely resulted in immense diversity in structure and function of ginsenosides in the Panax genus. The plant secondary metabolites are important determinants in regulating plant development as well as the responding to various biotic and abiotic stresses³³. Our findings suggest that different evolutionary events have played important roles in the biochemical diversity of CYPs and consequently the ecological adaptation of Panax species.

Methods

Plant materials, DNA and RNA extraction, and karyotype characterization

To address the evolutionary reorganization of the ancestral core-eudicot genome in extant Panax species, we collected samples of one diploid (P. stipuleanatus, 2n = 2x = 24) and three tetraploid (P. ginseng, P. japonicus and P. quinquefolius, 2n = 4x = 24) species (Supplementary Note 1). Genomic DNAs was extracted from fresh mature leaves using the TianGen plant genomic DNA kit (Tianjin, China). Total RNA was extracted from leaf, stem and root tissues using the TianGen plant RNA kit. Genomic DNA and RNA for genome assembly and gene annotation were obtained from the same individual of each species. Total RNAs (mRNA, lncRNA and small RNA) for gene expression comparison were isolated from leaf, stem and root tissue of three individuals for each species using the TianGen plant RNA kit (Tianjin, China). The haploid genome size of the four species was estimated by flow cytometry with three technical replicates. Karyotypes of the four species were visualized using OLYMPUS BX53 (Olympus Corporation, Japan).

Genome sequencing, assembly and gene annotation

Three de novo assembly strategies were employed to reconstruct the reference genomes of the four Panax species (Supplementary Note 1). Briefly, short insert libraries (350 bp) of P. stipuleanatus and P. japonicus were constructed by Illumina Novaseq (Tianjin, China) and sequenced using the Illumina Novaseq platform (Illumina, USA). Then, ~20 Kb SMRTbell libraries were generated for each of the two Panax species and sequenced on the PacBio RSII platform (PacBio, USA). In addition, DNA fragments longer than 50 kb were used to construct a 10× Gemcode library with a Chromium instrument (10× Genomics) and sequenced using the Illumina Novaseq platform (Illumina, USA). Finally, digested genomic DNA was used to construct Hi-C library for the two species and sequenced using the Illumina Novaseq platform (Illumina, USA). In contrast, two alternative strategies using the Nanopore platform (Nanopore, UK) was employed to assemble the reference genomes of P. ginseng and P. quinquefolius, respectively. De novo assembly and genome quality control were detailed in Supplementary Information (Supplementary Note 1 and 2; Supplementary Fig. 44). Gene models were predicted based on de novo prediction, homologous identification and Unigene clusters. Repeat elements were characterized using LTR-FINDER⁴³ and RepeatScout⁴⁴ and annotated using RepeatMasker⁴⁵ with the parameter “-nolow -no_is -norna -engine wublast”.

Ancestral karyotype inference and ancestral gene characterization

The ancestral core-eudicot karyotype was constructed by identification of collinear genomic blocks among the Vitis vinifera (grape)¹⁸, Daucus carota (carrot)⁴⁶, Lactuca sativa (lettuce)⁴⁷ and the four Panax species using ColinearScan⁴⁸ using a pipeline developed in previous studies^22,49. In brief, the grape is the most conserved genome among all extant core-eudicot plants and did not experience additional polyploidization events after its split from the common core-eudicot ancestor¹⁸. Carrot and lettuce are the most closely related species with assembled reference genomes to Panax. Protein sequences and genome annotations of the three selected species were obtained from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html). Paleo-polyploidization histories of the selected species have been well-documented; all of which have experienced the core-eudicot shared γ triplication and followed by additional species-specific duplications/triplications (i.e., Dc-α and Dc-β)^1,12,14,20. In addition, we inferred the karyotype of the ancestral Panax genome by identification of collinear genomic blocks among the four extant Panax species. Orthologous genes among the four Panax species were determined using the BLAST⁵⁰ with an e value cutoff of 10⁻⁵. Then, these homologous genes were used to identify collinear genes. The maximal gap length between neighboring genomic blocks was set to 50 genes^12,51,52,53. Large gene families with >30 members were excluded from the genome collinearity analysis. Putative ancestral core-eudicot genes were defined as those genes shared among at least two subgenomes of the four Panax species. Based on the polyploidization history of the four Panax species, we defined the top two homologous genes (duplicated by Pg-β) as in-paralogous and the other four homologous genes (duplicated by γ) as out-paralogous, respectively. The order of the ancestral core-eudicot genes on genomic region of Panax chromosomes was determined using ColinearScan⁴⁸. Based on the collinear genomic regions, overall genome collinearity was then visualized using WGDI (https://github.com/SunPengChuan/wgdi). In addition, we also calculated the synonymous mutation rate (Ks) between colinear homologous genes using the YN00 program in the PAML (v4.9 h) package with the Nei-Gojobori approach⁵⁴. The median Ks value of each collinear genomic region was applied to infer the polyploidization history. In brief, we used the kernel smoothing density function to generate K_S distribution curve. Then, Gaussian multipeak fitting of the curve was further generated by using the gaussian approximation function in WGDI. Orthologous gene families of the four Panax species were identified using OrthoFinder⁵⁵. Then, gene family amplification and contraction analysis were performed using CAFE software with default parameters⁵⁶. Full-length LTR retrotransposons in the Panax species were characterized using LTR-harvest⁵⁷ and LTR-finder⁴⁴. Insertion time of each LTR retrotransposon family was estimated using the formula: age = K/2r, where K is the Kimura 2-parameter distance and r is the mutation rate of 1.3 × 10⁻⁸ for the these Panax species⁵⁸.

Chromatin topology and DNA methylation

A total of 1,408,300,221,000 bp Hi-C data of the P. stipuleanatus (718.5× genome coverage) were obtained from the Illumina Novaseq platform (Illumina, USA). Clean short Illumina reads were mapped to the reference genome using bowtie2 (version 2.2.3)⁵⁹. Only uniquely valid read pairs were retained for the subsequent interaction analyses. The Hi-C interaction matrix was constructed according to the pipeline developed by previous study^27,60. Briefly, we utilized the ICE method to remove potential Hi-C data bias caused by restriction fragment length, GC content and mapping of reads. Interaction matrices at various resolutions (genome partitioned into bins of different sizes) were constructed using HiC-Pro⁶¹ and visualized by HiC-Explorer⁶². Genome-wide Hi-C resolutions were defined as 100 Kb for compartment A/B and 20 Kb for TAD-like structure, respectively, based on interaction maps⁶³. Identification of activated and inactivated compartment was performed at 100 kb resolution using the matrix2compartment module in Cworld software⁶⁴. In brief, the expected score within the matrix was calculated using lowess smoothed average over the intra-interactions. The eigenvalues of the principal component were plotted to ascribe the bins to two types of compartments. Positive and negative eigenvalues denoted the compartment A and B, respectively. Enrichments of respective genomic compositions and epigenetic markers in each 100 kb bin were summarized and compared in terms of their compartment origins (compartment A/B), and visualized by ggplot2⁶⁵. Topologically associated domain-like structures were identified by the insulation score method at a 20 kb resolution⁶⁶ with default parameters. Exact TAD-like structure boundaries and interior regions were specifically framed out using Hidden Markov Model⁶⁷. Then, we converted the real TAD-like structure into a visualized format. Cytosine methylation (5mC) was calculated using Nanopolish⁶⁸. Expression patterns of protein-coding genes were estimated using DESeq2⁶⁹.

Expression patterns of protein-coding genes and small RNAs

Clean non-coding RNA reads were mapped onto the reference genomes with HISAT2⁷⁰. In parallel, the short non-coding RNA reads were also assembled by StringTie⁷¹. The program GffCompare (http://ccb.jhu.edu/software/stringtie/gffcompare.shtml) was used to compare the assembled transcripts to annotated protein-coding genes⁷². Long non-coding RNAs (lncRNAs) were then identified as transcripts >200 nucleotides in length which lack protein-coding potential⁷³. Those lncRNAs that were expressed in only one replicate and with TPM < 1 were excluded. For the small RNAs, the clean reads were aligned and analyzed with reference genomes using ShortStack (version 3.8.3) with default settings⁷⁴. The identified non-coding RNAs were extracted with customer Perl scripts. The number of total metabolites were estimated for the leaf and root tissues of the four species using non-targeted metabolomics. The overall quantity of the metabolites was estimated and normalized based on the total peak area in the sample. Variable importance in projection (VIP) produced by PLS-DA, ANOVA, and fold change (FC) were applied to discover the contributable variable for classification. Tissue-biased metabolites were defined according to the following parameters, including VIP > 1, p value < 0.05, and FC ≥ 2 or FC ≤ 0.5.

Total RNAs of three Panax species (P. stipuleanatus, P. ginseng, and P. quinquefolius) was extracted from root, stem and leaf tissues using an RNA extraction kit (Tiangen, Beijing, China) based on the manufacturer’s instructions. RNA libraries were constructed by Novoseq (Tianjin, China) and sequenced using Illumina Novoseq (Illumina, CA, USA). Clean reads were aligned to the reference genomes using HISAT with default parameters⁷⁰. Raw mapped read counts were calculated using the prepDE.py script provided by StringTie⁷¹. Differences in transcription level of each gene were estimated using DESeq2⁶⁹. Differentially expressed genes (DEGs) were defined according to the 2-FC differences (p < 0.05) at the transcription level between different samples. Functional annotation of the genes was performed based on the KEGG and GO databases⁷⁵. Venn diagram were drawn with Venn Diagram⁷⁶. Expression patterns of the protein-coding genes were estimated according to a previous study⁷⁷. In brief, the relative expression level of a single gene was calculated based on the transcripts per million (TPM) for the three tissues within the triad as follows

$${{{{{\rm{Expression}}}}}}({{{{{\rm{Leaf}}}}}})=\frac{{TPM}\,({Leaf})}{{TPM}\left({Leaf}\right)+{TPM}\left({Root}\right)+{TPM}({Stem})}$$

(1)

$${{{{{\rm{Expression}}}}}}({{{{{\rm{Root}}}}}})=\frac{{TPM}({Root})}{{TPM}\left(L{eaf}\right)+{TPM}\left({Root}\right)+{TPM}({Stem})}$$

(2)

$${{{{{\rm{Expression}}}}}}({{{{{\rm{Stem}}}}}})=\frac{{TPM}({Stem})}{{TPM}\left({Leaf}\right)+{TPM}\left({Root}\right)+{TPM}({Stem})}$$

(3)

where TPM(Leaf), TPM(Root), and TPM(Stem) represent the expression level of each gene in leaf (Eq. 1), root (Eq. 2) and stem (Eq. 3) tissues, respectively. The normalized expression value was calculated for each one of the three tissues and for the average across all expressed tissues. The values of the relative contributions of each tissue per triad were used to plot the ternary diagrams using the R package ggtern⁷⁸.

Cytochrome P450 gene family member identification

Orthologs of the CYP 450 superfamily in the ginseng genus and other selected species were identified using BLAST. We downloaded CYP450 members identified in Arabidopsis (containing 288 members, https://drnelson.uthsc.edu/cytochromeP450.html) and P. ginseng (containing 484 members) as references in this analysis¹⁴. We aligned each species’ genome to the references, using blastp (-outfmt 6, -e value 1e-5, -num_threads 20, -num_alignments 100 and identity ≥ 40%), and received many potential terms. In addition, we required that the identical amino acid sites accounted for ≥ 40% (for Panax species) or ≥ 20% (for other species) of the reference gene sequence in each potential term (which will be classified into retained terms), when the CYP450 data set from P. ginseng (KPG) was used as reference. Finally, we compared previous terms with potential terms when the CYP450 data set from Arabidopsis was used as reference, and only overlapped terms that met the following requirements (identical amino acid sites accounted for ≥ 40% of its own gene sequence and ≥ 20% of the Arabidopsis gene sequence in each term) were kept for further analyses.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The raw total sequence reads and four genome assemblies have been deposited into the National Center for Biotechnology Information under the BioProject number PRJNA752920 with BioSample accessions SAMN20855168, SAMN20855173, SAMN20855195 and SAMN20855167. We also deposited the genome assemblies of the four Panax to Genome Warehouse in National Genomics Data Center^79,80, Chinese Academy of Sciences/China National Center for Bioinformation, under the project number PRJCA006678. Source data are provided with this paper.

References

Van De Peer, Y., Mizrachi, E. & Marchal, K. The evolutionary significance of polyploidy. Nat. Rev. Genet. 18, 411–424 (2017).
Article PubMed Google Scholar
Soltis, D. E. et al. Polyploidy and angiosperm diversification. Am. J. Bot. 96, 336–348 (2009).
Article PubMed Google Scholar
Ruprecht, C. et al. Revisiting ancestral polyploidy in plants. Sci. Adv. 3, e1603195 (2017).
Article ADS PubMed PubMed Central Google Scholar
Schubert, I. & Lysak, M. A. Interpretation of karyotype evolution should consider chromosome structural constraints. Trends Genet. 27, 207–216 (2011).
Article CAS PubMed Google Scholar
Wendel, J. F., Jackson, S. A., Meyers, B. C. & Wing, R. A. Evolution of plant genome architecture. Genome Biol. 17, 37 (2016).
Article PubMed PubMed Central Google Scholar
Van de Peer, Y., Ashman, T. L., Soltis, P. S. & Soltis, D. E. Polyploidy: an evolutionary and ecological force in stressful times. Plant Cell 33, 11–26 (2021).
Article PubMed Google Scholar
Fox, D. T., Soltis, D. E., Soltis, P. S., Ashman, T. L. & Van de Peer, Y. Polyploidy: a biological force from cells to ecosystems. Trends Cell Biol. 30, 688–694 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jiao, Y. et al. A genome triplication associated with early diversification of the core eudicots. Genome Biol. 13, R3 (2012).
Article PubMed PubMed Central Google Scholar
Qiao, X. et al. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol. 20, 38 (2019).
Article PubMed PubMed Central Google Scholar
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Article ADS CAS PubMed Google Scholar
Wang, X. et al. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation. N. Phytologist 209, 1252–1263 (2016).
Article CAS Google Scholar
Wendel, J. F., Lisch, D., Hu, G. & Mason, A. S. The long and short of doubling down: polyploidy, epigenetics, and the temporal dynamics of genome fractionation. Curr. Opin. Genet. Dev. 49, 1–7 (2018).
Article CAS PubMed Google Scholar
Zuo, Y. J., Wen, J. & Zhou, S. L. Intercontinental and intracontinental biogeography of the eastern Asian – eastern North American disjunct Panax (the ginseng genus, Araliaceae), emphasizing its diversification processes in eastern Asia. Mol. Phylogenet. Evol. 117, 60–74 (2017).
Article PubMed Google Scholar
Kim, N. H. et al. Genome and evolution of the shade-requiring medicinal herb Panax ginseng. Plant Biotechnol. J. 16, 1904–1917 (2018).
Article CAS PubMed PubMed Central Google Scholar
Shi, F. X. et al. The impacts of polyploidy, geographic and ecological isolations on the diversification of Panax (Araliaceae). BMC Plant Biol. 15, 297 (2015).
Article PubMed PubMed Central Google Scholar
Han, J. Y., Hwang, H. S., Choi, S. W., Kim, H. J. & Choi, Y. E. Cytochrome P450 CYP716A53v2 catalyzes the formation of protopanaxatriol from protopanaxadiol during ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol. 53, 1535–1545 (2012).
Article CAS PubMed Google Scholar
Li, M. R. et al. Genome-wide variation patterns uncover the origin and selection in cultivated ginseng (Panax ginseng Meyer). Genome Biol. Evol. 9, 2159–2169 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463–467 (2007).
Article ADS CAS PubMed Google Scholar
Lamesch, P. et al. The Arabidopsis information resource (TAIR): Improved gene annotation and new tools. Nucleic Acids Res. 40, D1202–D1210 (2012).
Article CAS PubMed Google Scholar
Zhang, D. et al. The medicinal herb Panax notoginseng genome provides insights into ginsenoside biosynthesis and genome evolution. Mol. Plant 10, 903–907 (2017).
Article CAS PubMed Google Scholar
Wang, J. et al. Sequential paleotetraploidization shaped the carrot genome. BMC Plant Biol. 20, 52 (2020).
Article CAS PubMed PubMed Central Google Scholar
Song, X. et al. The celery genome sequence reveals sequential paleo-polyploidizations, karyotype evolution and resistance gene reduction in apiales. Plant Biotechnol. J. 19, 731–744 (2021).
Article CAS PubMed Google Scholar
Wang, L. et al. Altered chromatin architecture and gene expression during polyploidization and domestication of soybean. Plant Cell 33, 1430–1446 (2021).
Article PubMed PubMed Central Google Scholar
Wang, M. et al. Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat. Plants 4, 90–97 (2018).
Article CAS PubMed Google Scholar
Concia, L. et al. Wheat chromatin architecture is organized in genome territories and transcription factories. Genome Biol. 21, 104 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wang, C. et al. Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res. 25, 246–256 (2015).
Article PubMed PubMed Central Google Scholar
Liu, C. et al. Prominent topologically associated domains differentiate global chromatin packing in rice from Arabidopsis. Nat. Plants 3, 742–748 (2017).
Article CAS PubMed Google Scholar
Waseem, M., Liu, Y. & Xia, R. Long non-coding RNAs, the dark matter: an emerging regulatory component in plants. Int. J. Mol. Sci. 22, 1–26 (2021).
ADS Google Scholar
Yu, Y., Zhang, Y., Chen, X. & Chen, Y. Plant noncoding RNAs: Hidden players in development and stress responses. Annu. Rev. Cell Developmental Biol. 35, 407–431 (2019).
Article CAS Google Scholar
Van De Peer, Y., Maere, S. & Meyer, A. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet. 10, 725–732 (2009).
Article PubMed Google Scholar
Soltis, P. S. & Soltis, D. E. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol. 30, 159–165 (2016).
Article PubMed Google Scholar
Soltis, D. E., Visger, C. J. & Soltis, P. S. The polyploidy revolution then…and now: Stebbins revisited. Am. J. Bot. 101, 1057–1078 (2014).
Article PubMed Google Scholar
Hartmann, T. From waste products to ecochemicals: Fifty years research of plant secondary metabolism. Phytochemistry 68, 2831–2846 (2007).
Article CAS PubMed Google Scholar
Bak, S. et al. The Arabidopsis Book - Cytochromes P450. Am. Soc. Plant Biol. 9, 56 (2011).
Google Scholar
Nelson, D. & Werck-Reichhart, D. A P450-centric view of plant evolution. Plant J. 66, 194–211 (2011).
Article CAS PubMed Google Scholar
Han, J. Y., Kim, H. J., Kwon, Y. S. & Choi, Y. E. The Cyt P450 enzyme CYP716A47 catalyzes the formation of protopanaxadiol from dammarenediol-II during ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol. 52, 2062–2073 (2011).
Article CAS PubMed Google Scholar
Papadopoulou, K., Melton, R. E., Leggett, M., Daniels, M. J. & Osbourn, A. E. Compromised disease resistance in saponin-deficient plants. Proc. Natl Acad. Sci. USA 96, 12923–12928 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
Augustin, J. M. et al. Molecular activities, biosynthesis and evolution of triterpenoid saponins. Phytochemistry 72, 435–457 (2011).
Article CAS PubMed Google Scholar
Miettinen, K. et al. The ancient CYP716 family is a major contributor to the diversification of eudicot triterpenoid biosynthesis. Nat. Commun. 8, 14153 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Birchler, J. A. & Veitia, R. A. Gene balance hypothesis: connecting issues of dosage sensitivity across biological disciplines. Proc. Natl Acad. Sci. USA 109, 14746–14753 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Dubcovsky, J. & Dvorak, J. Genome plasticity a key factor in the success of polyploid wheat under domestication. Science 316, 1862–1866 (2007).
Article CAS PubMed PubMed Central Google Scholar
Leitch, A. R. & Leitch, I. J. Genomic plasticity and the diversity of polyploid plants. Science 320, 481–483 (2008).
Article ADS CAS PubMed Google Scholar
Zhao, X. & Hao, W. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, i351–i358 (2005).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinforma. 25, 4.10 (2009).
Article Google Scholar
Iorizzo, M. et al. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution. Nat. Genet. 48, 657–666 (2016).
Article CAS PubMed Google Scholar
Reyes-Chin-Wo, S. et al. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nat. Commun. 8, 14953 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice. BMC Bioinforma. 7, 447 (2006).
Article Google Scholar
Guo, H. et al. Gene duplication and genetic innovation in cereal genomes. Genome Res. 29, 261–269 (2019).
Article CAS PubMed PubMed Central Google Scholar
Boratyn, G. M. et al. BLAST: a more efficient report with usability improvements. Nucleic Acids Res. 41, W29–W33 (2013).
Article PubMed PubMed Central Google Scholar
Wang, X., Shi, X., Hao, B., Ge, S. & Luo, J. Duplication and DNA segmental loss in the rice genome: implications for diploidization. N. Phytologist 165, 937–946 (2005).
Article CAS Google Scholar
Wang, X. et al. Genome alignment spanning major poaceae lineages reveals heterogeneous evolutionary rates and alters inferred dates for key evolutionary events. Mol. Plant 8, 885–898 (2015).
Article CAS PubMed Google Scholar
Wang, J. et al. Hierarchically aligning 10 legumegenomes establishes a family-level genomics platform. Plant Physiol. 174, 284–300 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yang, Z. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article CAS PubMed Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: Phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article PubMed PubMed Central Google Scholar
De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).
Article PubMed Google Scholar
Ellinghaus, D. et al. An efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinforma. 9, 18 (2008).
Article Google Scholar
Wicker, T. et al. Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol. 19, 103 (2018).
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yaffe, E. & Tanay, A. Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet. 43, 1059–1065 (2011).
Article CAS PubMed Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central Google Scholar
Ramírez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).
Article ADS PubMed PubMed Central Google Scholar
Belton, J. M. et al. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58, 268–276 (2012).
Article CAS PubMed Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Wickham, H. ggplot2: Elegant Graphics for Data Analysis. 35, 211 (Springer-Verlag New York. Media, 2009).
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Wick, R. R., Judd, L. M. & Holt, K. E. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129 (2019).
Article PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhao, X. et al. Global identification of Arabidopsis lncRNAs reveals the regulation of MAF4 by a natural antisense RNA. Nat. Commun. 9, 5056 (2018).
Article ADS PubMed PubMed Central Google Scholar
Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).
Article CAS PubMed PubMed Central Google Scholar
Axtell, M. J. ShortStack: comprehensive annotation and quantification of small RNA genes. RNA 19, 740–751 (2013).
Article CAS PubMed PubMed Central Google Scholar
Harris, M. A. et al. The Gene Oncology (GO) database and informatics resource. Nucleic Acids Res. 32, 258–261 (2004).
Article ADS Google Scholar
Chen, H. & Boutros, P. C. VennDiagram: a package for the generation of highly-customizable Venn and Euler diagrams in R. BMC Bioinforma. 12, 35 (2011).
Article Google Scholar
Ramírez-González, R. H. et al. The transcriptional landscape of polyploid wheat. Science 361, 662 (2018).
Article Google Scholar
Hamilton, N. E. & Ferry, M. ggtern: Ternary diagrams using ggplot2. J. Stat. Softw. 87, 3 (2018).
Article Google Scholar
Chen et al. Genome warehouse: a public repository housing genome-scale data. Genom. Proteom. Bioinforma. 21, S1672–0299 (2021).
Google Scholar
Xue, Y. et al. Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022. Nucleic Acids Res. 50, D27–D38 (2021).
Google Scholar

Download references

Acknowledgements

We thank James A. Birchler for his suggestions on the gene balance dosage hypothesis. This study was supported by the Natural Science Foundation of China (#31970235 to L.F.L. and 31991211 to B.L.) and the Shanghai Pujiang Program (#19PJ1401500 to L.F.L.). T.L. has been supported by a Vanier Canada Graduate Scholarship and a doctoral training fellowship from Fonds de Recherche du Québec–Santé. Y.V.d.P. acknowledges funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (No. 833522) and from Ghent University (Methusalem funding, BOF.MET.2021.0005.01).

Author information

Authors and Affiliations

Faculty of Agronomy, Jilin Agricultural University, 130118, Changchun, China
Zhen-Hui Wang
Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, School of Life Sciences, Fudan University, 200438, Shanghai, China
Zhen-Hui Wang, Xin-Feng Wang, Ming-Rui Li & Lin-Feng Li
Key Laboratory of Molecular Epigenetics of the Ministry of Education (MOE), Northeast Normal University, 130024, Changchun, China
Zhen-Hui Wang, Peng Jiang, Jing Zhao & Bao Liu
McGill University and Genome Quebec Innovation Center, Montreal, QC, H3A 0G1, Canada
Tianyuan Lu
School of Life Sciences, Jilin University, 130061, Changchun, China
Si-Tong Liu & Xue-Qi Fu
Department of Ecology, Evolution & Organismal Biology, Iowa State University, Ames, IA, 50011, USA
Jonathan F. Wendel
Department of Plant Biotechnology and Bioinformatics, Ghent University and VIB Center for Plant Systems Biology, Gent, Belgium
Yves Van de Peer
Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
Yves Van de Peer
College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, 210095, Nanjing, China
Yves Van de Peer

Authors

Zhen-Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin-Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianyuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ming-Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Peng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Si-Tong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xue-Qi Fu
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan F. Wendel
View author publications
You can also search for this author in PubMed Google Scholar
Yves Van de Peer
View author publications
You can also search for this author in PubMed Google Scholar
Bao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lin-Feng Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

L.F.L., Y.V.P, J.F.W., and B.L. conceived this project and coordinated research activities; Z.H.W., M.R.L., S.T.L., X.Q.F., and P.J. collected and maintained the plant materials; J.Z. carried out the chromosome experiments; Z.H.W., T.L., X.F.W., M.R.L., P.J, S.T.L., and X.Q.F. conducted the comparative genomic and epigenomic analyses; Z.H.W., X.F.W., Y.V.P, J.F.W., B.L., and L.F.L. wrote the paper. All authors discussed the results and approved the paper.

Corresponding authors

Correspondence to Yves Van de Peer, Bao Liu or Lin-Feng Li.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Yuannian Jiao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, ZH., Wang, XF., Lu, T. et al. Reshuffling of the ancestral core-eudicot genome shaped chromatin topology and epigenetic modification in Panax. Nat Commun 13, 1902 (2022). https://doi.org/10.1038/s41467-022-29561-5

Download citation

Received: 13 September 2021
Accepted: 23 March 2022
Published: 07 April 2022
DOI: https://doi.org/10.1038/s41467-022-29561-5

This article is cited by

Genome-wide identification and integrated analysis of TCP genes controlling ginsenoside biosynthesis in Panax ginseng
- Chang Liu
- Tingting Lv
- Yi Wang
BMC Plant Biology (2024)
Whole-genome sequencing in medicinal plants: current progress and prospect
- Yifei Pei
- Liang Leng
- Shilin Chen
Science China Life Sciences (2024)
Genome-wide identification and systematic analysis of the HD-Zip gene family and its roles in response to pH in Panax ginseng Meyer
- Li Li
- Boxin Lv
- Meiping Zhang
BMC Plant Biology (2023)
Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
- Kunpeng Li
- Peng Xu
- Yuannian Jiao
Nature Communications (2023)
Transcriptome analysis of MYB transcription factors family and PgMYB genes involved in salt stress resistance in Panax ginseng
- Mingming Liu
- Ke Li
- Meiping Zhang
BMC Plant Biology (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.