Genome evolutionary dynamics followed by diversifying selection explains the complexity of the Sesamum indicum genome

Whole genome duplication (WGD) and tandem duplication (TD) provide two critical sources of raw genetic material for genome complexity and evolutionary novelty. Little is known about the complexity of the Sesamum indicum genome after it diverged from a common ancestor with the paleodiploid Vitis vinifera and further experienced WGD and TD events. Here, we analyzed the functional divergence of different classes of inter- and intra-genome gene pairs from ancestral events to uncover multiple-layers of evolutionary dynamics acting during the process of forming the modern S. indicum genome. Comprehensive inter-genome analyses revealed that 60% and 70% of syntenic orthologous gene pairs were retained among the two subgenomes in S. indicum compared to V. vinifera, although there was no evidence of significant differences under selection pressure. For the intra-genomic analyses, 5,932 duplicated gene pairs experienced fractionation, with the remaining 1,236 duplicated gene pairs having undergone functional divergence under diversifying selection. Analysis of the TD events indicated that 2,945 paralogous gene pairs, from 1,089 tandem arrays of 2–16 genes, experienced functional divergence under diversifying selection. Sequence diversification of different classes of gene pairs revealed that most of TD events occurred after the WGD event, with others following the ancestral gene order indicating ancient TD events at some time prior to the WGD event. Our comparison-of-function analyses for different classes of gene pairs indicated that the WGD and TD evolutionary events were both responsible for introducing genes that enabled exploration of novel and complementary functionalities, whilst maintaining individual plant ruggedness. In this study, we first investigated functional divergence of different classes of gene pairs to characterize the dynamic processes associated with each evolutionary event in S. indicum. The data demonstrated massive and distinct functional divergence among different classes of gene pairs, and provided a genome-scale view of gene function diversification explaining the complexity of the S. indicum genome. We hope this provides a biological model to study the mechanism of plant species formation, particularly in the context of the evolutionary history of flowering plants, and offers novel insights for the study of angiosperm genomes.


Background
Whole genome duplication (WGD) has been an important driving force in accelerating angiosperm diversification, and is recognized as the primary source of novel genomic material contributing to genome complexity and evolutionary novelty [1]. The prevalence of WGD in flowering plants has been detected by analyzing the evidence of ancestrally inherited gene duplicates [2,3]. Phylogenetic analyses of gene duplicates has uncovered two ancient WGD events that occurred at the root of the seed plants (ζ), and at the base of the angiosperms ( ) prior to the divergence of monocots and eudicots. These events were estimated to occur around 319 ± 3 and 192 ± 2 million years ago (Mya) [4]. Within the eudicot lineage, phylogenetic analyses indicate an ancient whole genome triplication (WGT) event (Ƴ) that predated the split of the Asterid and Rosid lineages approximately 130 Mya [2,[5][6][7]. Based on evolutionary relationships between plant species in the asterid clade, the S. indicum genome has been estimated to have diverged from the Solanum lineage approximately 125 Mya (89.8 to 185.8 Mya), and from U. gibba approximately 98 million years ago (68.6 to 145.2 Mya). By comparing the distribution of synonymous codon (Ks) mutations for duplicated genes from WGD events among S. indicum, U. gibba and Solanum lineages, the most recent WGD in the lineage leading to S. indicum was estimated to have occurred approximately 71 (±19) Mya, possibly at the same evolutionary stage and in parallel with the WGT event within the Solanum lineage [8] (Fig. 1).
There are many genome-sequenced plant species in the rosid clade, but relatively few in the asterid clade. In the rosid clade, Vitis vinifera was the fourth species for which the complete genome sequence was established in flowering plants. After comparison with its close relatives, V. vinifera was considered as a true diploid, which had not undergone recent genome duplication [9]. So, V. vinifera was thought to contain ancient genomic loci or ancestral gene orders, which could be used to enable the discovery of ancestral traits and genomic features of flowering plants. In the asterid clade, prior to the release of the Sesamum indicum (sesame, Asteraceae) draft genome [8], several genomes were publicly available, including Solanum tuberosum (potato), Solanum lycopersicon (tomato), Utricularia gibba (floating bladderwort) and Mimulus guttatus (monkey flower), which had experienced WGD or WGT events or neardoubling of chromosome numbers within their genomes [6,7,[10][11][12]. Therefore, V. vinifera represents a paleodiploid species that is close to plant species in the asterid clade, and which experienced the older eudicot genome triplication event (γ) [6-8, 10, 13]. As a result, the paleodiploid V. vinifera in the rosid clade has maintained a complement of single-copy genes or single-copy syntenic regions at a whole-genome scale compared to other taxa within the asterid clade. Previous comparison of the two modern genomes, S. indicum and V. vinifera, has led to identification of two non-overlapping subgenomes in the S. indicum genome, which provides a rich source of genomic data to study orthologous genes between V. vinifera and S. indicum, as well as duplicated genes in S. indicum [8].
The WGD event contributed duplicated genes leading to the increase of gene dosage in S. indicum. Previous study indicated that duplicated genes mainly originate as a result of four different processes, that include ectopic recombination, replication slippage, retro-transposition and WGD [14]. Following duplication, these genes may experience different evolutionary fates under diversifying selection pressures, including conserved function, subfunctionalization [15,16], neo-functionalization [17,18] Fig. 1 Ancestral polyploid events and corresponding timeline within the asterids lineage. Rectangles represents whole genome duplication events and ovals tandem duplication events. WGT: whole genome triplication. WGD: whole genome duplication. TD: tandem duplication. Question mark (?) represents undetermined occurrence time of tandem duplication event and loss [19]. Followed by diversifying selection in an evolutionary process, duplicated genes from the three WGD events in the A. thaliana lineage provided functional divergence and indicated sub-and neofunctionalization, which have been evaluated by proteinprotein interaction in modern A. thaliana populations [20]. Moreover, the relative gene expression of paralogous genes across tissues demonstrates that 98% of duplicate pairs have sub-functionalized in a tissue-wise manner following WGD events [21]. Tandem duplication (TD) is a ubiquitous phenomenon in flowering plants, which can also bring about the increase of gene dosage [22,23]. Compared to other duplication events, TDs occur more frequently and focus on smaller scale duplication within the genome [24,25]. TD events are prevalent in many flowering plants and are a characteristic feature of many gene families related to key traits or phenotypes, including the genes coding for nucleotide binding site (NBS), cytochrome P450s and receptor-like kinases [26][27][28]. The tandem duplicated genes generated by TD events have experienced functional divergence under diversifying selection. From expression difference analysis of the NBS-encoding gene family in Brassica rapa and B. oleracea, paralogous genes from tandem arrays contributed more towards functional divergence than orthologous genes between B. rapa and B. oleracea over their evolutionary history [26].
Both WGD or TD events can contribute to anincrease in gene dosage, which may enhance the biological function of duplicated genes. However, duplicated genes from WGD event or paralogous genes from TD events may subsequently display functional divergence, which was not explained by the gene-dosage balance hypothesis [29]. Several questions therefore arise: How the gene-dosage balance hypothesis influence gene evolution in S. indicum? How is the function of the gene changed in the evolutionary history of the S. indicum genome? What isthe complexity of the S. indicum genome after it diverged from a common ancestor with V. vinifera (species divergence event), and experienced WGD and TD events?
In this study, we first compared two S. indicum subgenomes and the V. vinifera genomes to obtain syntenic orthologous gene pairs. Secondly, we inferred duplicated gene pairs in the S. indicum subgenomes attributable to the WGD event. Thirdly, we identified pairs of genes based on every possible combination from a tandem array to constitute two-gene paralogous gene pairs in a corresponding tandem array within the S. indicum genome. Using different classes of gene pairs from the S. indicum specific ancient evolutionary events, we investigated the functional divergence of different classes of gene pairs by employing InterPro annotation to trace the evolutionary dynamic process of S. indicum genome followed by diversifying selection. From comparison of functional divergence for different classes of gene pairs, we characterized the dynamics associated with each evolutionary event to determine the complexity of the S. indicum genome. The data demonstrate massive and distinct functional divergence among different gene pairs, and provide a genome-scale view of gene function diversification which is able to be traced to ancient evolutionary events. We propose that these insights into the dynamics of S. indicum genome evolution serve as an important model for studying the evolutionary biology of flowering plants.

Results
Influence of whole genome duplication on the S. indicum genome S. indicum has experienced a WGD event approximately 71 (±19) Mya, which resulted in two subgenomes (Sub-genome1 and Subgenome2) compared to the V. vinifera genome [8]. Employing S. indicum and V. vinifera genomes, we used blastp to reconstruct orthologous gene pairs between the two species with an E-value threshold of 1e-20 [30]. We then employed the MCscanX program to identify orthologous genomic regions with the parameters (e = 1e-20, u = 1 and s = 15) between S. indicum and V. vinifera genomes [31]. After manual curation, Subgenome1 covered approximately 57.22 Mb (7,450 genes) represented by 82 syntenic blocks in common with the V. vinifera genome. Meanwhile, Subgenome2 covered approximately 68.66 Mb (7,958 genes) represented by 87 syntenic blocks in common with the V. vinifera genome (Additional file 1: Table S1). Together, the two subgenomes represented 45.9% of the assembled S. indicum genome, which included 56.7% of the 27,148 genes currently annotated within S. indicum genome ( Table 1). Each chromosome pseudomolecule apart from 'LG16' contained a few syntenic regions, indicating that the S. indicum genome has experienced chromosome fragmentation and reassortment following the WGD LG01' was associated with Subgenome1, as was the shortest syntenic region of 63 Kb on 'LG15' (Fig. 2).

Functional divergence of syntenic orthologous gene pairs
Based on the syntenic relationships, we identified 5,932 V. vinifera genes that had orthologous genes located within the S. indicum subgenomes. This comparison involved 3,656 and 3,512 syntenic orthologous genes in Subgenome1 and Subgenome2, respectively. InterPro annotation enabled us to annotate these syntenic orthologous genes with functional descriptions [32]. We allocated each of the syntenic orthologous gene pairs to one of three classes, depending on their functionaldivergence status: (A) conserved function, with shared identical InterPro entries, (B) sub-functionalization, with shared partially identical InterPro entries, and (C) neofunctionalization, with completely different InterPro entries. The A, B and C functional divergence classes were used as evidence of collinearity between orthologous gene pairs in the S. indicum genome and corresponding ancient genomic loci in the V. vinifera genome. Of the 3,656 syntenic orthologous gene pairs shared between Subgenome1 and the V. vinifera genome, 2,681 (73.3%) shared the same InterPro entries, indicating that these genes retained conserved functions with the ancient genomic loci present in V. vinifera. A total of 471 (12.9%) orthologous gene pairs retained partially identical InterPro entries, indicating they had undergone sub-functionalization following the S. indicum split from a common ancestor with the paleodiploid V. vinifera.154 (4.2%) orthologous gene pairs had unrelated InterPro entries, suggesting that these genes had undergone neofunctionalization in S. indicum (Additional file 2: Table  S2). Of the 3,512 syntenic orthologous gene pairs between Subgenome2 and the V. vinifera genome, 3,117 were represented by InterPro entries, with 2,537 (81.4% of those annotated) sharing identical InterPro entries, indicating that these retained the same function as the corresponding ancient genes in the V. vinifera genome. A total of 449 (12.8% of those annotated) shared partial InterPro entries and 132 (4.2% of those annotated) had distinct InterPro entries, suggesting that the function of these two classes of S. indicum genes had undergone sub-functionalization and neo-functionalization compared to orthologues in V. vinifera (Table 2, Additional file 3: Table S3).

Selection pressure on syntenic orthologous gene pairs
For coding sequences, the strength of selection pressure is measured by the ratio of the rates of nonsynonymous substitution over synonymous substitutions (Ka/Ks) [33,34]. We calculated Ka/Ks of S. indicum and V. vinifera syntenic orthologous gene pairs to determine whether they had experienced different selective pressures during the process of functional divergence. After filtering gene pairs with low sequence similarity, we found that the remaining 3,306 syntenic orthologous In comparison, we found that overall, 3,116 syntenic orthologous gene pairs from the A, B, C classes associated with Subgenome2 had a mean Ka/Ks ratio of 0.121 (median value: 0.128), which is lower than the mean Ka/ Ks ratio observed between Subgenome1 of the S. indicum genome and the V. vinifera genome. However, this difference was not statistically significant (Mann-Whitney U test, P = 0.3371 > 0.05). A similar pattern of Ka/Ks ratios was found (A: 0.131, median value: 0.119; B: 0.142, median value: 0.13 and C: 0.179, median value: 0.166), and a similar inference of purifying selection for any two classes (Mann-Whitney U test, P A:B = 0.003246 < 0.05; P B:C = 1.016e-08 < 0.05; P A:C = 0.0001182 < 0.05) with the consensus evolutionary pattern under diversifying selection among different classes of functional divergence.

Fractionation of duplicated gene pairs
Using the V. vinifera genome to represent the reference ancient genome, we extracted two syntenic subgenomes from S. indicum in order to detect the evolutionary fate of duplicated genes following the WGD event. Duplicated gene pairs located on two syntenic subgenomes in the S. indicum genome will tend to fractionate following a WGD event (Fig. 3a). Based on loss and retention of duplicated gene pairs, we found that 4,696 duplicated gene pairs (79.16%) experienced fractionation, with 2,420 gene pairs retained in Subgenome1 and 2,276 in Subgenome2. There are 1,236 gene pairs co-retained in the two subgenomes in S. indicum (Table 3). We next focused on the fractionation of duplicated gene pairs in S. indicum, as represented by colored boxes in Fig. 3a.
The functional analysis of duplicated genes in S. indicum provides a novel approach to detect asymmetric evolution within duplicated subgenomes. In order to compare the functions of single-copy genes retained in the different S. indicum subgenomes, we employed the InterPro entries to describe the gene function based on the characteristics of conserved domains. This enabled us to identify 923 (28.37% of the genes in Subgenome1) Inter-Pro functional entries for single-copy genes within Sub-genome1 and 863 (28.5% of the genes in Subgenome2) from Subgenome2. Interestingly, we found 804 InterPro functional entries shared between the two subgenomes. These results suggest that the two S. indicum subgenomes still retained many genes with identical function, although they had experienced fractionation of duplicated gene pairs. The specific InterPro entries found in Subgenome1 were enriched for gene families or conserved functional domains of the GNAT domain, Ubiquitin carboxyl-terminal hydrolases family 2, Glycoside hydrolase, family 5, actin-binding, cofilin/tropomyosin type, peptidase S54, rhomboid, peptidase M20, cation-transporting P-type ATPase, N-terminal, cation-   Table S4). The duplicated genes with conserved function or subfunctionalization in different subgenomes were mainly enriched into conserved domains or motifs of protein kinases and transcription factors, which represented a larger proportion of all duplicated gene pairs. The neofunctionalized duplicated gene pairs experienced severe functional divergence, although these genes still InterPro entries in common which mainly focused on the conserved domains or motifs of zinc finger and transcription factors. These results suggested that WGD events had primarily brought about an increase in protein kinases and transcription factors involved in biological processes of signal transduction system, protein phosphorylation and signal transduction, carbohydrate biosynthesis and metabolism, as well as transcriptional regulation [35].

Influence of tandem duplication events in the S. indicum genome
Tandem duplication events will lead not only to the expansion of gene families, but also an increase of gene dosage in the form of tandem arrays [36]. Tandem duplicated genes in the S. indicum genome have previously been reported and were available in the PTGBase database [37]. We used this set and curated them based on the characteristics of conserved domains or motifs of gene families. This provided a set of 2,745 tandem duplicated genes distributed in 1,089 tandem arrays of 2-16 genes for further analysis (Additional file 5: Table S5).
From the S. indicum genome, 2,570 of the tandem duplicated genes representing 94% of total tandem duplicated genes, were distributed in 1,008 tandem arrays, and anchored on the 16 linkage groups (LG), with an uneven distribution. The highest proportion was anchored on the LG06, with 290 tandem duplicated genes distributed in 118 tandem arrays of 2-9 genes. The LG06 contained 2,745 protein-coding genes and tandem duplicated genes represented 10.56% of total proteincoding genes in LG06. In contrast, the LG13 contained 21 tandem duplicated genes generated by 9 tandem arrays of 2-4 genes. The LG13 contained 522 protein-coding genes and tandem duplicated genes represented 4.02% of total protein-coding genes in LG13. The largest tandem array consisted of 16 genes on LG12, and these genes were involved in the molecular function of oxidoreductase activity and flavin adenine dinucleotide binding [35] (Fig. 4).

Function divergence between the members of tandem array
For each tandem array, we selected two genes based on every possible combination to constitute paralogous gene pairs to investigate functional divergence. For example, one tandem array has three genes (a, b and c), which will generate three paralogous gene pairs (a-b, a-c and b-c). Finally, we obtained 2,945 paralogous gene pairs among all tandem arrays. Based on the annotation by InterPro entries, 197 of the paralogous gene pairs (6.7%) were not represented by InterPro entries. We therefore used the annotation of InterPro entries to determine the functional divergence of 2,748 paralogous gene pairs in tandem arrays, of which 2,308 (78.4%) sharing identical InterPro entries were classified into the A class of conserved function. These were mainly recognized as the members of gene families or conserved domains of Auxin-induced protein, ARG7, Cytochrome P450 and Cytochrome P450, E-class, group I. 425 (14.4%) shared partially identical InterPro entries and were allocated as sub-functionalized between members of paralogous gene pairs, and were grouped into the gene families or conserved domains of Protein kinase domain, Serine/threonine-/dual specificity protein kinase, catalytic domain and Tyrosine-protein kinase, catalytic domain. Only 15 gene pairs were recognized as the gene pairs of complete functional divergence and were also grouped into the gene families of protein kinases. This analysis indicates that the majority of tandem duplicated genes has a conserved function. Irrespective of whether the paralogous genes belonged to the members of gene pairs with sub-functionalization or neofunctionalization, the paralogous gene pairs of functional divergence represented a smaller proportion in all paralogous gene pairs of tandem arrays. This suggests that most of tandem duplicated genes in S. indicum display a bias towards conserved function, suggesting the tandem duplicated genes were subject to weaker selection pressure (Additional file 6: Table S6).
Gene functional differences between duplicated and tandem duplicated genes WGD and TD events provide abundant genomic materials and bring more opportunities for species to adapt to changing environments under selection pressure. Since these two events occurred during different stages of evolutionary history, we propose that the InterPro annotation is able to detect gene functional differences between the two events. Approximately 1,059 InterPro entries were used to annotate 2,472 duplicated genes from WGD event in S. indicum, and 634 InterPro entries for 2,745 tandem duplicated genes from TD events in S. indicum, providing evidence that the WGD event introduced greater gene complexity with distinct functional ingredients compared to the TD events. From the comparative analysis of annotation between WGD and TD events, we obtained 344 overlapping InterPro entries between duplicated and tandem duplicated genes. The remaining 715 InterPro entries were used to annotate 46% of all duplicated genes, which consisted of specific InterPro entries for annotation of duplicated genes. Removing the overlapping InterPro entries, the remaining 290 InterPro entries were specific for tandem duplicated genes, which were used to annotate 26.4% of all tandem duplicated genes (Fig. 5a). Gene numbers were compared following Log2 normalization between duplicated and tandem duplicated genes, which were annotated by the overlapping InterPro entries. For function comparison, the members of gene families for protein kinase represented the largest proportion of all tandemly duplicated genes, consistent with that in duplicated genes, although the gene number was different between the two datasets. Members of the Cytochrome P450 gene family were overrepresented within tandem duplicated genes, whereas transcription factors represented a larger proportion of duplicated genes (Fig. 5b). Members of the transcription factor WRKY and Kinesin gene families were classified according to specific InterPro entries for duplicated genes (Fig. 5c), with disease resistance recognition genes (R genes) enriched within the specific InterPro entries within tandem duplicated genes (Fig. 5d). Thus the different evolutionary events affecting the S. indicum genome have given rise to a different complement of genes with distinct functional ingredients. From this analysis, it appeared that the resulting gene composition generated by WGD and TD events differed as a result of selection. Some families with relatively high retention frequencies for TD events have relatively low retention frequencies for the WGD event, and vice versa [38].

Sequence diversification of different classes of gene pairs from evolutionary events
In order to detect sequence diversification of different classes of gene pairs arising from specific evolutionary indicum and V. vinifera was a diploid relative to V. vinifera and that the lineages which gave rise to the two subgenomes of the modern sesame genome diverged from each other well after the split of the S. indicum and V. vinifera lineages. The maximum Ks for paralogous gene pairs from tandem arrays was 0.3, which was the lowest of different classes of gene pairs from the other evolutionary events, indicating that the most of TD events may have occurred more recently than the WGD event, and also later than the S. indicum split from a common ancestor with V. vinifera (Fig. 6).

Dating of tandem duplication events in S. indicum
The analysis of sequence divergence of different classes of gene pairs indicates that most of the TD events represented the most recent events in the evolutionary history of S. indicum, and likely occurred after the WGD event.
In order to date the evolution of tandem duplicated genes, we combined the different classes of gene pairs from the WGD and TD events. 126 tandem duplicated genes distributed in 63 two-gene tandem arrays were located on the S. indicum subgenomes, which had 118 syntenic orthologous genes in V. vinifera, suggesting that these genes were located on syntenic blocks in the subgenomes compared to V. vinifera genome, and may be inherited from their ancestral gene orders (Additional file 7: Table S7). There was no evidence for the remaining 2,619 tandem duplicated genes being associated with the ancient genomic loci, indicating that these tandem duplicated genes might be generated after the WGD event. With the WGD event recognized as a reference point, tandem duplicated genes can then be divided into two classes: 126 tandem duplicated genes which were generated before the WGD event, and 2,619 after. Of the first set 72 tandem duplicated genes are distributed within 36 two-gene tandem arrays and are located within Subgenome1, with the remainder (54) distributed on 27 two-gene tandem arrays in Subgenome2. From these results, we concluded that the TD events had not occurred at a particular evolutionary stage but had been a continuous process over a long historical period, which is consistent with the description of the Brassica genus [39].
Evolutionary patterns of certain gene families followed by whole genome duplication and tandem duplication events Through the analysis of gene functional differences between duplicated and tandem duplicated genes in S. indicum, some gene families were affected more by theWGD event, but others were more affected by TD events. In order to detect the evolutionary patterns of gene families following the WGD and TD events, certain gene families containing conserved domains of DNAbinding WRKY, NB-RAC and Cytochrome P450 were chosen to investigate the consequences of each event.
The DNA-binding WRKY gene family is one of the largest families of transcriptional regulators in plants and contributes integral parts of signaling pathways modulating many plant processes [40]. Based on the InterPro annotation, 72 WRKY genes were detected in the S. indicum genome. It appears that 21 (29.2%) of these were generated by the WGD event, and none by TD events. The NB-ARC (NBS-encoding) gene family is a major class of disease resistance recognition genes (R genes), and play an important role in defense against pests and pathogens, thus improving the adaptability of plants to biotic stress [41]. Approximately 171 NBSencoding genes were detected by the InterPro annotation in the S. indicum genome, of which 83 (48.5%) were generated by TD and none by WGD. Cytochrome P450 monooxygenases constitute a large superfamily of hemethiolate proteins prevalent in prokaryotes and eukaryotes [42,43], and involved in biosynthesis of fatty acids, structural polymers (lignins), pigments (anthocyanins), accessory pigments (carotenoids), defense-related compounds (some phytoalexins), and UV protectants (flavonoids and sinapoyl esters) [44]. According to the InterPro annotation, 307 cytochrome P450 genes were extracted, representing 1.13% of the gene complement within the S. indicum genome, of which six were generated by WGD and 126 generated by TD, indicating a more significant influence of TD than WGD event for this gene class in S. indicum (Table 4).

Discussion
Functional divergence by diversifying selection The study of functional-divergence for different classes of gene pairs has been explored in the context of the three ancestral WGD events leading to the contemporary Arabidopsis genome. Different proportions of duplicated gene pairs from these sequential WGD events have indicated functional divergence using the number of identified protein-protein interactions as a proxy. Differences between duplicated gene pairs based on Gene Ontology annotation have reinforced this evidence of functional divergence from proteinprotein interactions, and has been interpreted as indicative of adaptation to different cellular components [20]. Comparison of functional divergence between the two S. indicum subgenomes compared to the V. vinifera genome, indicates that 73.3% of Subgenome1 and 72.2% of Subgenome2 have retained a conserved function between members of gene pairs, with the remainder displaying evidence of sub-functionalization or neo-functionalization. Functional analysis of diverged gene pairs indicates enrichment for different functional classes. The analysis of selection pressures indicated that the syntenic orthologous gene pairs can be assigned to those with conserved function, subfunctionalization and neo-functionalization, resulting from different selection pressures. S. indicum has experienced distinct genomic events at different evolutionary stages, with each resulting in extensive changes in composition of gene pairs. Moreover, it appears that some duplicated gene pairs subsequently emerged with a distinct evolutionary fate under diversifying selection, including sub-functionalization and neo-functionalization. Taken together, these results suggest that these classes of genomic event led to the introduction of extensive novel genomic materials resulting in different classes of gene pairs, with evidence of adaptive evolution under diversifying selection. This appears to have provided novel opportunities for species adaptation to changing environments.

Gene functional compensation followed by whole genome duplication and tandem duplication events
Based on the functional differences 1,059 InterPro entries were used to annotate duplicated genes and 634 to annotate tandemly duplicated genes in S. indicum. Of these, 344 had shared InterPro entries, with the remainder allocated to WGD-specific and TD-specific events. This analysis indicated that such gene pairs were mainly grouped into gene families involved in plant development and growth, but the TD-specific InterPro entries were mainly classified into gene families related to environmental influence. Based on genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis, Yu et al. (2014), demonstrated that the TD events led to an increase in gene dosage of NBS-encoding genes resulting in gene amplification, which may have some advantages for plant parasite defense [26]. The TD events giving rise to expansion of the NBS-encoding gene family is also likely to have benefited the resistance of S. indicum to the diseases and pests, and improve the adaptation to a changing environment. The WGD and TD events have brought specific genes with different functional features to the S. indicum genome, which appear to have been essential genomic ingredients for plant growth and development.
Where the WGD event has not brought sufficient functional components to meet the need for survival or increased fitness, the TD events have been a valuable mechanism to generate additional genomic ingredients to maintain plant fitness. We infer that this may be due to a critical mechanism for functional compensation in plant evolutionary history, and the mutual compensation of genes, through synergies with each other, jointly maintained the ruggedness of S. indicum.

Gene evolutionary dynamics arising from evolutionary events
The ancestral S. indicum genome has diverged from a common ancestor with the ancestral V. vinifera and inherited evolutionary evidence of ancestral gene orders. Subsequently, the ancestral S. indicum genome has experienced a WGD event around 71 (±19) Mya, which introduced extensive additional genomic materials leading to genome-wide chromosome fragmentation and rearrangement. TD events, which increase gene dosage and contribute to the expansion of gene families, have occurred over a long historical evolutionary period, although most of them have occurred mainly after the WGD event. The WGD and TD events increased gene dosage and improved the corresponding gene function, which will increase the likelihood of plant survival in changing environments. This can be explained by the gene-dosage balance hypothesis [29]. Subsequently, some duplicated genes or tandem duplicated genes experienced sub-functionalization or neofunctionalization under diversifying selection, which did not fit the gene-dosage balance hypothesis. So, the gene-dosage balance hypothesis might influence certain periods in the evolutionary history of S. indicum genome. Following each evolutionary event, functional components of the S. indicum genome have undergone subsequent gene functional divergence, and meanwhile also generated novel functional components. The WGD and TD events have independently supplied novel genomic materials, each complementing the other in terms of functional components, and both contributing to the additional functional features and ruggedness of the species.

Conclusions
The availability of the S. indicum genome sequence provides an opportunity to investigate the characterization of S. indicum genome, and to compare with genomic analogues in its closely relatives through a comparative genomics approach. By tracing the evolutionary history of S. indicum it appears that WGD and TD events occurred after the divergence of the predecessors of S. indicum and V. vinifera from a common ancestor. These evvents have also provided an extensive genomic resource to investigate the complexity of the S. indicum genome. According to syntenic relationship between S. indicum and V. vinifera, 60% and 70% of syntenic orthologous gene pairs were retained among Subgenome1 and Subgenome2 in S. indicum compared to V. vinifera. Based on selection pressure analysis, there was no evidence of significant differences between different subgenomes in S. indicum compared to V. vinifera. For the intra-genomic analyses, 5,932 duplicated gene pairs were retained 3,656 and 3,512 single-copy genes in Subge-nome1 and Subgenome2 compared to V. vinifera respectively, which meant that duplicated gene pairs in S. indicum have experienced fractionation. The co-retained 1,236 duplicated gene pairs in different subgenomes in S. indicum have undergone functional divergence under diversifying selection. From comparison of WGD and TD events, most of tandem duplicated genes were generated after the WGD, with others following the ancestral gene order indicating ancient tandem duplication at some time prior to the WGD. Our comparison of function analyses revealed that the WGD and TD evolutionary events were both responsible for introducing genes that enabled exploration of novel and complementary functionalities. Importantly, the comparison of gene families related to certain traits or phenotypes and their further exploitation may help us to uncover the intriguing evolutionary process of special traits or phenotypes in S. indicum, which can explore the phenotypic diversity due to the complexity of S. indicum genome.
We hope this provides a valuable biological model to study the mechanism of plant species formation, particularly in the context of the evolutionary history of flowering plants, and offers a novel insight for the study of angiosperm genomes.