Genes gained and lost during the evolution and domestication of Brassica napus CURRENT

Background: Brassica napus is one of the most important sources of vegetable oil for human nutrition and biofuel. It is an allotetraploid formed about 7500 years ago by hybridization between B. rapa and B. oleracea . Results from earlier studies show that the allopolyploidization process was accompanied by rapid and intensive changes and abundant homoeologous exchanges between the subgenomes have been accumulated during its short history of evolution. Results: By comparing differences between 19 artificially synthesized and 30 natural genotypes, we assessed possible changes in gene ratio, diversity and functional groups during the evolution and domestication of this species. This comparison revealed that gene ratio and diversity between the two subgenomes have hardly changed. However, large numbers of genes have been lost and many new genes gained. Compared with the artificial genotypes, the natural ones contain much lower proportions of genes conferring resistance and tolerance to biotic and abiotic stresses but much higher proportions of genes associated with seeds development and metabolic processes. The diploid donor for the A subgenome of B. napus contributed more genes involved in agronomic traits and the C subgenome donor contributed more genes related to cellular development and metabolic process. Conclusions: Our results show that genes conditioning resistance and tolerance to both biotic and abiotic stresses have suffered stronger selection during the evolution and domestication of B. napus , and that changes in different aspects including gene content and genome size in the allotetraploid are not random but dictated by its two diploid donors.

events. Wheat [3] and Brassica [4] are among the species for which rapid and massive changes in DNA sequences were detected in newly synthesized allopolyploid genotypes. However, such changes were not detected in either cotton [5] or Arabidopsis [6]. Further, asymmetric evolution, measured with different characteristics including gene number, genetic diversity or methylation, has been widely reported for various species of allopolyploids. It is widely believed that different subgenomes of a polyploid species may respond differently to internal as well as external stimuli [7]. Possible evolutionary advantages of subgenome asymmetry have been proposed in various species [8][9][10][11][12][13].
Brassica napus is one of the most important oil crops with a world annual production of over 60 million tonnes [14,15] and it has also been widely used as a model species in studying allopolyploid evolution [15,16]. It is a recent allopolyploid evolved from the spontaneous hybridization between B. rapa (2n = 20; genome AA) and B. oleracea (2n = 18; genome CC) occurred some 7,500 years ago [15,[17][18][19][20].
The initial report on rapid and extensive changes in DNA sequences in the early generations after polyploidization were based on a study of analysing artificially synthesized allopolyploid genotypes of B. napus and B. juncea with the use of several DNA probes [15]. Sequencing and assembling the genome of the diploid B. oleracea [7] and comparing it with that of B. rapa (The Brassical rapa genome sequencing project consortium 2011) detected asymmetrical evolution in B. napus [7].
Sequencing the genome of the allotetraploid B. napus revealed that abundant homeologous exchanges may have occurred between its two subgenomes [20].
To enrich the genetic diversity depleted from interspecific hybridization and intensive selection, artificially synthesized genotypes analogous to the natural allotetraploid B. napus were obtained and used in breeding programs [21][22][23][24]. Different from those natural genotypes, the artificial genotypes have hardly been exposed to evolution and domestication. Thus, differences between the artificial and the natural genotypes reflect to a large degree changes accumulated during the evolution and domestication of this species. Further, recent studies have shown the significance of dispensable genome components [25][26][27] thus the potential issues with the use of a single genotype to represent any species in different studies. The sequences for multiple genotypes for both the natural and artificial allotetraploids genotypes of B. napus offer a unique opportunity to investigate possible changes in genes accumulated during the evolution and selection of this important crop species.
Further, the availability of sequences from multiple genotypes for each of the two diploid donor species [7, 28-30] makes it possible to differentiate genes contributed by either of them in the allopolyploids.

Results
Genes lost and gained during the evolution and domestication of the allotetraploid Hits for about 90% of the reads from the 30 natural and 19 artificial B. napus genotypes were found in the reference genome of B. napus and they contain a total of 46,941 genes ( Fig. 1). Hits not found in the reference genome accounted for about 9.2% of the reads from the natural genotypes and 11.7% of those from the artificial genotypes. Assembling these unmapped reads based on SOAP denovo identified 24,505 unique genes from the natural genotypes, and 24,879 unique genes from the artificial genotypes (Additional file 1: Table S1, Fig. 1). Thus, the ratio of genes shared between the artificial and natural genotypes is about 48.7%, and the unique genes detected in both the artificial and natural genotypes accounted for about 51.3% of all genes detected in this study. Difference in functional groupings of genes between natural and artificial genotypes GO annotation assessments against the 24,505 unique genes from the natural genotypes identified 80 enriched GO terms (Fig. 2, Additional file 2: Table S2, S3). They include those involved in plant development including 'transferase activity ', 'macromolecule metabolic process', 'cellular macromolecule metabolic process', 'phosphorus metabolic process', 'seed development', and 'fruit development'. A similar assessment against the 24,879 unique genes from the artificial genotypes identified 48 enriched GO terms. Different from those from the natural genotypes, most of the enriched GO terms are related to responses to stresses. Specifically, they include 'response to stress', 'chromosome organization', 'cell cycle', 'cell cycle process', 'cell division' and 'mitotic cell cycle process'.
KEGG pathway analyses found that, compared with those in the artificial genotypes, significantly enriched pathways in the natural genotypes are involved in 'Phosphatidylinositol signaling system', 'RNA degradation', and 'Inositol phosphate metabolism'. These pathways have been shown to regulate pollen tube growth, root hair tip growth, flowering and maturation [31][32][33][34]. In contrast, pathways of 'Aminoacyl-tRNA biosynthesis', 'Propanoate metabolism' and 'Ribosome biogenesis in eukaryotes' were significantly enriched among the unique genes in the artificial genotypes (P < 0.05) (Additional file 3: Table S4, Fig. 3). Previous studies show that these pathways are predominantly involved in drought stress and disease response [35][36][37][38]. These results indicate that many genes related to stress responses have been lost and that genes related to seeds development and metabolic processes have been significantly increased during the evolution and domestication of this species.

Difference in B. napus genes derived from the two different diploid donors
To minimize possible influences of dispensable genome components (DGCs), transcriptomic data from 30 B. rapa and 26 B. oleracea were used to assess genes derived from either of these two diploid donors in the formation and domestication of the attotetraploid B. napus. With the use of an identity threshold of 95%, 39,583 non-redundant CDS (coding sequences) were identified from the B. rapa genotypes and 42,521 from the B. oleracea genotypes. Aligning these CDS against the genome of the allotetraploid B. napus found that the C subgenome shared 52.8% of its genes with its diploid donor B. oleracea and that the A subgenome shared 47.2% of its genes with its diploid donor B. rapa.
Clearly, the ratio of shared genes between the A subgenome and its diploid progenitor B. rapa was substantially lower than that between the C subgenome and its diploid progenitor B. oleracea (Table 1). GO term analysis showed significant differences in functional groups of genes derived from the two diploid progenitors. Of the genes shared between the natural and artificial genotypes, the numbers of enriched GO terms for two ('molecular function' and 'biological process') of the three functional classes were significantly higher for genes derived from B. rapa than those from B. oleracea. Those genes derived from B. rapa were enriched in transcription factor, plant-type development and negative regulation of several biological process. Of the genes unique in the artificial allotetraploid genotypes, the numbers of enriched GO terms from B. oleracea were significantly larger than those from B. rapa ( Table 2). The most significantly enriched GO terms included cellular development, response to abiotic stimulus, metabolic process and positive regulation of several biological processes ( Fig. 4; Additional file 4: Table S5,6,7). Table 2 Numbers of enriched GO terms among "retained" and "lost" genes derived from two diploid donors. "retained" genes represent genes shared between natural and artificial B. napus, "lost" genes represent unique genes in the artificial allotetraploid genotypes Discussion studies detected rapid and extensive changes in DNA sequences immediately following the polyploidization events [15] and asymmetric evolution with abundant homeologous exchanges has been reported from sequencing genomes of the allotetraploid [20] and its diploid donor species [7, 28]. Thus, difference in gene ratio between the two subgenomes between the artificial and natural genotypes was expected. Similar to those reported in previous studies, we also detected subgenome asymmetry in this study and found that the A subgenome contains a smaller number of genes than the C subgenome in each of the allopolyloid genotypes assessed. Our results also

GO Term
show that the genes contributed to the allotetraploid B. napus differ in functional categories and they also suffered different selection pressures during evolution and domestication. However, little change in gene ratios between the artificial and natural genotypes was detected, although the two groups of genotypes differ dramatically in time exposed to evolution and domestication. The most likely explanation to this lack of difference in gene ratio between these two groups of genotypes is that changes, including homeologous exchanges and gene conversion events in the evolution and domestication [19,20], may not be random but dictated by the founding genomes. This likelihood is further supported by the facts that the differences in both genome size and relative diversity between the two subgenomes of B. napus [20] also exist between its two diploid donors [7, 14,39,40].
Although sequences from multiple genotypes were used for each of the species investigated, they unlikely captured all genes present in any of them. Clearly, only expressed genes can be detected from the transcriptomic sequences, and the expression of many genes can be tissue-specific [41][42][43]. Importantly, obtaining all genes from any of the genotypes used was not the intention of this study and it is also not required for estimating either gene ratios between subgenomes or relative enrichment of genes in a given categories.
Differences in gene functional groupings between the artificial and natural genotypes showed that the genes conferring resistance and tolerance to both biotic and abiotic stresses were reduced and that genes involved in vegetative and productive growth were increased during the evolution and domestication of B. napus. The change for genes conferring resistance to biotic stresses detected during evolution and domestication was similar to that detected in an earlier study by assessing a selected panel of resistance genes [27]. The likelihood that genes conferring disease resistance tend to suffer stronger selection than genes in other categories during domestication has also been reported in other species including common bean (Phaseolus vulgaris) [44] and barley [45]. The reduced numbers of genes conferring tolerance to biotic and abiotic stresses during evolution and domestication provide further evidence suggesting that investment in defence requires allocation of limiting resources and hence trade-offs with other traits [46] or that defence strategies could have harmful pleiotropic effects [47][48][49].
It is of note that as many as 51% of the genes detected in this study were unique to either the artificial or natural genotypes of B. napus. Considering from the pan-genome point of view, they represent the dispensable genome component (DGC). This percentage is more than doubled that reported for this species in an earlier study [25].
However, it is comparable with those reported in maize [50,51] and rice [52]. Clearly, DGCs would vary depending on not only the number of genotypes but also the relative diversity of the genotypes assessed. The size of the DGC would increase with the increase in the numbers of individuals assessed for a given species and assessing genotypes with a wider diversity range should lead to larger DGCs. An earlier study showed that the artificial genotypes are more diverse than the natural ones [25]. Thus, the inclusion of both artificial and natural genotypes in this study must have contributed to the large DGC obtained.
Interestingly, genes in the DGC are similar to those 'lost' during the evolution and domestication in that they all contain higher proportions of genes conferring disease resistance and lower proportions of genes related to vegetative and productive growth [44,45]. The results from this study indicate that the similarity is unlikely due to the possibility that most of the genes lost during evolution belong to DGCs. The results from comparing the artificial and natural genotypes of B. napus showed that the numbers of 'lost' and 'gained' genes during evolution and domestication are similar ( Fig. 1; Additional file 1: Table S1). It may not be difficult to understand why wild genotypes contain higher percentages of genes conferring tolerance to biotic and abiotic stresses as traits

Conclusions:
By comparing artificial and natural genotypes of B. napus, we found that, although gene ratios between the two subgenomes having hardly changed, genes conditioning resistance and tolerance to both biotic and abiotic stresses have suffered stronger selection and the proportions of genes associated with seeds development and metabolic processes have increased during the evolution and domestication of this allotetraploid species. Our results also show that the functional groups of genes contributed by the two diploid donors seem to differ, with the A subgenome donor contributed more genes involved in agronomic traits and the C subgenome donor contributed more genes related to cellular development and metabolic process. Available evidences all point to the likelihood that changes accumulated during the evolution of B. napus are not random but dramatically affected by the genomes of its two diploid donors.

Genome and transcriptome sequences used
Sequences used in this study were all in the public domains. They included the genome sequences from 30 natural genotypes of B. napus and 19 artificially synthetic genotypes analogous to the natural allotetraploid B.
napus (artificial B. napus) [16]. The artificial genotypes were obtained from independent hybridizations between the A-subgenome donor B. rapa and the C-subgenome donors B. oleracea. These genomic sequences were paired-end 100-bp reads generated on an Illumina HiSeq2000 (Illumina Inc., San Diego, CA) (Additional file 5:

Genomic sequence analyses
The genomic sequences were assessed using the software FastQC (version 0.11.5) [53], and then trimmed and filtered using the software SolexaQA++ [54]. Low quality reads were removed with the criteria of Q < 30 and length < 50 bp. Following quality controls, the remaining reads from the 30 natural and 19 artificial genotypes were assessed against the reference genome of B. napus (GCF_000686985.2_Bra_napus_v2.0) using Bowtie2 [55] v2.2.6 (-end-to-end -sensitive). Unmapped reads were assembled using SOAPdenovo2-r240 [56] with the default settings based on k-mer size 37. The GapCloser (including in SOAPdenovo2 package) was used to reduce gaps among assembled scaffolds.
To remove contaminants, the assembled scaffolds were checked using the stand- (100 amino acids). Detailed information of reads used for mapping, assembly and annotation can be found in Table S1.

Transcriptomic sequence analyses
Following quality controls same as the procedure for genomic sequences, transcriptomic sequences from each of the two diploid species were assembled using Trinity (version 2.5.1) [59] with K-mers = 25 and de novo assembled with a minimum length of 200 bp. Redundant sequences were then removed using the cd-hit package (version 4.6.4) [60] with a sequence similarity threshold of 95%. CDS from the assembled sequences were then predicted using TransDecoder (version 5.0.2) (https://github.com/TransDecoder/TransDecoder/releases).

Identification of shared genes and their functional prediction
In estimating genes shared between the subgenomes with their respective diploid donors, BLAST+ (version 2.7.1) [61] was used to identify syntenic genes with a minimum E-value of 1e-5, a minimum coverage size of 100 bp and a minimum identity threshold of 85%. The sequences from the diploid genotypes were first used as a query to BLAST against the sequences of the allotetraploid genotypes and a reciprocal analysis was then carried out by using the polyploid sequences as a query to BLAST against the diploid sequences. A custom script [62] was used to retrieve all those sequences where the polyploid or diploid sequences gave the best hits. These sequences were used to calculate the percentages of genes shared between a given subgenome and its diploid donors. GO

Supplementary Information
Additional file 1:      Table S6: GO terms for genes found in the artificial genotypes of B. napus and derived from either of its diploid donors. Table S7: GO enrichment analysis of lost and retained genes derived from its two diploid donors.
Additional file 5:     Enriched KEGG pathways which are significantly different between the natural and artificial genotypes of B. napus. The numbers of KEGG pathways are given on the X-axis, and those marked with '*' indicate significant difference between input genes and backgrounds.

Figure 4
Heatmaps for allotetraploid genes derived from either of the diploid donor species. 'Lost genes' are those present in only the artificial genotypes and 'retained genes' are those found in both the natural and artificial allotetraploid genotypes. The three heatmaps represent the three different GO categories, 'A' for cellular component; 'B' for molecular function, and 'C' for biological process. Rows represent different GO terms, and different colors represent the ratio of input gene numbers and background gene number for each of the GO terms. The blue and black brackets at the left side represent GO terms for genes from B. rapa and B. oleracea, respectively. All GO terms showed significant differences between input genes and backgrounds.