An updated explanation of ancestral karyotype changes and reconstruction of evolutionary trajectories to form Camelina sativa chromosomes

Belonging to lineage I of Brassicaceae, Camelina sativa is formed by two hybridizations of three species (three sub-genomes). The three sub-genomes were diverged from a common ancestor, likely derived from lineage I (Ancestral Crucifer karyotype, ACK). The karyotype evolutionary trajectories of the C. sativa chromosomes are currently unknown. Here, we managed to adopt a telomere-centric theory proposed previously to explain the karyotype evolution in C. sativa. By characterizing the homology between A. lyrata and C. sativa chromosomes, we inferred ancestral diploid karyotype of C. sativa (ADK), including 7 ancestral chromosomes, and reconstructed the evolutionary trajectories leading to the formation of extant C. sativa genome. The process involved 2 chromosome fusions. We found that sub-genomes Cs-G1 and Cs-G2 may share a closer common ancestor than Cs-G3. Together with other lines of evidence from Arabidopsis, we propose that the Brassicaceae plants, even the eudicots, follow a chromosome fusion mechanism favoring end-end joining of different chromosomes, rather than a mechanism favoring the formation circular chromosomes and nested chromosome fusion preferred by the monocots. The present work will contribute to understanding the formation of C. sativa chromosomes, providing insight into Brassicaceae karyotype evolution.

, improved from AK, is recognized as ancestral state of lineage I ( Fig. 1 and 4a), based on the fact that most base common number of chromosomes is eight [5]. Besides, they reconstructed Proto-Calepineae karyotype (PCK, n = 7) as ancestral karyotype of 6 Brassicaceae tribes (Fig. 1). While PCK is inherited in three of the six tribes (Calepineae, Conringieae, and Noccaeeae), which belongs to lineage II, the rest three tribes (Eutremeae, Isatideae, and Sisymbrieae) is characterized by an additional translocation comparing to PCK, which is referred as translocation Proto-Calepineae Karyotype (tPCK, n = 7). Cheng et al. provided evidence that tPCK represents ancestral karyotype of the mesohexaploid B. rapa, the genus Brassica, and the tribe Brassiceae, by comparing three ancestral sub-genomes of Chinese cabbage (B. rapa) with PCK and tPCK [6].
Running through the evolutionary history of plant kingdom, polyploidization continually led to genome doubling/tripling, genome repatterning, and gene loss, characterizing genome instability and fractionation [7][8][9]. Interestingly, chromosome numbers could be much reduced to a kind of normal range after rounds of polyploidization. After two extra Brassicaceae-common duplications (BCD) [10], A. thaliana has only five base chromosomes. It was proposed that chromosome number reduction (CNR) was often the result of reciprocal translocations, which combined two chromosomes into a larger one and a smaller one, and the smaller chromosome got lost during meiosis [11]. For example, ACK and PCK shared the same karyotype of five chromosomes (AK1-4, and 7), AK6/8 and AK5/6/8 in PCK formed by reciprocal translocations between AK5, AK6 and AK8, resulting in chromosome number reduction from eight to seven [5]. Notably, an alternative telomere-centric model suggests that the removal of telomeres caused chromosome fusion and chromosome number reduction during the karyotype evolution, and explained the molecular dynamics of chromosome reformation [12]. Based on the telomere-centric model, ancestral karyotypes and evolutionary trajectories of chromosomes were reconstructed for Arabidopsis, grasses, and legumes [12][13][14].
The ancestral diploid karyotype of C. sativa (derivative of ACK, dACK) were reconstructed based on synteny and colinearity between C. sativa and Arabidopsis species [17]. However, dynamic changes of their formation or evolutionary trajectories of extant C. sativa chromosomes have not been well inferred. Here, using the theory of telomere-centric genome repatterning, we inferred a different ancestral diploid karyotype of C. sativa (ADK), and compared to previous inference, and reconstructed the ancestral karyotypes and evolutionary trajectories of the extant C. sativa genome. The present work will contribute to understanding the formation and evolution of the chromosomes in C. sativa and other Brassicaceae plants.

Inference of ancestral diploid karyotype of C. sativa
To understand the evolutionary trajectories of ADK before divergence of three C. sativa sub-genomes, we analyzed the syntenic conservation and chromosome repatterning between the genomes of the ancestor of lineage I and C. sativa. Here, we took the A. lyrata genome as the reference of ancestral genome of lineage I for the sake of the significant colinearity between their genomes ( Fig. S1 and Table S1) and high similarity between their karyotype. By searching homologous genes between them, we drew homologous gene dot-plots (Figs. 2 and 3), and showed orthologous correspondence between ancestral genomes of lineage I and C. sativa genomes.
In the homologous gene dot-plots of the two genomes, produced directly by using BLASTP hits and further highlighted by integrating inferred colinear genes, every chromosome in the ancestral genome has three homoeologous chromosomes or groups of homoeologous chromosome regions in C. sativa genome. We found that 5 ACK chromosomes had nearly perfect orthologous correspondence with at least one or more complete chromosomes in C. sativa (Fig. 3a, b, c, d, and e), showing that the integrity of each of these 5 chromosomes in ADK (correspondingly defined as ADK chromosomes 1, 2, 5, 6, 7), which directly inherited the chromosome structure of ACK (AK chromosomes 1, 3, 6, 7, 8) without prominent DNA rearrangements.
Notably, orthologous correspondence between AK2, 4, 5 and Cs4, 16 (Cs-G1) is nearly the same as that between AK2, 4, 5 and Cs6, 7 (Cs-G2) ( Fig. 3g and h), indicating that Cs-G1 and Cs-G2 shared two ancestral chromosomes, which majorly formed through reciprocal translocation of arms (RTA) and end-end joining (EEJ) between AK2, 4, 5. By searching shared gene synteny between A. lyrata and C. sativa genomes, we further found that the crossing-over positions between chromosomes (AK4, 5) were respectively between gene AL482377 (Corresponding C. sativa ortholog: Csa16g006880.1) and AL321151 (Csa04g046610.1) in AK4, and that between gene AL486375 (Csa04g046590.1) and AL486377 (Csa16g006870.1) in AK5. Actually, the following two evolutionary trajectories could explain the changes of these chromosomes. A relatively more complex evolutionary trajectory could occur as follows: AK2 and AK4 crossed over near one telomere of each of them, resulting in EEJ to produce AK2/4 and formation of a satellite chromosome of two telomeres (and possibly little DNA); then cross-over between AK5 and neo-AK2/4, which experienced one extra translocation and pericentric inversion, resulting in RTA between the two chromosomes to produce AK5/4 (ADK3) and ADK2/4/5 (ADK4) (Fig. 4c). An alternative trajectory could occur as follows: a cross-over between AK4 and AK5 resulted in reciprocal translocation of arms (RTA) to produce AK5/4, forming ADK3, and intermediate AK4/5. Then, AK4/5 and AK2 crossed over near one telomere of each of them, resulting in chromosome end-end joining (EEJ) to produce AK2/4/5 and likely formation of a satellite chromosome by two telomeres (and possibly little DNA). The neo-chromosome AK2/4/5 experienced one extra translocation and pericentric inversion to form ADK4 (Fig. 4d). No matter which trajectory was the actual one, the satellite chromosome likely produced was lost, eventually reducing the chromosome number from 8 in ACK to 7 in ADK.
Inferring evolutionary trajectories from ADK to extant C. sativa karyotype Shared chromosome structural patterns can help understand phylogenomic relationship. In homologous gene dot-plots, orthologous correspondence between AK7 (ADK6) and Cs10, 11, 12 ( Fig. 3d) suggested that one paracentric inversion is common to Cs10 and Cs11, respectively corresponding to Cs-G1 and Cs-G2, respectively, but not in chromosome Cs12 from Cs-G3. It suggested that Cs-G1 and Cs-G2 are not directly diverged from ADK, but share a common ancestor with one paracentric inversion as compared to ADK6. The formation process of the three sub-genomes and C. sativa genomes could occur as follows: the ancestral diploid of C. sativa differentiated into species A and B firstly, and then species A differentiated into species C and D after one paracentric inversion occurred in ADK6 (Fig. 5). Crossing-over between ADK6 and ADK7 occurred near one telomere of each chromosome in species C, resulting in chromosome end-end joining (EEJ) to produce ADK6/7 and formation of a satellite chromosome of two telomeres and little DNA. ADK5 in species D experienced one paracentric inversion independently ( Fig. 3c and 5). Crossing-over occurred between ADK3 and ADK4 in species B, which experienced one translocation, resulting in reciprocal translocation of arms (RTA) to produce ADK3/4 and ADK4/3, which experienced one pericentric inversion (Fig. 3j and 5). RTA between ADK5 and ADK7 in species B occurred to produce ADK5/7 and ADK7/5 ( Fig. 3j and 5). The crossing-over positions between chromosomes (ADK5, 7) are respectively between gene AL489681 (Corresponding C. sativa ortholog: Csa20g058860.1) and AL351869 (Csa02g002270.1) in ADK5 (the region where the centromere of ADK5 is located), and that between gene AL494932 (Csa20g041660.1) and AL494934 (Csa02g033470.1) in ADK7 (the region where the centromere of ADK7 is located). An initial hybridization event between species C (Cs-G1) and D (Cs-G2), resulting in a tetraploid genome, followed by an additional hybridization event between the tetraploid genome and species B (Cs-G3), eventually forming the extant hexaploid genome of C. sativa [17] (Fig. 5).
During the formation of the karyotype of C. sativa, 14 chromosomes of C. sativa inherited the chromosome structures of ADK ones. While one paracentric inversion occurred in Cs-G2 to produce one new chromosome, two RTAs occurred in Cs-G3 with one translocation and pericentric inversion to produce four new chromosomes. EEJ occurred in Cs-G1 to produce one new chromosome and one satellite chromosome. The loss of the satellite chromosomes resulted in the chromosome number reduction from 21 to 20.

Discussion
A telomere-centric theory shows the likely karyotype changes likely involve the production of free-end chromosomes, which were eventually inserted into other chromosomes, and/or the end-ending joining of different chromosomes [12,13]. Actually, chromosome may form a circular form and cross-over may occur near its two telomeres, and the resolution of the cross-over may produce a telomere-free chromosome and a satellite chromosome of two telomeres and little DNA; the telomere-free chromosome may invade another chromosome and eventually result in the merge of the invading one into the invaded one, referred to nested chromosome fusion (NCF). Alternatively, two chromosomes may cross over near one telomere of each chromosome, resulting in chromosome end-end joining (EEJ) and formation of a satellite chromosome. Besides, reciprocal translocation of arms (RTA) may although occur. The loss of satellite chromosome explains chromosome number reduction. Here, we used the telomere-centric model to update the explanation of the Brassicaceae karyotype evolution.
Though karyotypes that we inferred are of the same chromosome numbers in key evolutionary nodes as in previous [17], the karyotypes or the chromosome formations are updated. The ancestral diploid karyotype of C. sativa, inferred by previous study, only involved chromosome correspondence between AK2 and AK4, but ignores that AK5 should have also taken part in the formation of the ADK3 and ADK4, which is strongly suggested in homologous gene dot-plots between A. lyrata and C. sativa genomes (Fig. 3j, h; Fig. 4). Comparing to previous study [17], we further inferred the evolutionary trajectories from ADK to extant C. sativa karyotype, which involved one EEJ and two RTAs. Previously, the three sub-genomes of C. sativa were regarded as having no genome fractionation bias [17]. However, the sharing two ancestral chromosomes by Cs-G1 and Cs-G2, having evolved from ADK chromosomes, provided clear evidences to show their higher similarity and less fractionation as compared to the other chromosome.
Chromosome number reduction (CNR) in Brassicaceae plants took always the end-end-joining or EEJ mechanism rather than the nested-chromosome fusion or NCF mechanism. NCF and EEJ, which can generate satellite chromosome(s), the loss of which resulted in the CNR. Interestingly, the occurrence of the two mechanisms of CNR always showed an obvious plant family preference. The number of occurrences of the two mechanisms in grass family is summerized as follows: from the common 12 ancestral chromosomes, 7 NCFs and 0 EEJ occurred to produce 5 extant Brachypodium chromosomes, 5 NCFs and 1 EEJ to form wheat chromosomes, 1 NCF and 0 EEJ to form foxtail millet chromosomes, 13 NCFs and 4 EEJs to form maize chromosomes. In summary, there are 23 NCFs and 5 EEJs occurring independently to form extant grass chromosomes, showing NCFs were significantly more preferred than EEJ (Chisq-test P-value ≈ 0.02395). In contrast, the CNR during the formation of A. thaliana chromosomes from eight ancestral chromosomes involved only three EEJs but not NCF [12]. Similar to A. thaliana, the formation of ADK, and the formation of the extant hexaploid genome of C. sativa, EEJ is the only mechanism that causes CNR. This shows an exclusive preference of EEJ in Brassicaceae. A significant preference of EEJ over NCF was also observed in legumes. Though the sampled families are still too limited, it seems that eudicots prefer EEJ, and monocots prefer NCF, resulting in CNR.
While homologous gene dot-plots are always used to infer gene colinearity in a genome or between genomes, and multiple layers of gene colinearity would suggest the occurrence and ploid levels of polyploidization [10,22]. Using the assistance of homologous gene dot-plots, it was shown that the cucurbits shared a tetraploid ancestor overlooked by multiple genome sequencing efforts [22], highlighting its unelectable values in genome structure analysis. Besides, homologous gene dot-plots can also intuitively show chromosome changes and trace of genome repatterning [12][13][14]. A recent effort characterized gene colinearity between more than ten legumes and reconstructed the karyotypes of ancestral nodes during the divergence of legumes and evolutionary trajectories of legume chromosomes [14]. Here, we exploited the gene colinearity patterns in homologous dot-plots and inferred karyotype evolution and even phylogenetic relationship in Brassicaceae plants, further consolidating its usage in ancestral karyotype inference.

Conclusions
By using the telomere-centric model, we inferred ancestral diploid karyotype of C. sativa (ADK), including 7 ancestral chromosomes, and reconstructed the karyotype evolutionary trajectories leading to the formation of C. sativa genome. The process involved 2 chromosome fusions. By the analysis of chromosome structure and karyotype evolution, we found that sub-genomes Cs-G1 and Cs-G2 may share a closer common ancestor than Cs-G3. The present work will contribute to understanding the formation and evolution of the chromosomes in C. sativa and other Brassicaceae plants.

Dot-plot generation
We used BLASTP [24] to search for homologous pairs (E-value < 1 × 10 − 5 ) between every possible pair of chromosomes in two genomes. The best, second best, and other matches with E-value >1e-5 were displayed in different colors, to help distinguish orthology from paralogy, or layers of paralogy as a result of recursive WGD events. Dot-plots were produced using home-made Python scripts.

Circos diagram generate and inferring positions of breakpoints
Homologous pairs detected by BLASTP were used as input for ColinearScan 1.01 [25] to obtain syntenic regions between A. lyrata and C. sativa genomes. The maximum gap length (mg) was set to be 50 intervening genes between neighboring genes in colinearity on both chromosomes. Circos diagram of the two genome were produced by TBtools [26] based on the colinearity. Searching the syntenic regions which were involved in RTA to find out the boundary of these regions. The positions of breakpoints were between the boundary of the two syntenic regions.