The plastid and mitochondrial genomes of Vavilovia formosa (Stev.) Fed. and the phylogeny of related legume genera

The plastid and mitochondrial genomes of Vavilovia formosa (Stev.) Fed. were assembled on the base of the data of high-throughput sequencing of DNA isolated from a sample from North Osetia, Russia, using Illumina and PacBio platforms. The long PacBio reads were sufficient for reliable assembling organellar genomes while the short Illumina reads obtained from total DNA were unacceptable for this purpose because of substantial contamination by nuclear sequences. The organellar genomes were circular DNA molecules; the genome of mitochondria was represented by two circular chromosomes. A phylogenetic analysis on the basis of plastid genomes available in public databases was performed for some representatives of the tribes Fabeae, Trifolieae and Cicereae. As was expected, the V. formosa branch proved to be sister to the Pisum branch, and the tribe Fabeae was monophyletic. The position of Trifolium L. appeared sensitive to the phylogeny reconstruction method, either clustering with Fabeae or with the genera Medicago L., Trigonella L. and Melilotus Mill., but the internodes between successive divergences were short in all cases, suggesting that the radiation of Trifolium, other Trifolieae and Fabeae was fast, occurring within a small time interval as compared to further evolution of these lineages. The data on the relatedness of the plastid genomes of Trifolium and Fabeae correlate with the similarity of N2-fixing symbionts in these legumes represented by Rhizobium leguminosarum biovars trifolii and viciae, while the symbionts of Medicago, Melilotus and Trigonella belong to the Sinorhizobium meliloti and S. medicae species, which are distant from Rhizobium.


Introduction
Vavilovia formosa (Stev.) Fed. is a small perennial herbaceous legume confined to highlands of the Caucasus and Anterior Asia (Davis, 1970;Vishnyakova et al., 2016). Although morphologically variable, it is traditionally considered the only member of the monotypic genus Vavilovia A. Fedorov. Morphological and molecular data suggest it to be the closest genus to Pisum L. (peas, annual plants), to which the important crop Pisum sativum L. belongs. Both genera belong to the tribe Fabeae Rchb. Recently, H. Schaefer et al. (2012) reconstructed a phylogeny of this tribe and showed that Pisum and Vavilovia form a clade inside the speciose genus Lathyrus L., making it paraphyletic. They propose to subsume Pisum and Vavilovia to Lathyrus but the corresponding nomenclatorial change for Pisum is not yet broadly accepted and that for Vavilovia has not been formally made.
At present, genomic research is extensively conducted in both fundamental (e. g., phylogenetics, phylogeography, evo lutionary theory and taxonomy, comparative and functional genomics) and applied (e. g., QTL analysis, association mapping, and markerassisted selection) aspects. Organellar genomes are among the most popular research objects, since their relatively small size allows their sequencing through the highthrough put approach rather easily. Since mitochondria and plastids have the apparently symbiotic origin (from αproteobacteria and cyanobacteria, respectively), their re search may shed light on the coevolution of plants with microorganisms that gene tical ly and functionally interact with these organelles. Specifically, the N 2 fixing symbionts of legumes, rhizobia, form organellelike compartments, symbiosomes, inside the plant cells. Being metabolically integrated, they apparently coevolve with plastids and mitochondria (de la Peña et al., 2018).
Thus far, plastid genomes have been sequenced in many legume genera, including Vicia, Lathyrus, Pisum, Lens, Cicer, Trifolium, Medicago etc., whereas complete mitochondrial genomes are available only for 14 species of Fabales (ncbi. nlm.nih.gov, accessed on March 5, 2018). This is due to the nature of plant mitochondrial DNA, which generally occurs as a set of interconverting subgenomic molecules as a result of homologous recombination between repeted regions (reviewed in Gualberto et al., 2014). For this reason, plant mitochondrial genomes are more difficult to assemble than those of plastids (Smith, Keeling, 2015).
The plastid and mitochondrial genomes of V. formosa are not available yet, in spite of explosive interest to this species in the recent decade (Akopian, Gabrielyan, 2008;Mikić et al., 2009Mikić et al., , 2013Mikić et al., , 2014Sinjushin et al., 2009;Akopian et al., 2010Akopian et al., , 2014Atlagić et al., 2010;Oskoueiyan et al., 2010;Sinjushin, Belyakova, 2010;ZemerskiŠkorić et al., 2010;Zorić et al., 2010;Vishnyakova et al., 2013Vishnyakova et al., , 2016Safronova et al., 2014Safronova et al., , 2015. This interest was motivated by V. formosa being although the most distant but still a pea crop wild relative, which may harbor some genes useful for pea prebreeding and somehow transferrable to pea. The phylogenetic tree of most species of the tribe Fabeae has been extensively and reliably reconstructed by H. Schaefer et al. (2012), but the positions of the genera evolutionary closest to this tribe are problematic. According to the traditional taxo nomy, the tribes most related to Fabeae are Cicereae Alefeld, with the only genus Cicer L., and then Trifolieae (Bronn) Benth., with the genera Trifolium s.l., Medicago L. s.l., Trigo nella L., Melilotus Mill., Parochetus Buch.Ham. ex D. Don. and Ononis L. (Yakovlev, 1991). These genera comprise the socalled 'vicioid clade' in the sense by M.F. Wojciechowski et al. (2004). However, in the phylogenetic tree reconstructed by M.F. Wojciechowski et al. (2004) based on the plastid gene matK there was a highly supported divergence between the branches including (i) the genera Ononis, Medicago, Trigonella, and Melilotus and (ii) the genera Trifolium, Vicia, Pisum, Lathyrus, and Lens, the last four genera belonging to Fabeae. Interestingly, this classification is correlated to the plant symbiotic affinities to N 2 fixing nodule bacteria, since the plants from the second branch are inoculated by closely related symbionts: Trifolium by Rhizobium leguminosarum bv. trifolii, Vicia, Pisum, Lathyrus, Lens by R. leguminosarum bv. vicieae), while Medicago, Trigonella, and Melilotus have distant symbionts from the Sinorhizobium genus (Biondi et al., 2003;Dudeja, Nidhi, 2013). The genus Cicer (traditionally attributed to the monophyletic tribe Cicereae) appeared sister to branches (i) and (ii) taken together. The more proximal branch contained the genus Galega L. (traditionally attributed to the large tribe Galegae (Bronn) Torr. ex Gray), and the first divergence again was formed by a Trifolieae representative Parochetus.
The phylogenetic analysis of Fabeae by H. Schaefer et al. (2012), based on five plastid and one nuclear (ribosomal ITS) sequence, included an outgroup consisting of three Trifolium species, three Medicago species, one species Melilotus, and one Ononis species. Although the entire tree including the outgroup is inevitably unrooted, there was a node with bootstrap support 100 uniting the genus Trifolium with the tribe Fabeae, which agrees with the result by M.F. Wojciechowski et al. (2004).
Their coherent results meant that the evolutionary lineage including Fabeae and Triolium is sister to the lineage including most of the rest of the tribe Trifolieae in the traditional sense, thus making Trifolieae paraphyletic. It is noteworthy that the phylogenetic marker used by M.F. Wojciechowski et al. (2004) and five of six markers used by H. Schaefer et al. (2012) were plastid sequences. However, a Maximum Likelihood phylogenetic analysis based on 28 nuclear sequences showed Trifolium to cluster with Medicago and form a branch opposed to Pisum (Kreplak et al., 2019, Fig. 2, b), which is a pattern corresponding to the traditional systematics.
The plastid and mitochondrial genomes of Vavilovia formosa (Stev.) Fed. and the phylogeny of related legume genera In view of this controversy of the phylogenetic position of Trifolium, it was interesting to consider a phylogenetic tree reconstructed from complete or nearly complete plastid genomes.
In this work we (i) for the first time report the complete DNA sequence of both plastid and mitochondrial genomes of Vavilovia formosa and (ii) use the plastid genome to reconstruct the phylogeny of several legume genera. The obtained data allow us to address the correlation between the plastid based phylogeny of legumes and their symbiotic affinities presumably reflecting the tight functional and coevolutionary interactions of plastids with temporal N 2 fixing organelles, symbiosomes (de la Peña et al., 2018).

Materials and methods
Plant material. Vavilovia seeds were provided by the Gorsky State Agrarian University in Vladikavkaz. They represent a V. formosa population in the North Ossetian State Natural Reserve, North Ossetia, the Caucasus, Russia.
DNA isolation and high throughput sequencing. DNA from Vavilovia plant tissues was isolated with AxyPrep™ Multisource Genomic DNA Miniprep kit according to manu facturer's recommendations. Whole genome sequencing was done on the Illumina and PacBio platforms in the Macrogen genome sequencing company (Korea).
Organellar genome assembly. To assemble the plastid genome of Vavilovia, the reads were filtered with the Mirabait utility of the MIRA4.0 package (Chevreux et al., 1999) using the sequence of the Pisum sativum chloroplast genome (NC_014057) as a probe. A subset of sequences longer than 10 kb was searched to find a read containing the starting point of the assembly, the trnH gene. Then a read overlapping the initial read was selected, the reads were merged, and the dataset was searched for a next read to elongate the assembly. The assembly was elongated in such a manner, until it closed into a circle. It was used as a reference sequence for mapping the Vavilovia plastid genome with MIRA4.0 (Chevreux et al., 1999).
Two assemblies were made, one starting from long PacBio reads and the other from short Illumina reads. It is commonly accepted that the best results are gained by combination of these two types of reads (see e. g. Gnerre et al., 2011). The comparison of the two assemblies of Vavilovia plastid genome revealed that there appeared various regions with a lot of discrepancies. While the assembly of long reads corresponded well to the reference sequence, the assembly of short reads had a number of mismatches, such as nucleotide substitutions and short indels.
To understand the origin of the discrepancy, some of such regions were checked more carefully. For example, the region corresponding to nucleotide positions 39,400-40,000 of the reference sequence contained 17 mismatches within about 120 bp of alignment, possibly due to the nuclear origin of the reads involved into the assembly. Since short Illumina reads (up to 150 bp) do not permit to investigate their genomic environment, all long reads of our dataset that shared homology with that region were checked whether they belonged to the chloroplast genome indeed. A sample of 939 PacBio reads longer than 10 kb was filtered with the Mirabait utility (Chevreux et al., 1999) using the abovementioned stretch of 600 bp of the reference sequence. In total, 89 reads were filtered out. Of them, 80 lay entirely in the plastid genome, while the other 9 matched the assembly partially, sharing with it DNA stretches of variable lengths, 300 to 16,000 bp. These 9 reads were used as a query for a BLAST search of the nonredundant database at ncbi.nlm.nih.gov (Altschul et al., 1990). Three of them appeared to correspond entirely to the mitochondrial genome. This is quite natural, since the mitochondrial genome shares about 2.5 kb of homologous DNA stretches with the chloroplast genome, as evidenced from the comparison of the Vicia faba L. mitochondrial genome (KC189947) with the Pisum sativum plastid genome (NC_014057).
The remaining six reads contained a 300-500 bp region of homology with plastid/mitochondrial DNA, but the rest part had either no homology in the nonredundant database or 1000-1500 bp stretches of homology with genomic clones of some leguminous plants. Most probably, these reads re presented nuclear copies of plastid genes. The DNA stretch corresponding to the region 54,400-55,200 of the reference sequence had 15 mismatches per 600 bp of the assembly. A total of 88 reads (longer than 10 kb) that had homology to this region were filtered out. Four of them matched the plastid genome partially, with 8-16 kb corresponding to the plastid genome and 600-2700 bp with no significant similari ty in the nonredundant database. Other two randomly taken regions had no obvious discrepancies in the assembly made of short reads with the reference sequence. Seventyeight reads (longer than 10 kb) were filtered out that passed across the region 60,000-60,500. One of them contained a stretch of 2700 bp that did not match the plastid DNA. All of the 119 long reads passing across the region 80,000-80,500 entirely matched the plastid DNA.
Based on the above observations, a conclusion was inferred that discrepancies in the assemblies made from long vs. short reads arose due to the presence of nuclear copies of plastid DNA of various lengths, from about 300 to 16,000 bp, with the mean of about 7,000 bp. Therefore, the assembly of long PacBio reads was considered more appropriate for plastid genome reconstruction.
The resulting assembly was reasonably consistent. The total amount of mismatches was 0.25 %, and the average coverage depth was 78. These values suggested that the PacBio reads were sufficient for reliable assembling an organellar genome, while the short Illumina reads obtained from total DNA were unacceptable for this purpose because of substantial contamination by nuclear sequences.
The mitochondrial genome was assembled in a similar manner, with the original filtering of reads using the V. faba mitochondrial genome (KC189947). The assembly consisted of two ring chromosomes with average coverage depth 84 and 59, and the total number of mismatches was 0.37 %.
The plastid genome of V. formosa was assigned the accession number MK604478, and the two chromosomes of its mitochondrial genome got the accession numbers MK48602 and MK48603 in public databases.
Alignment of plastid genomes for phylogenetic analysis. We undertook phylogenetic analysis of the plastid genomes available in public databases of some representatives of the tribes Fabeae, Trifolieae and Cicereae. The plastid genome sequence of Vavilovia formosa in general was not collinear to those of other Fabaceae, differing from them by a large number of structural rearrangements. To make alignment, homologous DNA stretches were found by Blastn software at ncbi.nlm.nih.gov and manually put in the order and orientation corresponding to the Vavilovia plastid genome. Each next stretch of homology was sought in the portion of the plastid genome of a species to be aligned that was not yet included in the reconstruction. Then the plastid genome of V. formosa and reconstructions of the plastid genomes of the other species were aligned with ClustalW (Larkin et al., 2007) incorporated into the MEGA6 package (Tamura et al., 2013).
The obtained plastid genome reconstructions of the genera in question covered 77 to 93 % of the Vavilovia plastid genome, whereas that of Glycine max, used as an outgroup, covered 70 % (Table).
Phylogenetic analysis. Bayesian MCMC analysis was per formed with the use of BEAST 2.4.3 software (Drummond, Rambaut, 2007). The GTR+I+G model was chosen using jModelTest 2.1.10 (Guindon, Gascuel, 2003;Darriba et al., 2012). An uncorrelated lognormal relaxed clock model and a Yule process of speciation were applied. One MCMC ana lysis was run for 100 million generations. Trees were visualized using the program FigTree 1.4.3 (http://tree.bio.ed.ac.uk/soft ware/figtree/) by A. Rambaut.
The Maximum Likelihood reconstruction of the phylogeny was made with the aid of the MEGA6 package (Tamura et al., 2013) using the Kimura 2p parameter, GTR+I+G model of mutation rates, and bootstrap test with 100 replications. Interestingly, it appeared impossible to construct a single master molecule of the Vavilovia mitochondrial genome. Instead, two 'chromosomes' were obtained (Fig. 1). However, this is quite consistent with the dynamic nature of plant mitochondrial genomes (Gualberto et al., 2014). Another curious fact concerns the Nad5 gene, which appeared to belong to both 'chromosomes', since its exons 1-3 reside in the first, larger 'chromosome', whereas exons 4-5 are in the second 'chromosome', thus requiring transsplicing to produce the entire coding sequence. A similar situation has been described in Silene L., where some species possess up to 128 mitochondrial 'chromosomes', with exons of many genes present in more than one 'chromosome' (Sloan et al., 2012).

Results and discussion
Phylogenetic analysis of the mitochondrial genomes of Vavilovia and related genera is impossible yet, as of the studied genera (see Table) complete mitochondrial genomes are presently available only for M. truncatula, V. faba and G. max (ncbi.nlm.nih.gov accessed on 22 August 2019).
The structure of the plastid genome of Vavilovia formosa. The content of the plastid genome of V. formosa is shown schematically in Fig. 2.
The total length is 122,196 bp, which is similar to the plastid genome size of Pisum, 122,180 bp in P. sativum (HG966674) and 120,837 bp in P. fulvum (MG458702). Expectedly, the gene content appeared very similar to that of Pisum. A notable difference is that the Vavilovia plastid genome has a tandem triplication of the tRNA gene for methionine. The three copies differ by nucleotide substitutions and a 5 bp long insertion/ Accession numbers, coverage information, and percentages of similarity to the V. formosa plastid genome in the plastid genome reconstructions studied The plastid and mitochondrial genomes of Vavilovia formosa (Stev.) Fed. and the phylogeny of related legume genera deletion. It is not known whether all the copies are functional. In addition, the gene order in Vavilovia differs from that of Pisum by 10 rearrangements. Phylogenetic analysis involving plastid genomes of some related legume genera including Vavilovia. Figure 3 shows the obtained Bayesian phylogeny reconstruction for representatives of the tribes Fabeae, Trifolieae and Cicereae based on the plastid genome reconstructions and using the soybean plastid genome reconstruction as an outgroup. As expected for so long sequences, all nodes of the obtained tree are well supported by high posterior probabilities. The tree topology is also expectable, and it corresponds to the phylogenetic reconstructions based on shorter sequences: the Vavilovia branch is sister to the Pisum branch, as in (Sinjushin et al., 2009;Oskoueiyan et al., 2010;Schaefer et al., 2012), and their united branch is sister to Lathyrus clymenum as in (Schaefer et al., 2012); Lens culinaris L. is sister to the two involved Vicia spp.; and the tribe Fabeae is monophyletic, as in (Wojciechowski et al., 2004;Schaefer et al., 2012). Thus, the phylogenetic position of Vavilovia is unequivocal.
As showing Medicago, Trigonella and Melilotus to be a sister branch to that uniting Trifolium and Fabeae, thus making the traditional tribe Trifolieae paraphyletic. The phylogeny reconstructed here by the Bayesian analysis of the complete (or nearly complete) plastid genomes is expected to be more reliable, and it is consistent with the aforementioned results by M.F. Wojciechowski et al. (2004) andH. Schaefer et al. (2012). However, one can notice that although the node uniting Trifolium with Fabeae has a robust support of the posterior probability of 0.86 (see Fig. 3), the branch leading to it after the divergence from Medicago is very short. The same is seen in the trees by H. Schaefer et al. (2012).
At the same time Trifo lium and other representatives of the traditional Trifolieae -Medicago, Melilotus, Trigonella, formed a united branch in the Maximum Likelihood tree with the highest possible bootstrap support (100), which is sister to Fabeae (Fig. 4). A similar pattern, with Medicago and Trifolium forming a branch sister to Pisum, was constructed by K. Kreplak et al. (2019), who made a phylogenetic reconstruction based on 28 nuclear sequences using the same Maximum Likelihood method. However, the branch leading to the traditional Trifolieae, including Trifolium, is again very short, both in our tree (see Fig. 4) and in the tree by (Kreplak et al., 2019, Fig. 2, b).
The fact that the positions of Fabeae, Trifolium, and other Trifolieae in the tree depend on the method of phylogeny re-   The plastid and mitochondrial genomes of Vavilovia formosa (Stev.) Fed. and the phylogeny of related legume genera construction but each time reveal a short branch suggests that the radiation of Medicago, Melilotus, Trigonella, Trifolium, and Fabeae was fast. Its duration was short as compared to further evolution of these lineages; that is, the last common ancestors of Fabeae and Trifolum, of Fabeae and the traditional Trifolieae, and of Trifolium and the rest of traditional Trifolieae existed at close times. H. Schaefer et al. (2012) reconstructed the crown age of Fabeae as Middle Myocene (23-16 mya). The relative lengths of branches in the reconstructed phylogenetic trees suggest that the radiation to Fabeae and Trifolieae took place ca 1.5-1.8 myr earlier, that is in the Oligocene. For the time being there is no unequivocal reason to reconsider the traditional taxonomy uniting Trifolium and Medicago as opposed to Fabeae.
The data on relatedness of the plastid genomes of Trifolium and Fabeae correlate to the similarity of N 2 fixing symbionts in these legumes represented by Rhizobium leguminosarum biovars trifolii and viciae (Dudeja, Nidhi, 2013), while the symbionts of Medicago, Melilotus and Trigonella belong to the Sinorhizobium meliloti and S. medicae species, which are distant from Rhizobium (Biondi et al., 2003). This might arise from the functional relationship between rhizobial bacteroids and host plant plastids, where nitrogen fixed by the bacteroids is included into the amino acids and amides synthesized inside plastids of the infected cells of the nodules formed by the host plant (de la Peña et al., 2018). In view of this interaction, one can suggest that the related rhizobial symbionts were acquired by Trifolium and Fabeae plants due to compatibility with the similar plastid genomes.

Conclusion
Thus, phylogenetic analysis of a sample of the available pla stid genomes of representatives of related legume genera, in cluding Vavilovia, reported here, confirmed the expected   phylogenetic position of Vavilovia itself but challenged the presumed position of Trifolium and conjectured a certain coevolution between the plastids and bacterial symbionts of legumes, possibly because of their functional interaction.