Complete pan-plastome sequences enable high resolution phylogenetic classification of sugar beet and closely related crop wild relatives

As the major source of sugar in moderate climates, sugar-producing beets (Beta vulgaris subsp. vulgaris) have a high economic value. However, the low genetic diversity within cultivated beets requires introduction of new traits, for example to increase their tolerance and resistance attributes – traits that often reside in the crop wild relatives. For this, genetic information of wild beet relatives and their phylogenetic placements to each other are crucial. To answer this need, we sequenced and assembled the complete plastome sequences from a broad species spectrum across the beet genera Beta and Patellifolia, both embedded in the Betoideae (order Caryophyllales). This pan-plastome dataset was then used to determine the wild beet phylogeny in high-resolution. We sequenced the plastomes of 18 closely related accessions representing 11 species of the Betoideae subfamily and provided high-quality plastome assemblies which represent an important resource for further studies of beet wild relatives and the diverse plant order Caryophyllales. Their assembly sizes range from 149,723 bp (Beta vulgaris subsp. vulgaris) to 152,816 bp (Beta nana), with most variability in the intergenic sequences. Combining plastome-derived phylogenies with read-based treatments based on mitochondrial information, we were able to suggest a unified and highly confident phylogenetic placement of the investigated Betoideae species. Our results show that the genus Beta can be divided into the two clearly separated sections Beta and Corollinae. Our analysis confirms the affiliation of B. nana with the other Corollinae species, and we argue against a separate placement in the Nanae section. Within the Patellifolia genus, the two diploid species Patellifolia procumbens and Patellifolia webbiana are, regarding the plastome sequences, genetically more similar to each other than to the tetraploid Patellifolia patellaris. Nevertheless, all three Patellifolia species are clearly separated. In conclusion, our wild beet plastome assemblies represent a new resource to understand the molecular base of the beet germplasm. Despite large differences on the phenotypic level, our pan-plastome dataset is highly conserved. For the first time in beets, our whole plastome sequences overcome the low sequence variation in individual genes and provide the molecular backbone for highly resolved beet phylogenomics. Hence, our plastome sequencing strategy can also guide genomic approaches to unravel other closely related taxa.

Background As the crop plant Beta vulgaris (sugar beet) has a high economic value [1], continuous crop development is essential to enhance stress tolerances and resistances against pathogens. The White Silesian Beet provided the germplasm for sugar beet [2] and is a derivative of wild sea beet (B. vulgaris subsp. maritima). This leaves, similar to the situation in many domesticated crops, only a narrow genetic base for sugar beet breeding [3]. Additionally, early sugar beet breeding has focused mainly on increasing yield. This caused strong domestication bottlenecks and removed many useful traits that may benefit plant fitness [3,4]. The higher genetic variation in crop wild relatives of sugar beet offers potential that might be harnessed to introduce desired traits. Thus, giving insight into the genomic basis of wild beets is progressively moving into the focus of beet breeding research [5,6].
Phylogenetically, wild beets belong to the Betoideae (order Caryophyllales) and are separated in the genera Beta and Patellifolia [7,8], with all cultivated beets belonging to the genus Beta [9]. The genus Beta is then further subdivided into at least two sections, Beta and Corollinae. In general, the section Beta is widespread across Western Europe, whereas Corollinae species are generally distributed across the eastern Mediterranean area and South-West Asia [1,8] (Additional file S1A). Despite the long history of different systematic treatments [1,[7][8][9][10][11][12], the phylogenetic relationships of Beta and Patellifolia species are still a matter of ongoing debate (as reviewed in [7]). Especially the subdivision of the genus Beta into three sections (Beta, Corollinae, and Nanae) is discussed, with the pending suggestion to integrate B. nana into the section Corollinae, hence disbanding the section Nanae [7,8,13]. Similarly, as the Beta section Corollinae harbors a highly variable polyploid/hybrid complex, including di-, tri-, tetra-, penta-, and hexaploid forms, the species boundaries are far from resolved. Regarding the sister genus, there is still an ongoing discussion on whether the morphologically variable Patellifolia comprise three distinct species (P. patellaris, P. procumbens, and P. webbiana) or only two or even one [8,10,12]. Resolving the unclear wild beet relationships may inform beet improvement programs and contribute to the development of new, better equipped beets.
The plastome is well-suited for the reconstruction of phylogenies due to high structural conservation, a conserved evolutionary rate, uniparental inheritance, and high abundance of DNA across all species [14,15]. Historically, systematic information was obtained from plastome sequence restriction site variants, inversions, single nucleotide variants (SNVs), or spacers in single genes. Although this has led to a range of wild beet phylogenies resolving relationships on the level of genera and sections, these are often based on only a few species and contain collapsed branches due to low genetic variation. In contrast, the investigation of whole plastome sequences may enhance the resolution of phylogenetic relationships [14,[16][17][18]. As gene sequences and intergenic regions can be included and combined, whole plastome sequence analyses enable the detection of wellsupported phylogenetic relations on the species-and even on the accession level. Thus, plastid genomics may offer a route to clarify many of the pending questions regarding the wild beet phylogeny.
The plastome sequence of most angiosperms comprises a total of 79 protein-coding genes, 4 rRNA genes, and 30 tRNA genes [19]. The quadripartite structure is characteristic for plastome sequences comprising a large (LSC) and a small single-copy region (SSC) as well as two inverted repeats (IRs) [20,21], all contributing to a total length of 120 kb to 210 kb [20]. This difference in size can be mainly attributed to the IRs that range from 6 kb to 76 kb in length [21][22][23]. The relative orientation of the SSC between the IRs differentiates two structural variants which occur simultaneously in a single cell and might have been previously mistakenly annotated as differences between species [24].
The Caryophyllales, including B. vulgaris, contain canonical plastomes, harboring all hallmarks typical for angiosperm plastome sequences as described above [25,26]. For the wild beet species, until now, no plastome assembly is available, and of our investigated species, only the plastome sequence of B. vulgaris subsp. vulgaris was published previously [27]. A detailed, plastome-and mitogenome-based evolutionary positioning of species outside of the section Beta is still missing but needed to answer some of the unresolved issues in beet systematics.
Here, we resolve the phylogenetic relationships within the Betoideae at high resolution through genome-wide highly conserved. For the first time in beets, our whole plastome sequences overcome the low sequence variation in individual genes and provide the molecular backbone for highly resolved beet phylogenomics. Hence, our plastome sequencing strategy can also guide genomic approaches to unravel other closely related taxa. Keywords: Sugar beet, Beta vulgaris, Beta, Corollinae, Patellifolia, Chloroplast, Plastome assembly, Phylogeny, Phylogenomics comparison based on complete plastome assemblies and reads from both, the plastome and the mitogenome. Eleven different Beta and Patellifolia members, spanning the previously neglected plastome sequences of the Corollinae section and the Patellifolia genus, are included in our analyses. For this, whole plastome sequences of up to two accessions per species are sequenced, assembled, and compared. This novel contribution to the Betoideae pan-plastome intends to clarify the phylogenetic relationships of wild beets on a species-level and provides an important resource for further studies of beet wild relatives.

Results
Our pan-plastome dataset comprises 18 different accessions, including a biological replicate of B. corolliflora, which leads to 19 plastome assemblies in total (see Methods, Table 2). To provide a basis for comparative plastome analysis, all plastome sequences were fully assembled. Out of those, 17 were split into three scaffolds (LSC, SSC, and IR), apart from Bmar1 (four scaffolds) and Bnan2 (six scaffolds). Collapsed IR regions were confidently identified in all plastome assemblies based on a doubled average read coverage in comparison to the single copy regions as well as a gene content that is characteristic and expected for the IR region. Average read coverage and assembly length are shown in Table 1. The distribution of these values is shown in Additional file S1B. Circular and linear plots of a representative selection of plastome assembly sequences are provided in Additional file S1C.
Comparing the plastome assemblies of all Beta vs. Patellifolia species, the average length of the four Patellifolia plastome sequences (avg. 151,621 bp) is higher than for the 15  The final plastome assemblies were subsequently annotated and the alignment identity of all regions included in the phylogenetic analysis was assessed for gene regions and intergenic regions, respectively (Fig. 1). The alignment identity is significantly higher for gene sequences when compared to intergenic regions. This significant difference was obtained when amaranth, quinoa, and spinach were included as outgroups (Additional file S2A) (avg. gene/intergenic regions 90.73/83.55%; Mann-Whitney-U test; p ≈ 2e-10) as well as without outgroup reference sequences (Additional file S2B) (avg. gene/ intergenic regions 97.26/94.93%; Mann-Whitney-U test; p ≈ 1e-09). Rrn genes show high similarity among all plastome genes, whereas ycf1 and rpl22 show the greatest variance between all investigated accessions. The intergenic region between the genes ycf4 and cema contributes most to the differences in the alignment.
The distributions of SNVs ( Fig. 2A) and InDels (Fig. 2B) throughout the plastome sequences were further investigated. InDels are mostly absent from gene regions and SNVs are in general more frequent than InDels. Further, some clear hotspots of SNVs and InDels can be detected in the intergenic regions psbK-psbI, ycf4-cema, rpl33-rps18, psaj-rpl33 and rpl32-ccsa.
As InDels with a length, which is a multiple of three, do not influence the reading frame [28], we expected that the proportion of these InDels (which are multiples of three) is higher in gene regions compared to the proportion in intergenic regions. Indeed, we observed that 43.4% of the InDels in gene regions were a multiple of three, whereas this applies to only 29.6% of the InDels in intergenic regions (Fisher's exact test; p ≈ 3e-8; Fig. 3 [arrows]).
Phylogenetic relations of the Betoideae subfamily were inferred from colored de Bruijn graph-based splits (Fig. 4) as well as by an alignment-based maximum likelihood (ML) analysis (Fig. 5). Mitochondrial and plastid read derived kmers were used to calculate phylogenetic splits, and annotated gene sequences as well as intergenic regions derived from the plastome assemblies were used for the ML analysis. A clear separation of Patellifolia, B. section Beta and B. section Corollinae samples is visible in all four phylogenetic trees. In comparison to the ML tree based on fully assembled plastome sequences (Fig. 4C, black), the tree based on splits derived from the same dataset (fully assembled plastome sequences) (Fig. 4C, dark green) shows only one difference among the B. intermedia and B. corolliflora accessions. The phylogenetic relationships derived from kmers show a few additional differences (Fig. 4C, light green and pink).
These differences comprise for example the assignment of the four Patellifolia accessions/species to a clade consisting of both P. patellaris accessions and a separate clade formed by P. procumbens and P. webbiana for the assembly-based phylogenies (Fig. 4C, black and dark green), whereas the kmer-based phylogenies show a separate clade for P. webbiana and a second clade comprising the other three Patellifolia accessions/species (Fig. 4C, light green and pink). The calculation of the weighted F1 score, the weighted symmetric set distance and the Robinsons-Foulds distance shows that there is a high identity between all splitstree results (based on cp_reads, mt_ reads and cp_assemblies) (Additional file S1D). The usage of different input data formats (reads vs. assemblies) has a larger impact than the usage of different datasets (chloroplasts vs. mitochondria) meaning that these splitstree results show a higher divergence. For the ML-based tree (Alignment sites / patterns: 216442 / 2441; Gaps: 0.44%; Invariant sites: 86.73%), the phylogenetic relationships on the species level are highly supported (high bootstrap values) (Fig. 5). A phylogenetic tree based on the diagnostic set of 53 gene sequences and intergenic regions matches this phylogeny (Additional file S1E).
To investigate the contribution of different regions to the phylogeny, in addition to the whole plastome sequences (genic and intergenic regions combined), sequence matrices for (I) all gene regions, (II) all intergenic regions, (III) whole coding sequences, (IV) first and second codon position and (V) third codon position only, were constructed and used for the inference of phylogenetic relationships. Even though the topology of the phylogeny derived from whole coding sequences is highly similar (Additional file S1F), especially for the codon position-based matrices, alternative branches can be observed. However, substantially more nodes are only poorly supported with bootstrap values up to below 20.
The alignment of the 18S rRNA gene sequence for B. corolliflora (Bcor1) as representative for the Corollinae and B. vulgaris subsp. vulgaris as member of the section Beta revealed not a single genomic difference and therefore no signal which could be used to resolve their phylogenetic relations.
The results presented here show that especially for closely related species of the same subfamily, a higher number of gene sequences and more variable intergenic regions provide greater phylogenetic resolution.

Our proposed beet whole-plastome phylogeny is superior to single gene phylogenies
Here, we present the plastome sequence assemblies of 18 accessions covering most of the species' diversity within the beet genera Beta and Patellifolia and representing an important resource for future studies. All newly generated plastome sequences are highly similar including 79 protein coding genes and four rRNA genes distributed across a mean length of 150,519 bp (± 892 bp). A previously published B. vulgaris subsp. vulgaris (KR230391) plastome sequence comprises a length of 149,722 bp [27]. This is almost identical to our Bvul assembly which differs by only 1 bp in length (149,723 bp). This difference occurs in a stretch of five or six guanines in the IR region between the genes rrn23 and rrn16 -either a sequencing error or a biological difference. As expected, all Betoideae assemblies show high similarity as all angiosperm plastomes are highly conserved and the species investigated here are closely related, while most differences are located in intergenic regions.
Between the cultivated beet and wild beet accessions, most chloroplast genes are highly conserved, one example being rpl2 with only a low number of polymorphisms ( Fig. 1) [22]. The intergenic regions are significantly Bootstrap support values are shown above each branch. The resulting phylogeny is based on the variation in 83 genes and 76 intergenic regions from the plastid genomes of 18 accessions and species (plus outgroup). Different background colors represent the sampling location and the origin of the breeding line (Bvul), respectively. Inset: Actual branch lengths based on the ML analysis less conserved containing more InDels and are therefore more suitable for phylogenies on a lower taxonomic level [22]. Among the beet plastomes, ycf1 is the most informative genic region (Fig. 1A), which was also detected in other plant groups, such as the Tropaeolaceae, the orchids, and the Malvales [29][30][31]. Additionally, the rpl22, matk, clpP1, ycf2, psaC and ndhF genes were reported to be highly divergent [29,31,32], which is mainly consistent with our findings. Here, the rrn-genes, ndhB (already classified as gene with low divergence [29,31]), rps12 and rps7 can be found among the gene loci with lower variance among the investigated species.
Plastomes in general show very similar sequences with most differences occurring in non-coding regions [33]. For beet and wild beets, the most informative intergenic regions are ycf4-cema, accd-psai and rpl32-ccsa (Fig. 1B). However, for ycf4-cema the high difference between 'reference' and 'without reference plastome-' alignment identity should be noticed. For the Malvales, the regions psaB-psaA, psbF-psbE, rpl2-rpl23 and ndhH-ndhA were identified as lowly divergent regions [31]. We confirmed these for beets, except for the latter (ndhH-ndhA) that showed higher divergence among the wild beet plastomes. In contrast, the intergenic regions with the highest divergence in the Malvales (ndhD-ccsA and rps19-rpl2) [31] accounted for less differences among the wild beets. Nevertheless, the percent identity values among all intergenic regions are relatively similar in beets, especially when excluding the polymorphisms in the outgroup reference genomes.
As high variability among the investigated sequences is required to resolve phylogenetic relations of closely related species [34], the retrieved plastome sequences from beets and wild beets provide an excellent resource to approach systematic treatment of the Betoideae. To further increase sequence variability, we also integrated the more diverged intergenic regions into the analysis.
In addition to the plastome-inferred ML-based phylogeny, a mitogenome-and plastome-inferred read-based phylogenetic tree was constructed based on phylogenetic splits. The resulting trees can be considered reliable due to three distinct, robust properties: (I) all splits networks show a tree-like appearance (Fig. 4A, B), (II) the usage of different kmers leads to the same tree and (III) the usage of the geometric mean leads to the removal of samples in case no kmer occurs in the sample and both parameters, geom and geom2, lead to the same results (with the exception of k = 11, which results in a cloud-shaped network).
Differences between the phylogenetic trees derived from different strategies and datasets (Fig. 4C) may be explained by chance and/or by the use of the method (all splitstree results show high identities as indicated by all comparison metrics (Additional file S1D) while differences are larger when using different strategies [reads vs assemblies] in comparison to different datasets  [25,35].
Reasons for this may be recent species hybridization or incomplete lineage sorting [36,37]. Therefore, the mitogenome seems to be mostly useful at higher taxonomic levels [25] and might not be the most suitable system for the beet and wild beet accessions investigated in this study. In summary, the trees based on plastome assemblies (ML and splits) are likely the most reliable as the same phylogenetic relations are the outcome of different established strategies including the widely used ML method. Compared to our pan-plastome assembly, the information derived from individual genic and intergenic regions are insufficient to fully resolve the beet phylogeny, highlighting the power of our plastome approach: I) Investigation of specific regions of the plastome sequence (genic and intergenic regions, coding sequences, codon positions) revealed a few alternative branches for the codon position-based sequence matrix (Additional file S1F). These are marked by short internal branch lengths due to the close relationships of the species within this subfamily. Nevertheless, these conflicting relationships are only weakly supported. This is explained by the lower genetic diversity and therefore insufficient phylogenetic signal when using a smaller amount of sequences and total sequence length. II) An approach based on high-quality single-copy nuclear genes would require a minimum coverage of about 10x [25]. Moreover, nuclear genes are often part of gene families and influenced by whole genome duplication events [14]. Using our available data, nrDNA sequences were selected for the phylogenetic reconstruction. Especially the ITS and ETS regions were previously used for the investigation of phylogenetic relationships, but entire nrDNA repeats (18S-ITS1-5.8S-ITS2-26/28S) were also already assembled for multiple phylogenetic studies [15,38]. Unfortunately, the assembled nrDNA sequences constructed here are not useful to infer confident phylogenetic relationships as the coverage is very low (1.9x -8.5x) and intragenomic polymorphisms of different nrDNA repeat sequences might limit the reliability of the phylogeny [15]. The low bootstrap values and the low coverage make this phylogenetic tree unreliable. Therefore, we do not show these results here. Another possible explanation, apart from the low coverages, is that the biparental nature of the nuclear genome may be problematic for the inference of phylogenetic relationships [14]. Previous studies already suggest that phylogenies based on nrDNA and few selected plastid sequences only weakly support relationships [30]. The 18S rRNA gene sequences of representatives of the sections Corollinae and Beta are completely identical containing no phylogenetic signal to separate them.
Summarizing, our plastome-derived phylogeny benefits from the incorporation of genic and intergenic regions as well as the 'nature' of the plastome itself (as described in the Background section). Despite the low available read coverage and the low genetic diversity within our beet dataset, this leads to a highly confident phylogenetic tree. Further, the use of higher alignment lengths and the use of nucleotides instead of amino acids are favored to construct well supported phylogenies [39]. Therefore, we conclude that for resolving the relationships of cultivated and wild beets, our whole-plastome-based approach is the most reliable.

Implications for the systematic placements within the Betoideae subfamily
With efforts tracing back half a century, resolving the phylogeny of the subfamily Betoideae has been already a major undertaking [1,[8][9][10][11]40]. However, in most cases only few selected sequence regions were targeted, leading to unresolved relations at shallower taxonomic levels or with focus on specific species or sections, i.  [11]. They find, based on a representative geographical sampling, that B. macrocarpa is a distinct lineage from the two investigated B. vulgaris subspecies. Despite this interesting finding, the suggested phylogeny did not focus on other important species and accessions of the Betoideae subfamily and might be further improved by the analysis of sequences with higher diversity to reach higher bootstrap values, which we achieved using intergenic and genic regions of the whole plastome sequences.
With our pan-plastome-informed datasets, we have been able to confirm many of the observations before and added an unprecedented resolution at the species-level. More in detail, we conclude that: I) Among the section Beta, the plastome sequences of B. patula, B. vulgaris subsp. vulgaris and B. vulgaris subsp. maritima are highly similar as indicated by the slightly lower bootstrap values (96/100) for these three beets. As this section harbors wild beets in relatively close geographical proximity across the coastal Mediterranean area, the detected similarity can be explained by (natural) crossing and gene flow due to close geographical proximity or accidental cross-pollination during cultivation as wild beet and cultivated beet groups are easily cross-compatible [3]. Further, B. vulgaris subsp. vulgaris and some B. vulgaris subsp. maritima accessions are even phenotypically highly similar [41]. The phylogenetic relationships among species can also be influenced by the geographical distribution, mating systems and polyploidization [11].  Fig. 5). The investigation of the whole genome sequences of these polyploid species may help to resolve these parental contributions.
III) Among the Patellifolia members, P. procumbens and P. webbiana can be phylogenetically distinguished: Patellifolia was previously classified as B.
section Procumbentes and there is still an ongoing taxonomic debate whether P. patellaris, P. procumbens and P. webbiana can be considered as separate species. However, due to molecular and morphological traits, Patellifolia are now mostly considered a separate genus which is divided into three distinct species [8,10,12]. The relationships among the Patellifolia species could not be resolved in previous studies [1]. In the phylogenetic tree presented here, P. procumbens and P. webbiana seem to be closely related (however still distinguished with high support) and clearly separated from the two P. patellaris accessions. The branch lengths distinguishing B. patula and the B. vulgaris subsp. vulgaris/maritima, which are both considered separate species, are highly similar (0.0001). The same branch length separates P. procumbens and P. webbiana. Therefore, our phylogenetic analysis indicates that the three Patellifolia species are distinguishable on a molecular level.
Comparing the results presented here with earlier studies, the previous investigation of Betoideae was substantially extended and refined. The phylogenetic relationships were resolved in more detail and not only based on the monophyletic groups. This is especially important for the species of the B. section Corollinae which were investigated in depth. Using the whole plastome sequences, including intergenic regions, it was possible to further resolve the phylogenetic relationships with higher bootstrap support due to the extraction of higher sequence variance and phylogenetic signal within the subfamily.

Conclusions
We provide 19 plastome assemblies for 18 different beet and wild beet accessions, which can also be re-used for future investigations of beets and other Caryophyllales species, and harnessed these to revisit systematic issues within the genera Beta and Patellifolia. This analysis advanced our understanding of the phylogenetic relationships of the subfamily Betoideae in four ways: I) Analysing sequences of intergenic regions of the whole plastome assemblies made it possible to reveal the phylogeny of closely related species with high reliability. Our phylogenetic tree shows a clear separation of the wild beet genera Beta and Patellifolia, as well as of the two sections Beta and Corollinae.  . maritima) was not observed, likely due to the high sequence identity possibly explained by the close geographical proximity and the fact that these species are easily cross-compatible. III) All three Patellifolia species are clearly separated in our phylogenetic analysis, while P. procumbens and P. webbiana are more closely related to each other than to P. patellaris. These results, including the investigation of the branch lengths, point to a molecular separation within the Patellifolia species. IV) Finally, the taxonomic classification of B. nana as a member of the Corollinae was further supported.

Plant material, genomic DNA extraction, and DNA sequencing
Seeds of Betoideae species were obtained from KWS Saat SE, Einbeck, Germany (B. vulgaris subsp. vulgaris genotype KWS2320) and from the Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben (IPK), Germany (all other accessions with accession numbers listed in Table 2 and Additional file 2). The material of the KWS Saat SE, Einbeck and IPK Gatersleben was transferred under the regulations of the standard material transfer agreement (SMTA) of the International Treaty.
Apart from B. vulgaris subsp. vulgaris, 17 other Beta and Patellifolia accessions shown in Fig. 6 were analysed. The exact sampling location of the investigated accessions was extracted from the GBIS/I (Genebank information system; IPK) [48] (Fig. 6).
The plants were grown under long day conditions in a greenhouse and were obtained and grown in accordance with German legislation. Genomic DNA was isolated from young leaves using the NucleoSpin ® Plant II protocols from Macherey & Nagel. Each high-quality gDNA (200 ng) was fragmented by sonication using a Bioruptor (Fa. Diagenode) and subsequently used for library preparation with the TruSeq Nano DNA library preparation kit (Fa. Illumina). End repaired fragments were size selected by AmpureXp Beads (Fa. Beckmann-Coulther) to an average size of around 700 bp. After end repair, A-tailing and ligation of barcoded adapters, fragments were enriched by eight cycles of PCR. The final libraries were quantified using PicoGreen (Fa. Quant-iT) on a FLUOstar plate reader (Fa. BMG labtech) and quality checked by HS-Chip on a 2100 Bioanalyzer (Fa. Agilent Technologies). Before sequencing all libraries were pooled depending on the genome size and ploidy of each accession and sequenced 2 x 250 nt on a HiSeq1500 in rapid mode over two lanes using onboard cluster generation. Processing and demultiplexing of raw data was performed by bcl2fastq-v2.19.1 to generate FASTQ files for each accession.

Plastome assemblies and annotation
Trimmomatic (v0.39) [50] was applied to remove adapter sequences (ILLUMINACLIP:adapters. fa:2:30:10:2:keepBothReads) and to ensure high quality of the reads (SLIDINGWINDOW:4:15 MINLEN:50 TOPHRED33). FastQC (v0.11.9) [51] was used for quality checks. The trimmed reads were subjected to GetOrganelle (v1.7.0) [52] to generate plastome assemblies as suggested for Embryophyta plant plastome sequences (−R 15; −F embplant_pt). The SPAdes [53] kmer settings were set to -k 21, 45, 65, 85, or 105. The contig coverage information and other graph characteristics are used by GetOrganelle to construct the final assembly graphs, which were plotted and visually assessed using Bandage (v0.8.1) [54]. The assemblies suggested a circular sequence, however, circular plastome molecules might only comprise a small proportion of all molecules in the cell, whereas other plastome molecules may occur in branched or linear configurations [55][56][57]. The assemblies were submitted in the FASTA format, retaining the possibility to reuse the submitted assemblies as circular or linear sequences. The complex assembly graph of Bmar1 was not automatically resolved. Therefore, single contigs of Bmar1 were sorted manually based on the structure of the other assemblies to enable comparative analyses as described in the following section.
Structural annotation of all plastome assemblies was performed with GeSeq (v2.01) [58]. The BLAT [59] search parameters ' Annotate plastid trans-spliced rps12' and 'Ignore genes annotated as locus tags' were used together with a 'Protein search identity' of 25 and a 'rRNA, tRNA, DNA search identity' of 85. For HMMER profile search 'Embryophyta chloroplast (CDS+rRNA)' was selected and 'MPI-MP chloroplast references (Embryophyta CDS + rRNA)' was chosen as reference. The resulting annotation files in the gff format were directly used for further analyses. To avoid confusion, we want to make aware of the fact that psbN and pbf1 are two different names for the same gene.

Construction of phylogenetic trees
The workflow for the alignment-based phylogenetic analysis is available in Additional file S1G. The position of each gene was extracted from the GFF files obtained through GeSeq. Next, adjacent genes with conserved microsynteny across all investigated samples (including amaranth, quinoa, and spinach as outgroup) were identified and the interleaved intergenic regions of these neighbouring genes were extracted. For overlapping genes, the extraction of an intergenic region was not possible.
Using the gene sequences and intergenic regions of all samples, gene/region specific alignments were performed  [1,10,11,49] using MAFFT (v7.299b) [60]. High accuracy was ensured using the L-INS-I method. To align sequences with different orientations, the parameter '--adjustdirection' was used. The alignments were trimmed using trimAl (v1.4.rev22) [61]. The gap threshold was set to '-gt 0.8' , whereas the threshold for the minimum average similarity was set to '-st 0.001' . Then, the single alignments were concatenated and the resulting alignment matrix was inspected using SeaView [62]. Manual adjustment was not necessary.
Location-based clustering of the clades in the tree was performed manually based on the sampling locations (Additional file S2C).
To identify a reduced set of gene sequences and intergenic regions for the construction of a phylogenetic tree distinguishing all accessions, the sequences were iteratively added with increasing alignment identities until all species were separated by informative positions.
To investigate the region dependent phylogeny different additional data matrices were constructed for (I) all gene regions, (II) all intergenic regions, (III) complete coding sequences, (IV) first and second codon position and (V) third codon position only. Therefore, coding sequences for all 79 protein coding genes were extracted from the Genbank annotation files of our plastome assemblies. Start and stop codons were removed and extracted sequences were processed as described above for ML analysis.
To extract the 18S rRNA gene sequence from B. corolliflora (Bcor1) as representative of the Corollinae, SOAP-denovo2 assemblies were generated using the trimmed reads as input. SOAPdenovo2 (v2.04) [65] was tested with different kmer sizes ranging from 67-127 in steps of 10. The resulting assembly with the highest N50 length was used for further investigations. The reference 18S rRNA gene sequence for B. vulgaris subsp. vulgaris was retrieved from the NCBI (GeneID = 809573). The 18S rRNA gene sequence for Bcor1 was identified via BLAST and then extracted from the SOAPdenovo2 assembly, consecutively adding the following overlapping BLAST hit with the smallest e-value. Next, these extracted sequences were combined for a 18S gene sequence reconstruction. The assembled 18S rRNA gene sequence and the corresponding reference gene sequence were aligned via MAFFT and inspected using SeaView.

Mitogenome and plastome phylogeny based on kmer-derived phylogenetic splits
SANS serif (v2.1_04B) [66,67], a method based on colored de Bruijn graphs, was selected for the reconstruction of additional phylogenies using variable input data (mitochondrial reads, chloroplastic reads, and full plastome assemblies). This method does not require prior assembly of the reads and is therefore especially suitable for the mitochondrial sequences which could not be fully assembled using GetOrganelle due to the relatively low available sequencing depth and also higher complexity of the mitogenome in comparison to the plastome.
Reads were assigned to the plastome or the mitogenome, respectively, after mapping with BWA-MEM (v0.7.13) [68] against the sugar beet reference genome sequence, including the respective sugar beet chloroplast (KR230391.1) and sugar beet mitochondrial sequence (BA000009.3), which were published independently from this study. This enabled the extraction of reads mapping with higher confidence to the chloroplast/mitochondrial sequence in contrast to mapping to the nucleome (with e.g. a few mismatches) and vice versa. Therefore, 'samtools view' , with the -b and -h options, was used after indexing the BAM file. The resulting BAM file was then further processed using 'samtools collate' and converted to the FASTQ format to extract the corresponding reads (chloroplast or mitochondria, respectively) using 'samtools fastq' . After that, for the read-derived phylogenetic analyses, a colored de Bruijn graph was constructed using Bifrost (v1.0.5) [69] to filter kmers which only occur once in the dataset. This graph was then used as input for SANS serif. In addition, a phylogeny was reconstructed using the newly constructed plastome assemblies as direct input for SANS serif.
Different parameters were applied to test the robustness of the results. These arguments include different mean weight functions (−m; geom vs. geom2), the number of splits in the output list (−t; all vs. 10n) as well as various kmer sizes (11, 21, 31 and 61). The SANS serif output file was then converted to the nexus format (san-s2nexus.py) and subsequently visualised using Splitstree5 [70]. To analyse the discrepancies between trees derived from different methods, SANS serif was used with the option 'strict' to generate an output file in the newick format which was then visualised using FigTree [64]. Further, the SANS serif script 'comp.py' was used to calculate weighted (length of the edges/size of the splits is taken into account) precision and recall (combined in F1 scores) while using each tree as reference/ground truth in an 'all vs. all' comparison. In this use case, precision means 'the total weight of all correctly predicted splits divided by the total weight of all predicted splits' . Further, weighted symmetric set distances and Robinsons-Foulds