The complete chloroplast genomes of seventeen Aegilops tauschii: genome comparative analysis and phylogenetic inference

The D genome progenitor of bread wheat, Aegilops tauschii Cosson (DD, 2n = 2x = 14), which is naturally distributed in Central Eurasia, ranging from northern Syria and Turkey to western China, is considered a potential genetic resource for improving bread wheat. In this study, the chloroplast (cp) genomes of 17 Ae. tauschii accessions were reconstructed. The cp genome sizes ranged from 135,551 bp to 136,009 bp and contained a typical quadripartite structure of angiosperms. Within these genomes, we identified a total of 124 functional genes, including 82 protein-coding genes, 34 transfer RNA genes and eight ribosomal RNA genes, with 17 duplicated genes in the IRs. Although the comparative analysis revealed that the genomic structure (gene order, gene number and IR/SC boundary regions) is conserved, a few variant loci were detected, predominantly in the non-coding regions (intergenic spacer regions). The phylogenetic relationships determined based on the complete genome sequences were consistent with the hypothesis that Ae. tauschii populations in the Yellow River region of China originated in South Asia not Xinjiang province or Iran, which could contribute to more effective utilization of wild germplasm resources. Furthermore, we confirmed that Ae. tauschii was derived from monophyletic speciation rather than hybrid speciation at the cp genome level. We also identified four variable genomic regions, rpl32-trnL-UAG, ccsA-ndhD, rbcL-psaI and rps18-rpl20, showing high levels of nucleotide polymorphisms, which may accordingly prove useful as cpDNA markers in studying the intraspecific genetic structure and diversity of Ae. tauschii.

143 to 10 μg of extracted DNA was sheared to generate fragments, and the quality of DNA sequences was 144 determined using an Agilent Bioanalyzer 2100 (Agilent Technologies). Thereafter, we generated a paired-end 145 sequencing library, which was constructed from ~400 bp fragments obtained using a Genomic DNA Sample   (Table S3). Among these variable loci, rpl32-trnL-UAG (0.00478), ccsA-ndhD (0.00483), rbcL-psaI 230 (0.00492), and rps18-rpl20 (0.00635), which are located in intergenic regions (the former two in the SSC 231 region and the latter two in the LSC region), displayed the highest nucleotide polymorphisms ( Figure 5).
232 Moreover, the primers sequences of four regions can be effectively amplified after verification (Table S4), 233 which will be potential for studying the intraspecific genetic structure and diversity of Ae. tauschii . Manuscript to be reviewed 241 of IR regions. We found that the size of the 17 assembled genomes tended to be similar, ranging from 135,551 242 bp to 136,009 bp (Table 2), with accessions AY22 and AY46 having the largest and the smallest genomes, 243 respectively. The IRs regions of all 17 genomes had an identical length (21,548 bp) and differences in genome 244 sequence length can largely be attributed to the variation in the non-coding regions, particularly with respect to 245 the size of the intergenic regions size (Table 2) Manuscript to be reviewed 294 found that the tree we constructed is similar in topological structure to that presented by Bernhardt (2017).
295 Most researchers consider that Ae. tauschii, the donator of the D genome of common wheat, could be derived 296 from monophyletic speciation (Dvorak et al., 1998 306 Thus, we proposed that Ae. tauschii is derived from monophyletic speciation rather than ancient hybridization.
307 In brief, these phylogeny results will serve as a reference framework for future studies on Triticeae or Ae.

309
To identify the genetic divergence in the assembled genomes, we determined the nucleotide variability 310 (Pi) of coding genes, intron regions and intergenic regions using DnaSP. The results revealed that the sequence 311 divergence of the IR regions appeared to be lower than that of the LSC and SSC regions, which has also been 312 noted in other angiosperms and may possibly be attributed to copy correction of the IR regions via gene 313 conversion (Khakhlova and Bock, 2006). The IR sequences of 17 accessions were identical in this study. Intra-314 species variability were detected predominantly in the non-coding regions (intergenic spacer regions) of LSC 315 and SSC ( Figure S2 and Figure 5). We also found that four variable loci (rpl32-trnL-UAG, ccsA-ndhD, rbcL-316 psaI and rps18-rpl20) located in non-coding regions showed notably high levels of nucleotide polymorphisms, 317 two of which (rpl32-trnL-UAG, ccsA-ndhD) are located in the SSC region and the other two loci (rbcL-psaI , 318 rps18-rpl20) are found in the LSC region ( Figure 5). Previous studies have identified Ae. tauschii accessions 319 using the chloroplast non-coding sequences trnF-ndhJ, trnC-rpoB, atpI-atpH, and ndhF-rpl32 ( 16  16  16  16  16  16  16  16  16  16  16  16  16  16  16  16  16