Introduction

Triticeae is one of the most economically important tribes of the grass family, Poaceae, and includes such globally significant species as bread wheat (Triticum aestivum L.), barley (Hordeum vulgare L.), and rye (Secale cereale L.). Efforts to analyze the genomes of species constituting this tribe to uncover the evolutionary relationships among them have been on for decades1,2,3,4,5. These research efforts, beyond attempting to clarify the taxonomy of Triticeae, are largely propelled by the overwhelming role of wild species of this tribe as potential sources of essential alleles for the improvement of their cultivated relatives, especially bread wheat5,6,7,8.

Analysis of genome differences and phylogenetic relationships in Triticeae has mostly been conducted using cytogenetic approaches that rely on meiotic chromosome pairing in hybrids of wide crosses. However, chromosome pairing is affected by diverse factors, and the reliability of failed chromosome pairing as an indicator of genome dissimilarity has been questioned9,10,11. Molecular cytogenetic methods such as C-banding, fluorescence in situ hybridization and genomic in situ hybridization have also helped to generate useful information on genome differences and phylogenetic relationships in the Triticeae1,2,3,12,13. Other molecular approaches, including isozyme analysis, variations in low-molecular-weight glutenin subunit and DNA marker systems have provided some explanations on Aegilops-Triticum relationships, the origin and differentiation of Aegilops species, and intra- and inter-specific variations in the D and U genome clusters of Aegilops species14,15,16,17. Also, a combination of morphology, organelle and nuclear genes reportedly gave insights into the phylogenetic relationships among diploid taxa in Triticeae18. Nevertheless, diploid progenitors of Ae. crassa, Ae. vavilovii, Ae. juvenalis, Ae. columnaris and Ae. triaristata, and the exact progenitor of the B genome of hexaploid wheat and other polyploid Triticum species are still in dispute2,16,19,20,21,22. Also, there are opposing opinions regarding the donors of A genomes of polyploid species in the Emmer (AB group) and Timopheevi (AG group) lineages of wheat23,24,25,26,27,28,29. In the classification of Aegilops species, the justification for including Ae. speltoides in section Sitopsis is still under discussion29,30,31,32. To validate the results so far obtained and fill remaining gaps, more molecular data, especially those with a wide genomic coverage, are needed31.

DArTseq is one of the cheap and easy but efficient genotyping-by-sequencing platforms which allow genome-wide marker discovery through restriction enzyme-mediated genome complexity reduction and sequencing of the restriction fragments33,34,35. In this study, we applied DArTseq genotyping to analyze the genomes of 34 species in the tribe Triticeae. DArTseq generates two types of data: SNP and SilicoDArT (http://www.diversityarrays.com/). The former is the nucleotide polymorphisms found in the tag sequences and the latter is the presence/absence variation (PAV) of the tag sequences. The choice of which data to use depends on the research objective. Using both types of data, we clarified the extent of genomic similarity among Aegilops species in different sections and clusters and the evolutionary relationships between diploid and polyploid species in Aegilops and Triticum species. We confirmed the already known genomic constitutions of some polyploid species of Aegilops and Triticum species, and provided evidence-based explanation for the origin of the unidentified (X) genomes2,16,19,20,21 in Ae. crassa, Ae. columnaris, Ae. vavilovii, Ae. triaristata, and Ae. juvenalis. The consistency of the outcomes of this study with previous reports and the flexibility of DArTseq genotyping make this marker system suitable for routine applications to analyze plant genomes.

Results

Diploid analyzers of polyploid Aegilops and Triticum species

To determine the putative progenitors of each of the polyploid species of Aegilops, SilicoDArT markers in the diploid genomes of all the Aegilops species were used as genome analyzers (Table 1). For each of the diploid species, species-specific markers were selected by filtering markers present in one species but absent in all the others. This made dominant SilicoDArT markers preferred in this analysis, as codominant SNP markers do not give information on PAVs. The progenitors of the polyploid species were estimated based on the proportions of diploid markers that are retained in each polyploid genome (diploid-polyploid monomorphism). Because the number of the species-specific markers is affected by genetic similarity among the diploid species, especially the Sitopsis species, the genomes of the polyploid species of Aegilops were first analyzed with all the markers in each diploid genome of Aegilops species (Fig. 1a) before being analyzed with diploid species-specific markers (Fig. 1b). This allowed us to determine the suitability of the species-specific markers in estimating the progenitors of the polyploid species. The use of species-specific markers as analyzers reduced the background noise produced by monomorphic markers among the diploid species (Fig. 1b).

Table 1 SilicoDArT markers of diploid analyzers of polyploid Aegilops species.
Figure 1
figure 1

Estimation of the putative diploid progenitors of 12 polyploid Aegilops species based on the proportions of diploid species’ SilicoDArT markers retained in the genomes of the polyploid species. (a) Analysis based on all the markers of the diploid species. (b) Analysis based on the species-specific markers of the diploid species.

Having confirmed the adequacy of the species-specific markers of the analyzers, polyploid genomes of Triticum species were analyzed with only species-specific diploid analyzers (Table 2). Therefore, the conclusions made regarding the progenitors of the polyploid species (Aegilops and Triticum) are based on the species-specific markers of the analyzers. To select analyzers for the polyploid genomes of Triticum species, the genomes of 16 bread wheat-related diploid species were screened on the basis of the proportions of homoeology of their SilicoDArT markers to the total SilicoDArT markers in each of the three genomes of bread wheat. The homoeology of the diploid genomes to each of the genomes of bread wheat was estimated based on the number of markers of the diploid species assigned to each genome of bread wheat (Table 2). This estimation was possible because, in this study, we used DArTseq platform optimized for bread wheat. A diploid species with at least 10% homoeology to any of the three genomes of hexaploid wheat was selected as an analyzer for the corresponding genomes of each of the six polyploid Triticum species. With this criterion, a total of 13 analyzers were selected. Species-specific markers of the 13 selected diploid analyzers were then filtered for the analysis of the putative progenitors of the genomes of the polyploid species (Table 2).

Table 2 Selection of diploid analyzers of polyploid Triticum species with respect to homoeology of their SilicoDArT markers to those of A, B and D genomes of bread wheat.

Genomic differentiation and evolutionary relationships among polyploid and diploid species of Aegilops

Before applying the genome analyzers to determine the progenitors of the polyploid species, we used a total of 28,264 polyploid species-specific SilicoDArT markers, ranging from 187 in Ae. juvenalis to 4,759 in Ae. cylindrica (Table 3), to confirm genomic difference among the 12 polyploid species of Aegilops. The polyploid species-species markers were selected in the same manner as the diploid species-specific markers. The relatively low numbers of specific markers in the genomes of Ae. crassa and Ae. juvenalis is obviously because large proportions of their genomes (D and U genomes) are shared by the other species (Table 3). With the possibility of genomic adjustments during polyploidization22,36 and the assumption that the original progenitors of the polyploid species may be different from the accessions of the diploid species used in this study, only diploid analyzers with considerably higher proportions of monomorphism with the polyploid species were taken as the putative progenitors of the polyploid species. Our analysis confirmed the putative diploid progenitors of Ae. ventricosa (DvNv), Ae. cylindrica (CcDc), Ae. kotschyi (SkUk), Ae. biuncialis (UbMb), Ae. triuncialis (UtCt), Ae. ovata (UgMg), and Ae. variabilis (SpUp)30 (Fig. 1). Noteworthy is that the proportions of the markers of three Sitopsis species (Ae. bicornis, Ae. longissima, and Ae. sharonensis) retained in the genomes of the two polyploid species with S-related genomes (Ae. kotschyi and Ae. variabilis) were not reasonably different. This made it difficult to decide which of the Sitopsis species donated the S-related genomes to Ae. kotschyi and Ae. variabilis, although Ae. bicornis and Ae. longissima, respectively, seem to be the most likely candidates. This observation confirms the likelihood of a common ancestry of the Sitopsis species31. Therefore, the original progenitor of the S-related genomes of the polyploid species may have been an ancient relative of the Sitopsis species, which is probably extinct. Although we adopted the polyploid species-specific markers (Table 3) to differentiate Ae. kotschyi (SkUk) from Ae. variabilis (SpUp) and Ae. biuncialis (UbMb) from Ae. ovata (UgMg), these pairs of species have identical genomic constitutions (same progenitors; Fig. 1), and therefore may be considered to be variants/subspecies of the same species in each case.

Table 3 Species-specific SilicoDArT markers of 12 polyploid species of Aegilops.

The unidentified diploid progenitors of five polyploid Aegilops species (Ae. triaristata [UnXn, UnXnNn], Ae. crassa [DcrXcr, Dcr1Dcr2Xcr], Ae. juvenalis [XjDjUj], Ae. vavilovii [XvaDvaSva], and Ae. columnaris [UcXc])2,16,19,20,21 are most likely to be traced to Ae. speltoides or Ae. mutica. The competing proportions of monomorphic markers between each of the two diploid species (Ae. speltoides and Ae. mutica) and the genomes of the five polyploid species (Fig. 1) strongly suggest that the unidentified genomes may have been donated by an ancient species closely related to these two diploid species or their direct ancestor(s). Based on these results, we have proposed modifications in the genomic representations of Ae. crassa, Ae. juvenalis, Ae. vavilovii, Ae. columnaris, and Ae. triaristata (Table 4), in which the genomes of Ae. speltoides and Ae. mutica are jointly represented as Ts. This does not suggest that the two diploid species are genomically the same, but shows that we could not, in this study, clearly determine which of the two species may have donated the controversial genome to the five polyploid species. In this proposal, the genome of tetraploid Ae. crassa is considered to be constituted of Ae. tauschii and Ts genomes, while the genome of hexaploid Ae. crassa is designated as D1D2Ts (tetraploid Ae. crassa x Ae. tauschii). Aegilops vavilovii is considered to have evolved from the hybridization of tetraploid Ae. crassa and Ae. searsii, granted that Ae. crassa and Ae. vavilovii have similar morphological traits and are reported to be sympatric37. Furthermore, our analysis shows that Ae. juvenalis has Ts, D and U genomes; hexaploid Ae. triaristata, clearly lacks Ae. comosa genome but has the genomes of Ae. umbellulata, Ae. uniaristata and Ts, and Ae. columnaris is composed of Ae. umbellulata and Ts genomes.

Table 4 Proposed modifications in the genomic representations of five polyploid Aegilops species.

Cluster analysis of diploid and polyploid Aegilops species

A phylogenetic tree (Fig. 2a) constructed with 15,512 frequently called SNP markers separated the diploid Aegilops species into the already reported sections30, except that Ae. speltoides did not cluster with other species in the section Sitopsis, which has been reported by other researchers18,29,32 (see online Supplementary Table S1 for information of the markers). Aegilops umbellulata (section Aegilops) seemed more distant from the others, whereas Ae. speltoides (section Sitopsis) appeared closer to Ae. mutica (section Amblyopyrum), and relatively distant from other species of section Sitopsis. Among Sitopsis species, Ae. longissima and Ae. sharonensis appeared genomically more proximal to each other than to others. The polyploid species of Aegilops formed two clusters clearly based on the putative common diploid progenitors, Ae. tauschii (D cluster) and Ae. umbellulata (U cluster) (Fig. 2b). Aegilops juvenalis, bearing both D and U genomes, clustered closely with Ae. crassa and Ae. vavilovii in the D cluster, indicating a possible evolutionary link between its (Ae. juvenalis) genome and the two species in the D cluster. This again suggests the likelihood of the presence of a diploid genome, perhaps Ts, common to Ae. juvenalis, Ae. crassa and Ae. vavilovii (Fig. 1).

Figure 2
figure 2

Reconstruction of the evolutionary relationships among Aegilops species on the basis of 15,512 SNP markers. (a) 11 diploid species. (b) 12 polyploid species. The sections of the diploids and two main clusters of the polyploids are labelled in brown1,16,31.

Genomic and evolutionary relationships in the Aegilops-Triticum species

We used species-specific SilicoDArT markers of 13 bread wheat-related diploid species (Table 2) to determine the elementary donors of the A, B and D genomes in six polyploid Triticum species. As described for the estimation of the progenitors of the polyploid species of Aegilops, the proportions of species-specific markers of the diploid species shared with the genomes of the polyploid species enabled the determination of the progenitors of the genomes of the polyploid Triticum species (Figs 36).

Figure 3
figure 3

Determination of the donor of the A genomes of six polyploid Triticum species using 11 diploid species-specific SilicoDArT markers assigned to the A genome of bread wheat.

Figure 4
figure 4

Determination of the donor of the B/G genomes of six polyploid Triticum species using 10 diploid species-specific SilicoDArT markers assigned to the B genome of bread wheat.

Figure 5
figure 5

Comparison of the B/G genome chromosomes of polyploid Triticum species and the chromosomes of Ae. speltoides and Ae. searsii. The B/G chromosomes are numbered 1B–7B for convenience. This does not change the genomic representation of T. araraticum: AAGG.

Figure 6
figure 6

Determination of the donor of the D genomes of two hexaploid Triticum species using 11 diploid species-specific SilicoDArT markers assigned to the D genome of bread wheat.

The genome of T. urartu was the closest to the A genomes of all the polyploid species analyzed (Fig. 3), suggesting that T. urartu is the most likely donor of the A genome in each of them. The considerable similarity between the A genome of each of the polyploids and T. boeoticum –another A genome species suggests a common ancestry of T. boeoticum and T. urartu. Similarly, Ae. searsii seems to be the most closely related to the B/G genomes of the polyploid species (Fig. 4). However, the proportion of Ae. speltoides markers assigned to the reference B genome is higher than those of every other diploid species analyzed (Table 2). This strongly suggests an evolutionary link between the genome of Ae. speltoides and the B/G genomes of the polyploids. This link is further supported by an almost equal similarity of the genomes of the two diploid species to the G genome of T. araraticum (Fig. 4). Using the species-specific markers of Ae. speltoides and Ae. searsii as analyzers, we found that chromosome 4 S of each of the diploid species are almost equally similar to chromosome 4B/G of each of the polyploids (Fig. 5). But chromosomes 2 S, 3 S and 7 S of Ae. speltoides appeared to be more similar to the corresponding chromosomes of T. araraticum than those of Ae. searsii are. These observations give the impression that the B/G genomes of polyploid Triticum species are likely to be recombinant genomes with varying contributions from Ae. speltoides and Ae. searsii. Analysis of the D genomes of the three hexaploid species unambiguously traced them to Ae. tauschii as the sole donor (Fig. 6).

A further analysis using 66, 434 SNP markers consistently called in the six polyploid genomes (see online Supplementary Table S2 for information of the markers) indicated 72% similarity (monomorphism) across their A genomes, B/G genomes and the combined AB/AG genomes. However, higher similarity was observed among the AB genomes: hexaploids, 94%; tetraploids, 90%; hexaploid and tetraploid genomes combined, 84%. The slight differences in the proportions of monomorphic markers in the different groups of the AB genomes suggest that the AB genomes of the hexaploid species originated from the same tetraploid species, whereas those of the tetraploid species may have evolved from different accessions of the elementary A and B genome progenitors (T. urartu and Ae. speltoides/Ae. searsii, respectively). The lower similarity (84% as compared to 94%) across the hexaploid and tetraploid AB genomes may reflect further modification of AB genomes in hexaploid species resulting from their interaction with the D genome.

Discussion

The clustering patterns of Aegilops species were largely consistent with the established classifications25,38,39. Diploid species separated on the basis of their known sections in the genus; polyploid species were delineated following the presence of common diploid progenitor genomes (D and U genomes) among them1,16,30. However, as reported previously18,29,32, Ae. speltoides appeared distant from other species in the section Sitopsis; hence, its inclusion in the section needs to be reconsidered.

Markers specific to each of the 12 polyploid species clearly showed considerable polymorphisms among these genomes, including the genomes of species which arose from the same diploid progenitors. This suggests that genetic modifications, such as chromosomal alterations2, may have occurred during independent evolutionary events of those species with identical progenitors. Therefore, without these specific markers, it would be difficult to genomically differentiate Ae. kotschyi (SkUk) from Ae. variabilis (SpUp) and Ae. biuncialis (UbMb) from Ae. ovata (UgMg) because, from the stand point of our result (Fig. 1) and previous studies30,31, the species in each pair evolved from the same progenitors. Although each of the species in these two sets are recognized as independent, on the basis of differences in cytoplasm progenitors and/or nuclear genome variation31,40, this classification does not seem to be clearly justified. Therefore, in our opinion, each pair should be regarded as variants/subspecies of the same species.

The reported unknown diploid genomes, initially represented as modified M genome and later changed to X, in the genomes of Ae. triaristata, Ae. crassa, Ae. juvenalis, Ae. vavilovii, and Ae. columnaris2,16,19,20,21,30 is traceable to Ae. mutica or Ae. speltoides. The small proportions (<10%) of Ae. comosa-specific markers shared with the five polyploid species (Fig. 1b) is insufficient to infer the existence of remnants of Ae. comosa genome in the polyploid genomes. Assuming Ae. comosa was originally involved in the evolution of the polyploids, species-specific elements from other progenitors may have spread and eventually masked Ae. comosa-specific elements41. Our data suggest that ancient or ancestral forms of Ae. speltoides or Ae. mutica, which are probably extinct, donated the unidentified genomes to the five polyploid species. From our analysis, it appears that all the polyploid species originally had a genome of such an ancient species (Fig. 1). This observation agrees with the hypothesis that Ae. mutica (syn. Amblyopyrum muticum) and Ae. speltoides, both allogamous species with ancestral traits, diverged earlier than other Aegilops species and may therefore be the ancestors of the other Aegilops species22. Therefore, each diploid Aegilops species may have retained a substantial portion of the common ancestral genome (Ae. speltoides/Ae. mutica or their ancestor). The difference in the representation of the common progenitor in each of the polyploid species can result from the peculiar evolutionary event(s) of each species. Polyploids that arose from the hybridization of the common diploid ancestor with other diploid species should have larger portions of the genome of the common ancestor than those that did not directly evolve from the common ancestor.

We validated the putative diploid progenitors of the A and D genomes of polyploid Triticum species to be T. urartu and Ae. tauschii, respectively42,43,44,45,46. We have also provided information that may help to explain the complex nature of the B/G genomes. The genomes of both Ae. speltoides and Ae. searsii are similar to the B/G genomes, especially to that of T. araraticum (Fig. 4), a relatively less advanced tetraploid genome47,48; thus, the B/G genomes of polyploid Triticum species may have evolved from an ancestral genome that later differentiated into those of Ae. speltoides and Ae. searsii. Alternatively, the B/G genome may have arisen from the hybridization of Ae. speltoides and Ae. searsii before the emergence of the AB/AG genome at different times. The above considerations support earlier postulations that the B genome is the most modified of the three genomes of hexaploid wheat, whereas the A and D genomes are substantially similar to those of T. urartu and Ae. tauschii, respectively22. The previously suggested origin of the A genome of T. araraticum from T. boeoticum23,24 is probably invalid (Fig. 3). Our results agree with the hypothesis that both Emmer and Timopheevi lineages of polyploid wheats have the same sources of elementary A and B/G genomes25,26,27,28,29. However, a common ancestry of the A-genome species cannot be ruled out and the A genomes of polyploid Triticum species may have evolved from a common ancestor of T. urartu and T. boeoticum before the differentiation of the two species. Although no karyotypic differences have been detected between these diploid A-genome species25,49, low fertility of interspecific F1 hybrid plants of these two species has been reported50. The latter study, consistent with our result, confirms that the two species are genomically different. As previously documented47,48, our study suggests that the A and G genomes in T. araraticum are less modified than the A and B genomes in the Emmer lineage.

This study has demonstrated that DArTseq genotyping can be applied to conduct a large scale analysis of plant genomes, mostly because it allocates markers to individual chromosomes, which can be easily extracted and analyzed. This genomic sequence-based platform ensures a wide genomic coverage and is not subject to criticism associated with the factors that affect meiotic chromosome pairing in hybrids of distant crosses, which forms the main anchor of cytogenetic systems of genome analysis9,10,11,51,52,53,54. Also, the number of informative markers generated by DArTseq outstrips what is possible with conventional DNA marker procedures14,15,16,17, making it more robust and reliable. Genotyping of all the available accessions of species in tribe Triticeae using this platform would clarify the genomic relationships between the cultivated and wild species. This information would make the use of the available gene pools for breeding much more precise and would also help to clarify Triticeae taxonomy. As polyploidy and interspecific hybridization are key events in the evolution of higher plants55, this genome analysis approach would be useful in other groups of plant species, especially polyploids with unclear phylogeny.

Methods

Plant materials

All 23 Aegilops species, eight Triticum species, and three distant relatives of wheat were analyzed. Except bread wheat, represented by two cultivars ‘Chinese Spring’ (CS) and ‘Norin 61’ (N61), each species was represented by one accession (Table 5). Seedlings were raised in Greenhouses of the Arid Land Research Center, Tottori University and Laboratory of Plant Genetics, Kyoto University, Japan. Depending on growth rate and plant size, fresh leaves were harvested from each 2–4-week-old seedlings and genomic DNA samples were isolated and purified using the cetyl trimethyl ammonium bromide method. Quality check, quantification and concentration adjustment for sequencing and genotyping were accomplished with NanoDrop2000C Spectrophotometer (ThermoScientific). The concentration of each sample was adjusted to 50 ng/μL.

Table 5 List of plant materials.

Genotyping-by-sequencing and data analysis

Purified DNA samples (1 μg for each sample) were sent to Diversity Arrays Technology Pty Ltd (http://www.diversityarrays.com/) for sequencing and marker identification. Sequences of the genomic representations were aligned to the wheat_ChineseSpring10 reference genome and wheat_ConsensusMap_version_4. This is because our analysis is based on DArTseq platform optimized for hexaploid wheat. DArTseq is a genotyping-by-sequencing system which utilizes Next-Generation-Sequencing platforms (HiSeq. 2500 in our case) to sequence the most informative representations of genomic DNA samples to aid marker discovery. In comparison to the array version of DArT, DArTseq results in higher marker densities56. The high marker number generated by this system gives it an edge over previous molecular marker procedures applied for genomic analysis of Triticeae species14,15,16,17. It, therefore, serves as a cheap alternative to genome analysis by whole genome sequencing, where the sequence information of genomes intended to be analyzed are not available. Two types of data are generated by DArTseq: SNP and SilicoDArT. SNP markers are nucleotide polymorphisms present in the restriction fragments, while SilicoDArT markers represent PAV of the restriction fragments. Therefore, codominant SNP markers are scored “0” (reference allele homozygote), “1” (SNP allele homozygote) and “2” (heterozygote: presence of both reference and SNP alleles), while dominant SilicoDArT markers are scored in a binary fashion, with “1” representing presence of the restriction fragment with the marker sequence and “0” designating its absence.

Frequently called SNP markers (>0.9 call rate) were used for phylogenetic tree constructions and differentiation of the genomes of polyploid species of Aegilops, whereas SilicoDArT markers (>0.7 call rate) were used for the determination of putative progenitors of the polyploid Aegilops and Triticum species. This reduction in call rate was made to accommodate more markers, ensure wider genomic coverage and reduce bias. To estimate the phylogenetic relationships among the 11 diploid and 12 polyploid Aegilops species, the raw genotypic data of the two sets (diploid and polyploid) were subjected to cluster analysis. Pearson’s correlation coefficient (r) was used as similarity index, and the genetic distances among the species were estimated by transforming the r values to distance values, using d = 100(1 – r) (http://genomes.urv.cat/UPGMA/)57. Species-specific SilicoDArT markers of the polyploid species of Aegilops were used to differentiate their genomes, while species-specific SilicoDArT markers of diploid species of Aegilops were used to estimate the diploid-polyploid evolutionary relationships among all the Aegilops species. Diploid Triticum and Aegilops species whose total SilicoDArT markers showed at least 10% homoeology to the total SilicoDArT markers in any of the three genomes of hexaploid wheat were selected as analyzers to determine the putative progenitors of the corresponding genomes of each polyploid Triticum species. Species-specific SilicoDArT markers of these selected diploid species were used as analyzers to determine the putative progenitors of each polyploid Triticum species. In determining the progenitors of all the polyploid species (Aegilops and Triticum), the proportions of the species-specific markers of the diploid analyzers retained in the genomes of the polyploid species were used as a basis to draw conclusions on genomic proximity and evolutionary relationships among the species. Species-specific markers of Ae. speltoides and Ae. searsii were further used to examine the relationship between the seven B/G-genome chromosomes of each of the polyploid Triticum species and those of the two diploid species. The two diploid species were chosen based on the close proximity of their genomes to the B/G genomes of the polyploid species.