Untangling Nucleotide Diversity and Evolution of the H Genome in Polyploid Hordeum and Elymus Species Based on the Single Copy of Nuclear Gene DMC1

Numerous hybrid and polypoid species are found within the Triticeae. It has been suggested that the H subgenome of allopolyploid Elymus (wheatgrass) species originated from diploid Hordeum (barley) species, but the role of hybridization between polyploid Elymus and Hordeum has not been studied. It is not clear whether gene flow across polyploid Hordeum and Elymus species has occurred following polyploid speciation. Answering these questions will provide new insights into the formation of these polyploid species, and the potential role of gene flow among polyploid species during polyploid evolution. In order to address these questions, disrupted meiotic cDNA1 (DMC1) data from the allopolyploid StH Elymus are analyzed together with diploid and polyploid Hordeum species. Phylogenetic analysis revealed that the H copies of DMC1 sequence in some Elymus are very close to the H copies of DMC1 sequence in some polyploid Hordeum species, indicating either that the H genome in theses Elymus and polyploid Hordeum species originated from same diploid donor or that gene flow has occurred among them. Our analysis also suggested that the H genomes in Elymus species originated from limited gene pool, while H genomes in Hordeum polyploids have originated from broad gene pools. Nucleotide diversity (π) of the DMC1 sequences on H genome from polyploid species (π = 0.02083 in Elymus, π = 0.01680 in polyploid Hordeum) is higher than that in diploid Hordeum (π = 0.01488). The estimates of Tajima's D were significantly departure from the equilibrium neutral model at this locus in diploid Hordeum species (P<0.05), suggesting an excess of rare variants in diploid species which may not contribute to the origination of polyploids. Nucleotide diversity (π) of the DMC1 sequences in Elymus polyploid species (π = 0.02083) is higher than that in polyploid Hordeum (π = 0.01680), suggesting that the degree of relationships between two parents of a polyploid might be a factor affecting nucleotide diversity in allopolyploids.


Introduction
Hybridization and polyploidization have played a central role in the history of plant evolution, and contributed greatly to plant diversification and speciation [1,2]. Much attention has been drawn to studying the evolutionary consequences of polyploid species in both genome size and contents, with the advances in molecular methods over the last two decades [3,4]. Polyploid genome origins and evolution have also been the focus of plant evolutionists [1,5]. Increasing evidences have demonstrated the complexity of the dynamic nature of polyploids. Many polyploids are proved to involve multiple origins in space and time [1,5], together with introgression (or gene flow) [6][7][8]. Both multiple origins [9,10] and gene flow [6][7][8] have been considered as the causes of shared polymorphism across ploidy level and/or phylogenetic incongruence among loci. However, whether gene flow among independent formations is regular occurrence following polyploid species have rarely been tested in ployploid taxa [11].
The tribe Triticeae contains several important cereal crops such as wheat, barley and rye, as well as forage crops. The tribe combines a wide variety of biological mechanisms and genetic systems which makes it an excellent group for research in evolution, genetic diversity, taxonomy, and speciation in plants [12]. According to Löve [13] and Dewey's [14] classification, genus Hordeum and Elymus are two relative large genera in the tribe Triticeae.
The genus Hordeum comprises 31 species (including cultivated barley, H. vulgare ssp. vulgare) and exists at the diploid, tetraploid, and hexaploid levels with a basic chromosome number x = 7. Based on cytogenetic analyses, the diploid species in Hordeum were classified into four monogenomic groups: H, I, Xa, and Xu genome group [15,16], which were supported by isoenzyme analysis [17] and molecular data [18][19][20][21][22]. The H genome group is not only the largest genome group in Hordeum (including 14 diploid species, 7 tetraploid species, 4 hexaploid species, and 2 species existing at three ploidy levels (26,46,66), and distributed widely from central Asia to the Americas), but also widely present in polyploid species in Elymus, Stenostachys and Pascopyrum.
Within the genus Elymus are approximately 50 allotetraploid species that combined both H and St genomes, and distributed throughout the world in non-tropical areas, from northern Greenland in the north to Tierra del Fuego in southernmost South America [23]. The St haplome originated from the genus Pseudoroegneria [14]. It has been confirmed that the H haplomes in Elymus were contributed by different Hordeum diploids [24][25][26][27][28][29][30]. Phylogenetic analyses based on phosphoenolpyruvate carboxylase, b-amylase, granule-bound starch synthase I [29] and disrupted meiotic cDNA (DMC) [30], suggested few potential Hordeum diploid species as H-genome donors to Elymus species. The tetraploid H. jubatum might have involved in the origin of StH Elymus [29]. However, the role of polyploid Hordeum species in the origin of StH Elymus remains to be studied. It is not clear whether gene flow across polyploid Hordeum and Elymus species has occurred following polyploid speciation. Recent studies led to the conclusion that the polyploid probably originated multiple times [1,5], which are often considered as a potential source of increased genetic variation in polyploids. However, how much genetic variation is contributed by the diploid progenitors and the degree of gene flow among the independent origins are the two major factors determining the genetic diversity in polyploids. Yet, the extent and role of gene flow among polyploids in evolution remains enigmatic.
In present study, DMC1 data from the allopolyploid StH Elymus are analyzed together with diploid and polyploid Hordeum species. The objectives of this analysis are: (1) to explore the possible role of polyploid Hordeum species in the origin of StH Elymus; (2) to determine whether gene flow has occurred between polyploid Hordeum and Elymus; and (3) to examine the level of nucleotide polymorphism in the H genomes from Elymus, Hordeum diploids and polyploids. Answering these questions will provide new insights into the formation of these polyploid species, and the potential role of gene flow among polyploid species during polyploid evolution.

Samples
The present study includes 18 tetraploid (22 accessions) StH Elymus, 9 polyploid Hordeum species. All diploid Hordeum species and other diploid Triticeae species representing the St, W, P, and E genomes were included for analyses (Table 1). Bromus arvensis and B. sterilis were used as outgroups. The single copy nuclear gene disrupted meiotic cDNA (DMC1) has been applied to phylogenetic analyses in Triticeae species. The DMC1 sequences used in this study were collected from previously published sources [21,[30][31][32][33][34][35].

Data Analysis
Multiple sequence alignments are made using Clustal X with default parameters [36] with manual adjustment. Phylogenetic analysis using the maximum-parsimony (MP) method is performed with the computer program PAUP* ver. 4 beta 10 [37]. All characters are specified as unweighted and unordered, and gaps are excluded in the analyses. Most-parsimonious trees are obtained by performing a heuristic search using the Tree Bisection-Reconnection (TBR) option with MulTrees on, and ten replications of random addition sequences with the stepwise addition option. Multiple parsimonious trees are combined to form a strict consensus tree. Overall character congruence is estimated by the consistency index (CI), and the retention index (RI). In order to infer the robustness of clades, bootstrap values with 1000 replications [38] are calculated by performing a heuristic search using the TBR option with MulTrees on.
In addition to maximum parsimony analysis, maximumlikelihood (ML) analysis is performed. For ML analysis, 8 nested models of sequence evolution were tested for the data set using PhyML 3.0 [39]. The general time-reversible (GTR) [40] substitution model led to a largest ML score compared to the other 7 substitution models: JC69, K80, F81, F84, HKY85, TN93 and custom (data not shown). As the result, the GTR model was used for the ML analysis. The ML analysis was performed using the Mac OS X UNIX version of GARLI v. 0.95 [41]. The runs were set for an unlimited number of generations, and automatic termination following 10,000 generations without a significant (lnL increase of 0.01) topology change. Thirty analyses were run with random starting tree topologies, and the tree with best score was used to represent gene tree. Branch support (BS) was estimated based on 100 ML bootstrap replicates in GARLI.
Nucleotide diversity was estimated by Tajima's p [42] and Watterson's h [43] statistics. The former measure quantifies the mean percentage of nucleotide differences among all pairwise comparisons for a set of sequences, whereas the latter is simply an index of the number of segregating (polymorphic) sites. Tests of neutral evolution were performed as described by Tajima [42], and Fu and Li [44]. The above calculations were conducted by the software program DnaSP v5 [45].
The 89 (including two outgroups) aligned 1221 bp DMC sequences showed 794 constant, 221 variable and parsimonyuninformative, and 206 parsimony-informative sites. Parsimony analysis using Bromus arvensis and B. sterilis as outgroup produced 740 equally parsimonious trees with a consistency index (CI) of 0.693, and a retention index (RI) of 0.848. Maximum likelihood analysis across 30 GARLI runs generated likelihood score ranging from -lnL6349.08703 to -lnL6355.83219. ML tree with BS is shown in Figure 1.
Two copies of sequences each from E. caninus, E. cordilleranus, E. hystrix, E. sibiricus, E. virginicus and E. wawawaiensis were well separated into two distinct groups, one grouped with the sequences from H genome, and another with St genome from Pseudoroegneria ( Fig. 1). As unexpected, the sequence (GQ855194) from E. transhyrcanus formed clade with Lophopyrum elongatum and Thinopyrum bessarabicum with 90% BS in ML and 78% BS support in MP. The second copy of the sequence from E. transhyrcanus (GQ855193) was   (Fig. 1). Based on grouping of the sequences in phylogenetic analysis, we further separately analyzed nucleotide variation of DMC1 gene in the H genome from Hordeum polyploids and diploids, and Elymus. Some of the putative H copies of sequences from Elymus and Hordeum polyploids were not clearly put into the H clade. These sequences (PI 537323L, GQ855194 and W6-13828K from Elymus, H6198s2, H2013s2, H1418s2 and H2144s2 from Hordeum) were excluded for nucleotide diversity analysis. Estimates of nucleotide polymorphism, p and hw, were shown separately for the H genome of Elymus, Hordeum polyploid and diploid species ( Table 2). The number of polymorphic sites (56) in the H genome of polyploid Hordeum is much lower than that (90) in its diploid donor species. The estimates of nucleotide diversity in the H genome of diploid Hordeum studied were hw = 0.02693, p = 0.01488. The estimates of nucleotide diversity in the H genome of polyploid Hordeum studied were hw = 0.0168, p = 0.01734. The number of polymorphic sites, the estimates of nucleotide diversity hw, p for the H genome in Elymus species was 80, 0.02774 and 0.02083, respectively. The Tajima [42], and Fu & Li's [44] tests were conducted on each data set. The Tajima

Discussion
Origin of H genome in StH-genome Elymus species based on single copy nuclear gene DMC1 has previously been discussed [30]. DMC1 sequence data also showed a reticulate relationship of American polyploid species and diploid Hordeum [35]. However, the relationship of the H genome in polyploid Hordeum and Elymus was not previously explored.
The maximum parsimonious analysis grouped 24 sequences from Hordeum diploid and polyploid species together with 94% bootstrap supported value, maximum likelihood analysis also grouped these sequences together with highly supported value of 97% (Fig. 1). Only 3 Hordeum diploid H genome species, H. brevisubulatum subsp. violaceum, H. brachyantherum subsp. californicum, and H. bogdanii were grouped together with the sequences from Elymus H genome, indicating that the H genomes in Elymus originated from limited Hordeum diploid species, whereas the H genomes in polyploid Hordeum species were contributed by relative large Hordeum diploids. One concern is that the number of sequences from Elymus H genome is less than the number of sequences from Hordeum polyploids analyzed here, which may bias the comparison. However, phylogeny of Elymus StStHH allotetraploids based on three nuclear genes including relative large sample of Elymus species suggested that the one diploid Hordeum species, H. brachyantherum subsp. californicum (Syn: H. californicum Covas & Stebbins), is the possible H-genome donor to Elymus species [29], which also indicated that H genome in Elymus species originated from limited Hordeum diploid. However, the published data indicated that many Hordeum diploid species have contributed to the origin of polyploids in this genus, more than 10 diploid species were suggested to be the potential donors to polyploids in Hordeum [22,35,46]. Taken these together, it can be concluded that the H genomes in Elymus species have originated from limited gene pool, while H genomes in Hordeum polyploids have originated from broad gene pools.
Polyploid formation is a prominent mode of speciation in the flowering plant. Recent molecular data indicated that polyploid speciation is often more complex than initially thought [47], which is also the case in the tribe Triticeae [8,22,28,29,48,49]. Molecular studies suggested polyploid species in many genera have multiple origins rather than single origin [1,5,47,50]. It was suggested that the fates of polyploid populations of independent origins varied depending on the amount of genetic variation initially contributed by the diploid progenitors [50]. Studies have demonstrated that genetic diversity in polyploids is often similar to or higher than their diploid progenitors [47,51,52]. It is worth comparing the nucleotide diversities among the H genomes from Elymus, polyploid and diploid Hordeum species. This may offer an opportunity to address the potential evolutionary outcomes of   21.19959 The N is the number of sequences analyzed, h is the number of haplotypes, n is the number of the sites (excluding sites with gaps/missing data), s is the number of segregating sites, p is the average pairwise diversity, h w is the diversity based on the number of segregating sites. *: Significant at a = 0.05. doi:10.1371/journal.pone.0050369.t002 polyploidization. Nucleotide diversity (p) of the DMC1 sequences from polyploid species (p = 0.02083 in Elymus, p = 0.01680 in polyploid Hordeum) is higher than that in diploid Hordeum (p = 0.01488). The estimates of D were significantly departure from the equilibrium neutral model at this locus in diploid Hordeum species (P,0.05), suggesting an excess of rare variants in diploid species. These rare variants may not contribute to the origination of polyploids. Phylogenetic analyses indeed indicated that not all diploids have contributed to the origination of polyploids in Hordeum and Elymus. Why is the genetic variation in polyploids higher than in diploid even the polyploids originated from limited number of diploid detected here? It has demonstrated that polyploidization resulted in the genome wide gene duplication which not only enables allopolyploids to tolerate more genomic variation than their progenitors, but also provides new opportunity to create functional diversification between homoeologous genes over time [52][53][54]. After gene duplicated, one of the copies may undergo mutations, if mutations are not deleterious, the mutations will not be removed by natural selection. Nucleotide diversity of this copy of gene in polyploids will be higher than that in their progenitors. Recent studies on the evolutionary rates of duplicated genes in polyploids compared to their diploid relatives showed that the evolutionary rates appear to be different among different homoeologous locus pairs [55][56][57][58]. Barrier et al. [59] found that the floral regulatory genes APETALA1 (ASAP1) and APETALA3 (APETALA3/TM6) are evolving much faster in the polyploid species than in the diploids. Analysis of nucleotide sequence diversity (p) of RPB2 revealed that nucleotide diversity (p) of RPB2 on the St genome in tetraploid Elymus was higher than that in the diploid Pseudoroegneria St genome [60]. The degree of relationships between two parents of a polyploid was suggested as a general factor affecting the amount of genomic sequence variation in allopolyploids [54]. In a study on interspecifc crosses of Brassica found that the overall amount of genomic change in AC (or CA) tetraploids was much lower than that in the AB (or BA) tetraploids. This was because the genetic distance between the A (B. rapa) and C (B. oleracea) was much closer than that between the A and B (B. nigra) [61]. A study on the timing and rate of genome variation in triticale following allopolyploidization also suggested the degree of the relationship between the parental genomes was the key factor in determining the rate of genomic sequences variation occurring during intergeneric allopolyploidization [54]. It was well demonstrated that genus Hordeum is monophyletic genus [21,22], and polyploids originated from the diploid species in this genus. While the Elymus StH genomic species originated from the St genome donor Pseudoroegneria and H genome donor Hordeum species. The genetic distance of parental genomes in polyploid Hordeum is much closer than that between St and H genomes. The degree of relationships between two parents of a polyploid might be factor affecting nucleotide diversity in allopolyploids. This might explain that nucleotide diversity (p) of the DMC1 sequences in Elymus polyploid species (p = 0.02083) is higher than that in polyploid Hordeum (p = 0.01680). This speculation needs to be further studied.
One of objectives of this study is to explore the possible role of polyploid Hordeum species in the origin of StH Elymus and whether the gene flow has occurred between polyploid Hordeum and Elymus species.  (Fig. 1). The diploid H. pusillum and tetraploid H. jubatum was suggested as the parental parents for H. arizonicum [22,46,62]. cpDNA analysis suggested that H. pusillum could be the maternal parent of H. arizonicum [63]. Previous analysis of DMC1 data suggested that H. brachyantherum subsp. californicum might be one diploid genome donor of H. arizonicum, and the second genome donor of H. arizonicum might be the common ancestor of H. brachyantherum subsp. brachyantherum, and showed that one copy of DMC1 sequences of H. arizonicum fall outside the Hordeum clade of the tree [35]. DMC1 data here suggested if the St genome was not the donor species to one copy of genome in H. arizonicum, gene flow has occurred between H. arizonicum and some Elymus StH genome species during evolutionary process.
Analysis of b-amylase data revealed that one of the H. jubatum genome was placed together with Elymus species [29]. The role of H. jubatum in the Elymus evolutionary history has been suggested, a tetraploid similar to H. jubatum might have been involved in the history of Elymus, either through introgression between the Elymus and H. jubatum, or through a direct contribution from H. jubatum like species to Elyums [29]. Our DMC1 data showed that one of the H. jubatum genome with H. tetraploidum, H. fuegianum, T. caputmedusae and Aust. velutinum grouped together, Elymus virescens as sister to this group. Our result not only did not contradict to the suggestion that H. jubatum was involved at some stage in the history of StStHH Elymus [29], but also further expanded to that several ployploid Hordeum species might have involved in the evolution of StStHH Elymus through gene flow among them.
The study on the polyploid formation in Tragopogon (Asteraceae) indicated a lack of gene flow among polyploid plants of independent origin, even when they co-occur, suggesting potential reproductive barriers among separate lineages in polyploid species [50]. Sequence analysis of 12 nuclear loci representing 6 genes on tetraploid Capsella bursa-pastoris and its close diploid relative C. rubella showed that polyploid speciation need not result in immediate and complete reproductive isolation, and the postpolyploidization hybridization and introgression can contribute significantly to genetic variation in newly formed polyploid [64]. Molecular characterization of a diagnostic DNA marker for domesticated tetraploid wheat indicated gene flow from wild tetraploid wheat to hexaploid wheat [65]. Our results suggested that gene flow among different polyploids in Triticeae species might play an important role in polyploid speciation and evolution.