Composition, variation, expression and evolution of low-molecular-weight glutenin subunit genes in Triticum urartu

Wheat (AABBDD, 2n = 6x = 42) is a major dietary component for many populations across the world. Bread-making quality of wheat is mainly determined by glutenin subunits, but it remains challenging to elucidate the composition and variation of low-molecular-weight glutenin subunits (LMW-GS) genes, the major components for glutenin subunits in hexaploid wheat. This problem, however, can be greatly simplified by characterizing the LMW-GS genes in Triticum urartu, the A-genome donor of hexaploid wheat. In the present study, we exploited the high-throughput molecular marker system, gene cloning, proteomic methods and molecular evolutionary genetic analysis to reveal the composition, variation, expression and evolution of LMW-GS genes in a T. urartu population from the Fertile Crescent region. Eight LMW-GS genes, including four m-type, one s-type and three i-type, were characterized in the T. urartu population. Six or seven genes, the highest number at the Glu-A3 locus, were detected in each accession. Three i-type genes, each containing more than six allelic variants, were tightly linked because of their co-segregation in every accession. Only 2-3 allelic variants were detected for each m- and s-type gene. The m-type gene, TuA3-385, for which homologs were previously characterized only at Glu-D3 locus in common wheat and Aegilops tauschii, was detected at Glu-A3 locus in T. urartu. TuA3-460 was the first s-type gene identified at Glu-A3 locus. Proteomic analysis showed 1-4 genes, mainly i-type, expressed in individual accessions. About 62% accessions had three active i-type genes, rather than one or two in common wheat. Southeastern Turkey might be the center of origin and diversity for T. urartu due to its abundance of LMW-GS genes/genotypes. Phylogenetic reconstruction demonstrated that the characterized T. urartu might be the direct donor of the Glu-A3 locus in common wheat varieties. Compared with the Glu-A3 locus in common wheat, a large number of highly diverse LMW-GS genes and active genes were characterized in T. urartu, demonstrating that this progenitor might provide valuable genetic resources for LMW-GS genes to improve the quality of common wheat. The phylogenetic analysis provided molecular evidence and confirmed that T. urartu was the A-genome donor of hexaploid wheat.


Background
Wheat flour can be made into a wide variety of foods due to the unique viscoelastic properties of dough [1,2]. These viscoelastic properties result from gluten proteins, which account for about 80% of the total grain proteins [3,4]. Wheat gluten is composed of two main components: glutenin and gliadin. Glutenin plays a major role in dough's elasticity, while gliadin contributes mainly to dough's viscosity [5]. According to their relative mobility in sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE), glutenin proteins are generally divided into high-molecular-weight glutenin subunits (HMW-GSs) and low-molecular-weight glutenin subunits (LMW-GSs) [6]. LMW-GSs have molecular weights ranging from 20 kDa to 45 kDa, making up 60% of glutenin proteins and one third of seed storage proteins [3,7]. Based on the first amino acid of the mature proteins, LMW-GSs have been classified into three types: i-(isoleucine), m-(methionine) and s-(serine) [8].
LMW-GSs are encoded by a multi-gene family whose members are located at Glu-A3, Glu-B3 and Glu-D3 loci on the short arms of homologous chromosomes 1A, 1B, and 1D, respectively [9]. Without a complete genome sequence, it is hard to determine the exact members of LMW-GS gene family in a wheat variety. In the past decade, the LMW-GS gene family members were characterized in only a few wheat varieties, including Norin 61, Glenlea and Xiaoyan 54 [10][11][12]. Twelve to 19 LMW-GS genes were identified from individual varieties using complementary methods, including cDNA or DNA BAC library screening and proteomic analysis. Recently, a new molecular marker system was developed to identify LMW-GS gene family members which used highresolution capillary electrophoresis to separate fragments of gene members with three conserved primer sets (LMWGS1, LMWGS2 and LMWGS3) [13]. Using this marker system, more than 15 LMW-GS genes were detected from single wheat variety [13]. This marker system was also used as a complementary tool for the allelic determination of LMW-GS genes at Glu-B3 locus in wheat cultivars and segregating populations [14]. A full-length gene-cloning method based on this marker system has been used to clone 16 or 17 LMW-GS genes in individual bread wheat genotypes [15]. Both the marker system and the gene cloning method were applied to investigate the composition of LMW-GS genes in large populations, including Aroona nearisogenic lines and the micro-core collections (MCC) of Chinese wheat germplasm [16,17], demonstrating their efficiency in dissecting this complex gene family in common wheat.
Wild progenitors and relatives could provide tremendous genetic variability to broaden the gene-pool of common wheat [18]. In the past decades, several important agronomic genes have been well characterized, such as the stem rust resistance gene Sr47, the leaf rust resistance genes (Lr41, Lr42 and Lr43) from Aegilops tauschii, the high grain protein content (Gpc-B1) gene from tetraploid wheat and the chromosome arm 1RS containing both disease resistance and high yield genes from rye [19][20][21][22]. T. urartu is the wild diploid wheat from the Fertile Crescent region, and has long been considered as the A-genome donor in polyploid wheat species [23]. Isozyme, RAPD and AFLP markers have detected large genetic variations in T. urartu populations [24,25]. Recently, a set of genes were also characterized in T. urartu, e.g., the powdery mildew resistance gene (PmU), and the grain length controlling gene (TuGASR7) [26][27][28]. Abundant variability of storage proteins in T. urartu, was detected in gliadin proteins and HMW-GSs using electrophoretic procedures or nucleotide sequence analysis [29,30]. Several variants with repetitive domain length polymorphism were also observed in LMW-GS genes [31]. However, the detailed composition and genetic diversity of LMW-GS genes in T. urartu remain unknown.
Dissecting the composition and diversity of LMW-GS genes in T. urartu is prerequisite to broadening the genetic resources for bread-making quality improvement in common wheat; unraveling the genetic diversity of T. urartu will facilitate its gene and germplasm conservation. In this study, a systematic molecular analysis of LMW-GS genes in T. urartu was conducted using complementary approaches, including highthroughput molecular marker system, gene cloning, two-dimensional electrophoresis (2-DE), liquid chromatography tandem mass spectrometry (LC-MS/MS), matrix assisted laser desorption/ionization time of flight tandem mass spectrometry (MALDI-TOF/TOF-MS) and SDS-PAGE. The gene composition, variation, organization and expression pattern were extensively investigated in 157 accessions collected from the Fertile Crescent region, which is widely considered as the center of origin and diversity of T. urartu [25,32]. Genetic diversity of LMW-GS genes and genotypes in T. urartu and their evolutionary clues pertaining to wheat species of different ploidy were further discussed.
To further characterize the LMW-GS genes represented by these DNA fragments, 50 typical accessions, covering all 15 genotypes, were subjected to gene cloning using the full-length gene cloning method (Additional file 1: Table S1) [15]. Generally, six or seven LMW-GS gene sequences were cloned in each accession, which matched well with six or seven DNA fragments detected with the marker system. Totally, 148 LMW-GS sequences were obtained and deposited in GenBank (KM065455-KM065457, KM085178-KM085322); these sequences were derived from eight LMW-GS genes (i.e., TuA3-385, TuA3-391, TuA3-397, TuA3-400, TuA3-460, TuA3-502, TuA3-538 and TuA3-576) determined in the T. urartu population due to the redundancy and large number of allelic variants ( Table 2). Among these genes, only two or three variants Table 1 LMW-GS genes and their allelic variants identified in T. urartu population using the LMW-GS gene molecular  marker system   Gene  Allelic variants a  LMWGS1 b  LMWGS2 b  LMWGS3 b   TuA3-385  TuA3-385  385  492  383   TuA3-391  TuA3-373  TuA3-391  TuA3-392  373  391  392  480  484  501  371  375  390   TuA3-397  TuA3-397  397  504  396   TuA3-400  TuA3-400  TuA3-402  400  402  506  509  399  402   TuA3-400  TuA3-460  TuA3- A single gene could be detected by no less than one primer set, and the correspondence among fragments detected by these three primer sets was established by their theoretical sizes. LMW-GS genes and allelic variants were named in accordance with the sizes of their corresponding fragments amplified by the LMWGS1 primer set practically or theoretically, and the major allelic variant was designated as the gene whereas the remainders as its allelic variants [17]. b Three primer sets of the LMW-GS gene molecular marker system [13]. c Not amplified by the specific primer sets. Figure 1 Electropherograms of DNA fragments detected in accession PI428198 using the LMW-GS gene molecular marker system. The horizontal axis shows the detected fragment sizes, and the vertical axis displays the signal intensities during the capillary electrophoresis. The orange peaks were size standard DNA fragments in the GeneScan 1200 LIZ and each blue peak represents a LMW-GS gene. were detected for each of the TuA3-385, TuA3-391, TuA3-397, Tu-A3-400 and TuA3-460 genes. In contrast, at least seven variants were identified for each of the TuA3-502, TuA3-538 and TuA3-576 (Table 2). All allelic variants resulted in 15 genotypes at the Glu-A3 locus in T. urartu, which was consistent with the genotypes based on the size of DNA fragment in the marker system (Table 2).

m-type LMW-GS genes
TuA3-385 gene with two variants, TuA3-385a and TuA3-385b, was widely distributed in the T. urartu population. Both variants were supposed as pseudogenes due to immature stop codons at their repetitive domains (Additional file 1: Table S2; Additional file 2: Figure S1). Another common gene, TuA3-391 contained three allelic variants: TuA3-373, TuA3-391 and TuA3-392 (Additional file 1: Table S2; Additional file 2: Figure S2). All three allelic variants of TuA3-391 gene were pseudogenes because of immature stop codons at either their repetitive or C-terminal I domains. TuA3-397 was also a common gene in T. urartu population. TuA3-397a was the major allelic variant (87.26%), but it might be a pseudo-gene. The other variant, TuA3-397b might encode an m-type LMW-GS for its intact open reading frame (ORF) (Additional file 1: Table S2; Additional file 2: Figure  S3). TuA3-400 gene was seldom detected in T. urartu population; its two allelic variants (TuA3-400 and TuA3-402) were only detected in four and five accessions, respectively (Additional file 1: Table S2). These two allelic variants shared 99% identity despite two 3-bp InDels and several SNPs (Additional file 2: Figure S3). The TuA3-402 allelic variant was a pseudo-gene due to the immature stop codon at its C-terminal II domain, whereas TuA3-400 was supposed to be active for its intact ORF.
Another i-type gene TuA3-538 had seven variants: TuA3-535, TuA3-538a/b/c/d/e and TuA3-657 in the T. urartu population. All the variants were supposed to be active genes for their intact ORFs, except for the TuA3-538d variant. The TuA3-538a/b/c/d/e variants distinguished themselves mainly by different repeat number of CAG and CAA motifs at the C-terminal II domain in addition to several SNPs throughout their coding regions. The TuA3-535 variant shared >99% identity with each of the TuA3-538a/b/c/d/e variants. Compared with the other allelic variants of the TuA3-538 gene, the long fragment of TuA3-657 was mainly derived from two insertions (24-bp and 87-bp) at the repetitive domain (Additional file 2: Figure S5). Among these allelic variants, the TuA3-538a, TuA3-538b and TuA3-538c were widely distributed in the T. urartu population, occupying 85.35% of variants. TuA3-535, TuA3-538d, TuA3-538e and TuA3-657 were rare, present in a few accessions (Additional file 1: Table S2).

Expression of LMW-GS genes in T. urartu
The bread-making quality of wheat flour is attributed greatly to the composition of LMW-GSs and the number of expressed genes [12,16]. To investigate the expression pattern of LMW-GS genes in T. urartu, four accessions from four genotypes, U2 (PI428202), U9 (PI428255), U10 (PI428270) and U8 (PI428335), in turn containing one, two, three and four genes with intact ORFs, were selected and subjected to proteomic analysis. All the spots on 2-DE gels of PI428202, PI428255 and PI428270, and the spots of LMW-GSs of PI428335 were identified by LC-MS/MS or MALDI-TOF/TOF MS ( Figure 2).
Of the 25 spots investigated for PI428270 in the U10 genotype, three (spots 1, 2 and 3) were LMW-GSs, two were globulin, 13 were gliadins, and the remaining spots were other storage proteins (avenin, hordein and aveninlike precursor) (Figure 2a; Additional file 1: Table S3, S4 and S5). Spots 1, 2 and 3 were in turn matched to protein TuA3-538b, TuA3-579a and TuA3-502a in U10, and corresponded to the middle, upper and lower bands in SDS-PAGE due to their same mobility, respectively (Figure 2a; Additional file 1: Table S3). In the PI428335 accession with the U8 genotype, five spots were identified as LMW-GSs. Spots 1, 2 and 3 in turn matched deduced amino acid sequences of TuA3-538b, TuA3-579b and TuA3-502a, whereas both spots 4 and 5 matched TuA3-397b. All of these spots had corresponding bands with the same mobility in SDS-PAGE (Figure 2b; Additional file 1: Table S3). Interestingly, TuA3-400, which was only identified in the U15 genotype, shared the same 2-DE protein spot and SDS-PAGE band with TuA3-397b due to their similar molecular mass and isoelectric point (pI) value in our previous MS/MS identification (Data not shown) (Table 2; Figure 2e). In PI428202, spots 1 and 2 were proteins of the only active variant, TuA3-538c in U2; six (spot 1) and eight (spot 2) high-quality peptide sequences obtained by MS/MS analysis matched hypothetical polypeptides of TuA3-538c, respectively (Figure 2c; Additional file 1: Table S3). Moreover, these two spots also corresponded to the only band (TuA3-538c) detected with SDS-PAGE (Figure 2c). With regard to PI428255 of the U9 genotype, spot 1 was a protein product of the TuA3-538e variant in U9 and corresponded to the lower band (TuA3-538e) in SDS-PAGE, and spot 2 was that of the TuA3-576b variant and matched the upper band (TuA3-576b) in SDS-PAGE (Figure 2d; Additional file 1: Table  S3). The SDS-PAGE data for all the genotypes confirmed that four main types of bands corresponded to intact ORFs of TuA3-397/TuA3-400, TuA3-502, TuA3-538 and TuA3-576 in this T. urartu population by comparing their electrophoretic mobility with deduced protein molecular weights (Figure 2e). Collectively, the expression patterns of LMW-GS genes in T. urartu were consistent with the active genes determined using the LMW-GS marker system and full-length gene cloning method.
Generally, i-type genes were the main active genes in T. urartu, and one to three of them were expressed in individual accessions (Table 2, Figure 2e). One i-type variant, TuA3-538c, was expressed in 35 accessions of the U2 genotype. And all three i-type genes, TuA3-502, TuA3-538 and TuA3-576 were characterized as expressed genes in the U3, U5, U6, U10, U8, U11 and U14 genotypes, which together contained 61.78% of the total accessions ( Table 2). All the m-type genes were pseudo-genes except for the TuA3-397b and TuA3-400 allelic variants, which were only detected in the U8 (TuA3-397b) and U15 (TuA3-397b and TuA3-400) genotypes (Table 2, Figure 2e). None of the variants of the s-type gene, TuA3-460, were active as no protein bands were detected on 2-DE and SDS-PAGE, which was consistent with the stop codons in their CDS regions (Table 2, Figure 2e).

Characteristics of LMW-GS genes in T. urartu
Based on the first amino acid of their mature protein sequences, the eight genes in T. urartu were classified into three types (m-, s-and i-). TuA3-385, TuA3-391, TuA3-397 and TuA3-400 were m-type, TuA3-460 was s-type and TuA3-502, TuA3-538 and TuA3-576 were i-type genes. Their deduced mature proteins contained three conserved domains (N-terminal domain, repetitive domain and C-terminal domain), except for the i-type subunit which lacked the N-terminal domain (Additional file 2: Figure S9). Cysteine residues could form inter-and intra-chain disulphide bonds which are of great importance for the formation of glutenin polymers [3]. All the subunits identified in the T. urartu population contained eight cysteine residues. The location of these cysteine residues, in the m-and i-type genes, were conserved with six of the residues at the C-terminal I domain and one at each of the C-terminal II and III domains, except for the first and the third cysteine residues in the m-type genes, TuA3-397b and TuA3-400 (Additional file 2: Figure S9). The m-type LMW-GSs were also different from the i-type genes in molecular weight. The estimated molecular weight of TuA3-397b and TuA3-400 were 31.77 kDa and 31.90 kDa, respectively, substantially lower than the average molecular weight of all the i-type genes (TuA3-502, 36.98 kDa; TuA3-538, 38.55 kDa; TuA3-576, 39.42 kDa) because of longer repetitive regions in the itype subunits (Additional file 2: Figure S9).
All variants of the eight LMW-GS genes in T. urartu were subjected to phylogenetic analysis using ClustalW2 and MEGA 5. Two main clades were obtained in the phylogenetic tree, one containing all the m-and s-type genes, and the other including all the i-type genes ( Figure 3). In the m-and s-type gene clade, four subclades were further divided, each containing variants of a single gene, except for the sub-clade of TuA3-397, where TuA3-400 was also involved (Figure 3). In the clade of i-type genes, three sub-clades were further divided, which corresponded to the TuA3-502, TuA3-538 and TuA3-576 genes, accordingly (Figure 3).

Geographic distribution of LMW-GS genes and genotypes in T. urartu
The 157 analyzed T. urartu accessions were collected in the Fertile Crescent region, including northeastern Lebanon, southeastern Turkey, Armenia, Syria, Iraq and Iran, where many temperate-zone agricultural crops originated and were domesticated [33]. For the purpose of better exploitation and in situ genetic conservation of T. urartu germplasm, the geographic distribution of their LMW-GS genes/variants and genotypes was analyzed.
Southeastern Turkey was the region of the greatest diversity where all eight genes and 34 of their total 39 variants were detected, as well as ten unique variants were found (Table 2, Figure 4a). In northeastern Lebanon, 26 variants of seven genes (all except TuA3-400) were detected; all were shared by southeastern Turkey except TuA3-397b and TuA3-579b (Table 2, Figure 4a). With regard to the genotypes of LMW-GS genes, southeastern Turkey was also the region of the highest/(most abundant) diversity, as the majority of genotypes were detected there (Table 2, Figure 4b). Moreover, seven genotypes (U1, U3, U4, U5, U11, U12 and U14) were unique to southeastern Turkey (Table 2, Figure 4b). In northeastern Lebanon and Syria, seven and four genotypes were detected, respectively. All genotypes were also present in southeastern Turkey except for the U8 genotype (Table 2, Figure 4b).
Despite containing the unique genotype, U15, Armenia shared the U10 genotype with southeastern Turkey and northeastern Lebanon (Table 2, Figure 4b). In Iraq and Iran, the only genotype, U2, was also detected in southeastern Turkey, northeastern Lebanon and Syria (Table 2, Figure 4b). In summary, southeastern Turkey should be the center of origin for T. urartu because the greatest diversity of LMW-GS genes/variants and genotypes were detected. And in the five remaining collection areas, almost all the genes/variants and genotypes were observed in southeastern Turkey.

LMW-GS genes in T. urartu
Eight LMW-GS genes, i.e., four m-type, three i-type and one s-type genes, were characterized in the T. urartu population. In each accession, six to seven genes were detected, the highest number of LMW-GS genes reported at Glu-A3 locus to our best knowledge. To investigate the evolutionary relationships of LMW-GS genes between T. urartu and other diploid or polyploid wheats, all the gene sequences in T. urartu were queried with the nucleotide BLAST program in NCBI. Gene sequences sharing high identity (>90%, even 99%) with the LMW-GS genes in T. urartu were found (Additional file 1: Table S6).
For TuA3-385, no homolog was found at Glu-A3 locus of other wheat species, but interestingly this gene shared >98% identity with D3-385 at the Glu-D3 locus (e.g., JX878094) in hexaploid wheat [17] and the GluDt-64 allele (EF437430) in Ae. tauschii (Additional file 1: Table S6). And phylogenetic analysis showed the TuA3-385 and D3-385 genes clustered together (Figure 5a).  Thus, the TuA3-385 and D3-385 genes were homologs between T. urartu and common wheat. However, TuA3-385 was a pseudo-gene and D3-385 was an active gene in both common wheat and Ae. tauschii. This m-type gene should be an ancient gene that emerged before the divergence of the A and D genomes, but this gene was lost at the Glu-A3 locus during wheat polyploidization (Additional file 1: Table S6; Additional file 2: Figure  S1). During evolution, this gene in T. urartu was mutated and became a pseudo-gene, but the gene in Ae. tauschii and common wheat maintained an intact ORF.
TuA3-460 was the first s-type LMW-GS gene detected at the Glu-A3 locus in Triticum. Interestingly, all of its BLAST hits (≤90% identities) in BLAST databases were not s-type but m-type genes (Additional file 1: Table S6). And the phylogenetic analysis indicated that the variants of TuA3-460 were in the same clade with m-type genes at the Glu-B3 and Glu-D3 loci in Triticum aestivum (B3-530, B3-578, B3-570, D3-575 and D3-586) and Ae. speltoides (FJ824794) but not s-type genes (Figure 5a). Protein sequence alignments were performed to understand the variations among TuA3-460, the known s-and m-type LMW-GSs (Additional file 2: Figure S7). Besides the conserved s-type N-terminal domain (MENSHIP-GLEKPS) and the s-type specific peptide (TLSH) at the repetitive domain, TuA3-460 had 11 unique amino acids throughout the protein sequences, including one unique Cysteine residue at the C-terminal II domain. Moreover, compared with the known m-and s-type LMW-GSs, TuA3-460 contained three deletions (PPFSQQ, PVLPQQ and PPFSQQQQ) at the repetitive domain (Additional file 2: Figure S7). Thus, TuA3-460 is a new s-type gene at the A genome, which is homologous with the s-type genes at the B and D genomes, rather than a chimeric gene containing s-and m-type gene sequences. The s-type LMW-GS genes were closer to the m-type genes than the i-type ones (Figure 3). Our deduced protein sequence alignments also revealed that the s-type LMW-GSs had higher similarities with the m-type LMW-GSs (Additional file 2: Figure S9), and the m-type LMW-GSs shared the variations with all s-type proteins from A, B and D genomes. Thus, m-type gene should be the oldest type of LMW-GS gene, and s-type genes probably originated from m-type LMW-GS genes due to the mutation of MET to MEN in the N-terminal region, which was consistent with the previous observations [3,17,34]. Even though containing several unique features, TuA3-460 had a pretty high similarity with the m-type genes, especially possessed one insertion (KQLGQCSFQQPQQQ) at the C-terminal domain and four amino acids (Additional file 2: Figure S9), which were exclusively contained in the m-type LMW-GSs. All these data indicate that this new s-type TuA3-460 gene also originated from the mtype LMW-GS genes. However, most features (specific amino acids and InDels) of the previously characterized s-type LMW-GSs could not be detected in TuA3-460, implying that TuA3-460 might not share the same evolutionary process with other s-type LMW-GS genes from the primitive m-type LMW-GS gene, or they could originate from different m-type genes.
However, the i-type genes preserved high polymorphisms at the Glu-A3 locus of Triticum [17]. In T. urartu, all accessions possessed three i-type genes, compared with 2-4 genes in common wheat, implying that the Glu-A3 locus might be derived from more than one origin of T. urartu and suffered rapid genome divergence. Moreover, all the three i-type genes in many T. urartu accessions were active genes, while only one or two genes were expressed in common wheat. This indicated that T. urartu is valuable in quality improvement in common wheat since a high number of active genes might contribute to superior bread-making quality [12]. Moreover, the i-type genes (A3-502/A3-573/A3-640, Glu-A3f ) had positive effects on dough quality, e.g. percentage of SDS-unextractable fraction in total polymeric protein, dough resistance and extensibility [16]. The corresponding homologs of A3-573 and A3-640 were also detected in T. urartu.

Center of origin and diversity of T. urartu
Turkey was established as the center of origin and diversity with abundant plant species and endemism based on its variety in geomorphology, topography and climate [35]. Furthermore, southeastern Turkey exhibits great genetic diversity of plants in the Triticeae family, and is supposed to be the origin of domestication for wheat and einkorn (T. monococcum) [36]. Among the six collection areas in this work, southeastern Turkey showed the highest genetic diversity of LMW-GS genes (35 of the total 39 variants) and genotypes (13 of the total 15 genotypes) in T. urartu (Figures 3b, 4a; Table 2). Almost all of the genes/ variants and genotypes detected in the remaining areas were also detected in southeastern Turkey. Moreover, many variants (e.g., TuA3-463, TuA3-498, TuA3-535, TuA3-576c) and genotypes (U1, U3, U4, U5, U11, U12 and U14) were unique to southeastern Turkey (Figures 3b, 4a; Table 2). Even though the U8 genotype was exclusively detected in northeastern Lebanon and Syria and the U15 genotype was specifically identified in Armenia, similar genotypes were widely present in southeastern Turkey (U8 with U10 and U15 with U2 and U7) ( Table 2). Considering the largest genetic diversity and typical LMW-GS genes/variants and genotypes, southeastern Turkey might be the center of origin and diversity of T. urartu. This conclusion was confirmed by the analysis of the loci coding storage proteins [29,37] and the assessment of AFLP markers [25].
Lebanon was supposed as a center of specific adaptation for diploid and tetraploid wheats given that some morphological characters were exclusively detected there [38]. However, fewer unique LMW-GS genes/variants and genotypes were detected in T. urartu accessions from northeastern Lebanon than those from southeastern Turkey (Figures 3b, 4a; Table 2). Northwestern Syria was regarded as one of the regions of richest genetic diversity of T. urartu based on the assessment by AFLP markers [25]. Iran is one of the primary centers of diversity for wheat and its relatives; wild wheats, in particular diploid species, are extensively distributed in its various parts [39]. However, low genetic diversity of T. urartu in Syria and Iran was detected due to the lack of accessions collected, and the LMW-GS genes/variants and genotypes identified in these two areas were shared by southeastern Turkey and/or northeastern Lebanon (Figures 3b, 4a; Table 2). Larger collections of T. urartu are needed for further analyses to draw more precise conclusions about the diversity of LMW-GS genes/variants and genotypes in Syria and Iran.

Direct A genome donors of T. aestivum
Common wheat (AABBDD) is believed to be the result of spontaneous crosses between T. dicoccoides (A u A u BB) and Ae. tauschii (DD); T. dicoccoides (A u A u BB) was produced by the hybridization between T. urartu (A u A u ) and the B genome ancestor which was speculated as Ae. speltoides (SS) [2]. Considering its wide adaptability and variation, common wheat is believed to have arisen more than once from crosses of different genotypes of its progenitor species [40,41]. The determination of the specific donors of the A genome of bread wheat would benefit not only the genetic diversity conservation of T. urartu but expand the genetic basis for bread wheat breeding. The dissection of the LMW-GS gene family certainly would provide some evidence about the direct donors of the A genome of common wheat.
T. urartu and common wheat shared two genes, A3-391 and A3-400 (Figure 5a). The allelic variants for each gene showed high identity (>97%), thus it was difficult to match the allelic variants between T. urartu and common wheat. The other two genes, TuA3-385 and TuA3-460 were unique to T. urartu (Figure 5a). The itype genes were present as haplotypes and showed high diversity in common wheat and T. urartu. Except A3-502 shared by T. urartu and common wheat (Figure 5b), the other i-type genes in common wheat were divided into five groups, from i A -1 to i A -5 [17], of which i A -3 (A3-573/A3-640) and i A -4 (A3-649-1/A3-649-2) contained the same number of i-type genes with T. urartu (Figure 5b). The TuA3-538 genes showed close relationship with A3-640 (i A -3) and A3-649-2 (i A -4), and the TuA3-576 genes showed higher identity with A3-573 (i A -3) and A3-649-1 (i A -4) than the other genes ( Figure 5b). Thus, the characterized T. urartu might be the direct donor of the Glu-A3 locus of common wheat varieties possessing i-type genes i A -3 and i A -4. Moreover, the i-type genes i A -3 and i A -4 should be the ancient genotypes because they had the same number of i-type genes with T. urartu and their genes closely matched those in T. urartu with high identity (Figure 5b) [17]. Interestingly, group i A -4 were only detected from landraces in the micro-collection of Chinese wheat germplasm, which also suggested that i A -4 might be an ancient genotype [17]. The i A -1 and i A -2 groups might also be derived from the characterized T. urartu because their i-type genes shared the same branch with TuA3-576. But i A -1 and i A -2 groups only contained one i-type genes, which shared higher identity with TuA3-576 than TuA3-538 ( Figure 5b). Thus, in these genotypes, TuA3-538 was lost due to a deletion and many SNP mutations of TuA3-576 were introduced during polyploidization. The i A -5 was a special group of i-type genes because all three genes were substantially different from the i-type genes in T. urartu (Figure 5b). This group of i-type genes in common wheat might be derived from some other LMW-GS genotypes not detected in the present study, or they might have undergone many deletion and duplication processes during their evolution.

Conclusions
In summary, this work has promoted our understanding of the composition, variation, expression and evolution of LMW-GS genes in T. urartu. Analysis of the geographic distribution of LMW-GS genes/variants and genotypes would facilitate the in situ conservation of the genetic diversity of T. urartu. These new LMW-GS genes/variants would broaden the genetic resources in wheat quality breeding and accelerate their application to improve bread-making quality in common wheat.

T. urartu accessions
The T. urartu accessions were obtained from the State Key Laboratory of Plant Cell and Chromosome Engineering, the Institute of Genetics and Developmental Biology, and the Chinese Academy of Sciences. This collection consisted of 157 accessions including 82 from northeastern Lebanon (Iaat, Kfardane, Talia and Baalbek), 63 from southeastern Turkey (Mardin and Urfa), five from Armenia, five from Syria (Damascus and Haseke), one from Iraq (Arbil) and one from Iran (Bakhtaran) (Additional file 1: Table S1).

DNA isolation and polymerase chain reaction (PCR) amplification
Genomic DNA of 157 T. urartu accessions were extracted from young leaves of 14-day-old seedlings grown in a glasshouse using the cetyltrimethyl ammonium bromide (CTAB) method [42]. PCR was conducted using 20-μl reaction volumes consisting 1.0 U LA Taq DNA polymerase (Takara Bio, Otsu, Japan), 1 × GC buffer I (Mg 2+ , plus), 8 nM of each dNTP, 100 ng of genomic DNA and 6 pmol of each specific primer. The PCR reactions were performed using a Veriti Thermal Cycler (Applied Biosystems, Foster City, CA, USA) according to Zhang et al. [13].

Detection of LMW-GS genes
LMW-GS genes were detected using the LMW-GS gene molecular marker system [13]. PCR products were purified with 3.0 M sodium acetate and 70% ethanol before adding HiDi-formamide and GeneScan 1200 LIZ size standard (Applied Biosystems, Foster City, CA, USA). DNA fragments of LMW-GS genes were separated by capillary electrophoresis using a 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA, USA) with the default genotyping module and the G5 dye set. LMW-GS genes and allelic variants were designated in accordance with the size of their corresponding fragment lengths in the GeneMapper Software v3.7 (e.g., 385 and 397) [13].

Cloning of LMW-GS genes
Fifty accessions were chosen to clone LMW-GS genes whose DNA fragment lengths were detected by the marker system [15]. Genes were cloned using the full length gene method and were further nominated as per the above cloning method [15]. Briefly, those sequences with high identity but a different length of repetitive domains were assigned to a single gene. Conversely, in a single gene, those sequences of conserved SNPs or different fragment lengths were considered allelic variants of the gene. Each gene was represented by the variant detected in the majority of accessions and designated as 'representative variant DNA fragment length + gene'. Similarly, allelic variants were named, 'DNA fragment length + allele', and letters in alphabetical order were added to distinguish these variants with the same fragment lengths but different SNPs according to their frequencies in the T. urartu population. For example, considering their high identity (>99%), TuA3-460, TuA3-463 and TuA3-474 were regarded as allelic variants of the gene TuA3-460; for TuA3-460 was detected in the majority of accessions (21), whereas TuA3-463 and TuA3-474 were only in three and four accessions, respectively (Additional file 2: Figure S8).

Separation and characterization of LMW-GSs
To elucidate the expression pattern of LMW-GS genes in T. urartu, four accessions, which in turn contained one (PI428202), two (PI428255), three (PI428270) and four (PI428335) LMW-GS genes with intact ORFs, were chosen for proteomic analysis. In each accession, glutenins were extracted from three seeds with their embryos removed [44]. Then, the prepared glutenin samples were separated by 2-DE [12], and all the spots on 2-DE gels of PI428202, PI428255 and PI428270, were digested by chymotrypsin (Sigma-Aldrich, MO, USA) and identified by LC-MS/MS [12,45]. The LC-MS/MS spectra were analyzed with Bioworks 3.1 software, using a database including protein sequences of Triticeae available in NCBI (before 2013-7), deduced amino acid sequences from the T. urartu genomic data (http://gigadb.org/dataset/100050) and protein sequences of LMW-GS genes cloned in this work. The unidentified spots were further analyzed using the MALDI-TOF/TOF mass spectrometry (AB SCIEX 5800). MS and MS/MS data were analyzed using MASCOT 2.0 search engine (Matrix Science, London, U.K.) to search against the same database of the former LC-MS/MS, with the peptide mass tolerance and the MS/MS ion tolerance of 0.2 Da and 0.5 Da, respectively. The protein scores greater than 58 were significant (p < 0.05). Considering the identical electrophoretic mobility of the above three accessions, only the spots of LMW-GSs in PI428335 were selected for mass spectra analysis. After verifying its consistency with 2-DE, SDS-PAGE was exploited to separate the LMW-GSs of every T. urartu accession for its high efficiency.