Introduction

Positional cloning of genes in wheat is hampered by its large genome size with abundant repetitive elements and its polyploid nature. Common wheat Triticum aestivum L. has a 16.4-Gb genome (Arumuganathan and Earle 1991) with a large proportion of repetitive elements (approximately 80%, Moore 1995) and three different genomes (2n = 6x = 42, AABBDD genomes). These three genomes originated from the hybridization of diploid Aegilops tauschii Coss. (2n = 2x = 14, DD genome) and tetraploid Triticum turgidum L. (2n = 4x = 28, AABB genomes). The AA genome from tetraploid wheat was contributed by Triticum urartu Tumanian ex Gandilyan (2n = 2x = 14, AA genome), which is closely related to cultivated diploid wheat Triticum monococcum L. (2n = 2x = 14, AmAm genome, Dvorak et al. 1993; Johnson and Dhaliwal 1976). The BB genome was contributed by a species from the Sitopsis group, likely related to Aegilops speltoides (Tausch) Löve (Dvorak and Zhang 1990; Huang et al. 2002; Liu et al. 2003).

The use of wheat diploid ancestors (or other diploid Triticeae species) provides a viable alternative to overcome the complications imposed by polyploidy to positional cloning projects. The development of molecular markers and the construction of high-density genetic maps are considerably simpler in diploid genomes. Examples of the use of this strategy are provided by the cloning of the powdery mildew resistance locus Pm3 of hexaploid wheat using both tetraploid and diploid wheat (Yahiaoui et al. 2004), and by the use of T. monococcum to clone the vernalization genes Vrn1 and Vrn2 (Yan et al. 2003, 2004) and the leaf rust resistance locus Lr10 (Stein et al. 2000).

The large size of the Triticeae genomes has delayed the sequencing of these species and limited the number of successful positional cloning projects in this economically important group of plants. Fortunately, there is good colinearity among species of the Poaceae family (Devos 2005; Devos and Gale 2000; Feuillet and Keller 2002; Paterson et al. 2000; Van Deynze et al. 1995). The available genomic sequences of rice (Goff et al. 2002; Yu et al. 2002) and Brachypodium (http://www.brachypodium.org) provide parallel road maps, which are useful to generate markers in the Triticeae-targeted regions, and in some cases, to identify potential candidate genes (Yan et al. 2003, 2006). This comparative genomics approach has been used in most of the positional cloning projects in wheat (Bossolini et al. 2006; Feuillet et al. 2003; Fu et al. 2009; Uauy et al. 2006; Yan et al. 2003, 2004, 2006).

Whereas the initial low-resolution restriction fragment length polymorphism (RFLP) maps were sufficient to reveal the existence of large colinear chromosome blocks between rice and wheat, they were unable to detect small rearrangements affecting this colinear frame (Devos 2005; Dubcovsky et al. 1996; Guyot and Keller 2004; Linkiewicz et al. 2004; Peng et al. 2004; Sorrells et al. 2003). The development of high-density maps and the first comparative sequencing of targeted orthologous regions revealed numerous exceptions to the broad RFLP colinearity previously observed along large blocks of rice and wheat chromosomes. These alterations were generated by gene duplications, deletions, and inversions as well as gene insertions associated with transposition of linked retroelements (Bennetzen and Ma 2003; Bennetzen and Ramakrishna 2002; Brunner et al. 2003; Dubcovsky et al. 2001; Guyot et al. 2004).

The whole rice genomic sequence provided the framework for additional comparative studies with orthologous regions in the Triticeae genomes. These studies showed that the intergenic regions in the large Triticeae genomes are mainly composed of repetitive elements that evolve rapidly and are not conserved with rice (Bennetzen and Ma 2003; Dubcovsky et al. 2001; SanMiguel et al. 2002). A low conservation of the intergenic regions was also reported among different Triticeae genomes (Dubcovsky and Dvorak 2007; Gu et al. 2004; Kong et al. 2004; Wicker et al. 2003). In maize, some intergenic regions were shown to be highly variable even between inbred lines of the same species as a result of independent retrotransposon invasions in different maize progenitor plants (Fu et al. 2002; Song and Messing 2003).

An additional source of exceptions to the broad gene colinearity observed between rice and the Triticeae genomes is the existence of an ancestral polyploidization event that predated the divergence of the rice and wheat lineages (Paterson et al. 2004). Deletions in the duplicated regions occurred independently after the divergence of the two lineages, generating numerous alterations to the initial colinear regions. A well-studied example of this phenomenon is the chromosome region including the thermo-sensitive earliness per se gene Eps-A m 1 located on the distal region of T. monococcum chromosome 1AmL. Searches of the most similar rice genes using wheat genes from this region returned orthologues from both rice chromosomes 1 and 5, which originally belong to a single region duplicated in the ancestral polyploidization event (Valárik et al. 2006).

The current interest in the cloning of the Eps-A m 1 gene has recently increased with the discovery that, in addition to its effect on heading time, this gene affects the number of spikelets and grains per spike in diploid wheat (Lewis et al. 2008). The effect on heading time was explained by differences in the timing of the transition between the vegetative and reproductive stages and the duration of spike development (no significant effects were found in the stem elongation period; Lewis et al. 2008). The Eps-A m 1 gene was initially mapped distal to marker wg241 on the telomeric region of diploid wheat chromosome 1AmL (Bullrich et al. 2002) and later mapped more precisely to a 0.8-cM interval between genes VatpC and Adk1 (Valárik et al. 2006). Colinearity between the ends of both rice chromosome 5 and wheat chromosome 1AmL was interrupted by a small inversion, several non-colinear genes, and a non-colinear region between markers VatpC and Adk1 (Valárik et al. 2006). Even after using all the available genes from the rice colinear region, the interval between the markers flanking the Eps-A m 1 locus was not small enough to start a chromosome walk to ultimately clone the gene.

In this study, the physical map of A. tauschii (http://wheat.pw.usda.gov/PhysicalMapping/) and the diploid Brachypodium distachyon (L.) Beauv sequence were utilized to generate additional markers in the Eps-A m 1 region and identify potential candidate genes. Brachypodium has recently emerged as an alternative model species for Triticeae because of its closer phylogenetic distance to these species than rice (Draper et al. 2001). The estimated 355-Mb genome of Brachypodium places it in an intermediate position between Arabidopsis and rice in terms of genome size (Bennett et al. 2000). Additional genomic resources for this species are already available, such as 20,449 expressed sequence tags (ESTs) from five B. distachyon cDNA libraries (Vogel et al. 2006).

Materials and methods

Mapping populations

Mapping populations used in the construction of the T. monococcum high-density genetic map are all derived from the cross between cultivated T. monococcum ssp. monococcum accession DV92 (spring growth habit) carrying the Eps-A m 1 allele for late heading, and wild T. monococcum ssp. aegilopoides accession G3116 (winter growth habit) carrying the Eps-A m 1 allele for early heading. These mapping populations included 74 F2 lines, 343 F3:2 additional families, 96 F5 single-seed descent (SSD) lines, and 3,298 BC5F2 near isogenic lines (NILs), all of them described before (Lewis et al. 2008; Valárik et al. 2006). Plants were screened with polymerase chain reaction (PCR) markers flanking the Eps-A m 1 region (Table 1), and only those with recombination events within the targeted region were retained. Progeny tests for heading time were performed for plants carrying critical recombination events within the Eps-A m 1 region (Lewis et al. 2008; Valárik et al. 2006).

Table 1 Polymerase chain reaction markers developed in the Eps-A m 1 region of Triticum monococcum

BAC selection, sequencing, and annotation

BAC libraries from A. tauschii accession AS75 (181,248 clones, 4.1-fold coverage, Akhunov et al. 2005), B. distachyon accession Bd21 (36,864 clones, 11.6-fold coverage, Huo et al. 2006), and T. monococcum ssp. monococcum accession DV92 (276,000 clones, 5.6-fold coverage, Lijavetzky et al. 1999) were screened by hybridization and/or PCR with probes derived from markers in the Eps-A m 1 region (Table 1). Positive BAC clones were HindIII-fingerprinted, and contigs were manually assembled. BAC-end sequencing was performed using primers from the T7 and SP6 ends of the BAC vector to confirm contig assembly. Selected BAC clones from A. tauschii and B. distachyon were sequenced and annotated using a combination of tools, including comparative genomics analyses, BLAST searches, and gene-finding programs (Dubcovsky et al. 2001). Sequences were assembled using the Phred/Phrap/Consed software (Ewing and Green 1998; Gordon et al. 1998). Gaps in the BAC sequences were filled by primer walking.

Sequence analysis

Sequences of the A. tauschii BAC clones RI101M3 and HI6P23, rice BAC clone AC130728, and B. distachyon BAC clone DH027O19 were compared using the on-line available Artemis Comparison Tool (Abbott et al. 2005). Additionally, exon and intron, and protein sequences of wheat, rice, and Brachypodium orthologous genes wg241, CA608558, and VatpC were compared using either BLASTN or BLASTP, accordingly.

Genomic DNA extraction, PCR, hybridization, and Southern Blot procedures were performed as described before (Dubcovsky et al. 1994).

Semi-quantitative and real-time quantitative PCR

RNA samples were obtained from four different BC5F2 NILs derived from the DV92 × G3116 mapping population containing the closest recombination events flanking the Eps-A m 1 locus (two at each side). NILs 268-5 and 268-3 have recombination events distal to Eps-A m 1, and they carry Eps-A m 1 alleles for early (G3116) and late (DV92) heading, respectively, whereas NILs 529-1 and 529-3 have recombination events proximal to Eps-A m 1, and they both contain Eps-A m 1 alleles for late heading (Lewis et al. 2008). Plants were grown in a growth chamber at a constant temperature (16°C) and a long-day photoperiod (16 h light, 602 µE m−2 s−1). Shoot apical regions were dissected using magnifying glasses, and multiple apices from the same genotype and at the same developmental stage were pooled for RNA extraction. Apices were collected at the double-ridge stage and at the terminal spikelet stage. Total RNA was extracted using the RNeasy Plant Mini Kit (Qiagen, USA), and cDNA was then synthesized using the QuantiTect® Reverse Trancription Kit (Qiagen, USA).

Semi-quantitative and real-time quantitative PCR using SYBR® Green JumpStart™ Taq ReadyMix™ (Sigma-Aldrich®, USA) were performed using primers 5′-TTTTTGCTCAACATAAGGCTTTC-3′ and 5′-TCTGTCCATTGCCTGGAGAT-3′ for T. monococcum gene Mot1, and primers 5′-CCCGCTGCTTAAGACATTGCAGA-3′ and 5′-CCTCTTCATGCAATCCAAGGCCTTTAC-3′ for T. monococcum gene FtsH4. Transcript levels of the eukaryotic initiation factor eIf4a and Actin were used as endogenous controls in the semi-quantitative and real-time quantitative PCR, respectively. The eIf4a endogenous control for the semi-quantitative PCR was amplified using primers forward 5′-TGCTGTTCGACATCCAGAAG-3′ and reverse 5′-CCCAGACCTTACCACTCCAA-3′, and Actin was amplified using primers forward 5′-ACCTTCAGTTGCCCAGCAAT-3′ and reverse 5′-CAGAGTCGAGCACAATACCAGTTG-3′ (Uauy et al. 2006). The \( {{\text{2}}^{ - \Delta \Delta C}}_T \) method (Livak and Schmittgen 2001) corrected for primer amplification efficiency was used to normalize and calibrate the Mot1 and FtsH4 C T values relative to the Actin endogenous controls.

The real-time quantitative PCR data was analyzed as a 2 × 2 factorial analysis of variance (ANOVA) with genotype (Eps-A m 1-early (G3116) and Eps-A m 1-late (DV92) alleles, four different recombinant lines) and developmental stage (double-ridge and terminal spikelet) as factors. Replications including all four treatments were performed in two different growth chambers under the same conditions described above, which were included in the analysis as blocks (Randomized Complete Block Design).

Results

A. tauschii genes and physical map

A chromosome walk was initiated in A. tauschii to take advantage of the available physical map (http://wheatdb.ucdavis.edu:8080/wheatdb/index.jsp). The A. tauschii BAC library was screened with probes derived from both proximal and distal markers flanking the Eps-A m 1 locus, and the new genes identified in each region are described below.

Eps-A m 1 proximal side

The screening of the A. tauschii BAC library with a probe derived from the proximal single-copy gene wg241 (54% identical to rice hypothetical protein EEC79795.1) yielded four positive BAC clones (RI1C16, HI92F1, BB19D17, and RI101M3). These clones are part of the A. tauschii contig 2863 (D genome assembly 1.1), which includes 133 BAC clones, and covers approximately 2.3 Mb. All four BAC clones selected with wg241 contained gene CA608558, but only BB019D17 and RI101M3 included also the VatpC gene. The BAC clone RI101M3 (approximately 125 kb) was sequenced, annotated, and deposited in GenBank (EU358770). In addition to the three genes previously detected in rice and mapped in T. monococcum (wg241, CA608558, and VatpC, Valárik et al. 2006), six new genes were identified in this BAC clone (Fig. 1d).

Fig. 1
figure 1

a Schematic representation of the rice genomic sequence. b Brachypodium distachyon physical map, BAC clone DH027O19 covering the complete Eps-A m 1 region. c Triticum monococcum high-density genetic map. d Aegilops tauschii physical map. e T. monococcum physical map. Genes are indicated as colored circles, other markers as lines, sequenced BAC clones as continuous red bars, and not sequenced ones as dashed lines. The telomere is located to the right of the genetic and physical maps

Starting from the proximal end of the BAC clone RI101M3, the first two genes are the previously mapped wg241 and CA608558 (Valárik et al. 2006), followed by CD892187_1 and CD892187_2, which are not present in the rice colinear region. The predicted proteins encoded by these two genes (ACT34065.1 and ACT34066.1, respectively) are 61% identical (70% similar) to each other. The temporary name of this locus (CD892187) is based on the 98% identity of the predicted coding region of gene CD892187_1 to T. aestivum EST CD892187 (85% identity with gene CD892187_2). The two predicted proteins are 41% (ACT34065.1) and 44% (ACT34066.1) identical to rice hypothetical protein EEE62901. The function of this protein is currently unknown.

The duplicated CD892187 genes are followed by a gene designated Poz, which encodes a protein 58% identical to rice protein BAD16239.1 (located in a non-colinear rice chromosome region, Fig. 1a). This protein is a member of the speckle-type POZ protein family and contains an N-terminal MATH domain (cd00121) and a C-terminal POZ domain (pfam00651). POZ domains from several zinc finger proteins have been shown to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes.

Distal to the Poz gene, an additional non-colinear gene was detected based on its 86% identity to barley EST BG301142.1. This gene, designated Ap2-like (Ap2-L), encodes a protein that includes a single AP2 DNA-binding domain (cd00018), a domain found in many transcriptional regulators in plants (e.g., APETALA2 and EREBP). APETALA2-like proteins that play a role in plant development usually contain two copies of this domain instead of the single one observed in this protein.

The most distal genes present in the BAC clone RI101M3 (distal to the VatpC gene) are also absent in the rice colinear region. These genes, designated Cf2.1 and Cf2.2, encode proteins that are 55% and 54% identical to sorghum Cf2/Cf5-like disease resistance protein ACE86403.1 from the LRR_RI (leucine-rich repeat, ribonuclease inhibitor)-like subfamily (cl02423), respectively.

Eps-A m 1 distal side

The hybridization of the A. tauschii BAC library with a probe developed from the single-copy gene Adk1 (91% identical to rice putative protein kinase ADK1, AAT44307.1) yielded BAC clones RI3H10, HD66L22, and HI6P23. The clone RI3H10 is part of the A. tauschii contig 9849 (D genome assembly 1.1), which includes nine BAC clones and covers a region of approximately 245 kb. The other two BAC clones were assigned to the same contig by HindIII-fingerprinting. Among these three BAC clones, HI6P23 (approximately 145 kb) was the one that extended further into the proximal region, and it was selected for sequencing.

The sequence of the BAC clone HI6P23, which was annotated and deposited in GenBank (EU358773), revealed two genes that were not colinear with rice in addition to the colinear Adk1 gene (Fig. 1a, d). The first one (from the proximal to the distal end of the BAC clone) was designated Mot1 based on a phylogenetic analysis of the conserved domains of the predicted protein (Fig. S1a, supplemental online material). Mot1 encodes a protein 84% identical to rice SNF2 domain-containing protein BAD25208.1, and 36% identical to Saccharomyces cerevisiae protein MOT1. This last protein is a TATA-binding protein-associated factor involved in transcriptional regulation. The second gene, designated FtsH4 according to a phylogenetic tree constructed from the conserved domains of the predicted protein (Fig. S1b), encodes a protein that is 80% and 81% identical to rice proteins NP_001043384.1 (FtsH4) and NP_001043385.1 (FtsH5). These two proteins are encoded by genes recently duplicated in rice, which are both homologous to Arabidopsis gene FstH4 (Fig. S1b).

In an attempt to extend the A. tauschii physical map from the distal side, the probe P23_4 was designed from a non-repetitive sequence located 29 kb from the proximal end of the BAC clone HI6P23 (Table 1, Fig. 1d). The re-screening of the BAC library with this probe failed to identify any new A. tauschii BAC clone, which interrupted the chromosome walk from this direction in this library.

B. distachyon genes and physical map

As an alternative strategy to bridge the gap identified in the A. tauschii BAC library, a second chromosome walk was initiated in B. distachyon. The B. distachyon BAC library was first hybridized with a probe developed from the single-copy gene Mot1. A total of 17 positive BAC clones was detected, and the ends of six of them (DH022N11, DH025B20, DH026A02, DH027O19, DH035A19, and DH042C06) were sequenced. The T7-end of the clone DH027O19 was 89% identical to gene cdo393, and the SP6-end was 87% identical to gene FtsH4, at the DNA level, which suggested that this BAC clone was spanning the complete Eps1 orthologous region (Fig. 1b).

The BAC clone DH027O19 (approximately 118 kb) was sequenced and deposited in GenBank (EU358772). The annotation of this clone revealed ten genes and one pseudogene (NADH). The predicted protein from the pseudogene NADH shows 98% identity with B. distachyon NADH dehydrogenase 27 kDa subunit (ACF08644.1), but it includes a premature stop codon and a truncation of the last approximately 23 amino acids. The ten genes present in this BAC clone include nine previously found in either rice or A. tauschii (cdo393, Pp2C, Unp30, VatpC, CD892187, CA608558, wg241, Mot1, and FtsH4), and a non-colinear gene designated Rbp1 (Fig. 1b). The protein sequence encoded by gene Rbp1 shows 74% identity with a rice putative RNA-binding protein (BAC79567) from the Mak16 multi-copy gene superfamily (pfam04874).

T. monococcum high-density genetic and physical maps

The new genes identified in the A. tauschii and B. distachyon sequences were used to generate molecular markers (Table 1), which were incorporated into the high-density genetic map of the Eps-A m 1 region in T. monococcum (Lewis et al. 2008; Valárik et al. 2006).

On the distal side, genes Mot1 and FtsH4 were mapped 0.03 cM proximal to Adk1 and completely linked to each other and to the Eps-A m 1 locus (Table 1, Fig. 1c). This result indicates that the A. tauschii BAC clone HI6P23 covers the distal side of the physical map of the Eps-A m 1 region. Since Mot1 and FtsH4 are completely linked to the earliness per se phenotype, they are both valid candidate genes for Eps-A m 1.

On the proximal side, gene Cf2.1 was mapped 0.05 cM distal to VatpC, but still 0.75 cM proximal to the Eps-A m 1 locus (Table 1, Fig. 1c). This distance was too long to continue the chromosome walk from Cf2.1, and therefore, additional efforts to develop a marker closer to Eps-A m 1 were focused on the distal contig, specifically on the A. tauschii BAC clone HI6P23. Marker P23_4 (Table 1, Fig 1d), developed from the proximal end of the clone HI6P23, was incorporated into the high-density map, but unfortunately, it was still completely linked to genes Mot1 and FtsH4 and to the Eps-A m 1 phenotype (Fig. 1c).

Since it was not possible to continue the chromosome walk from the A. tauschii distal BAC clone, an additional marker was developed from B. distachyon gene Rbp1 (Table 1). This gene was located in the B. distachyon BAC clone DH027O19 between Mot1 and the genes mapped on the other side of the proximal gap of the Eps-A m 1 region (Fig. 1b) and, therefore, was an interesting candidate to generate a marker within the proximal 0.75-cM gap. Unfortunately, marker Rbp1 was mapped in a non-colinear region on T. monococcum chromosome 7Am, 5 cM from the RFLP marker wg420 (Dubcovsky et al. 1996).

The use of both A. tauschii and B. distachyon was a useful strategy to develop markers completely linked to the Eps-A m 1 locus and complete the distal side of the T. monococcum physical map, but it was insufficient to identify a marker tightly linked to the Eps-A m 1 locus on the proximal side (Fig. 1). Therefore, a chromosome walk was initiated in T. monococcum from the closest markers to Eps-A m 1 on the distal side. The screening of the BAC library with probes Mot1 and P23_4 yielded positive BAC clones 707L18, 273G2, 653H20, 67B4, and 43J16. Using BAC-end sequencing, HindIII-fingerprinting, hybridization (for gene Mot1), and PCR markers (for both genes FtsH4 and Adk1), these five BAC clones were organized into a single contig spanning approximately 400 kb (Fig. 1e).

An additional marker was then developed from the proximal SP6-end of the BAC clone 43J16. One of the primers for this marker included the junction between retrotransposons Angela (LTR, Gypsy) and Caspar (TIR, CACTA; Table 1, Fig. 1e), and was designated A-C. This marker failed to amplify a PCR product from the other four BAC clones from the T. monococcum contig, confirming its proximal position. Unfortunately, this marker was also completely linked to the Eps-A m 1 locus in the T. monococcum high-density genetic map (Fig. 1c), indicating that additional chromosome walk steps are still necessary to complete the proximal side of the Eps-A m1 physical map in T. monococcum.

Comparison of the wheat, rice, and Brachypodium Eps-A m 1 colinear regions

The regions between genes cdo393 and Adk1 in rice (approximately 58 kb) and cdo393 and FtsH4 in B. distachyon (approximately 117 kb) resulted to be much smaller than the orthologous regions in either T. monococcum or A. tauschii (Fig. 1). Although the physical length of this region is unknown in the Triticeae species, it can be roughly estimated. The sequenced A. tauschii BAC clones RI101M3 and HI6P23 (approximately 265 kb) plus the estimated length of the non-overlapping region of the T. monococcum BAC clone 43J16 (based on the HindIII-fingerprint) covered a region of approximately 350 kb, which corresponds to a genetic distance of 0.14 cM (Fig. 1c–e). Assuming a constant ratio between genetic and physical lengths across this region, the 2.2-cM region between genes cdo393 and Adk1 in T. monococcum can be estimated to be approximately 5,600 kb.

The difference in the size of the Eps1 orthologous regions from wheat, rice, and Brachypodium was determined mainly by the expansion of some intergenic regions in the large Triticeae genomes due to the insertion of repetitive elements. In spite of this big size difference, the Eps1 region revealed a relatively good level of conservation of gene content and order across species, altered by an inversion of the region flanked by genes wg241 and VatpC in the wheat lineage, and by several gene insertions/deletions (indels) and duplications (Figs. 1 and 2).

Fig. 2
figure 2

Comparison of the sequences of Aegilops tauschii, rice, and Brachypodium distachyon Eps-A m 1 colinear regions using the on-line Artemis Comparison Tool. Genes are indicated as colored arrows. a, d A. tauschii BAC clones RI101M3 and HI6P23. b Rice BAC clone AC130728. c B. distachyon BAC clone DH027O19

Both A. tauschii and B. distachyon shared three genes that were absent in the orthologous region on rice chromosome 5 (CD892187, Mot1, and FtsH4, Figs. 1 and 2). The most similar rice gene to T. monococcum gene FtsH4 was found to be present on chromosome 1 between the paralogous copies of genes Pp2C and Adk1, a region previously identified as the result of the ancestral polyploidization event that occurred before the divergence of the Poaceae genomes (Valárik et al. 2006). This suggested that the orthologous copy of T. monococcum gene FtsH4 on chromosome 5 was deleted in the rice lineage, but that the on wheat chromosome 1 and Brachypodium were not. The other two genes (CD892187 and Mot1) were absent in the rice orthologous or paralogous regions and may represent insertions in the wheat/Brachypodium lineage.

B. distachyon contained one unique gene (Rbp1) and a unique pseudogene (NADH), as well as a missing gene (If2), which was present in the other two species (Figs. 1 and 2). The Triticeae genomes showed four genes absent in both rice and B. distachyon (Poz, Ap2-L, Cf2.1, and Cf2.2), and a duplication of a gene present also in B. distachyon (CD892187). On the contrary, all the rice genes were present in either B. distachyon or the Triticeae genomes (Figs. 1 and 2). Genes Pp2C and Unp30 were identified in both rice and B. distachyon, but its presence in A. tauschii is currently unknown since they might be outside of the sequenced region (Figs. 1 and 2). Gene Unp30, however, was mapped in the T. monococcum high-density genetic map, 0.03 cM between genes VatpC and Cf2.1 (Fig. 1c).

Sequence divergence among wheat, rice, and Brachypodium orthologous genes

Sequence identity across the wheat, rice, and Brachypodium genomes was restricted to gene regions (Fig. 2). Analyses of three orthologous genes present in the three species (wg241, CA608558, and VatpC) revealed identical number of exons and conserved exon lengths across genomes. Intron size, however, was more variable across species. Thirty-four out of the 35 exon–intron boundaries found for the three orthologous genes in the three species contained the canonical GT and AG motifs, but the 15th intron of gene wg241 contained the alternative GC splicing site at the 5′ end in both wheat and rice, but not in Brachypodium.

Analyses of predicted proteins for these three orthologous genes revealed higher amino acid identity between wheat and Brachypodium than between these two species and rice (Fig. 3a–c). The combined exon regions for the three genes were more similar between wheat and Brachypodium (83% identity) than between wheat and rice (75% identity) or Brachypodium and rice (80% identity) along the 6.5 to 7.0-kb high-scoring segment pairs (HSP) aligned by BLASTN (Fig. 3d). These differences were even larger for the combined intron regions. Wheat and Brachypodium introns showed a significant BLASTN alignment along a 1,909-bp HSP (68% identical), which was much smaller in the comparisons between wheat and rice (280-bp HSP, 67% identical) or Brachypodium and rice (140-bp HSP, 74% identical; Fig. 3e).

Fig. 3
figure 3

Comparisons among the wheat, rice, and Brachypodium orthologous proteins and genes at the Eps-A m 1 region. a–c Protein identity (%). d Combined exon identity (%) and length of the high-scoring segment pair (bp). e Combined intron identity (%) and length of the high-scoring segment pair (bp)

Characterization of Eps-A m 1 candidate genes

Both candidate genes Mot1 and FtsH4 were sequenced from the two parental lines of the T. monococcum mapping population to test if the DV92 and G3116 Eps-A m 1 alleles were associated with changes in the coding regions of these genes.

Alignment of the predicted MOT1 proteins showed two amino acid polymorphisms between the two parental lines: isoleucine (I) vs threonine (T) at position 362 (BLOSUM62 score −2), and aspartic acid (D) vs glutamic acid (E) at position 460 (BLOSUM62 score 2). Alignment of wheat, rice, and Arabidopsis MOT1 proteins indicated that the amino acid changes occurred at non-conserved positions in the protein (Fig. S2).

Alignment of the predicted FtsH4 proteins from DV92 and G3116 showed no amino acid polymorphism.

To test if the phenotypic differences between the DV92 and G3116 Eps-A m 1 alleles were associated with changes in the expression of the candidate genes, transcript levels of both candidates Mot1 and FtsH4 were compared between NILs carrying either DV92 or G3116 alleles by semi-quantitative (Fig. 4a) and real-time quantitative PCR (Fig. 4b). Transcripts of both genes Mot1 and FtsH4 were observed in apices, crowns, leaves, and spikes, but were not detected in roots (Fig. 4a). A comparison of the Mot1 and FtsH4 transcript levels in shoot apical regions by semi-quantitative PCR showed no apparent differences between NILs carrying the either DV92 or G3116 Eps-A m 1 alleles (Fig. 4a). Similar results were obtained at the initiation of the transition between the vegetative and reproductive stages (double-ridge stage) and at the terminal spikelet stage (Fig. 4a). These results were confirmed by real-time quantitative PCR (Fig. 4b). The 2 × 2 ANOVAs showed non-significant interactions between genotypes (Eps-A m 1 early and late alleles) and developmental stages (double-ridge and terminal spikelet; P = 0.86 for Mot1 and P = 0.94 for FtsH4) and, therefore, the effects of the Eps-A m 1 alleles were analyzed across developmental stages. Non-significant differences in the Mot1 and FtsH4 transcript levels (P = 0.64 and P = 0.74, respectively) were found between NILs carrying the DV92 or G3116 Eps-A m 1 alleles.

Fig. 4
figure 4

a Semi-quantitative polymerase chain reaction (PCR) using primers designed from genes (1, 2) Mot1, (3, 4) FtsH4, and (5, 6) eIf4a. cDNA samples obtained from (1, 3, 5) different Triticum monococcum plant organs; and (2, 4, 6) shoot apical meristem of BC5F2 near isogenic line (NIL) 268-5 carrying the Eps-A m 1 allele for early heading (G3116, bold), and NILs 268-3, 529-1, and 529-3 carrying the Eps-A m 1 allele for late heading (DV92). The first four samples were extracted from apices at the double-ridge stage, and the last four at the terminal spikelet stage. b Real-time quantitative PCR of (1) Mot1 and (2) FtsH4, comparing transcript levels between NILs carrying the Eps-A m 1 allele for early (G3116) and late (DV92) heading. Values in the y-axes are normalized and calibrated values using the formula \( {\left( {{\text{1}} + {\text{Efficency}}} \right)^{ - \Delta \Delta {C_{\text{T}}}}} \). Different calibrators were used for Mot1 and FtsH4, and therefore the Y scales for the two genes are not comparable. Error bars represent one standard error of the mean

Discussion

B. distachyon as a closer model species than rice for the large Triticeae genomes

Since the completion of the rice genomic sequence (Goff et al. 2002; Yu et al. 2002) several studies demonstrated a high level of colinearity between rice and the Triticeae genomes, which turned rice into the most commonly used model species for positional cloning projects in wheat (Bossolini et al. 2006, 2007; Dubcovsky et al. 2001; Faris et al. 2008; Yan et al. 2003, 2004, 2006). Even in those projects where the targeted gene was not present in the rice colinear region, the rice genome was still useful as a stepping stone to develop molecular markers in the wheat region (Fu et al. 2009; Uauy et al. 2006; Yan et al. 2004). However, the overall colinear gene framework between rice and the Triticeae genomes has been shown to be frequently interrupted by inversions, deletions, and insertions of genes, which sometimes complicates the use of rice as a model for the larger and more complex Triticeae genomes (Griffiths et al. 2006).

In 2001, Draper et al. questioned the value of rice as a model species for the temperate cereals and forage grasses given its relatively distant phylogenetic relation, its subtropical nature, and its specialized growth habit. He proposed B. distachyon as a better model than rice for both wheat and barley. Brachypodium is a temperate grass, and its small genome contains only approximately 15% of highly repetitive elements (Catalán et al. 1995).

In a phylogenetic tree of the Poaceae family based on combined data from chloroplast restriction sites and morphology, Kellogg (2001) showed that Brachypodium is more closely related to oat, Bromus, and wheat (subfamily Pooideae) than rice (subfamily Ehrhartoideae). This observation was also supported by sequence comparisons of ESTs (Vogel et al. 2006) and genes from both the leaf rust resistance locus Lr34 (Bossolini et al. 2007) and the domestication locus Q (Faris et al. 2008) regions. Bossolini et al. (2007) estimated that the divergence of Brachypodium from the Triticeae lineage occurred approximately 35 to 40 million years ago (Mya), significantly earlier than the divergence of these two groups of species from the rice lineage, which took place approximately 50 Mya (Paterson et al. 2004).

The present comparison of exon and intron sequences of the orthologous genes wg241, CA608558, and VatpC from the wheat, rice, and Brachypodium Eps1 regions supported the previous conclusions. In all three cases, the predicted proteins were more similar between wheat and Brachypodium than between these two species and rice (Fig. 3). In addition, the comparison of the intron regions showed longer significantly similar segments between wheat and Brachypodium than between these two species and rice. This better conservation of the intron regions is important because it improves the ability of probes developed from Brachypodium genes to hybridize to wheat genomic DNAs or BAC clones.

In spite of the closer phylogenetic distance between Brachypodium and the Triticeae genomes relative to rice, the larger Triticeae genomes seem to have a higher proportion of colinearity exceptions when compared with the other two species. One inversion, one gene duplication, two gene insertions, and one gene insertion and duplication that were unique to the Triticeae species were found in this study (in addition to the non-colinear genes that might be present in the 0.75-cM proximal gap of the T. monococcum physical map), whereas only one gene insertion (Rbp1) was unique to B. distachyon, and none of the rice genes was simultaneously absent in both Triticeae species and Brachypodium. Similarly, the comparative study of the Q locus region among wheat, rice, and Brachypodium showed multiple duplications of the T. monococcum gene FB (from only one copy present in Brachypodium sylvaticum and none in rice to four copies present in T. monococcum), and a small inversion including one of the FB copies (Faris et al. 2008). The comparison of the Lr34 locus orthologous regions among the same species revealed a large inversion in rice. However, excluding this main structural change, the gene content was more similar between rice and Brachypodium than between these two species and wheat. Thirty-nine out of the 43 genes present in Brachypodium were present in rice (91%), whereas only ten out of the 19 genes found in wheat (53%) showed orthologues in either rice or Brachypodium. Interestingly, four of the wheat non-colinear genes showed evidence of movements associated with linked retroelements (Bossolini et al. 2007).

These last results suggest that the higher numbers of exceptions to gene colinearity observed in the Triticeae genomes might be associated with their larger genome size and their higher proportion of repetitive elements, many of which are actively transcribed (Echenique et al. 2002). Dubcovsky et al. (1996) pointed out that the RFLP maps of species with large genomes exhibited a higher proportion of duplicated loci (28% in T. monococcum, 30% in barley, Kleinhofs et al. 1993) than the RFLP maps of species with smaller genomes, such as rice (6%, Saito et al. 1991) or common bean (9%, Nodari et al. 1993). Dubcovsky and Dvorak (2007) indicated that the high rate of turnover of repetitive elements in the intergenic regions of wheat was associated with faster changes in the gene regions. A recent study comparing a high-density SNP map of A. tauschii with the rice and sorghum genomes confirmed that 80% of the structural changes observed among these species occurred in the large genome of A. tauschii, whereas only 20% occurred in the smaller rice and sorghum genomes (Luo et al. 2009).

The higher rate of change associated with the large Triticeae genomes provides an explanation for the limitations of the Brachypodium genome as a better model than rice for the complex Triticeae genomes. Even though some changes are shared by Brachypodium and wheat relative to rice (e.g., the presence of genes CD892187, Mot1, and FtsH4 in this comparative study), most of the changes are unique to the Triticeae species. A corollary of this higher rate of structural changes in the large genomes (such us barley or A. tauschii) is that the sequencing of some of these large genomes is still necessary to better support positional cloning projects in the Triticeae species.

Eps-A m 1 candidate genes

The use of both A. tauschii and B. distachyon as bridge species for the positional cloning of the Eps-A m 1 gene in T. monococcum yielded two valuable results. First, the markers identified proximal to Adk1 and completely linked to the Eps-A m 1 locus were sufficient to reveal T. monococcum BAC clones covering this region and complete the distal side of the Eps-A m 1 physical map. Second, and even more importantly, the mapping of genes Mot1 and FtsH4 completely linked to both the earliness per se and spikelet number phenotypes indicated that these two genes are valid candidates for Eps-A m 1.

Mot1

The Mot1 gene has SNF2_N (pfam00176) and HELICc (cd00079) domains characteristic of the SNF2 family of transcriptional regulators. A phylogenetic analysis of these conserved domains from different protein members of this family in Arabidopsis, rice, and S. cerevisiae indicated that wheat and Brachypodium MOT1 proteins are more closely related to rice protein OsXP464189 and S. cerevisiae protein MOT1 than to other members of this family. This cluster was supported by 100% bootstrap values (Fig. S1a).

In yeast, MOT1 has been found to both activate and repress transcription by ATPase-dependent mechanisms (Dasgupta et al. 2002; Muldrow et al. 1999; Sprouse et al. 2006; Sudarsanam et al. 2000). Another member of the SNF2 family, the PHOTOPERIOD-INDEPENDENT EARLY FLOWERING gene 1 (PIE1; Fig. S1a), was shown to be required for the activation of both FLOWERING LOCUS C (FLC) and FRIGIDA (FRI) from the vernalization pathway, and also by the autonomous pathway that regulate flowering time in Arabidopsis (Noh and Amasino 2003). PIE1 was found to be expressed preferentially in the shoot apical meristem and to increase FLC expression repressing the transition of the shoot apex from the vegetative to reproductive stages and delaying flowering time (Noh and Amasino 2003). Gene BRM, another Arabidopsis member of the same SNF2 family (Fig. S1a), was also shown to be expressed in meristems and to be required for normal vegetative and reproductive development (Farrona et al. 2004). BRM-silenced plants exhibited a smaller overall size with reduced stems and leaves, defects in floral organ size, number, and identity, and they flowered significantly earlier (Farrona et al. 2004). It was demonstrated that BRM controlled the transition of the shoot apical meristem from the vegetative to reproductive stages by affecting CONSTANS (CO), a transcription factor from the photoperiod pathway that promotes flowering (Farrona et al. 2004).

The involvement of both genes PIE1 and BRM in the regulation of flowering time and in the normal vegetative and reproductive development in Arabidopsis suggests that gene Mot1 is a potentially interesting candidate for Eps-A m 1. Gene Mot1 was found to be expressed both in the vegetative shoot apical meristem and in the developing spike in T. monococcum, which are the predicted tissues and developmental stages where Eps-A m 1 is expected to act. However, non-significant differences were found by real-time quantitative PCR in the transcript levels of this gene between BC5F2 NILs carrying either the Eps-A m 1 allele for late heading from DV92 or the Eps-A m 1 allele for early heading from G3116 (Fig. 4).

Sequence comparison of the predicted MOT1 proteins from the DV92 and G3116 alleles revealed only two amino acid polymorphisms, both of them upstream the SNF2_N and HELICc conserved domains at non-conserved amino acid positions. Although the substitution of T in G3116 at position 362 (also conserved in A. tauschii) by I in DV92 has a negative BLOSUM62 score (−2), indicative of different biochemical properties of the substituted amino acid, the occurrence of this change at a non-conserved position complicates the prediction of the effect of this amino acid change in the function of the protein.

FtsH4

A phylogenetic analysis of the peptidase_M41 (pfam01434) and the AAA (cd00009) domains from several members of the FtsH family of proteases, including all the Arabidopsis and rice homologues, revealed that wheat and Brachypodium FtsH4 proteins are more closely related to Arabidopsis protein FtsH4 and rice proteins FtsH4 and FtsH5. This clade was supported by 100% bootstrap values (Fig. S1b). Yu et al. (2005) indicated that both rice genes FtsH4 and FtsH5 are arranged in tandem on rice chromosome 1, likely being the result of a duplication of a gene homologous to Arabidopsis gene FtsH4.

T. monococcum gene FtsH4, the other Eps-A m 1-candidate gene, is an ATP-dependent metalloprotease from the M41 family of peptidases, which belongs to a larger family of proteins called AAA (ATPases associated with diverse cellular activities; van der Hoorn 2008). The Arabidopsis nuclear genome was found to contain 12 FtsH genes, the products of three of which are targeted to mitochondria and the remaining nine to chloroplast (Sakamoto et al. 2003). In rice, at least nine FtsH genes have been found, and each of them corresponds to an Arabidopsis FtsH gene or gene pair, except for gene Arabidopsis FtsH12. Rice genes FtsH4 and FtsH5 correspond to Arabidopsis FtsH4 and are targeted to the mitochondria (Yu et al. 2005; Fig. S1b).

Arabidopsis protein FtsH4 (Fig. S1b) has been shown to be part of the i-AAA protease complex that functions in the intermembrane mitochondrial space (Urantowka et al. 2005). Mutants for this gene revealed morphological, anatomical, and developmental alterations when grown under a short-day photoperiod. The FtsH4 gene was found to be highly expressed in seeds, and Arabidopsis ftsh4 mutants showed a delayed germination. A developmental delay was maintained through all the developmental cycle and resulted in plants with reduced leaf number, later flowering time, and lower fertility (Gibala et al. 2009).

Based on the role of FtsH4 in Arabidopsis, this gene cannot be ruled out as a candidate for Eps-A m 1. However, comparison of the predicted FtsH4 protein sequences from the DV92 and G3116 alleles revealed no amino acid substitution. In addition, semi-quantitative PCR analyses revealed that gene FtsH4 is present in both the vegetative apical meristem and the developing spike in T. monococcum, with no major differences between BC5F2 NILs carrying the DV92 or G3116 Eps-A m 1 alleles (Fig. 4a). These last results were also confirmed by real-time quantitative PCR (Fig. 4b).

Although currently it is not possible to rule out any of the two candidate genes, the negative BLOSUM62 score of one of the amino acid changes in the MOT1 protein and the known role of related SNF2-domain containing proteins on flowering time and development suggest that Mot1 might be a better candidate for Eps-A m 1 than FtsH4.

Future work and conclusions

To determine whether genes Mot1 or FtsH4 have the expected effects of Eps-A m 1 on heading time, spike development, and spikelet number, mutants for the wheat orthologues of these two genes are being generated using an available tetraploid Targeting Induced Local Lesions IN Genomes population (TILLING; Uauy et al. 2009). Mutants for both the A and B genome copies of each gene will be then combined to produce double mutants and test their effects on heading time and spikelet number.

Currently, the existence of additional candidate genes completely linked to Eps-A m 1 in the proximal 0.75 cM-gap of the T. monococcum physical map cannot be ruled out. An approximate calculation of the number of additional candidate genes that might be expected in this region can be obtained by assuming a direct proportionality between number of genes and genetic distances. Since 11 genes were found to be present within the 0.14-cM region sequenced in A. tauschii, the additional 0.0125 cM required to find a marker proximal to Eps-A m 1 (one recombination event from Mot1/FtsH4) is expected to yield only one more additional linked gene. If a constant ratio between genetic and physical distances is also assumed, this 0.0125 cM region is expected to represent approximately 300 kb, which can probably be covered in a limited number of additional chromosome walk steps in T. monococcum. However, ratios between recombination and gene number and between genetic and physical lengths are known to be variable along different chromosome regions, and therefore these predictions should be taken with caution.

The effect of the Eps-A m 1 locus on heading time is known to be modulated by temperature (Bullrich et al. 2002). Therefore, the cloning of this gene has the potential to increase the understanding of complex interactions between temperature and plant development. In addition to this basic knowledge, the identification of the Eps-A m 1 gene has highly relevant practical implications, since this gene affects both the duration of spike development and the number of spikelets per spike, which is an important component of potential grain yield in wheat. It is interesting to point out that the A genome of polyploid wheat was contributed by T. urartu, a related but separate species from T. monococcum (Johnson and Dhaliwal 1976). Therefore, 10,000 years of independent domestication may have resulted in the origin of new and valuable alleles in the Am genome of T. monococcum that are not present in the A genome of T. urartu or polyploid wheat species.