Bs1, a new chimeric gene formed by retrotransposon-mediated exon shuffling in maize.

Transposons are major components of all eukaryotic genomes. Although traditionally regarded as causes of detrimental mutations, recent evidence suggests that transposons may play a role in host gene diversification and evolution. For example, host gene transduction by retroelements has been suggested to be both common and to have the potential to create new chimeric genes by the shuffling of existing sequences. We have previously shown that the maize (Zea mays subsp. mays) retrotransposon Bs1 has transduced sequences from three different host genes. Here, we provide evidence that these transduction events led to the generation of a chimeric new gene that is both transcribed and translated. Expression of Bs1 is tightly controlled and occurs during a narrow developmental window in early ear development. Although all Bs1-associated transduction events took place before Zea speciation, a full uninterrupted open reading frame encoding the BS1 protein may have arisen in domesticated maize or in the diverse populations of its progenitor Z. mays subsp. parviglumis. We discuss potential functions based on domain conservation and evidence for functional constraints between the transduced sequences and their host gene counterparts.

Transposons are ubiquitous and fundamental components of prokaryotic and eukaryotic genomes. For example, they make up to 85%, 45%, 40%, and 12% of the maize (Zea mays), human, rice (Oryza sativa), and Arabidopsis (Arabidopsis thaliana) genomes, respectively (Arabidopsis Genome Initiative, 2000;International Human Genome Sequencing Consortium, 2001;Goff et al., 2002;Yu et al., 2002;Schnable et al., 2009). Some recent studies suggest that over evolutionary time scales, transposons have contributed to the evolution of genes and genomes by providing means for gene and genome diversification (Kazazian, 2004). For example, an analysis of the human genome sequence reveals more than 1,000 predicted and known proteins to contain sequences derived from transposons, especially long interspersed nuclear elements of the L1 family and short interspersed nuclear elements such as Alu elements (Li et al., 2001). In addition, some mobility-related proteins may have evolved to contribute a cellular function. For example, Drosophila telomeres are composed of the telomere-specific retroposons HeT-A and TART (for review, see Pardue et al., 1996), the RAG1 and RAG2 proteins required for V(D)J recombination and the maturation of B-and T-cells are derived from an ancient DNA transposon (Hiom et al., 1998; for review, see Roth and Craig, 1998); and Syncytin, a protein involved in placental morphogenesis, is encoded by the envelope gene of the human endogenous retrovirus HERV-W (Mi et al., 2000). In plants, the transposase genes of Mutator-like elements (MULEs) may have provided ancestral sequences for two genes involved in red light signaling (FAR1 and FHY1; Hudson et al., 2003;Lin et al., 2007) as well as the MUG1 gene family (Cowan et al., 2005).
Other mechanisms, however, all based on transposon activity, have also been implicated. For example, "gene acquisition" by DNA transposons and "gene transduction" by retroelements have the potential to mediate gene diversification and the emergence of novel cellular functions (Goodier et al., 2000). Whereas gene acquisition by DNA transposons is likely to be mediated by recombination, gene transduction occurs by readthrough transcription of retroelements (Goodier et al., 2000;Bennetzen 2005). Gene acquisition by MULEs in rice (Jiang et al., 2004;Juretic et al., 2005;Hanada et al., 2009) and Arabidopsis (Hoen et al., 2006) reveals that MULEs acquired gene fragments and duplicated or amplified them into gene families. A recent study suggests that 22% of MULE-generated duplications are transcribed in rice (Hanada et al., 2009), whereas the only case of an Arabidopsis duplicate that is transcribed (KI) behaves like the associated transposase gene, suggesting that the function of KI may be selfish rather than cellular (Hoen et al., 2006).
Like MULE-mediated gene acquisition in plants, cellular gene transduction by retroelements seems to be a common feature during L1 retrotransposition in human (Goodier et al., 2000). However, only transduction by infectious retroviruses has been shown to generate hybrid open reading frames (ORFs) that could modulate cell function, although in this case expression of the hybrid ORF leads to neoplastic transformation (Cooper, 1995). Thus, the significance of cellular gene transduction for gene diversification in the absence of a disease phenotype remains to be determined. A recent study suggests that the rate of chimeric gene formation by retroposition is 50-fold higher among grass genomes than in primates and that retroposition has kept grass genomes in constant flux of new chimeric retrogenes (Wang et al., 2006). In addition to transposon activity, several other mechanisms contribute to the origination of new genes, including exon shuffling caused by illegitimate recombination and retroposition, gene duplication, retroposition of a gene transcript, lateral gene transfer, gene fusion, and de novo formation of new genes from previously noncoding sequences (for review, see Long et al., 2003). In tomato (Solanum lycopersicum), a recent mutation fused exons from the gene encoding the b-subunit of inorganic pyrophosphate-dependent phosphofructokinase to those of the homeobox gene LeT6, leading to elevated levels and an altered pattern of expression of the latter (Chen et al., 1997). This mutation (called mouse ear) arose spontaneously in an isogenic tomato cultivar and leads to excessively proliferated leaves, consistent with the altered expression pattern of LeT6 (Rick and Harrison, 1959;Chen et al., 1997) and suggesting that the emergence of new gene variants may lead to phenotypic differences. Another tomato gene, SUN, was duplicated together with 24.7 kb of flanking sequences through gene transduction by the retrotransposon Rider (Xiao et al., 2008). This duplication event brought SUN into a new genomic context that increased its expression, leading to an elongated fruit shape. The identification of young genes such as these provides tools to study the origin and evolution of new genes, since many details on the origin of a gene are lost over long periods of time (Long et al., 2003).
We and others have previously reported the first case, outside of oncogenic retroviruses, of a retrotransposon that has transduced host cellular sequences (Bureau et al., 1994;Jin and Bennetzen, 1994;Palmgren, 1994;Elrouby and Bureau, 2001). The maize long terminal repeats (LTR) retrotransposon Bs1 has transduced sequences from three different maize cellular genes, namely, proton-dependent membrane ATPase (c-pma), xylan endohydrolase (c-xe), and b-1,3-glucanse (c-bg; where c corresponds to the cellular genes, whereas their retroelement-associated counterparts are designated r-pma, r-xe, and r-bg; Elrouby and Bureau, 2001). The transduction events generated a hybrid ORF (ORF1, 740 amino acids) that contains sequences corresponding to the Bs1 gag domain fused to the transduced sequences. Here, we report that the Bs1-associated transduction events and subsequent mutations led to the emergence of a novel gene by the shuffling of existing sequences. We show that Bs1 is both transcribed and translated in reproductive tissues, and specifically in ears. The BS1 protein is not detected in extracts obtained from sterile ears, suggesting that Bs1 expression may be associated with normal reproductive development in maize. Characterization of Bs1 from several maize landraces and inbred lines as well as from the wild relatives of maize, the teosintes, reveals that different large and small deletions/insertions mediated the formation of one uninterrupted ORF (ORF1) following the initial transduction events. A sequence highly related to maize ORF1 is first seen in Z. mays subsp. parviglumis, which has been shown by independent lines of evidence to be the progenitor of domesticated maize (Z. mays subsp. mays; Doebley, 2004). Collectively, the Bs1-associated transduction events generated a novel chimeric gene whose function, if any at this point of its evolution, may be involved in reproductive development.

Bs1 ORF1 Is Expressed in Reproductive Organs
We have previously identified an EST that shares identity with the Bs1 3# LTR and a part of the internal sequence (Elrouby and Bureau, 2001). This EST was isolated from mixed stages of anther and pollen, suggesting that Bs1 may be expressed in the germ line. To properly assess the Bs1 expression pattern, we performed reverse transcription (RT)-PCR with RNA extracted from different tissue types ( Fig. 1) in two maize inbred lines, W22 and Oh43. As seen in Figure 1, Bs1 is specifically expressed in stage R1 or early stage R2 ears (when silk starts to be visible outside the husks; Hanway and Ritchie, 1984) and tassels of both inbred lines (Fig. 1A, lanes 2-5, 7, and 8) but not in any of the vegetative tissues tested (husk, root, 1-week-old seedling, 2-week-old seedling leaf, mature leaf; Fig.  1A, lanes 6 and 9-12).
Since the Bs1 chimeric sequence was generated by retroelement-mediated gene transduction, it is devoid of any introns (Elrouby and Bureau, 2001). It is thus important to confirm that the RT-PCR products obtained were truly amplified from cDNA rather than contaminating genomic DNA. First, when reverse transcriptase was omitted from the reaction, no amplification products were obtained at all (Fig. 1A, lanes 13-15). Second, we used the same cDNA to amplify transcripts of an intron-containing gene. For this purpose, we used the gene coding for Abp1 (for auxinbinding protein 1; Elrouby and Bureau, 2000). Abp1 primers anchored in exons 3 and 5 yield an RT-PCR product consistent with amplification from cDNA only (Fig. 1B, lanes 2-8). The same primer pair amplified a genomic fragment containing the intervening introns when genomic DNA was used instead as a template (Fig. 1B, lane 9). Third, the fact that no Bs1 amplification products were obtained in vegetative tissues (Fig. 1A, lanes 6 and 9-12) whereas the same tissues supported amplification from Abp1 transcripts (Fig. 1B, lanes 4-8) suggests that the Bs1 amplification products obtained in reproductive tissues were derived from reverse-transcribed Bs1 mRNA expressed differentially in these tissues.

Bs1 ORF1 Is Translated
To determine whether the Bs1 transcript is translated, we raised polyclonal antibodies against ORF1.
In ORF1, both r-xe and r-pma but not r-bg maintained the reading frame of their cellular gene counterparts (Elrouby and Bureau, 2001). To avoid cross-reactivity against cellular proteins encoded by c-xe and c-pma, only the sequence encoding the N-terminal 301 amino acids of ORF1 was used to raise the antibodies (anti-BS301). This sequence spans the Bs1 gag and r-bg domains (Elrouby and Bureau, 2001). Another antibody was raised against a synthetic peptide spanning residues 196 to 215 (anti-BS196). Both antisera recognize the BS1 protein expressed in Escherichia coli ( Fig.  2A, shown for anti-BS196 antiserum), and a polypeptide of approximately 100 kD in extracts from maize young ears but not from sterile ears (see below) or leaves (Fig. 2B). The size of this polypeptide is consistent with a translational product encoded by Bs1 ORF1. Additionally, the approximately 100-kD protein is not recognized by preimmune sera (Fig. 2C) and is competed out when the antiserum is preincubated with the BS1 synthetic peptide (Fig. 2D). We conclude that the approximately 100-kD polypeptide we see in maize ear extracts is most likely encoded by Bs1 ORF1.
We performed immunoblot analyses using the anti-BS1 antibodies to test Bs1 expression in different maize tissues. Bs1 is translated primarily in young ears and to a much lesser extent in young tassels and mature embryos (Fig. 2E, lanes 5, 6, and 9). In tassels, two bands of approximately 97 and 100 kD are sometimes seen (Fig. 2E, lane 6). The nature of the smaller band is unclear. In vitro translation experiments have previously indicated that, in addition to ORF1, a longer polypeptide predicted for the frameshift fusion of ORF1 and ORF2 can be generated (Jin and Bennetzen, 1989). We do not see any evidence of such a fusion in plant extracts.

Bs1 Expression Is Associated with Normal Reproductive Development
As a consequence of altered light and temperature conditions due to growth at elevated latitude, maize plants occasionally undergo normal vegetative development but produce sterile ears (Fig. 3A). These ears have a vegetative appearance, are stunted, light green in color, with aborted kernels that look like elevated swellings from the main axis of the cob, and arrested silk. When proteins prepared from these ears were tested by immunoblot analysis, we could not detect the Bs1 ORF1 polypeptide (Fig. 3B, lane 7). Furthermore, we could not detect the ORF1 polypeptide in proteins extracted from post-pollen tassel (Fig. 3B, lane 4). Instead, the anti-BS1 antibodies detected several bands, all of which are smaller than expected of the ORF1 polypeptide. Since the sum of the intensities of these bands is more than that seen in young tassel extracts (Fig. 3B, lane 3), they are unlikely to be degradation products. Thus, the nature of these bands remains to be determined. However, it is possible that they may correspond to shorter Bs1 translational products resulting from in-frame initiations at several Figure 1. Bs1 is expressed in reproductive tissues. A, RT-PCR analysis of Bs1 mRNA extracted from Z. mays subsp. mays. Lanes 1 and 16 contain a M r marker. Lanes 2 to 8 contain young ear (stage R1 or early stage R2, when silk starts to be visible outside the husks [Hanway and Ritchie 1984]; W22), young ear (Oh43), young tassel (W22), young tassel (Oh43), husk (Oh43), silk-free ear (Oh43), and silk (Oh43), respectively. Lanes 9 to 12 contain Oh43 root, 1-week-old seedling, 2-week-old seedling leaf, and mature leaf, respectively. Lanes 13 to 15 contain controls in which reverse transcriptase was omitted during firststrand cDNA synthesis; lane 13 contains young ear (Oh43), lane 14 contains young ear (W22), and lane 15 contains young tassel (Oh43). B, RT-PCR analysis of Abp1 mRNA extracted from Z. mays subsp. mays (Oh43) using the primer pair shown in the schematic in C (represented by arrowheads). Lane 1 contains a M r marker and lanes 2 to 8 contain young ear, young tassel, root, seedling leaf, 8-week-old leaf, silk, and husk, respectively. Lane 9 contains a PCR amplification product using the same primer pair used for RT-PCR but with genomic DNA as a template. C, Schematic depiction of the Abp1 gene showing primers (arrowheads) used in B. The primers used to amplify the Bs1 transcript are indicated by arrows above the Bs1 structure depicted in Figure 4. internal ATG codons (ORF1 contains 20 internal Mets; data not shown). Immunoblots of protein extracts from both sterile ears and post-pollen tassels and probed with anti-ubiquitin antisera ( Fig. 3C) suggest that both tissues contain a normal suite of proteins and that general protein degradation is not the case, as indicated by the presence of high molecular mass ubiquitinated proteins and free ubiquitin (Fig. 3C).

Structure and Evolution of Bs1 in Maize and the Teosintes
In order to study the structure and sequence evolution of Bs1 ORF1, we cloned and characterized Bs1 from domesticated maize and the teosintes (Zea luxurians, Zea diploperennis, Z. mays subsp. mexicana, Z. mays subsp. huehuetenangensis, Z. mays subsp. parviglumis). In these taxa, Bs1 copy number ranges from one to five (Johns et al., 1985;Elrouby and Bureau, 2001). To amplify all potential copies, the sense and antisense primers were anchored in the retroelement primer-binding site and polypurine tract, respectively, both of which are expected to be conserved. The number of copies isolated from the different taxa reflects expected copy numbers. We isolated five maize (inbred line W22) Bs1 copies (My1 to -5), two from Z. luxurians (L14, L15), three from Z. diploperennis (D12, D13, D18), four from Z. mays subsp. mexicana (Mx6, Mx16, Mx23, Mx24), three from Z. mays subsp. huehuetenangensis (H8, H10, H25), and one from Z. mays subsp. parviglumis (P22; Fig. 4; Table I). Although an intact ORF1 is only observed in two maize Bs1 copies (see below), all Bs1 copies contain all three transduced genes, suggesting that the transduction events took place before the speciation of the genus Zea.
Except for the Z. mays subsp. parviglumis and four maize copies, all other Bs1 copies contain in-frame premature stop codons when compared with maize ORF1 (Jin and Bennetzen, 1989 Table I). Likewise, all teosinte and three of the maize Bs1 copies sustain insertions/deletions (indels) that disrupt the Bs1 coding potential ( Fig. 4; Table I). The teosinte Bs1 copies contain, in addition to 1-to 3-bp indels, larger indels such as those starting at codon 138 (21 bp, in D12, D18), codon 171 (8 bp, in D13), codon 194 (69 bp, in L14, L15, D13), codon 473 (91 bp, in Mx16), codon 490 to 510 (60 bp, in D12, D18), codon 561 (183/187 bp, in L14, L15, D13), and codon 679 to 689 (30 bp, in L14, L15). It is interesting that the 183/ 187-bp insertion seen in both Z. luxurians copies and in D13 corresponds to the 183 bp of c-pma that is later deleted in maize r-pma to form mature ORF1 (Bureau et al., 1994;Jin and Bennetzen, 1994;Palmgren, 1994;Elrouby and Bureau, 2001). Also, the 1-bp indel at codon 88 is conserved in all but three teosinte Bs1 copies, and the 2-bp indel at codon 736 is conserved in all teosinte copies as well as one copy from maize (My3). Another 1-bp indel that disrupts codon 185 is conserved in all perennial teosinte Bs1 copies, at least one copy from each annual teosinte (Mx16, H10, P22), and one copy from maize (My3). With the exception of one Z. mays subsp. huehuetenangensis copy (H25), Bs1 copies isolated from the annual teosintes contain only simple (1-3 bp) indel ( Fig. 4; Table I). In H25, a large 696-bp deletion eliminates approximately two-thirds of the r-pma and all of the env regions and is likely to have occurred later after all transductions took place. In Z. mays subsp. parviglumis, P22 contains a 1-bp deletion at codon 185 as well as a 2-bp deletion at codon 736.
In maize (inbred line W22), five Bs1 copies were isolated ( Fig. 4; Table I), but only two of them poten- Figure 2. Bs1 is translated. Immunoblot analysis using total bacterial (A) or plant (B-E) extracts. In A, extracts from a bacterial strain containing a plasmid expressing the N-terminal 301 amino acids (lane 1) or the empty plasmid (lane 2) were probed with anti-BS1 antiserum. In B to D, extracts from young ears, sterile ears, or leaves (lanes 1-3, respectively) were probed with the anti-BS1 antiserum (B), preimmune serum (C), or anti-BS1 antiserum that had been incubated with a BS1 synthetic peptide (D). In E, proteins extracted from mature leaf, 2-week-old seedling leaf, 1-week-old seedling, silk, young ear, young tassel, endosperm, pericarp, and embryo (lanes 1-9, respectively) were probed with anti-BS1 antibody. In all panels, the leftmost lane contains a M r marker. [See online article for color version of this figure.] tially encode an intact ORF1. The remaining three (My1, My3, My5) contain the same 1-bp deletion located at codon 345 (also present in D18, Mx23, H10). In addition, My3 contains the same 1-bp deletion present in all perennial Bs1 copies as well as in Mx16, H10, and P22 (disrupting codon 185) and the 2-bp deletion disrupting codon 736 in all perennial and annual teosinte copies. We also isolated Bs1 sequences from a number of maize exotic landraces and inbred lines and identified copies similar to maize (W22) and Z. mays subsp. parviglumis Bs1 (data not shown). The two maize Bs1 copies that potentially encode intact ORF1 (My2, My4) differ only by nucleotide substitutions (98% identical at the nucleotide level and 96% identical at the amino acid level). My4 is more similar to the published Bs1 sequence (isolated from maize inbred line 1s2p; Jin and Bennetzen, 1989). We also isolated a full-length Bs1 cDNA from another maize inbred line (Oh43). As in the case of My2 and My4, the Oh43 cDNA potentially encodes an intact ORF1 (Fig. 4).

Patterns of Nucleotide Sequence Evolution
The determination of the ratio of the nonsynonymous (leading to amino acid replacements) substitution rate per nonsynonymous sites and synonymous (leading to silent changes) substitution rate per synonymous sites (dN/dS) has been used extensively to infer the nature of selection operating on genes of interest (Yang, 2002). An excess of nonsynonymous over synonymous substitutions is an indication of positive selection, whereas a low dN/dS ratio is indicative of purifying selection (Yang, 2002). We calculated the dN/dS ratio for Bs1 ORF1 using the CODEML program of PAML (see "Materials and Methods"). Except for H25, all sequences described above were used in this analysis. H25 was eliminated because it contains a large (696 bp) deletion that is likely to skew the analysis. We obtained a dN/dS ratio not significantly different from 1 (0.89), possibly suggesting that, within the genus Zea, ORF1 is under neutral genetic drift. Since maize Bs1 sequences containing an intact ORF1 clustered in one clade (data not shown), we used CODEML to calculate a different dN/dS value for this cluster and assessed whether it is significantly different from that of the rest of the sequences. Likelihood ratio estimates suggest no significant difference (P = 0.59, 1 degree of freedom). Pairwise comparisons of the maize and Z. mays subsp. parviglumis Bs1 sequences, however, reveal dN/dS ratios of 0.60, 0.59, and 0.56 for Bs1(1s2p)/P22, My4/P22, and My2/P22, respectively (Table II). When the Bs1 copies that encode intact ORF1 were compared, we obtained ratios slightly higher than 1 when Bs1(1s2p) or My4 was compared with My2 (1.2 or 1.17, respectively) but lower than 1 (0.50) when Bs1(1s2p) and My4 were compared with each other (Table II). This confirms our earlier findings that My4 is more similar to the published Bs1 sequence (isolated from 1s2p) than it is to My2 and suggests that there may be two slightly different copies of Bs1 (1s2p/My4 type and My2 type) that are evolving differently.
We also used dN/dS ratio estimates to study the substitution patterns between the transduced genes and their parental host genes. We have previously reported that integration of both c-pma and c-xe but not c-bg occurred in a manner that preserved their open reading frames in Bs1 ORF1 (Elrouby and Bureau, 2001). We tested whether substitution patterns were also constrained to keep the cellular gene's amino acid composition (Table III). The dN/dS ratios for c-pma/ r-pma, c-xe/r-xe, and c-bg/r-bg were 0.29, 0.18, and 0.91, respectively, suggesting that r-pma and r-xe are likely under pressure to maintain their cellular gene amino acid sequence. A high value for c-bg/r-bg is consistent with the random integration of r-bg and very similar to the value we determined for ORF1 comparisons (0.89), suggesting that r-bg and c-bg diverged at the same rate at which ORF1 diverged within the genus Zea.

DISCUSSION
Gene transduction by retroelements occurs frequently during the retrotransposition of L1 elements in human (Moran et al., 1999;Goodier et al., 2000). Approximately 23% of human L1 elements seem to have transduced host sequences. With approximately 400,000 L1 elements, transduction events like these have enlarged the diploid human genome by as much as 19 Mb, or 0.6% (Goodier et al., 2000). L1-mediated gene transduction has also been shown to occur in an experimental system involving a human cell culture and proposed to have the potential as a mechanism for the evolution of new genes by shuffling of already existing sequences (Moran et al., 1999;Goodier et al., 2000). In addition, the human PMCHL1 gene was created by retrotransposition of an antisense MCH mRNA coupled with the de novo creation of splice sites (Courseaux and Nahon, 2001). A computational survey of the rice genome identified 1,235 retrogenes, and 27 of these are located within LTR retrotransposons (Wang et al., 2006). Additionally, 380 of these retrogenes contain chimeric protein-coding sequences. Combined with an exceptionally high rate of chimeric gene formation by retroposition in grass genomes, gene transduction by retroelements is likely to contribute to the phenotypic diversity of grasses. In maize, Bs1 remains the best studied LTR retrotransposon with clear evidence for multiple gene transduction events (Elrouby and Bureau, 2001). In this study, we provide evidence that these transduction events may have resulted in the formation of a chimeric new gene. Whereas all transduction events took place before the speciation of the genus Zea, the formation of ORF1 may have happened in domesticated maize or in the diverse populations of its progenitor Z. mays subsp. parviglumis, suggesting a possible recent emergence of this new gene.

The Birth of a New Gene
Several findings suggest that Bs1 has evolved as a new gene. First, it is both transcribed and translated. This is remarkable, since gene transduction and integration of the transduced sequences into the retroele- ment genome is theoretically a random process. In Bs1, none of the transduced sequences were full-length ORFs (i.e. only fragments of the three genes were incorporated into the Bs1 genome; Elrouby and Bureau, 2001). Moreover, in the case of r-xe, a part of the transduced sequence is noncoding in c-xe (5# untranslated region), and r-bg was integrated into Bs1 without maintaining the c-bg translational frame. Bs1 transductions were also associated with numerous mutations, including two major deletions. The first deletion removed 385 bp in r-xe compared with its cellular gene counterpart (c-xe). This deletion elimi-nated 44 bp from the first exon, all of the intervening intron, and 82 bp of the second exon (Elrouby and Bureau, 2001). The second deletion removed 183 bp in r-pma when compared with c-pma and hence eliminated most of exon 6 (Bureau et al., 1994;Jin and Bennetzen, 1994). Additionally, single point mutations account for a large number of amino acid changes, and the transduced sequences considerably diverged from their cellular gene counterparts (sequence identities are 81%, 86%, and 88% for r-bg/c-bg, r-xe/c-xe, and r-pma/c-pma, respectively; Elrouby and Bureau, 2001). Despite all these mutations, one long uninterrupted ORF (ORF1) formed in maize and is both transcribed and translated. The size of the protein product observed in our immunoblot analyses is consistent with a translational product of ORF1.
Second, analysis of the Bs1 sequence in maize and its wild relatives, the teosintes, suggests a process that guided the emergence of this new gene. Bs1 is likely to be a Zea-specific element. DNA hybridization experiments using DNA from a variety of monocotyledonous and dicotyledonous plants reveal its presence only in Zea species (Johns et al., 1985). Similarly, no Bs1 sequences were found in closely related grasses (rice, sorghum [Sorghum bicolor], barley [Hordeum vulgare]), since BLAST searches only identified short regions of similarity with cellular gene orthologs of the transduced genes (data not shown). In Bs1 copies from the different Zea species, several deletions/insertions were identified, mainly in the teosintes, and obviously led to the formation of one long uninterrupted ORF (ORF1) in maize. For example, we have previously noticed that, compared with c-pma, r-pma in maize Bs1 contained a deletion of 183 bp (Bureau et al., 1994;Elrouby and Bureau, 2001). In this study, we identified teosinte Bs1 copies (in Z. luxurians and Z. diploperennis) that still contain this 183-bp sequence. This further supports our hypothesis that c-pma was the last of the three genes to be transduced (Elrouby and Bureau, 2001). More importantly, it indicates that all three Bs1 transductions took place before the speciation of the genus Zea and that the formation of mature ORF1 most probably took place in the diverse populations of Z. mays subsp. parviglumis (the progenitor of maize) or during the domestication of maize. In addition, several deletions/insertions are conserved (identified in the same position) among the different taxa studied. In particular, the 1-bp deletion at codon 88 is seen in almost all teosinte copies but not in maize inbred lines or Z. mays subsp. parviglumis. Likewise, the 2-bp deletion at codon 736 is found in all teosinte sequences as well as in one maize sequence, and the 1-bp deletion at codon 185 is identified in one maize and eight teosinte copies ( Fig. 4; Table I). It is likely that simple insertions at these positions were instrumental to the formation of ORF1.
Third, when the dN/dS ratio, a strict measure of selection, was calculated for teosinte Bs1 copies, a value not significantly different from 1 was obtained throughout the length of a reconstructed ORF1. This rules out purifying selection and suggests that in the genus Zea, ORF1 is under neutral drift, a result to be expected for pseudogenes (all teosinte Bs1 copies). However, when potentially "functional" copies of Bs1 obtained from maize inbred lines were compared with their presumed parental copy (from Z. mays subsp. parviglumis), we obtained dN/dS ratios significantly lower than 1, suggesting potential functional constraints on Bs1 ORF1 in domesticated maize. This is also confirmed by the finding that the dN/dS ratio of Bs1(1s2p) and My4, two copies isolated from two different inbred lines, is also lower than 1. Interestingly, the two potentially functional copies identified in maize seem to diverge slightly in sequence, and this is reflected in a slightly increased number of nonsynonymous over synonymous changes. This was evident from the fact that the two copies diverge more at the amino acid level than at the nucleotide level and also from a dN/dS ratio that is higher than 1.
Fourth, Bs1 is expressed only during a specific developmental window. This is evident again both at the transcript and the protein levels and both spatially and temporally. The Bs1 transcript is detected primarily in young ears and to a lesser extent in young tassels but not in any of the vegetative tissues tested. The BS1 protein mirrors its transcript localization; however, whereas it was detected in abundant levels in young ears, it was barely detectable in young tassels. Bs1 expression is also temporally regulated. Immunoblot analysis with protein extracts from young ears (stage R1 or early stage R2, when silk starts to be visible outside the husks; Hanway and Ritchie, 1984), whole mature kernels, mature embryos, endosperm, and pericarp reveals that the BS1 protein is detected only in young ears (Fig. 2). Embryos show very low expression levels, suggesting that, in young ears, Bs1 is probably expressed in ovules and young developing embryos and that expression is either down-regulated or shut down during later stages. This tight expression pattern suggests that Bs1 may have evolved as a new gene and may be involved in early aspects of maize reproduction and/or kernel development.
Fifth, Bs1 expression seems to correlate with normal reproductive development. The BS1 protein was undetectable in sterile ears collected from plants that developed abnormally, probably due to unfavorable growth conditions. These plants grew normal vegetative structures but produced very few pollen and sterile ears. Specifically, the ears were vegetative in appearance (i.e. light green in color with kernels replaced with structures that look like elevated swellings from the main axis of the cob). The cob axis itself was enlarged (in diameter) and constituted most of the mass of the ear. The average length of the cob was 30% to 50% of the length of an R1/R2-stage ear. Silk was arrested early and appeared only when husks were manually removed. These ears arrested at this stage and did not develop further. Whereas the BS1 protein was not detected in protein extracts prepared from these ears, high molecular mass ubiquitinated proteins as well as free ubiquitin (Fig. 3C) were unaffected, suggesting that general protein degradation is not the case and that Bs1 is expressed only during normal ear development. It remains to be determined whether normal reproductive development requires or is necessary for Bs1 expression.

Recent Birth
Our results suggest that the Bs1 transduction events occurred before the speciation of the genus Zea. We identified the transduced sequences in all Bs1 copies isolated from all five teosinte species tested (Fig. 4). However, all the teosinte copies of Bs1 contain deletions, insertions, or premature stop codons that disrupt ORF1. Although this is very clear for perennial (Z. luxurians and Z. diploperennis) and two of the annual (Z. mays subsp. mexicana and Z. mays subsp. huehuetenangensis) teosintes, Bs1 in Z. mays subsp. parviglumis has a structure more similar to that of domesticated maize (Z. mays subsp. mays). The Bs1 sequence in Z. mays subsp. parviglumis differs from maize Bs1 by only several nucleotide substitutions and three simple deletions. The first deletion is located at codon 185 (relative to ORF1) and is 1 bp long. The second deletion is located at position 736 and is 2 bp in length. The third deletion is also 2 bp in length but is located downstream of ORF1 (in the hypothetical ORF2). A single nucleotide insertion at codon 185 would restore ORF1 and produce an ORF (94% identical to maize ORF1, 904 amino acids in length) that terminates 7 bp upstream of the hypothetical ORF2 stop codon (Fig. 4). Although we cannot rule out the presence of such an ORF in Z. mays subsp. parviglumis, we sequenced two more independent PCR products and confirmed the presence of the deletion at position 185. Additionally, this 1-bp deletion was also identified in eight other Bs1 sequences isolated from perennial and annual teosintes as well as domesticated maize (see "Results"), ruling out an amplification error and suggesting that it descends from an ancestral Bs1 copy. Although it is possible that one or more Bs1 copies may have escaped PCR amplification in the taxa used in this study, it is unlikely that this copy encodes an intact ORF1. Given the high degree of sequence similarity between Bs1 copies that contain an intact ORF1 or an ORF1 that contains only simple (1-or 2-bp) indels, amplification would probably miss copies of considerable structural differences rather than copies that are more similar to maize Bs1. This is confirmed by the facts that all five maize Bs1 copies were cloned, that the number of copies cloned from the teosintes corresponds to copy number estimates reported previously (Johns et al., 1985;Elrouby and Bureau, 2001), and that some internal primers (anchored in maize ORF1) failed to amplify any Bs1 sequences from the teosintes.
The high degree of sequence and structural similarity between the Z. mays subsp. parviglumis and maize Bs1 copies makes it difficult to infer the time of emergence of ORF1. It is tempting to suggest that ORF1 formed at or immediately after the domestication of maize, since copies of Bs1 with intact ORF1 were only identified in domesticated maize. However, populations of Z. mays subsp. parviglumis exhibit a very high degree of diversity, and maize has maintained a substantial proportion (60%-70%) of this diversity (Tenaillon et al., 2001; for review, see Tian et al., 2009). Given our small sample size, we are not able to discern whether ORF1 formed in maize or in its progenitor populations. This may require a more detailed examination of Bs1 sequence in a large number of Z. mays subsp. parviglumis and maize populations.
The notion that ORF1 may have acquired, or is in the process of acquiring, a function is extraordinary, given how recently this must have taken place. Based on recent archaeological and molecular data, maize (Z. mays subsp. mays) was domesticated from its progenitor teosinte (Z. mays subsp. parviglumis) approximately 6,250 years ago (Riperno and Flannery, 2001;Matsuoka et al., 2002), although evidence derived from microsatellite analyses estimates the upper limit of the time of divergence of the two subspecies at 9,188 years ago . Although it has been suggested that most domesticated crops are the products of multiple independent domestications, analysis of 99 microsatellite loci in a large maize and teosinte population suggests a single domestication for maize that is likely to have occurred in the central Balsas River Valley in Mexico . In archaeological maize samples, analysis of three genes involved in the control of plant architecture, storage protein synthesis, and starch production (and hence that were major players during domestication) revealed that alleles of the three genes typical of contemporary maize were present in Mexican maize by 4,400 years ago, yet allelic selection at one of these genes may have not been completed by as recently as 2,000 years ago (Jaenicke-Despres et al., 2003). As mentioned earlier, maize Bs1 is most similar (based on structure and the degree of sequence identity) to Z. mays subsp. parviglumis Bs1. It is conceivable that the Z. mays subsp. parviglumis allele passed on to maize during domestication approximately 9,000 years ago and that this was followed by two simple insertions that created ORF1. The finding of intermediates containing intact codon 185 or intact codon 736 among maize exotic landraces supports this idea. Alternatively, ORF1 or an ORF1-related fusion of ORF1 and the hypothetical ORF2 may have arisen in Z. mays subsp. parviglumis, followed by selection during domestication to maintain ORF1 in postdomestication maize. The domestication of maize is thought to have involved strong selective sweeps that are likely to reduce genetic diversity in genes and genomic regions important during domestication . The absence of teosinte-type Bs1 copies (with large indels and premature stop codons) among maize inbred lines is consistent with this idea.
Strong selection for traits that improved agronomic performance, palatability, or nutritional value was instrumental for the domestication of crop plants . In maize, the ear received much of the attention. The morphological differences between the maize and teosinte ears suggest that traits unique to maize confer a selective disadvantage for surviving in the wild and more suitability as a cultivated crop. For example, maize has a rigid enlarged polystichous (multi-ranked) rachis with tenaciously attached grains that require human intervention for dispersal and propagation, whereas the teosinte ear is distichous (two-ranked) with a thin rachis that naturally disarticulates, aiding in seed dispersal. Teosinte grains are also protected inside fruit cases formed by an invaginated rachis and lower glume. As discussed before, we suggest that Bs1 ORF1 is expressed in the ear (especially young developing ears) and that its expression correlates with normal reproductive development. It is possible that the Bs1 ORF1 contributes some of the traits that farmers favored during domestication (for a discussion of potential functions, see below). Alternatively, the Bs1 ORF1 may have hitchhiked with some of the genes contributing these traits.

Potential Function
It is likely that both r-xe and r-pma contribute properties to the BS1 protein similar to those of their parental cellular proteins. This is based on several observations. First, both r-xe and r-pma integrated in ORF1 in a nonrandom manner that maintained the same reading frame of their cellular gene counterparts (Jin and Bennetzen, 1994;Elrouby and Bureau, 2001). Second, dN/dS ratio estimates reveal potential functional constraints between r-xe and r-pma and their cellular gene counterparts (see "Results"). Third, conserved domain analysis revealed that r-xe corresponds to the sequence encoding the xylan endohydrolase signal peptide (Banik et al., 1996;Elrouby and Bureau, 2001; data not shown) that potentially targets the enzyme to cell wall xylans, and r-pma encodes a slightly truncated ATP-binding and hydrolysis domain characteristic of membrane and vacuolar proton-dependent ATPases (Jin and Bennetzen, 1994;Michelet and Boutry, 1995;Elrouby and Bureau, 2001; data not shown).
Several scenarios are possible for a potential function for Bs1 ORF1. For example, ORF1 may contribute a novel function that may or may not require the r-xe and/or r-pma domains, although amino acid and domain conservation would argue for a function that utilizes one or both domains. Alternatively, ORF1 may alter already existing functions (e.g. those encoded by c-xe and/or c-pma). In this case, Bs1 may downregulate c-xe and/or c-pma functions in the ear. This is conceivable both at the transcript and the protein levels. Bs1 transcripts share a high degree of sequence identity with those of c-xe and c-pma and may thus mediate their down-regulation by posttranscriptional gene silencing. The ORF1 protein may also compete with proteins encoded by c-xe and/or c-pma for xylan binding (for c-xe) and/or localization to the membrane or the cell wall (for c-pma and c-xe, respectively). One of the main differences between the maize and the teosinte ear is cob size (both in length and diameter). The maize cob is much larger in size and is multiranked (polystichous) with tenaciously attached grains that require human intervention for dispersal and propagation (or harvest), traits that are suitable for a field crop. Xylans constitute more than 60% of cell wall polysaccharides in the maize cob (Ebringerova et al., 1997). In the teosinte ear, on the other hand, grains attach to a thin rachis that is mostly cellulose in nature. Reduction of xylan endohydrolase activity in the maize ear could result in a larger xylan content and cob size, traits that might have appealed to farmers during domestication. Elucidating the exact function of Bs1 will require the generation of Bs1 null alleles or knockdown lines as well as detailed molecular and biochemical characterization.
Retroelement-mediated gene transduction has been previously proposed as a general mechanism to generate new genes with the potential to modulate host cellular functions. In this report, we show that gene transduction events mediated by the maize LTR retrotransposon Bs1 led to the formation of a chimeric new gene whose function may be implicated during reproductive development. Given the high frequency with which gene transduction takes place (Moran et al., 1999;Wang et al., 2006) and the frequent occurrence in genomes (Goodier et al., 2000;Wang et al., 2006), other cases similar to Bs1 are likely to be identified, and the full extent of how chimeric genes produced by retroposition may contribute to genetic and phenotypic diversity may be elucidated.