A h RNA Secondary Structure Consists of Two Independent 7 SL RNA-like Folding Units*

The amplification of genomic Alu elements by retro- position, i.e. by reintegration of reverse-transcribed RNA, suggests that Alu RNA plays an important role in this process. We report enzymatic studies of the secondary structure of Alu RNAs transcribed in vitro from two recently retroposed Alu elements. These ex- periments show that the dimeric organization of an Alu sequence is reflected in its RNA folding. Alu subunits fold independently, conserving secondary structure motifs of their progenitor 7 SL RNA molecule. Energy minimization analysis indicates that this folding pat- tern is also characteristic of different Alu and Alu-like sequences and has been conserved since primate diver- gence. By analogy to 7 SL RNA, the Alu RNA folding may be important for specific interactions with pro- teins. This could indicate a physiological function for Alu transcripts. However, this can be also seen as a structural adaptation leading to efficient retroposition of these sequence elements.

progenitor 7 SL RNA molecule and (ii) whether the left and right subunits fold independently or, due to regions of extensive complementarity, interact to form a common secondary structure. Because no abundant source of the specific in vitro Alu transcripts is known, it was difficult to address these questions experimentally.
To circumvent this problem, we analyzed the structure of Alu RNAs transcribed in vitro from DNA templates of two members of the youngest Alu family (class IV (11) or Sb (12)). Since these Alu elements, HAFP-Alu from the human afetoprotein locus (13) and GBG-Alu from the gorilla P-globin gene cluster (14), retroposed recently (i.e. after the divergence of human and great ape lineages), they are expected to closely represent the Alu sequence that is active in retroposition (11,12,(15)(16)(17)(18).

MATERIALS AND METHODS
DNA templates for an in vitro Alu RNA transcription by T7 RNA polymerase (19) were prepared by polymerase chain reaction (cf. Ref. 20 for amplification conditions) from the genomic clone comprising 1.15 kilobase of intergenic sequence between 6-and fl-globin genes of gorilla (GBG-Alu template) (14) and from the plasmid pHAFP5.5 containing a 5.5-kilobase insert from human or-fetoprotein gene (13) (HAFP-Alu template). The gorilla clone, originally from Trabuchet et al. (14), was provided by R. J. Britten (California Institute of Technology), whereas the human clone (13) was a gift from A. chain reaction primer, TAATACGACTCACTATAGGCCGGGCGC-Dugaiczyk (University of California, Riverside). The 5' polymerase GGTGGCTCA included the T7 RNA polymerase promoter (underlined sequence), whereas the 3' primer, TTTTTTGAGACGGAGT-CTCGCTC, limited the oligo-A tail of the Alu sequence to 6 A residues (both primers were synthesized using Gene Assembler, Pharmacia LKB Biotechnology Inc.). RNA transcripts (19) were labeled with [32P]phosphate at either the 5' or 3' end as described (21) and purified by polyacrylamide gel electrophoresis. By DNA sequencing, we found two positions different from the published GBG-Ah sequence (14); there is no deletion of C in position 158, and position 218 is G instead of c.
To probe Alu RNA secondary structure, we used SI nuclease, U2 (A-residue specific) and T I (G-residue specific) RNases that cleave in nonpaired, and VI nuclease that cleaves in double-stranded RNA regions (all enzymes were from Pharmacia). The reactions were carried out as described (21) using four 10-fold enzyme dilutions, typically containing 20,000-40,000 cpm of the end-labeled gel-purified Alu RNA in a final volume of 5 pl. The results were analyzed on 12% polyacrylamide sequencing gel (if not stated otherwise). Minimal energy RNA folding was analyzed using version 4.0 (for IBM PC) of the FOLD program (22), kindly provided by M. Zuker (National Research Council of Canada, Ottawa).

RESULTS
The templates for in vitro synthesis of both Alu RNAs by T7 RNA polymerase (19) were prepared from the corresponding genomic clones by polymerase chain reaction, placing the RNA polymerase promoter such that transcription started with the first G nucleotide of the Alu sequence. The resulting Alu RNA were labeled with radioactive phosphate either at the 5' or at the 3' end and subjected to enzymatic digestions using single strand-specific or double strand-specific nucleases (Fig. l). The structure of Alu RNA fragment that comprised only the sequence of the right subunit of either HAFP-Alu or GBG-Ah RNA was also investigated. This socalled "R fragment" was a by-product of RNA synthesis, resulting from a cleavage within the linker separating subunits (between C and A residues at positions 128 and 129). Its FIG. 1. Enzymatic digestions of 5' end-labeled HAFP-Alu RNA (a) a n d GBG-Ah RNA a n d its R fragment labeled at t h e 3' e n d s ( b ) . SI, VI, U2, and TI, corresponding digestions carried out in native conditions with serial 10-fold enzyme dilutions 1-4; T,S, sequencing reaction using TI nuclease; L, partial formamide hydrolysis of full-length RNA in b or its truncated fragment in u; n and R, undigested full-length RNA and the R fragment, respectively. RNA fragments resulting from SI and VI cleavage (indicated on the left side) that terminate in 3 ' -OH do not migrate exactly as those ending with 3"phosphate (indicated on the right). The numbers refer to nucleotide positions in the A h general consensus sequence (18). Note that doubling of bands in the 3"labled RNA is due to the heterogeneity in the number of A residues at the 3' end of the in vitro Alu transcript.
digestion pattern being virtually identical with that in the corresponding region of the full-length RNA ( Fig. l b , data not shown) pointed to an independent folding of the Alu subunits. In this case, a partial digestion of Alu RNA with SI nuclease, shown to cut within the linker (Fig. l ) , would lead to a separation of subunits during polyacrylamide gel electrophoresis in nondenaturing conditions. As shown in Fig. 2a, the incubation of the 3"labeled fulllength Alu RNA with increasing amounts of S, nuclease leads to a single major band that corresponds to the 3"labeled Alu right subunit migrating with the same electrophoretic mobility as the R fragment. In a control experiment, the aliquots of the reactions analyzed in Fig. 2a were subjected to electrophoresis under denaturing conditions. The results shown in Fig. 2c demonstrate that S1 nuclease also cuts susceptible sites of Alu RNA in other than the linker region (cf. also Figs. 1  and 3). However, these fragments, being kept together by base pairing within the secondary structure, are not revealed by the nondenaturing electrophoresis shown in Fig. 2a. Associated, they migrate as an intact Alu RNA or its subunits, unless the S1-incubated samples are heat-denatured before loading the native gel (Fig. 2b). In conclusion, there is no evidence for secondary interactions between Alu-left and Aluright that are strong enough to keep the structure together once the covalent bonds within the linker have been cleaved. This indicates that Alu RNA subunits fold independently, which is consistent with the more detailed analysis of the enzymatic digestion data presented below. Fig. 3, a and b, summarizes the results of enzymatic digestions obtained in a series of experiments such as illustrated in Fig. 1. In the case of the HAFP-Alu RNA (Fig. 3a), the cleavage pattern is consistent with a model where each subunit folds like the Alu fragment of 7 SL RNA shown in Fig.  3c (representing a predicted minimal energy structure of 7 SL RNA that is virtually identical with that derived from enzymatic digestions and evolutionary comparisons (9)). Both Aluleft and Alu-right RNA conserve hairpins I and I1 as well as the domain 111. Nearly the same structure is found for GBG-Alu RNA, except that hairpins I and I1 in the right subunit form a mixed helix 1/11 (Fig. 3b), thus departing from the 7 SL RNA scheme. This seems to involve a U + G transversion a t position 174 in GBG-Alu RNA that favors the helix 1/11 by allowing a wobble pair between G-174 and U-144. This substitution is absent in the HAFP sequence that conforms at this position with consensus sequences of human Alu families (cf. Ref 18 for the weighted consensus sequences of human Alu families used in this work, according to the original classification (12)). Although there is no evidence for a mixed helix 1/11 in the right subunit of HAFP-Alu RNA, a weak TI RNase susceptibility of G at positions 176-178 (cf. Fig. 3a) might indicate the lability of hairpins I and I1 in the right subunit and a possibility of conformational flexibility in this region. The 7 SL-like folding pattern shown in Fig. 3a appears to be common among different primate Alu and Alu-like RNAs that were analyzed by Zuker's minimal energy approach (22) using as RNAs the consensus sequences of the corresponding genomic elements (data not shown). This is the case for Galago AluI RNA (23) and the right 7 SL-like subunit of Galago AluII (24). This turns out to be the case for RNAs of different human Alu families when energy minimization is carried out separately for the right and the left subunit. Although the minimal energy structure predicted using the whole sequence involves subunits interaction through regions of complementarity corresponding to hairpins I in the Aluleft and Alu-right (cf. Ref 18), its energy minimum is only slightly lower (2.3% in the Alu Sb and 3.3% in the Alu J family) than that of the separate arrangement of subunits. The pattern of nucleotide substitutions among consensus sequences of human Alu families analyzed earlier (18) is in agreement with the model in Fig. 3a (most of these substitutions are either compensatory or conservative in character as far as the secondary structure model is concerned). Interestingly, the "consensus" RNAs of the Ah-like rodent B1 elements also fold according to the 7 SL RNA scheme (31) and this appears to be the case for BC200 RNA from the brain of cynomolgus monkey (25), similar to the left subunit of the human Alu family J.

DISCUSSION
As shown by recent analyses of genomic Alu elements, different Alu families were generated via retroposition from a limited number of conserved master sequences (11,12,(15)(16)(17)(18). Experimental evidence presented in this paper provides arguments for the notion that evolutionary conservation of Alu master sequences is due to selection for invariance in folding characteristics from 7 SL RNA (18). This appears to be also true for Alu-like retroposons as judged from their folding pattern predicted using Zuker's approach. Dimeric arrangement of 7 SL-like folding units is presumably associated with a certain functional advantage for Alu RNA and/or its proliferation. Examples of monomeric (rodent B1 elements, BC200 RNA) or heterodimeric arrangements (Galago Alu 11) of different 7 SL-like molecules corroborate well with our observation that Alu subunits fold independently and suggest that the dimeric structure represents an advanced form in Alu evolution.
Although sequences of Alu RNAs used in our experiments differ in some positions from the exact consensus of their family, these differences occur in sequence positions that do not affect the overall 7 SL-like folding pattern of these molecules. Enzymatic digestions confirm the minimal energy folding of HAFP-Alu RNA and, except for the hairpin 1/11, that of GBG-Alu RNA, both of which are consistent with the folding of Alu RNA predicted separately for right and left subunits derived from the four human Alu family consensus sequences. Among nine sequence positions, disregarding a different number of A residues in the linker by which HAFP-Alu RNA differs from the consensus sequence of the Alu family Sb, only one (CZiH -+ G) affects base pairing in the model in Fig. 3a. A similar situation is seen with the GBG sequence. The question may be asked whether differences between these sequences do reflect different evolution of the parental templates or are simply due to substitutions introduced after their insertion in the genome (26). Compared to the consensus, four out of the nine substitutions in HAFP-Alu and three out of the six in GBG-Alu are due to transitions in CpG dinucleotides, as expected from the number of CpG dinucleotides in the consensus and their higher substitution rate (18,26). Assuming a substitution rate of about 0.15%/ nucleotide/million years (27), one would expect an accumulation of 0.42 substitution/Alu sequence/million years since its retroposition, or about twice as much when the effect of CpG is taken into account. For HAFP-Alu and GBG-Alu that could have been inserted 1-5 million years ago (13,14), the number of differences with respect to the consensus appears, therefore, to be higher, from the estimated figure of 0.8-4 substitutions/sequence. This indicates that their parental templates could have already evolved from the precise consensus and that sequences of their RNAs may represent even closer Alu master sequences active in retroposition.
In signal recognition particle, 7 SL RNA interacts with protein components to form a specific ribonucleoprotein complex (10,28). This suggests that 7 SL-like folding in Alu RNA maintains specific interactions with proteins (27). The resulting ribonucleoprotein could mediate retroposition of Alu RNA and thus explain specificity of this process that has led to proliferation of Alu repeats to an extent far exceeding other sequences that only occasionally generate retropseudogenes. The requirement for appropriate folding could also explain why Alu retroposition only occurs from the conserved master sequences (11,12,(14)(15)(16)(17)(18). The relatively high number of 7 SL-RNA pseudogenes (29) indicates that 7 SL RNA, itself prone to retroposition, was a good starting point for an efficient retroposon element to evolve, presumably in the context of a more complex retroposition system (6,7,30). Further experiments are required to establish the nature of the putative Alu ribonucleoprotein and its relation to retroposition and to the host cell. An in uitro transcription system of Alu RNAs, such as used in the present study, could provide a useful tool for these investigations.