A Method for Mapping RNA Initiation, Termination, Splice, and Protein Binding Sites RIBOSOME BINDING SITES ON P-GLOBIN MESSENGER RNA*

A new method for mapping RNA initiation, termination, and splice sites was developed. The method in- volves: 1) hybridization of RNA to end-labeled single-stranded DNA; 2) mild digestion of the hybrid with a single strand specific nuclease; 3) high resolution gel electrophoresis and autoradiography. The regions of the labeled probe which are resistant to nuclease digestion are mapped by measuring the distance from the labeled end. The sequence can be determined by run- ning the end-labeled probe sequenced according to the protocol of Maxam and Gilbert (Maxam, A. M., and Gilbert, W. (1980) Methods Emymol. 65,499-560) at the same time. The feasibility of this method was tested with adult chicken /?-globin mRNA, and we found that the transcription initiation, termination, and RNA splice sites can be determined to within a few bases of the known sites. Using this method we also found that ribosomes bind to &globin mRNA from 22 bases up- stream from the translation start codon (AUG) to 15 bases past the stop codon (UAA). Several methods are currently available to map RNA initiation, termination, and splice sites. The transcription initiation site can be mapped by the S1 nuclease mapping method of Berk and Sharp (1, 2). In this method a DNA restriction fragment spans the reputed initiation site and has one 5’ end-labeled end in the first coding block. After hybridization with RNA and subsequent digestion with S1 nuclease the initiation site is determined from the length of protected DNA. A primer extension method can also be used to map the initiation

A new method for mapping RNA initiation, termination, and splice sites was developed. The method involves: 1) hybridization of RNA t o end-labeled singlestranded DNA; 2) mild digestion of the hybrid with a single strand specific nuclease; 3) high resolution gel electrophoresis and autoradiography. The regions of the labeled probe which are resistant to nuclease digestion are mapped by measuring the distance from the labeled end. The sequence can be determined by running the end-labeled probe sequenced according to the protocol of Maxam and Gilbert (Maxam, A. M., and Gilbert, W. (1980) Methods Emymol. 65,499-560) at the same time. The feasibility of this method was tested with adult chicken /?-globin mRNA, and we found that the transcription initiation, termination, and RNA splice sites can be determined t o within a few bases of the known sites. Using this method we also found that ribosomes bind to &globin mRNA from 22 bases upstream from the translation start codon (AUG) to 15 bases past the stop codon (UAA).
Several methods are currently available to map RNA initiation, termination, and splice sites. The transcription initiation site can be mapped by the S1 nuclease mapping method of Berk and Sharp (1,2). In this method a DNA restriction fragment spans the reputed initiation site and has one 5' endlabeled end in the first coding block. After hybridization with RNA and subsequent digestion with S1 nuclease the initiation site is determined from the length of protected DNA. A primer extension method can also be used to map the initiation site (3).
Uniformly labeled genomic DNA can be hybridized to mRNA and the hybrid digested with SI. From the size of the labeled DNA fragments generated, one can determine if the transcript from a gene inserted into an expression vector or a gene transferred into cells is the same as the naturally occurring RNA transcript (4). However, this method is not suitable for mapping transcription initiation, termination, and splice sites of unknown RNA.
Protein binding sites on DNA are usually mapped by the DNase I footprinting method of Galas and Schmitz (5). In this method, purified proteins are bound to an end-labeled restriction fragment of DNA and mildly digested with DNase I. The region protected from DNase I digestion appears as a blank area on the autoradiograph of a denaturing gel. A similar approach can be used for mapping protein binding sites on RNA, but unless the RNA being studied is small, it is difficult to generate unique lengths of end-labeled RNA which can be resolved by gel electrophoresis. Sequencing the RNA fragments protected from nuclease digestion by bound proteins is another approach to mapping protein binding sites on RNA (6). In this case one has to purify a homogeneous piece of RNA in order to sequence the fragment.
In this report we describe a method which overcomes these difficulties in mapping RNA initiation sites, termination sites, and splice sites. In addition, the method can also be applied to mapping the nuclease-resistant RNA fragments generated from RNA-protein complexes. To test the feasibility of this method we mapped the transcription initiation and termination sites and splice sites of chicken P-globin RNA. Also the ribosome binding sites on the /?-globin mRNA in chicken reticulocyte polyribosomes were determined.

MATERIALS AND METHODS
Isolation of Polysomes-Erythroid cells were obtained from white leghorn chickens after phenylhydrazine treatment and polysomes were isolated as described (7).
Preparation of Polysomal and Protected Polysomal RNA-To prepare undegraded total polysomal RNA the polysomal pellet was suspended in 1% SDS,' 10 m~ Tris, pH 7.5,1 mM EDTA, 0.15 M NaCl. .Vadanyl-adenosine complex (VSA) was added to the solution as a ribonuclease inhibitor (8). The mixture was treated with proteinase K (50 pg/ml) for 2 h at 37 "C and extracted 3 times with phenol (saturated with 10 m~ Tris, pH 7.5, 1 mM EDTA) and once with CHCl,. The RNA was precipitated with 2.5 volumes of ethanol after making the solution 0.3 M sodium acetate.
To prepare micrococcal nuclease-protected polysomal RNA the polysomal pellet was taken up in 5% glycerol, 10 mM Tris, pH 7.5, 0.2 mM MgC12, 1 mM CaClz, and insoluble material was removed by a short centrifugation (15 min, 12,000 X g, 4 "C). The supernatant was adjusted to a nucleic acid concentration of 1 mg/ml and digested with micrococcal nuclease (1.5 pg/ml) for 2 h at 22 "C. After digestion the solution was made 1 mM EDTA, 1% SDS and treated with proteinase K (50 pg/ml), in the presence of vadanyl-adenosine complex for 2 h at 37 "C. The solution was extracted with phenol and the RNA precipitated with ethanol.
Cbning-The chicken @-globin gene fragments that will be described under "Results" were inserted into M13mp7RF by established methods and with observation of current NIH guidelines for recombinant DNA research. The sources of the @-globin gene fragments were a h phage containing adult @-globin genomic DNA (hCPG1) supplied by Dodgson et al. (9) and an adult @-globin cDNA plasmid (pHB1001) supplied by Salser et al. (10).
Preparation of Single-stranded End-labeled DNA Fragments-The various single-stranded globin gene fragments used in this investigation were isolated from the single-stranded recombinant phage DNA of M13mp7 by digestion with BamHl as described previously The abbreviations used are: SDS, sodium dodecyl sulfate; P-RNA, nuclease-protected polysomal RNA; M13mp7RF, replicative form of M13mp7 DNA.
(12). The isolated fragments were treated with bacterial alkaline phosphatase (Worthington) and labeled with 32P at the 5' end with T4 polynucleotide kinase (Bethesda Research Laboratories) according to the method of Maxam and Gilbert (11).
Hybridization-The hybridization of RNA to the end-labeled probes was carried out in 0.4 M NaCl, 10 mM Tris, pH 7.5, 1 mM EDTA at 68 "C for >16 h. Usually 50,000 dpm of end-labeled probe (5-20 ng of P-globin gene) and 50 pg of RNA were mixed in a final volume of 25 pl.
Digestion of Hybrids with Mung Bean Nuclease-The hybridization reactions were taken up in 1 ml of 30 m~ sodium acetate, pH 4.6, 50 m~ NaCl, 1 m~ ZnC12, 5% glycerol and equilibrated at 37 "C. Mung bean nuclease (6.5 units, P-L Biochemicals) was added, and the solution was incubated for 30 s. The reaction was stopped with EDTA (final concentration 16 m~) , and the solutions were made 0.3 M sodium acetate and precipitated with 2.5 volumes of ethanol at -20 "C for >16 h. After centrifugation (12,000 X g, 20 min, 4 "C) the pellets were treated with 0.3 N NaOH at 68 "C for 1 h. The reaction was neutralized with HCI and Tris, pH 7.5, yeast tRNA was added as carrier (3 pg), and the solution was precipitated with 2.5 volumes of ethanol at -20 "C for >I6 h. After centrifugation (12,000 X g, 15 min, 4 "C) the pellet was dried under vacuum and taken up in 5 pl of tracking dye (80% formamide, 1 m~ EDTA, 10 m~ NaOH, 0.1% xylene cyanol, 0.1% bromphenol blue). Equal amounts of cold acidprecipitable radioactivity were loaded on the gel.
Sequencing-The procedure of Maxam and Gilbert (11) was used to sequence the 5' end-labeled fragments.
Acrylamide Gel Electrophoresis-Sequencing gels (0.3, 330, and 400 mm) were prepared as described by Maxam and Gilbert (11). The gel was exposed to Kodak XAR-5 x-ray film with a DuPont Cronex intensifying screen at -70 "C.

RESULTS AND DISCUSSION
The basic strategy used in mapping RNA is similar to that of the DNase I footprinting method of Galas and Schmitz (5).
The RNA to be mapped is hybridized to an end-labeled singlestranded DNA fragment and the hybrid is mildly digested with a single strand specific nuclease, such as S1 or mung bean nuclease, in such a way that the enzyme cuts the endlabeled DNA approximately once per molecule. High resolution gel electrophoresis and autoradiography will show the DNA bands containing the labeled end. The regions of the probe that form a RNA-DNA hybrid will not be cut by the enzyme and should appear as blanks, and the regions that do not form a hybrid will show ladder patterns. The region protected by RNA can be mapped by measuring the distance from the labeled end. Also by running the end-labeled DNA cleaved by the base-specific reagents of Maxam and Gilbert (11) concomitantly, one can deduce the sequence of the region protected by RNA. In this report we mapped the transcription initiation and termination sites and the RNA splice sites in chicken P-globin RNA. Chicken reticulocyte polysomal RNA was used as the source of P-globin mRNA. Reticulocyte polysomes were digested with micrococcal nuclease and the nuclease-resistant RNA was isolated for mapping ribosome binding sites on P-globin mRNA.
Single-stranded probes covering the P-globin gene regions were isolated from the recombinant phage DNA of M13mp7. The areas cloned into M13mp7 are outlined in Fig. 1. The 406base pair AluI-PuuII genomic fragment containing the first coding block was inserted into the SaZI site of M13mp7RF DNA after adding SaZI linkers (mPG6). The 600-base pair SstI genomic fragment containing the third coding block, was also inserted into the SaZI site by blunt-end ligation after trimming the ends of the DNAs (mpG7). mPG2 is a recombinant containing the 466-base pair HpaII fragment of adult /?-globin cDNA plasmid pHBlOO1 (10)  bp, base pairs. BarnHl restriction endonuclease as described before (12). The fragments were labeled at the 5' end with T4 polynucleotide kinase and [y3'P]ATP (11). The labeled probe and RNA were hybridized as described under "Materials and Methods," with the concentrations of globin RNA in at least 100-fold excess over the probe. The substrate concentration (RNA and probe) for mung bean nuclease was the same in all the samples to obtain comparable digestion of the probe in a l l the samples.
The control sample contained an equal amount of tRNA. We found that mung bean nuclease produces more bands than S1 nuclease with less double-stranded endonuclease activity. Mung bean nuclease is very susceptible to SDS, however, and SDS was not included in the hybridization mixture. The concentration of mung bean nuclease and time of digestion were adjusted in such a way that a significant amount of uncut probe remains after digestion.
The RNA initiation site and the first splice donor site were mapped with the 430-base fragment (mPG6). This DNA contains the 406-base region which includes 207 bases of the 5' flanking region, the fmt coding block (173 bases), and 26 bases of the fist intron. The extra DNA (24 bases) was derived from the linkers used and M13mp7. In Fig. 2 the f i s t splice junction was determined by sequencing the fragment. Having just the cytosine and thymine sequence ladder was found to be sufficient to determine the sequence of the splice donor since the sequence of P-globin mRNA is known (13). The DNA sequences of the chicken genomic P-globin gene and the RNA splice site are not known. By comparing the mRNA sequence (13) and the sequence of the 430-base probe (figures not shown here) we found that the RNA splice site is at the same position as the site seen with mammalian P-globin mRNA (9,14). We determined that the f i s t splice is 173 bases from the CAP site. The figure snows that polysomal RNA (lane I ) protects a stretch of 173 bases of DNA, and this region is bounded by ladder patterns. The junctions at the protected and unprotected regions correspond to the RNA initiation site (CAP) and the first splice donor site (IVS). The splice donor site can be mapped to within 3 bases of the actual site. However, the last base of the first exon and two bases in the intron, as determined by sequencing, are protected from nuclease attack by polysomal RNA (lane 1 ). To improve resolution in the region containing the initiation site the gel was run for an extended period, along with the probe treated with the base-specific reagents (Fig. 3). The position of the initiation site was found to be within 2 bases of the known initiation site (14). The only ambiguity in determining the boundary between the protected and unprotected regions in mapping the initiation and the splice sites is the uneven digestion of the end-labeled probe by the single strand specific nuclease. As can be seen in the control lane (see lane 2 in Fig.  2) mung bean nuclease does not necessarily cut at every base, ribosomes which have progressed to different parts of the mRNA during translation. Thus, P-RNA may represent a complex mixture of globin mRNA fragments which are derived from the entire region of the globin RNA between the start and end of the ribosome binding sites. P-RNA should produce the same DNA ladder pattern as the control but reduced in intensity, compared to the control, between the start and end of the ribosome binding sites. P-RNA protects the probe 4 bases short of the splice donor site, for some unknown reason. Fig. 4 shows the result obtained with the 490-base probe containing the 466-base globin cDNA fragment (m/?G2). As expected polysomal RNA shows a clear blank except at the extreme ends which contain M13mp7 sequences. When P-RNA was included in the hybridization, the probe was pro- and there are regions where the enzyme cuts at a much slower rate. Therefore, depending on the sequence of the probe there will be a certain amount of error in determining the initiation and splice sites. However, the maximum error does not exceed 2-3 bases in the case of /?-globin RNA. The S1 mapping method of Berk and Sharp (1) also has a certain amount of error due to the tendency of S1 to digest the end of duplex molecules?
The ribosome binding sites in the f i t coding region were also mapped with P-RNA. As seen in Figs bases and most likely forms a folded structure. The same region is shown with another probe (see Fig. 5 ) .
The transcription termination site and the last splice acceptor site were mapped with the fragment containing the third coding block. The 600-base fragment (mPG6) contains approximately 300 bases of intervening sequences, the third coding sequence, and 58 bases of the 3' untranscribed region. The results obtained with this fragment are shown in Fig. 5. Polysomal RNA (lane 1 ) shows a region which is a clear blank and this corresponds to the length of the third coding region. The boundary between the digested and undigested regions in lane 1 corresponds to the splice acceptor site (IVS) and the poly(A) addition site (POLY A ) . The splice acceptor site was found to be within 2 bases of the putative splice site (at 299 instead of 297 bases). The splice acceptor site was determined from the sequence of mRNA (13) and that of the 600-base probe. Again we find that the second splice site occurs at the same position as the site in mammalian P-globin mRNA (9, 14). The poly(A) addition site agrees exactly with the published site for P-globin RNA (13).
P-RNA (lune 2 ) shows an area of reduced digestion by mung bean nuclease and the boundaries between the reduced and unreduced areas are the splice junction (IVS) and the junction indicated by an arrowhead. This site is again 15 bases past the translation stop codon, UAA, and is in good agreement with the result obtained with the cDNA probe (see Fig. 4). The bands indicated by asterisks are the same sites seen in Fig. 4 which are not well protected by ribosomes against micrococcal nuclease digestion. The region indicated by a bracket is the region in the single-stranded probe which is relatively resistant to cleavage by mung bean nuclease. Overexposure of the same region in the cDNA probe is shown in Fig. 4, lane 3. There are other regions in the probe which are relatively resistant to mung bean nuclease presumably due to formation of a folded structure.
As demonstrated in this report the method used can map transcription initiation, termination, and RNA splice sites within a few bases of the reported sites. Also the method can be used to determine the protein binding sites on a known RNA (abundant or less abundant) as long as appropriate gene fragments are available. The same method may also be used for mapping the secondary structure of a known RNA, using S1 resistant RNA fragments. The main advantages of this method over the S1 mapping method of Berk and Sharp is that the position of the labeled end does not have to be within the RNA region. Also both sides of a splice site and the termination site can be determined easily. The major drawbacks of the method described here are: l) that hybridizable RNA should be in excess over the probe, 2) that single strand specific nucleases do not produce bands at every base, and 3) that there are regions in the single-stranded probe that are relatively resistant to cleavage by mung bean nuclease, probably due to secondary structure. Therefore, it is important to include a control sample which contains no RNA that hybridizes to the probe to ensure that the boundary between nuclease-sensitive and nonsensitive regions does not fall into a nuclease-resistant region of the probe. However, the experiments with chicken P-globin mRNA show that the boundaries can be mapped within a few bases from the known initiation, termination, and RNA splice sites. The hybridization of RNA to the probe may disrupt the secondary structure of the singlestranded DNA and the nuclease may cut the end of duplex regions. If this is true there will be no difficulty in finding the boundary with reasonable accuracy. The results using RNA derived from the polysomes digested with micrococcal nuclease show the ribosomes bind to the chicken p-globin mRNA starting at 22 bases upstream from the translation start codon, AUG. Also there is no ribosomal binding after 15 bases past the stop codon, UAA. The start site we determined is 11 bases away from the sequence which shows strong homology with the putative ribosome binding site, the Shine-Dalgarno sequence (15) (see Scheme 1). Day et al. (14) proposed another possible ribosome binding site closer to the start codon (indicated by dots) which shows weaker sequence homology with the putative ribosome binding site, the 3' end of 18 S ribosomal RNA. Our result suggests that the site closer to the start codon may in fact be the ribosome binding site (Scheme 1). It has been shown that ribosomes protect about 35 bases of mRNA from nucleases (16,17). The protection of 15 bases past the stop codon may be due to physical hindrance to nuclease attack by the ribosome bound to the UAA codon.