Rabbit globin mRNA: analysis of T1 RNAse digestion fragments.

Rabbit globin complementary DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as a template for in vitro synthesis of 32P-labeled RNA and deoxysubstituted RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease T1 and alkaline phosphatase digestion have been determined. In addition, the 3' nearest neighbor was determined for several fragments resulting from digestion with T1 ribonuclease. The utility of the deoxysubstitution technique was demonstrated by the ease with which the sequences of pyrimidine-rich fragments could be determined. Many sequences thus determined were long enough to fit uniquely with the alpha- or beta-globin amino acid sequences. The positions of these fits were found to be clustered, leading us to believe that only certain regions of the complementary DNA are transcribed by Escherichia coli RNA polymerase. Other unique characteristics of RNA synthesis from a complementary DNA template include a high yield of free poly(A) and the fact that one must use low rather than high salt buffers to obtain transcripts of high molecular weight.


Rabbit globin complementary
DNA made with RNA-dependent DNA polymerase (reverse transcriptase) was used as a template for in vitro synthesis of 32P-labeled RNA and deoxysubstituted RNA. The sequences of the nucleotides in most of the fragments resulting from combined ribonuclease Tl and alkaline phosphatase digestion have been determined. In addition, the 3' nearest neighbor was determined for several fragments resulting from digestion with Tl ribonuclease. The utility of the deoxysubstitution technique was demonstrated by the ease with which the sequences of pyrimidine-rich fragments could be determined. Many sequences thus determined were long enough to fit uniquely with the LY-or P-globin amino acid sequences. The positions of these fits were found to be clustered, leading us to believe that only certain regions of the complementary DNA are transcribed by Escherichia coli RNA polymerase. Other unique characteristics of RNA synthesis from a complementary DNA template include a high yield of free poly(A) and the fact that one must use low rather than high salt buffers to obtain transcripts of high molecular weight.  (2). Not only do the globin mRNA sequences yield a large number of Tl fragments but many of these fragments proved difficult to sequence with the more commonly used techniques. One of our major aids in solving such problems has been the use of deoxysubstitution techniques (3,4

Chromatography and Electrophoresis
Sepharose chromatography was used in some cases to obtain rapid approximate estimates of the lengths of synthesized RNAs (Sepharose 4B, Pharmacia).
The column (0.6 x 12 cm) was made and chromatography performed by the same procedures as the Bio-Gel P-60 columns used for desalting the RNA product as discussed in a later section.
OligotdTl-cellulose chromatography was carried out using grade T2 oligo(dT)-cellulose obtained from Collaborative Research. The chromatography procedure was the same as that described by Aviv and Leder (361  Fractions containing the peak were pooled (usually three to four tubes). In later experiments, the reaction was terminated with 20 ~1 of phenol and 10 ~1 of 1 M EDTA prior to desalting.

Sequence Determination Procedures
32P-labeled RNA was prepared for digestion with Tl ribonuclease or for combined digestion with Tl ribonuclease and bacterial alkaline phosphatase.
Because of its freedom from DNase contamination, Ul ribonuclease was often used in place of Tl ribonuclease, especially when analysis of deoxysubstituted RNA was carried out. Carrier RNA (E. coli RNA) (20 pg) was added to the pooled fractions, usually 300 to 400 ~1 in volume.
The material was evaporated to a smaller volume (10 to 30 ~1) in a silanized tube by a combination of vacuum, Vortex mixing, and heat. The sample was then transferred to a silanized polyethylene sheet and dried in a vacuum desiccator. Enzymatic digestions, electrophoresis, and other subsequent analysis and sequence determination procedures were carried out as described by Sanger and colleagues (11-131 with some additions and modifications.
Analysis of Tl or Tl phosphatase digestion fragments with l -carboxymethyllysine-41-pancreatic ribonuclease was carried out as described by Paddock and Abelson (41 were added and the mixture spotted on PEIcellulose plates for analysis as above.

RESULTS
In Vitro RNA Synthesis Characteristics The in vitro RNA synthesis reaction using globin cDNA as template has some unique features: (a) poly(A) synthesis exceeds the synthesis from the heteropolymeric region of the template, (b) the size of the RNA synthesized is relatively small (roughly 70 nucleotides) when the synthesis is performed in high salt, and larger (approaching template size) when synthesis is carried out in low ionic strength buffer.

PoZyfA) Synthesis
-The singlerstranded cDNA template with its stretch of deoxythymidylic acid at the 5' end poses unique problems for in vitro RNA synthesis. The incorporation of AMP is 5-to lo-fold more than that observed for other nucleotides (Fig. la), suggesting that considerable poly(A) is synthesized. This is substantiated by the presence of large pancreatic ribonuclease-resistant radioactive fragments (average size about 90 nucleotides) in digests of material synthesized with [a-32P]ATP as the radioactive precursor (Fig. 2).

Synthesis of poly(A)
can be reduced 2-fold by preincubation with GTP for a short period before adding the other nucleoside triphosphates (Fig. lb) confirmed by experiments involving digestion with Tl RNase after the RNA synthesized in vitro was purified by oligo(dT)cellulose chromatography (data not shown). These results indicate that even when preincubation with rGTP is used, 90% of the heteropolymeric product is not covalently attached to the poly(A) sequences.
In an attempt to decrease the poly(A) synthesis, the antibiotic rifampicin was added to the reaction mixture after preincubation with GTP but before the addition of the other nucleoside triphosphates. Since rifampicin is known to inhibit RNA chain initiations but not chain elongations with doublestranded DNA as template (421, it was thought that the addi-tion of the antibiotic would limit initiations to those regions primed with GTP. But in the case of the in vitro RNA synthesis with globin cDNA template, rifampicin stopped RNA synthesis immediately, apparently inhibiting RNA chain elongation as well as chain initiation (Fig. 3). The discrepancy between this and previous results may lie with the difference in templates: globin cDNA is a single-stranded DNA, whereas previous work with rifampicin was done with double-stranded templates. A similar result has been observed with a different inhibitor by Walter et al. (43) who showed that heparin inhibited RNA chain initiations when double-stranded DNA was used as a template and all RNA synthesis when singlestranded DNA was used.
Ohasa and Tsugita (44)  synthesis is detected without template or when bacteriophage Ml3 DNA is used as a template. The poly(A) synthesis phenomenon we observe resembles that observed by Chamberlain and Berg (451, who reported a template-dependent synthesis of poly(A) from heat-denatured X, T2, or calf thymus DNA when ATP is the only nucleoside triphosphate present. Unlike our results, they observed that the synthesis of poly(A) was strongly suppressed when other nucleoside triphosphates were added to the reaction mixture. They postulated that oligo(dT) tracts within the single-stranded DNAs were priming the synthesis of poly(A) longer than the oligo(dT) tracts due to slippage which would be limited when other nucleoside triphosphates were present. In our case, since the oligo(dT) tracts were located at the 5' ends rather than the interior of the template molecules, the addition of other nucleoside triphosphates would not be expected to inhibit the synthesis of poly(A) according to the model of Chamberlin and Berg.
The presence of large amounts of poly(A) does not interfere with the fingerprint analysis unless [a-32PlATP is the radioactive precursor. Fig. 4 shows fingerprints of combined ribonuclease Tl and bacterial alkaline phosphatase digests of in vitro synthesized globin RNA in which the radioactive precursors are [~x-~~PIGTP (Fig. 4a), [c~-~~PIUTP (Fig. 46), and [cz-~~P]CTP (Fig. 4~). The fingerprints contain the graticule patterns expected of a combined ribonuclease Tl and alkaline phosphatase digest, and the unlabeled poly(A) does not pose any problems. When [c9PlATP is the precursor, we have found that further purification steps are necessary to remove poly(A) sequences since commercial bacterial alkaline phosphatase and bovine pancreatic ribonuclease preparations contain nuclease activities which cleave poly(A) sequences. Fig. 5 (Fig. 5~) has faint spots resembling a regular RNase Tl graticule leaving the undigested [3ZPlpoly(A) at the origin, indicating that this enzyme is highly pure and specific. By pretreating alkaline phosphatase with diethylpyrocarbonate (46) and by using pancreatic ribonuclease under high salt conditions (47), one can eliminate such degradations of poly(A) tracts. Alternatively, we have found it convenient to remove the poly(A) sequences by passing the enzymatic digests through an oligo(dTl-cellulose column (36). Fig. 6 shows a fingerprint of a combined RNase Tl and alkaline phosphatase digestion of in vitro synthesized RNA labeled with [cx-~~P]ATP and purified on an oligo(dT)-cellulose column. This step, of course, is unnecessary if the labeled precursor is any nucleoside triphosphate other than ATP.
Influence of Salt-The influence of KC1 on the in vitro synthesis of RNA with E. coli RNA polymerase has been extensively studied. Most of the templates used in these studies are double-stranded, such as bacteriophage T4 DNA and E. coli DNA, and the results of these studies indicate that KC1 allows more extensive synthesis and reinitiation of synthesis (48). However, in the in vitro RNA synthesis with singlestranded globin cDNA as template, we have found that smaller transcripts are produced at higher salt concentrations.
For example, in the presence of 0.15 M KC1 ("high salt"), the globin RNA synthesized in vitro has an average length of 70 nucleotides (Fig. '7a), whereas when KC1 is omitted a much larger product is obtained (Fig. 7b). Electrophoresis of in vitro synthesized globin RNA with KC1 omitted in a 10% acrylamide gel (data not shown) indicates that most of the RNA produced is in the size range of 70 to 300 nucleotides. Maitra et al. (49) observed a similar salt effect on the product length of in vitro synthesized RNA using denatured T4 DNA as a template. It was hoped that the smaller transcripts synthesized in high salt might originate from a specific region of the cDNA template; however, fingerprints obtained from material synthesized in high and low ionic strength buffers were similar (data not shown).

Two-Dimensional Fingerprints
Figs. 4 and 6 show fingerprints of RNA synthesized in vitro from globin cDNA with each of the four labeled precursors. The fingerprints are similar with expected differences due to the labeling (for example, due to the phosphatase treatment, the graticule containing no uridine residues should not be visible when [a-32PlUTP is the radioactive precursor). Subsequent analysis of the fragments by various enzymatic cleavages shows that the fingerprint patterns are highly reproducible. This consistency extends over different syntheses from the same batch of cDNA and from cDNA preparations made at different times. This is a necessary prerequisite for any subsequent sequence determinations, since procedures involving labeling with one radioactive precursor at a time require that the experimentor be able to recognize corresponding fragments from different fingerprints with a high degree of confidence.
To ensure proper correlation of data between different experiments, we have also carried out experiments in which three of the four nucleoside triphosphates (rGTP, rUTP, and rCTP) were labeled. Fig. 8

Sequence Determinations of Fragments
In the primary digestions of the globin RNA synthesized in vitro for sequence analysis, bacterial alkaline phosphatase was used in combination with ribonuclease Tl to remove terminal phosphates, thereby reducing the negative charge on each fragment. This provides better separation of the large fragments rich in uridine by increasing their electrophoretic mobilities in the DEAE-cellulose dimension. The basic methods for determining the sequences of fragments in a ribonuclease Tl fingerprint have been well described (11-13). Generally, digestions with pancreatic ribonuclease, ribonuclease II2 and/or alkaline hydrolysis, and subsequent analysis sufficed to elucidate their primary structures. However, when runs of pyrimidines occurred partial digestions with spleen phosphodiesterase were frequently used. Partial digestion with carboxymethyllysine-41-pancreatic RNase (41) and deoxysubstitution techniques described in Paddock et al. (3) were used to determine the more difficult sequences. Using these techniques, nearly all of the fragments resulting from combined ribonuclease Tl and alkaline phosphatase digestions have been sequenced. Fig. 9 Fig. 8), with numbers assigned to each fragment spot. Several fragments overlap on the fingerprint and were repurified by one-dimensional homochromatography on DEAE-cellulose thin layer plates (40) before secondary digestions were performed. With this procedure, Fragments 26 and 29 can be resolved, for example, and Spot 58 separates into three distinct fragments. Table II* gives the data obtained by the techniques described. Although most fragment sequences have been determined, the sequence data for Fragment 58b remain ambiguous, and a few other fragments are of such low yield that their sequences probably will not be determined (see Table I). For many of the longer sequences, especially those obtained with partial spleen digestions, confirmation has been obtained from deoxysubstitution data. Because of many sequence similarities among the fragments, we have been able in many instances (as noted in Table II) to use small fragments as markers for identifying the products of digestion of larger fragments. For example, Fragment 4, C-C-U-G, was used as an electrophoretic mobility marker to allow an unambiguous interpretation of the results of a U2 digest of Fragment 14, C-C-A-C-C-U-G; Fragment 17, U-C-U-G, was used as a marker for a partial spleen digest of Fragment 42, U-C-C-U-C-U-G, and pancreatic digests of deoxy-C-substituted Fragments 4, C-C-U-G, and 7, C-C-C-U-G, were used as markers for a pancreatic digest of deoxy-C-substituted Fragment 26, C-C-U-C-C-C-U-G. For most of the fragments, we also electrophoresed the U2 digests and partial spleen digest of fragments from the different labeling experiments side by side in at least one experiment.
Most of the sequence determinations using the data in Table  II were quite straightforward.
However, the derivation of the sequences for the oligonucleotides contained in Spot 34 was an exception. The derivations are displayed in Fig. 10, which shows the sequence overlaps that can be obtained from the larger digestion products for Spot 34. From its position in the fingerprint we can deduce that all of the oligonucleotides of Z Tables II and III 8 (left). Two-dimensional eiectrophoresis of a Ul RNase plus bacterial alkaline phosphatase digest of globin RNA labeled with l&'*PlGTP, [(Y-"PIUTP, and [(u-32PlCTP as described under "Experimental Procedures." FIG. 9 (rig&). Line drawing of fingerprint. Line drawing of Fig. 8. Numbers identify the fragments, most of which have been sequenced and are shown in Table I. Nonphosphatased Tl digestion fragments appear in our fingerprints in low yield and are not included in the diagram. Missing numbers in the series in all cases are oligonucleotides actually be a mixture of two sequence isomers. The (t) following a that occur in very low yields or do not consistently appear (e.g. number 18). &umber 56 is actually three oligonucleotides which sequence indicates that it is tentative. There are also some data for isomers of C,AG, A&G, C2A2G, and CA,G. 3' nearest neighbors for separate on DEAE-homochromatography.
They occur in low yield some oligonucleotides were determined from fragments obtained and little sequence information has been derived. Number 58 sepa-from a Tl digest separated by DEAE-homochromatography in the rates into three spots with DEAE-homochromatography. 58b may second dimension and are shown in parentheses. Spot 34 should have the base composition 3Ap, 2Up, 2Cp, and 1 and 19 are derived from different oligonucleotides. Therefore, G-OH. Therefore, modified pancreatic RNase product 19, A-A-we may conclude that the number of oligonucleotides present C-U-U-Cp(Ap), cannot be derived from the same oligonucleo-in Spot 34 is greater than three unless one such oligonucleotide tide as either product 14, (Py-)A-C-C-U-G, or product 18, has the sequence A-A-U-C-A-C-C-U-G. This latter sequence (Py-)A-U-C-A-U-G; combining 19 with either 14 or 18 would may be eliminated because it has the wrong number of cytiyield a fragment with 3 or more uridine residues which could dine residues and, more importantly, because it is the wrong not run in this position on the fingerprint. In the same way we size: all of the components of Spot 34 are octanucleotides as may conclude that product 15, A-A-U-Cp(Ap), and products 18 shown by their co-migration upon homochromatography on a  DEAE-cellulose thin layer plate, and the overlap data summarized in Fig. 10 and discussed below clearly establish the sequences.
The most significant overlaps for Spot 34 are as follows. The four and one-half nucleotide overlap between products 17 and 18, establishes the sequence for oligonucleotide 34a. The three nucleotide overlap between products 6 and 19 establishes the sequence for oligonucleotide 34~. Oligonucleotide 34d is defined by the three and one-half nucleotide overlap between products 13 and 14. This leaves only the sequence determination for Fragment 34b which is defined by the three nucleotide overlap between products 15 and 7 and the two base overlap between products 15 and 2. Admittedly we are relying on the nucleotide composition constraints to place product 2 in Fragment 34b instead of into Fragment 34d. That the nucleotide composition we had deduced by fingerprint position is correct however, is shown by products 9 and 12 which demand that Gp is the 5' neighbor of the sequence we have deduced for Fragment 34~. The remaining data displayed in Fig. 10 and shown in Table II are also supportive of the four sequences that were determined.
An excellent example of the power of the deoxysubstitution technique is provided by the ease with which the method solves the sequence of Fragment 26, C-C-U-C-C-C-U-G. It proved to be very difficult to obtain the correct partial digestion conditions for analysis with an exonuclease such as spleen phosphodiesterase due to the high number of cytidine residues. However with deoxysubstitution, the sequence could be determined in a single experiment.
When the 32P label was introduced in [a-3*PldCTP, pancreatic RNase digestion of this fragment (digestion only after Up is now possible) gives products C-C-Up and C-C-C-Up. Subsequent analysis with spleen phosphodiesterase shows that only C-C-Up has label in Up. Thus C-C-C-Up must be the 3' neighbor of C-C-Up. Confirmation of the sequence is provided in a dC-substitution experiment with 32P label introduced in [a-"*PIrGTP.
Pancreatic RNase digestion results in only C-C-C-Up appearing with radioactive label.
Several Tl RNase fragments separated in the second dimension on DEAE-cellulose thin layer plates by homochromatography have also been analyzed. The fingerprint of Tl-digested globin RNA separated in this manner is shown in Fig. 11 and the analysis is presented in Table III. The analyses of these fragments are consistent with the data for phosphatasetreated fragments already discussed and, in addition, have FIG. 11. Tl RNase digestion of globin RNA labeled with [(u-32PlUTP as described under "Experimental Procedures." The first dimension is on cellulose acetate at pH 3.5; the second dimension is homochromatography of a DEAE-cellulose thin layer plate.
allowed us to determine the nucleotides adjacent to the 3' ends for some of the fragments with "nearest neighbor" analysis. These nearest neighbor assignments are shown in Table I.

Amino Acid Correspondence of Nucleotide Fragments
We have devised a computer program which converts a given nucleotide sequence into all possible amino acid sequences and searches for possible fits within the (Y-and pglobin chains (each of the three possible reading frames are tried and partially specified codons at each end of the fragment are filled in with all possible sequences so that all possible fits will be detected). As expected, nucleotide sequences up to six nucleotides in length usually have possible tits at more than one position, while sequences eight nucleotides or longer usually tit the amino acid sequences at a unique location or not at all (10). When any fragment is found to have unique fits, it can be stated that either it comes from the corresponding position in the mRNA, or if the single observed fit is fortuitous, then the fragment must be from the untranslated regions of the mRNAs. As the fragment sequences become longer and longer, the likelihood of "chance" tit becomes less. As has been shown previously (101, most of the sequences of eight nucleotides or longer fit into single locations in the globin amino acid chains or can be assigned to the untranslated regions. Among them are the sequences for Spots 13, 26, 31, 47, 48, and 54 which tit into the (Y chain,and Spots 30b,33,34c,35,45,52,53,57, and 5% which fit into the /3 chain (Table IV). We were able to assign unambiguously eight fragments of eight nucleotides or longer to the untranslated regions on this basis (Table V).
Among the assignments thus made, that for Spot 48 (A-A-U-U-U-C-A-A-G) deserves special comment. It fits the amino acid sequence valine-asparagine-phenylalanine-lysine beginning at position 96 of the (Y chain, but it must be noted that experiments using isoacceptor lysyl-tRNAs have led to some controversy as to whether the lysine (position 99) is coded by AAA or AAG (50, 51). If the lysine is indeed coded by AAA as suggested by Woodward and Herbert (50), then this fragment may actually have come from the untranslated region.
On the other hand, the discrepancy could also have arisen from possible differences in strains of rabbits used in the previous experiments.
As can be seen from GCCCCUUG GACAUCAUG GCAAUCAUG GAAUACCUG GCUAAUAAAG GCAAAAAUUAUG GAAAUUUAUUUUCAUUG(C) AUC (7-8Py) CUCUG fits), we must assume that our in vitro transcription yields in the no-fit region must be too low to be seen in a fingerprint of a Tl digest. Fig. 7b provides support for the hypothesis that we are not transcribing the entire cDNA in yields sufficient to sequence.
While some of RNA synthesized under low-ionic strength conditions appears as long as the cDNA, a majority of the RNA fragments appear much shorter. As mentioned earlier, we have also electrophoresed globin RNA synthesized in vitro on acrylamide gels and found a size range of 70 to 300 nucleotides.
That most fragments are not of full length suggests that the "no-fit" regions may be under-transcribed in our system. We have also searched the possible nucleotide sequences for these "no-fit" regions using codons for the known amino acids. Several large Tl fragments are obligatory, yet their sequences do not exist in our catalogue of fragments in Table I. This is in contrast to the data communicated to us by Weissman" for the human globin mRNAs, for which sequences appearing to represent the entire translated portions of the a chain have been detected. At the present time, we have no good explanation for the difference in results; a difference in the conformation for the cDNAs synthesized from the human and rabbit a-mRNAs could account for the varying results. Transcription from human /3-cDNA appears to be more similar to that of rabbit with most RNA synthesis commencing near amino acid position 40. However, weak transcription is also observed from human cDNA for regions corresponding to earlier amino acid positions.:' As shown in Table V, we have also assigned eight fragments to the untranslated regions of the mRNAs. In consideration of the clustering evidence displayed in Table IV, these have been  assigned to the untranslated regions at the 3' ends of the 2 molecules.
Comparison of data with that obtained by Proudfoot' for the 3' ends of the mRNAs (52, 53) shows that Fragments 29,34A, 36, 55, and 58A fit into the p chain 3' untranslated region and none of the fragments in Table V