Structure of Cloned DNA Complementary to Rat Prolactin Messenger RNA*

Prolactin (Prl), growth hormone (GH), and chorionic somatomammotropin (CS) form a set (the ‘Trl set”) of mammalian hormones which is thought to have evolved from a common ancestral gene. Until now this assump- tion was based on similarities in their amino acid sequences, on their often overlapping biological and im- munological properties, and on nucleic acid sequence data for two members of the set (GH and CS). In the current study, we report the amplification in bacteria and the sequence analysis of DNA complementary to rat Prl mRNA. Two independent clones were obtained, one of which contained 823 bases including the entire sequence coding for rat prolactin and its signal peptide and parts of the 5‘- and 3”untranslated regions of the mRNA. The nucleotide sequences of the two cloned DNAs differ by three nucleotides. In addition, both predicted amino acid sequences also differ by about 14% from the published amino acid sequence of the protein. Although technical problems cannot be ex-cluded for each of these discrepancies, the data suggest a revised structure for the protein and the possible existence of distinct nonallelic Prl genes. The results of this study have allowed us to compare the nucleotide sequences of the structural genes coding for each of the hormones of the Prl set. The codons for rPrl mRNA do not show the marked preference for GMP and CMP in the third position observed in rGH, hGH, and hCS mRNAs. rGH and rPrl mRNAs show 39% nucleic acid homology in the coding portion, but little homology in

Prolactin (Prl), growth hormone (GH), and chorionic somatomammotropin (CS) form a set (the 'Trl set") of mammalian hormones which is thought to have evolved from a common ancestral gene. Until now this assumption was based on similarities in their amino acid sequences, on their often overlapping biological and immunological properties, and on nucleic acid sequence data for two members of the set (GH and CS). In the current study, we report the amplification in bacteria and the sequence analysis of DNA complementary to rat Prl mRNA. Two independent clones were obtained, one of which contained 823 bases including the entire sequence coding for rat prolactin and its signal peptide and parts of the 5'-and 3"untranslated regions of the mRNA. The nucleotide sequences of the two cloned DNAs differ by three nucleotides. In addition, both predicted amino acid sequences also differ by about 14% from the published amino acid sequence of the protein. Although technical problems cannot be excluded for each of these discrepancies, the data suggest a revised structure for the protein and the possible existence of distinct nonallelic Prl genes. The results of this study have allowed us to compare the nucleotide sequences of the structural genes coding for each of the hormones of the Prl set. The codons for rPrl mRNA do not show the marked preference for GMP and CMP in the third position observed in rGH, hGH, and hCS mRNAs. rGH and rPrl mRNAs show 39% nucleic acid homology in the coding portion, but little homology in the untranslated regions. This differs from the comparison between hGH and hCS where the same degree of homology (about 93%) was found in both the coding and noncoding regions. Ten per cent of the codons of rPrl and rGH are identical, and 26% differ by only one base. Of the latter, only 58% result in amino acid changes, which most often are conservative. These results strongly support the hypothesis that the Prl and GH genes derive from a common precursor. The data allow us to suggest a date for this gene duplication event at around 380 million years ago.
The polypeptide hormones prolactin (Prl), growth hormone (GH), and chorionic somatomammotropin (CS) are closely related (Catt et al., 1967;Li et al., 1969;Shenvood, 1967;Niall et al., 1971). Their amino acid sequence homologies range from 20 to 85%, depending on the species compared, possibly explaining the observed overlap in some of their biological and immunological properties ( N i d et al., 1973). DNA complementary to rat GH' , human GH , and human CS  mRNAs have been cloned and sequenced recently. More homology was found between the nucleic acid sequences of hGH and hCS, than between those of hGH and rGH . These studies supported the hypothesis that the genes coding for these hormones originated from a common precursor by gene duplication and that the CS-GH gene duplication occurred at some time after the separation of the human and rat species (Bewley et aZ., 1972;Niall et al., 1971). Although a fragment of DNA complementary to a portion of rPrl mRNA has been cloned and sequenced (Gubbins et al., 1979), the complete nucleotide sequence of a Prl structural gene has been missing from these comparisons.
A cloned Prl cDNA would be useful along with the GH and CS cDNAs for studying the differential expression of these related genes. Both Prl and GH genes are expressed predominantly in the pituitary, whereas the CS gene is expressed predominantly in the placenta. The Prl and GH genes are also expressed in several lines of cultured rat pituitary tumor cells (Dannies and Tashjian, 1973) which can be manipulated by a variety of agents to favor their differential expression (Martin and Tashjian, 1977;Vale et al., 1973). Thus, cloned cDNA to Prl mRNA would be a useful probe for studying the molecular mechanism of this regulation. In addition, this cDNA constitutes the probe needed to isolate the genomic Prl sequences from chromosomal DNA.
We report here the cloning in bacteria and the complete nucleotide sequence of the structural gene coding for rPrl. These data enable us to propose a revised amino acid sequence for rPrl, including its signal peptide. While the data demonstrate some homology between rPrl and rGH supporting the evolutionary relatedness of the two genes, they also document major differences in the structures and codon choices. The data are consistent with the possibility that Prl is the most primitive molecule in this "Prl set" of related polypeptide hormones.

Animals-Female
Sprague-Dawley (Wistar) rats were anesthetized with Equithesan (pentobarbital 1%. ethanol 11.5%. chloral heydrate 456, MgS04 2%. propylene glycol 44% in sterile water) and the medial basal hypothalami destroyed using a modified Halasz knife as previously described (Cheung and Weiner, 1976). At the same time, 40-mm silastic capsules filled with Ili~-estradiol were implanted subcutaneously. Twenty-three days later the animals were killed by decapitation, the pituitaries removed, and trunk blood collected for ' The abbreviations used are: GH, growth hormone; cs, chorionic somatomammotropin; Prl, prolactin. radioimmunoassay of Prl. Prl concentrations were measured using rabbit antiserum to rPrl provided by Dr. Albert F. Parlow through the National Institute of Arthritis, Metabolism and Digestive Diseases rat pituitary hormone distribution program.
Cultured Pituitary Cells-GH, and GC cells were grown at 37°C in monolayer as previously described , in Ham F-10 and MEM Joklik medium, respectively. Both media were supplemented with 15% horse serum and 2.5% fetal calf serum. For hormonal stimulation, GH, cells received fresh medium containing thyrotropinreleasing hormone (TRH, 30 nM, Sigma) and estradiol (100 nM, Sigma) 48 h prior to harvest. Cells were harvested with 0.058 tr.wsin and 0.02% EDTA.
mRNA Preparation and Characterization-The frozen pituitaries or cell pellets were added to a sterile solution containing 5 M guanidine thiocyanate, 50 mM Tris-HCI, pH 7.5, 2 mM EDTA, and 5% p- mercaptoethanol (Chirgwin et al., 1979). After homogenization (Polytron) of the tissue, N-lauryl sarcosine was added to 2%. CsCl to 0.35 g/ml, and the mixture was heated for 2 min at 65°C. This solution was layered over a cushion of 5.7 M CsCI, 1 mM EDTA, and centrifuged in an SW 40 rotor at 32,000 rpm for 24 h a t 20°C. The resultant pellets were dissolved in 10 mM Tris-HCI, pH 7.5, 1 mM EDTA, and 1% Nlauryl sarcosine, extracted with phenol and chloroform and precipitated with ethanol in the presence of 0.3 M sodium acetate.
Polyadenylated RNA was isolated by aftinity chromatography on oligo(dT) cellulose (P-L Biochemicals, type 7) and translated in a wheat germ cell-free system in the presence of [:%]methionine (Amersham, 600 to 1300 Ci/mmol) as previously reported . Preprolactin synthesized in the cell-free system was immune-precipitated with rabbit antiserum to rPrl and the Cowan strain of Staphylococcus aureus as described earlier . "%-Proteins were electrophoresed on 12.5% sodium dodecyl sulfate slab polyacrylamide gels (Laemmli, 1970) at 20 mA/gel. Gels were fmed in 50% trichloroacetic acid for 30 min, washed twice for 1 h in 7% acetic acid, dried on Whatman No. 3MM paper, and exposed to X-ray film (Kodak, NS2T).
cDNA Synthesis a n d Construction of Recombinant Plusmids-The cDNA synthesis, HindIII linker ligation, and recombinant plasmid construction leading to the cloning of the 486-base pair fragment were as described earlier , except that the first strand cDNA synthesis was for 75 min and endonuclease HindIII (New England Biolabs) rather than Hsu I was used.
For cloning the full length rPrl cDNA, 21 pg of rPrl-enriched polyadenylated RNA were reverse transcribed as above. After base hydrolysis of the RNA, phenol and chloroform extraction, fractionation over Sephadex G-50, and ethanol precipitation, the "H-labeled single-stranded cDNA (about 3 pg) was 2-dCMP-tailed (Roychoudhury et al., 1976) in a 130-p1 reaction containing 140 mM potassium cacodylate, 30 m~ Tris base (final pH 6.91, 100 p~ dithiothreitol, 1 mM CoC12, 50 p~ dCTP, 6.2 PM [a-'"PIdCTP (Amersham, 300 Ci/ mmol), and 28 units of terminal deoxynucleotidyl transferase (Enzo Biochemicals). The reaction was incubated at 37°C for 24 min and was monitored by incorporation of radioactivity precipitated on DE81 paper. The reaction was stopped with phenol and chloroform extraction after an estimated 80 dCMPs were added, then fractionated on Sephadex G-50 and ethanol-precipitated. Oligo(dG)le.lx (P-L Biochemicals) was used to prime synthesis of the second strand under conditions identical with the first strand synthesis except that only cold nucleoside triphosphates were present. Reaction was in 150 pl with 40 units of reverse transcriptase from avian myeloblastosis virus (Dr. J. W. Beard, Life Sciences, Inc.) for 90 min a t 42OC. After phenol and chloroform extraction, Sephadex G-50 fractionation, and ethanol precipitation, the 1.9 pg of double-stranded cDNA were 3"dCMPtailed in 50 p1 of the previously described buffer with 500 pmol of dCTP/S"end. The reaction was terminated with phenol and chloroform extraction after 15 dCMP residues had been added to each end as estimated by incorporation of label. The reaction was fractionated on Sephadex G-50 and ethanol-precipitated with 1 pg of Torula RNA used as carrier.
Fifty nanograms of dCMP-tailed, double-stranded cDNA were annealed to 50 ng of pBR322 cleaved with Pst I (New England Biolabs), and tailed with dGMPs (16 dGMPs/end) in 100p1 containing 10 mM Tris-HC1, pH 7 4 1 m~ EDTA, and 100 mM NaCI. The mixture was heated to 80°C for a few minutes, allowed to cool gradually to 42"C, incubated overnight at 42OC, and then cooled gradually to 4OC.
Transformation and Colony Screening-Transformation of Escherichia coli x1776 with each of the plasmid preparations was carried out as previously described  in accordance with National Institutes of Health guidelines, The yield ranged from 0.15 to 1 x 10' colonies/pg of supercoiled plasmid in the various transformations. In the fmt experiment in which the DNA was ligated into the Hind111 site of pBR322, the clones were screened for ampH kt'.
Plasmid DNA was isolated from the positive clones, digested with Hind111 (New England Biolabs), analyzed by electrophoresis on 1% agarose gels and hybridization (Southern, 1975) using a ['"PIcDNA to the rPrl-enriched mRNA as probe (see below). In the second experiment in which the cDNA was inserted into the Pst I site of pBR322, amp' tetR colonies were replica-plated onto Whatman 540 paper (Grunstein and Hogness, 1975) and screened with the cDNA probe described above. Plasmid was isolated from the positive clones, cleaved with Pst I and analyzed by agarose gel and Southern hybridization using the nick-translated plasmid containing the rPrl486-base pair insert. Hybridization Probes-cDNA probes were prepared in 100-pl reactions containing 60 mM NaC1, 10 mM dithiothreitol, 50 m~ Tris-HCI, pH 8.3, 6 m~ MgCI,, 1.5 pg of oligo(dT)r~.l+ 1.5 pg of mHNA, 40 units of reverse transcriptase, 400 rnM each dATP, dTTP, dGTP, and 11 p~ [a-,'"P]dCTP dried under N,. Incubation was at 37°C for 60 min. Reactions were stopped with addition of EDTA to 10 mM followed by phenol and chloroform extraction and fractionation over Sephadex G-50 as described above. The ethanol-precipitated void volume peak was taken up in 0.4 N NaOH, 5 mM EDTA, heated to 37°C for 60 min, and then neutralized with HCI. Incorporation of 10" cpm/pg of DNA was obtained routinely.
DNA Gel Electrophoresis-Electrophoresis on 4.5 to 10% polyacrylamide slab gels was performed in Tris/borate/EDTA buffer (Peacock and Dingman, 1967) a t 30 mA/gel. Electrophoresis on 0.8 to 2% agarose horizontal gels was performed in Tris/acetate/NaCI/ EDTA buffer (Peacock and Dingman, 1968) a t 50 V/gel. DNA Sequence Analysis-5"Ends were labeled with polynucleotide kinase (Boehringer-Mannheim) and [y-'"PIATP (Amersham, 4200 Ci/mmol) after previous dephosphorylation of the DNA with alkaline phosphatase (Enzo Biochemicals). 3'-Ends were labeled by E. coli DNA polymerase I, Klenow fragment (Boehringer-Mannheim) and the appropriate [a-'"P]deoxynucleoside triphosphate. The chemical cleavage technique (Maxam and Gilbert, 1977). with one modification,' was used on DNA restriction fragments. For chemical modification at the adenine residues, 25 pl of formic acid (Mallinckrodt) were added to 10 p1 of ["PIDNA and allowed to react for 10 min at 20°C. Reaction was stopped with addition of 200 pI of 0.3 M sodium acetate, 0.1 mM EDTA, and 25 pg/ml of tRNA. The thin gel system was used (Sanger et al., 1977).

Stimulation of rPrl Secretion and Characterization of the
nRNA-Polyadenylated RNA was prepared as described under "Experimental Procedures" from rat pituitaries and from various cultured GH cells (Dannies and Tashjian, 1973). Analysis of the relative abundance of rPrl mRNA in the individual preparations was performed by cell-free translation and, in some cases, immunoprecipitation followed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (Fig. 1). Preprolactin and pregrowth hormone co-migrated in our gel electrophoresis system. Pregrowth hormone is the major translation product of the mRNA extracted from GC cells (Lane c), whereas immunoprecipitated preprolactin is shown in Lanes g and i. In the cultured GH, cells, treatment for 48 h with estradiol and thyrotropin-releasing hormone (TRH) resulted in a severalfold enrichment in rPrl mRNA (Fig. 1, Lanes b and a, control and stimulated, respectively). However, the RNA extracted from normal rat pituitaries (Fig. 1, Lanes e and female and male, respectively) contained a greater amount of Prl or GH mRNA, or both, than the cultured cells. The ' normal female rat pituitary is more enriched in Prl mRNA than normal male rat pituitary as shown by the immunoprecipitation of the translation products with antiserum to rPrl (Fig. 1, Lanes g and h ) . The greatest amount of Prl mRNA was found in the pituitaries of the female rats a b c d e f g h i j  to Dayhoff (1978).
nucleotides per codon differ.
treated with chronic estrogen stimulation and ablation of the medial basal hypothalamus (Fig. 1, Lanes d and i; total cellfree translation and immunoprecipitation, respectively). The serum concentrations of Prl in these animals were found by radioimmunoassay to be elevated to 1200 ng/ml, a 40-fold increase over normal. Based on the cell-free translation data, the Prl mRNA represents about 80% of the total pituitary mRNA after these treatments. Double-stranded cDNA Synthesis and Cloning-Rat Prl cDNA was synthesized and cloned in two separate experiments. In the fmt experiment, double-stranded cDNA was prepared from the Prl-enriched pituitary polyadenylated RNA template using reverse transcriptase from avian myeloblastosis virus (AMV) to synthesize both strands followed by S, nuclease digestion of the resulting hairpin loop. Aliquots of this material digested with several restriction endonucleases generated a series of discrete DNA fragments separable by polyacrylamide gel electrophoresis (data not shown). The presence of prominent bands suggested that this cDNA was quite enriched in one or a few specific DNA species. The DNA complementary to rPrl mRNA was expected to be among these species based on the cell-free translation data.
Chemically synthesized "P-labeled double-stranded DNA decamers containing the restriction site for Hind111 (5°C-C-A-A-G-C-T-T-G-G-3') were then ligated to each end of the double-stranded cDNA to rat pituitary mRNA. After digestion of this product with HindIII to remove the extra HindIII concatamers, the double-stranded cDNA with a single HindIII cohesive terminus at each end was purified on a preparative polyacrylamide gel. Double-stranded cDNA with a length of about 450 to lo00 base pairs was electroeluted from the gel and ligated into the HindIII site of pBR322. This plasmid was then used to transform E . coli strain x1776 as described under   (1977). " Data from the current study.
"Experimental Procedures." Of 25 amp' tet" colonies, eight contained a plasmid with an insert. Only one of these eight plasmids hybridized (Southern, 1975) with a ['"PIcDNA probe complementary to the initial mRNA template (Fig. 2). This plasmid was amplified in bacteria and purified. It contained a 486-base pair DNA insert, the nucleotide sequence of which was consistent with that predicted from the known amino acid sequence of rPrl (Shome and Parlow, 1977). In the second cloning experiment, single-stranded cDNA was prepared from the same pituitary RNA preparation. Next, an oligopolymeric tract (tail) of dCMPs was added to its 3'end as described under "Experimental Procedures." An esti- mated 80 dCMP residues were added/molecule of cDNA. Therefore, the synthesis of the second strand could be primed on this poly(dC) tail with the use of an oligo(dG) primer and reverse transcriptase. The double-stranded product was then 3"tailed with about 15 dCMP residues/end as before. This product was annealed to pBR322 previously digested with Pst I and 3"tailed with dGMP and the recombinant plasmid transformed into the E . coli strain x177fi. As the Pst I site of pBR322 interrupts the 8-lactamase gene near i t s 3'-end, insertion of foreign DNA in this site may not abolish ampicillin resistance. Nevertheless, 210 tet" amp" colonies were obtained with 50 ng of cDNA. Twenty-five of these colonies were detected by colony hybridization to a cDNA probe prepared from the pituitary mRNA. An autoradiograph of a represent- ative fiter is displayed in Fig. 3A. Of these 25 recombinant plasmids, only one contained a cDNA insert greater than 800 base pairs in length (Fig. 3B, the size anticipated for full length rPrl cDNA) which also hybridized with the 486-base pair rPrl cDNA cloned in the first experiment (Fig. 3C). This insert was 823 base pairs in length and had a DNA sequence compatible with the one expected for rPrl cDNA.
Nucleotide Sequence of the Two Cloned rPrl cDNAs-The complete nucleotide sequences of both the 486-and 823-base pair cloned rPrl cDNAs were det.ermjned by the chemical degradation method (Maxam and Gilbert, 1977). Each cloned DNA was sequenced two or more times, and in 75% of the full length molecule both the message and antimessage strands were sequenced. The 486-base pair cloned cDNA corresponds to an internal fragment of the 823-base pair cloned cDNA (from nucleotide 230 to 716). The double-stranded DNA sequence of the larger molecule and the location of restriction endonuclease sites are shown in Fig. 4. The rPrl mRNA sequence and the derived amino acid sequence are shown in Fig. 5. If this assignment of amino acids is correct and if translation begins with the methionine codon at position -28 as indicated, then the primary translation product of rPrl mRNA is a protein of 25,655.65 daltons. Preliminary amino acid sequences for rPrl (Shome and Parlow, 1977) and i t s NHderminal signal peptide (McKean and Maurer, 1978) have been reported. Our predicted amino acid sequence agrees with the published sequences in 172 of the 197 amino acids of rPrl and in 27 of 28 amino acids in the signal peptide. Our sequence does not contain the alanine residue reported at position -20 of the signal peptide, suggesting that the actual signal peptide contains 28 rather than 29 amino acids. Similarly, our data indicate that the mature hormone contains 197 rather than 198 amino acids as previously reported, Of the remaining discrepancies, 15 are omissions or additions, 9 are differences in the reported amino acid, and 4 result from two inversions of two amino acids. Three of our changes convert the amino acid to the one found in the corresponding position in hPrl (Shome and Parlow, 1977). A cluster of 18 discrepancies occurs between amino acids 70 and 104, an area whose amino acid sequence was apparently determined with chymotryptic digests (our conclusion by analogy with the strategy used for hPrl, Shome and Parlow, 1977). Since proper alignment is often difficult with such digests, the nucleic acid sequence may be more reliable in this case. Also, shown in Fig. 5 are the nucleic acid and predicted amino acid sequences for rGH . The amino acid sequence of rGH and rPrl were aligned by introducing arbitrary gaps in order to maximize their identity. Table I summarizes conclusions based on this comparison of the molecules. The codon selection in rPrl mRNA is nonrandom (Fig. 6), as in rGH, hGH, and hCS. However, in contrast to these other members of the Prl set, rPrl has no preference for GMP and CMP over AMP and UMP in the third position of the codons (Table 11).

DISCUSSION
In the studies reported here, analysis of two different cloned DNA fragments containing structural rPrl gene sequences is presented. One of the fragments, containing 486 base pairs, was inserted into a plasmid after addition of Hind111 linker DNA to blunt-ended double-stranded cDNA. The other fragment which contains almost all of the bases complementary to rPrl mRNA was inserted into a plasmid with the use of dGMP-dCMP tailing. The nucleotide sequence of these DNAs, although generally agreeing with that expected from the amino acid sequence data (Shome and Parlow, 1977), has enabled us to propose a substantial revision in the preprolactin sequence. The nucleic acid sequence was also compared to that previously determined for rGH  revealing certain interesting differences in base composition and codon selection.
RNA Isolation a n d cDNA Cloning-Three unusual approaches were applied to the isolation of rPrl mRNA, its conversion to double-stranded cDNA and subsequent cloning. First, the production of mRNA highly enriched in Prl sequences was stimulated by subcutaneous 17P-estradiol implantation and by destruction of the medial basal hypothalamus in a group of female rats. Prl secretion from mammalian anterior pituitary is mainly under inhibitory hypothalamic control (Takahara et al., 1974;MacLeod, 1976;Ben-Jonathan et al., 1977). Destruction of the medial basal hypothalamus increases circulating Prl concentrations while inhibiting the secretion of other anterior pituitary hormones (Bishop et al., 1972). In addition, the number of mammotrophs increases 3fold in animals with lesions." Estrogen treatment also stimulates Prl synthesis (Lieberman et al., 1978;Seo et al., 1979) and causes hypertrophy of mammotrophs (Gersten and Baker, 1970). In the present study, combining these two treatments caused a greater than 2-fold increase in the weight of the anterior pituitary gland (16 to 38 mg) and a 40-fold increase in serum Prl (30 to 1200 ng/ml). Analysis by cell-free translation of the RNA isolated from the pituitaries of these treated rats showed that the level for Prl mRNA was substantially greater than that seen in normal rats or in cultured pituitary tumor cells (GHs cell line) that were stimulated with estrogens and thyrotropin-releasing hormone. This obviated the need for further mRNA purification and thus enabled us to synthesue directly cDNA highly enriched in Prl gene sequences. Secondly, for synthesis of the 823-base pair fragment, we devised a technique to prevent the loss of DNA sequences complementary to the 5'-end of the mRNA. This loss inevitably occurs when S, nuclease is used to digest the hairpin loop joining the two strands of DNA resulting from the self-priming of the second strand on the fist one. In this technique, an oligopolymeric tract of dCMPs was added to the 3"end of the single-stranded cDNA. This tract was used as a template to prime the synthesis of the second cDNA strand using a commercial oligo(dG) primer. Sequencing of the final doublestranded cDNA molecule which had been dCMP-tailed again at both 3'-ends, revealed tracts of 14 and 16 dCMPs at each end, not the predominance of dCMPs on the 5'-end of the message strand expected from the initial tailing of the singlestranded DNA. We subsequently found that our terminal deoxynucleotidyl transferase was contaminated with exonuclease, and this may have caused loss of the 3"poly(A) tract and adjacent sequences in our cloned molecule (including the AAUAAA found in most other eukaryotic mRNAs). The loss was probably minimal based on estimates of full length rPrl mRNA as 840 f 40 bases (Evans and Rosenfeld, 1979). Despite this problem the technique resulted in a longer 5"untranslated region than was previously possible in our hands. Third, the time of the fist and second strand synthesis with reverse transcriptase was extended from the usual 15 min Shine et al., 1977) to 75 to 90 min. This was done after control experiments with our enzyme indicated that [3'P]dCMP continued to accumulate in specific restriction fragments of the product for up to 90 min. This result is in agreement with the results of Buell et al. (1978).
Comparison of the Cloned rPrl cDNAs-The 486-base pair fragment coding for amino acids 33 to 194 of the protein, the 823-base pair fragment coding for the full length protein and untranslated regions, and the 387-base pair fragment coding for amino acids -16 to 145 previously reported (Gubbins et al., 1979) have nearly identical nucleotide sequences. The 486base pair fragment differs from the 823-base pair fragment by only three nucleotides: adenine instead of cytosine in the second base of the codon for amino acid 98 and thymine instead of cytosine for the third base of the codons for amino acids 125 and 128. The latter two differences are silent, but the fist converts asparagine in the 486-base pair clone to threonine in the 823-base pair clone; both of these amino acids are members of the same chemical subfamily. Such substitutions which commonly occur in related proteins may have little effect on the secondary or tertiary structure or function of the hormone (Dayhoff, 1978).
Likewise, comparison of the full length clone with the 387base pair sequence of Gubbins et al. (1979), reveals differences in three nucleotides. These are: CTT (leucine) instead of CTC (leucine), ATT (isoleucine) instead of ATC (isoleucine), and AGC (serine) instead of GGC (glycine) at amino acids 125, 128, and 133, respectively. Among these differences, which all represent a single nucleotide change per codon, two are silent. The third one leads to a serine to glycine change which is a commonly observed substitution and which may not affect biological function significantly (Dayhoff, 1978). The change in our sequence a t amino acid 133 results in the addition of a second Hue I11 site five bases in the 5' direction from the single Hue I11 site previously reported. This change also results in the addition of an Eco RII site which is confwmed by restriction enzyme analysis. Our rPrl mRNA came from Sprague-Dawley rats while the RNA of Gubbins et al. was from Fischer rats. Hence, these differences may represent allelic polymorphism in the Prl molecule as has been observed in bovine GH (Fellows and Rogol, 1969), or the cloned molecules may represent transcripts from nonallelic genes as has been shown for rat preproinsulin (Ullrich et al., 1977). Errors in reverse transcription (Gopinathan et al., 1979) or sequencing cannot be excluded. Of interest in this regard, two-dimensional gels of rPrl mRNA translation products have revealed two molecular forms of rPrl (Evans et al., 1978) and hybridization with rPrl cDNA to the genomic DNA of GH, cells has given a calculated gene copy number of 2.4 per haploid genome (Evans and Rosenfeld, 1979). Whether the various rPrl cDNA molecules will be assigned to one or more chromosomal loci will depend upon the isolation, cloning and characterization of all genomic Prl sequences from the normal rat.
Comparison of rPrl and rGH-When the amino acid sequences of rPrl and rGH  are aligned with arbitrary gaps to maximize their identity,' 51 of 209 amino acid residues are identical (Table I) and 118 more represent common substitutions of related residues (Dayhoff, 1978). This degree of identity and the prevalence of conservative differences suggest that the genes coding for the two hormones arose by duplication of a common ancestor (Bewley et al., 1972;Niall et al, 1971). Since the pituitaries of birds, reptiles, and amphibians all contain distinct Prl-like and GHlike molecules, the Prl-GH gene duplication has been estimated to have occurred before the divergence of fish and tetrapods, 300 to 400 million years ago (Acher, 1976). Another estimate of this time of divergence can be made by applying the "evolutionary clock" hypothesis. This assumes that neutral changes accumulate in a given protein at a constant rate expressed as unit evolutionary period. One unit evolutionary period is the average time in millions of years for a 1% difference in amino acid sequence to arise between lineages; for Prl, the unit evolutionary period is 5 (Wilson et al., 1977a).
From the 76% difference between rPrl and rGH (Fig. 51, the Prl-GH divergence time is estimated to have occurred about 380 million years ago, in good agreement with the taxonomic estimate. This significantly predates the divergence time that we calculate for hGH and hCS at 56 million years ago, using the GH unit evolutionary period of 4 (Wilson et al., 19774 and homology data from Martial et al. (1979).
Using the same alignment (Fig. 5), direct comparison of the nucleic acid sequences indicate a 39% incidence of identical nucleotides. As previously noted when comparing hGH to rGH and hCS , the nucleotide homology is greater than the amino acid homology. The drift of silent and presumably neutral mutations in the comparison of the two sequences reveals that 24 of 52 single nucleotide changes ' H. NiaU, personal communication. within codons (Table I) are not reflected as amino acid changes. This silent mutation frequency is 46% whereas, on a random basis, one would expect a rate of only 25%. Similar high rates of silent mutations have been reported between homologous nucleotide sequences from different organisms (Jukes and King, 1979). Such high rates of silent mutations may reflect evolutionary pressure to conserve certain important structural elements and further support the common precursor hypothesis previously derived from amino acid sequence data.
Our cloned DNA contains 52 base pairs corresponding to the 5'-untranslated region of rPrl mRNA. The number of nucleotides still missing from the complete structure is unknown. There is no striking homology with rGH mRNA in this region, and only limited complementarity with the 3'-end of eukaryotic ribosomal RNA (Hagenbuchle et al., 1978). In the 5"untranslated region of rPrl, the sequence GUGGU is repeated three times beginning 47, 18, and 9 nucleotides from the AUG initiation codon. GUGG is repeated twice in rGH mRNA, 19 and 2 nucleotides from AUG, once in hGH mRNA, 19 nucleotides from AUG, and once in hCS mRNA, 5' nucleotides from AUG.' GUGG is absent from the 5"untranslated region of mRNAs for ovalbumin (McReynolds, 1978), human a-chorionic gonadotropin (Fiddes and Goodman, 2979), bovine corticotropin-j3-lipotropin (bACTH- LPH Nakanishi et al., 1979), bovine preproparathyroid hormone (Kronenberg et al., 1979), and rat preproinsulin I and I1 (Lomedico et al., 1979). Fig. 7 shows a possible secondary structure for this region of the rPrl mRNA which leaves the GUGGU unpaired in a hairpin loop. It is not known whether the preservation of this structure at a fixed distance from the initiation codon in these three related polypeptide hormones is significant. The short palindrome (5'-UUCUCUU-3') in the 5"untranslated region of rPrl mRNA is not present in rGH mRNA, but less of the 5"untranslated sequence of the latter is known.
The 3"untranslated regions of rPrl and rGH mRNAs do not show significant homology. In this area, rPrl lacks the regions rich in guanine and cytosine which are present in rGH, hGH, and hCS, and has a guanine and cytosine content of 36%, similar to the 42% of total rat DNA (Sueoka, 1961). The sequence 5'-UUCUUAAAAGUCUAUUUCUU-3' located in the 3"untranslated region of rPrl may correspond to the ' P. Seeburg, personal communication; only 15 nucleotides of the 5'-untranslated region of hCS mRNA have been sequenced. palindromic sequences found at a similar position in rGH, hGH, and hCS mRNAs. Symmetrical sequences of this type have been identified in other eukaryotic mRNAs (Wilson et al., 1977b;Proudfoot, 1977) but their function, if any, remains unknown.
The limited homology between both 5'-and 3'-untranslated regions of rPrl and rGH indicates that these regions have undergone evolutionary divergence faster than the coding regions. Human CS and hGH which, as discussed above, may have diverged 56 million years ago, have 93% homology in both coding and noncoding regions; hGH and rGH which diverged about 132 million years ago have 75% homology in the coding region, 73% in the 5'-and 38% in the 3"noncoding regions ; and rPrl and rGH which diverged 380 million years ago have 39% homology in the coding regions and about 30%'j in the noncoding regions. These data are consistent with the hypothesis of Nishioka and Leder (1979), that two modes of nucleotide sequence evolution can be distinguished, a slow mode preserving sequence homology and a faster mode in areas where there are no RNA or protein sequence constraints.
Codon usage in rPrl is nonrandom, as has been found in other eukaryotic mRNAs. No consistent pattern has emerged , although one striking difference has been observed within the Prl set. In total rat DNA as in all vertebrate DNA, guanine and cytosine constitute 42% of the residues (Sueoka, 1961). However, in the third position of the codons for rGH, hGH, and hCS there is a strong preference for guanine and cytosine (74%, 76%, and 80%, respectively, Table 11) whereas in rPrl the third codon choice is random (5056, Table It). Table I t summarizes the usage of guanine and cytosine in the third position of the codons for several mRNAs of eukaryotic hormones. Only bovine preproparathyroid hormone mRNA has a lower guanine plus cytosine percentage than rPrl mRNA (Kronenberg et al., 1979). The guanine plus cytosine preference is more marked in those hormones of the Prl set thought to be of more recent origin; however, the extent to which this trend is generalized and significant is unknown.
With the results of the current study, coding sequence data for a t least one gene from each of the entire Prl set is now available, and there is clear evidence for the evolutionary relatedness of Prl, GH, and CS. Although nucleic acid sequence data for all three hormone genes from one species are not yet available, the rPrl and rGH genes are clearly much more different than are the hGH and hCS genes. The biological activities of these hormones show considerable interspecies variation. Thus, it will be of interest to apply the same analysis to this set of hormones in a variety of species to determine the extent of intra-and interspecies similarities in these genes.