Nucleotide Sequence of Bovine Prolactin Messenger RNA EVIDENCE FOR SEQUENCE POLYMORPHISM*

Hybrid molecules containing DNA sequences comple- mentary to bovine pituitary mRNA were constructed in the Pst I site of pBR322 by the dC.dG tailing tech- nique. Recombinant plasmids containing bovine prolactin (bPRL) sequences were amplified in bacteria and identified by hybridization to purified r3’P]bPRL cDNA sequences. Nucleotide sequence analysis was per- formed on the inserts from two of the positive clones. One clone, pBPRL72, contained a 982-base pair insert that included 67 nucleotides of the 5’-untranslated re- gion, the complete coding region of the preprolactin protein (690 nucleotides), and the entire 3”untranslated region (150 nucleotides) of bPRL mRNA. The nucleotide sequence analysis of clone pBPRL72 predicted the se- quence of a 30-amino acid signal peptide and confirmed the published amino acid sequence of the protein with one exception. A comparison of the pBPRL72 cDNA sequence with a second bPRL clone, pBPRL4, revealed four silent nucleotide differences. Three of the base changes occurred in the third position of amino acid codons, and one occurred in the 3’-noncoding region. The sequence polymorphism suggests the existence of alleles or multiple loci for bPRL that do not alter the protein structure.

clones shows that the nucleotide sequence of bPRL mRNA contains heterogeneities in the coding and noncoding portions of the message. The sequence polymorphisms are silent and therefore do not affect the amino acid sequence of the bPRL protein. Tn, u1 aliquot5 of each reaction *ere electrophoresed as delcrlbed by Sanger and Coulson The sequenclng reactions *ere stopped w i t h 10 ul of 901 fomamide and denatured a t 1W' (8).

RESULTS AND DISCUSSION
Detection of Clones with Large Prolactin Inserts-We had earlier reported the cloning and sequence analysis of a 225base pair insert coding for amino acids 119-192 of the bPRL protein (199 amino acids total) (3). In the present study, mild SI-treatment and subsequent sizing of ds-cDNA synthesized from bovine pituitary poly(A)-containing RNA were employed in order to obtain full length PRL sequences. A total of 165 PRL-positive colonies were identified by hybridization to purified ['"PplbPRL cDNA from approximately 600 transformants. Recombinant plasmids from selected colonies were prepared by the miniscreen technique of Birnboim and Doly (7), digested with Pst I, and electrophoresed in agarose gels to determine the size of the PRL inserts. The majority of the clones contained inserts of approximately 600 bp (data not shown). Two recombinant plasmids containing large inserts of approximately 750 and 950 bp were selected for sequence analysis. These plasmids were designated pBPRL4 and pBPRL72 (plasmid bovine prolactin, 4th and 72nd colonies), respectively.
Sequence Analysis of Prolactin Inserts-The insert from pBPRL72 was excised with Pst I and separated on a preparative polyacrylamide gel. The Pst I digestion resulted in a single insert fragment indicating the absence of a Pst I site in the bPRL sequence. The purified insert was analyzed with a series of restriction enzymes. Digestion with the enzymes Alu I, Dde I, Hue 111, Hinff, and Hpa I1 resulted in fragments (data not shown) of appropriate size (20-150 bp) for use as primers in the dideoxy DNA sequencing method (8,9). Several of these fragments were employed for DNA sequence analysis of the inserts from pBPRL4 and pBPRL72 according to Smith (9). The chemical method for DNA sequencing (12) was used to determine the length of the homopolymer tails resulting from the dG dC tailing technique. Fig. 1 shows a summary of the sequencing strategy. The 982-bp insert sequence of the larger clone, pBPRL72, was determined using both methods for DNA sequencing (Fig.  2). Most areas of the insert were sequenced two times and for 70% of the insert length both complementary strands were analyzed (Fig. 1). The cloned insert contained a total of 907 nucleotides corresponding to the bPRL mRNA sequence as well as 21 A residues from the poly(A) portion of the message. The homopolymer tails consisted of 26 and 28 bases at the 5' and 3' ends of the insert, respectively. The amino acid sequence predicted from one reading frame (Thr, +1, Fig. 2) agrees with that of the bovine prolactin protein (13) with one minor exception. The DNA sequence predicts an aspartic acid instead of an asparagine at amino acid position 31. Confirmation of our assignment was obtained from both DNA strands in this region and therefore suggests an alternate bPRL protein sequence. Assignments from the DNA sequence at the indeterminate positions (Glx and Asx) of the amino acid sequence show glutamic acid for amino acids 70, 78, 118, 120, 121, 122, 143, and 145, glutamine for 71, 73, and 74, asparagine for 10 and 92, and aspartic acid for 93.
The clone pBPRL72 also contained the entire sequence corresponding to the signal peptide of the bPRL protein. Lingappa et al. (14) reported the length of this signal sequence to be 30 amino acids, but the complete amino acid sequence was not determined. The nucleotide sequence of pBPRL72 also predicts a signal peptide of this length. A single AIJG codon (Met, -30, Fig. 2) is found in the sequence preceding the codon for the NH2-terminal threonine of the authentic bovine prolactin protein. In addition, the location of eight leucine codons in the hydrophobic center of the signal sequence reported by Jackson and Blobel (15) are predicted correctly from the DNA sequence of the pBPRL72 insert. The amino acid sequence deduced from the reading frame following the AUG codon represents the first complete protein sequence for the signal peptide of bPRL. This signal peptide sequence is similar in nature to other known signal peptides and is the longest known signal sequence of a secreted protein (15).
The start of the signal peptide sequence in pBPRL72 is preceded by 67 nucleotides corresponding to the 5"untranslated portion of bPRL mRNA. By electrophoresis in denaturing gels, we have previously estimated the size of bPRL cytoplasmic mRNA to be approximately 1000 nucleotides, including the 3'-poly(A) segment (1). If the poly(A) region of bPRL mRNA is 100-150 nucleotides long, the 907-bp sequence of pBPRL72 may represent a nearly complete copy of the mRNA. Furthermore, electron microscopic analysis of cytoplasmic bPRL mRNA and pBPRL72 duplexes does not show an unhybridized 5"terminat extension of the mRNA:' This observation indicates that any additional sequence for this region of the mRNA is likely to be less than 50 nucleotides. Together these data indicate that the 5"noncoding region of bPRL mRNA represented in the cloned pBPRL72 insert is nearly complete.
The complete 3"noncoding portion of bPRL mRNA is also included in clone pBPRL72. The appearance of a poly(A) sequence in this clone establishes the exact length of the 3'untranslated region to be 150 nucleotides. The common AAUAAA sequence found in the 3'-noncoding regions of eucaryotic mRNAs is located 28 bases upstream of the poly(A) addition site. This sequence has been postulated to play a role in the addition of poly(A) segments to mRNAs and generally occurs approximately 20 bases prior to the poly(A) junction (16).
Comparison of Cloned Prolactin Sequences-Previous analysis of two partial bPRL cDNA clones from our laboratory (3) and another laboratory (17) has revealed several nucleotide differences between the cloned sequences. Sequence data from two laboratories on the rat PRL mRNA sequence (18,19) also suggests that nucleotide sequence heterogeneities occur in cDNA clones obtained from different inbred strains of rats. Therefore, we decided to examine another PRL cDNA clone for potential sequence polymorphism. The present analysis further documents the extent of such nucleotide heterogeneities in the bPRL mRNA. Clone pBPRL4, which contains an

5'
Met Asp Ser Lys Gly Ser Ser Gln Lys Gly S e r Arg Leu Leu Leu Leu Leu insert of approximately 750 bp, was analyzed simultaneously with pBPRL72. Approximately 50% of pBPRL4 has been sequenced, and it contains the nucleotide sequence corresponding to amino acid number 26 through the poly(A) portion of the mRNA (Fig. 2).
Comparison of the insert DNA sequences from pBPRL72 and pBPRL4 indicates four positions where a change in the identity of the base is evident (Fig. 2). Confmation of these assignments was obtained from both complementary strands. All the nucleotide substitutions are silent with respect to the amino acid sequence of bPRL. Three of the changes occur in the third position of codons, while the fourth difference occurs in the 3'-noncoding region, 10 nucleotides from the poly(A) junction of the mRNA. Including the earlier differences, we have now identified a total of seven amino acids that display alternate bases in the third position of their codons. Both transitions and transversions of the nucleotides have been observed. These differences are summarized in Table I. In each instance, the sequence polymorphism does not alter the protein sequence. Moreover, we have noted that the differences all occur in the codons of hydrophobic amino acids. The significance of this trend is unknown at this time. Although it is conceivable that such nucleotide differences may arise from errors in reading the sequencing gels, the data was obtained from unambiguous portions of the gels and c o n f i e d by agreement of both DNA strands.
One other nonexpressed difference is evident between the 3"noncoding sequence of bPRL mRNA reported by Miller et al. (17) and our data. We find the nucleotides CA at bases 768 and 769 (Fig. 2) where they have reported a single G residue.
The effect of this deletion/substitution on the bPRL mRNA structure is unknown, but it is nevertheless silent with respect Met (-30) and terminating with the ochre UAA codon, labeled OC. The beginning of processed PRL protein is labeled Thr (+l). The bases marked by * are nucleotides that differ in the insert sequence of clone pPBRL 4. The changes are summarized in Table I.

The lccation and i d e n t i t y of the aaino acids from WRL n i t h differences In the third
to the protein sequence. Assuming both sequences are correct, this difference represents yet another type of polymorphism in the bPRL gene.
The origin of the sequence heterogeneities in these bPRL cDNA clones may result from several possible sources. Although errors in the initial copying of the mRNA template by reverse transcriptase have not proven to be a significant problem in cDNA cloning, the possibility of introducing random base changes is feasible. Furthermore, growth and amplification of the recombinant plasmids in Escherichia coli may introduce mutations in the cloned sequences. However, since we only detect heterogeneities in the third positions of codons, it seems unlikely that the differences result from random errors by reverse transcriptase or random mutations of the hybrid plasmids in E. coli. A third possibility is that

Bovine Prolactin
Sequence 68 1 the sequence heterogeneities result from the existence of multiple bPRL mRNA sequences. Multiple sequences may indicate a number of alleles for PRL in the gene pool of cattle or multiple loci within each animal. The poly(A)-containing RNA used to generate the cDNA clones described in this study was obtained from several animals. We have therefore initiated sequencing studies of bPRL mRNA from single animals by extension of DNA restriction fragment primers hybridized to the mRNA template. Preliminary results suggest the presence of two nucleotides at the same position on the sequencing gel corresponding to the third nucleotide of some amino acid codons (data not shown). Animals with this type of sequence polymorphism in the cytoplasmic PRL mRNA would presumably be heterozygous at the PRL gene locus or have duplicated genes. Confiiation of these results will require cDNA cloning from one anterior pituitary gland. The existence of multiple alleles for bPRL may be further examined at the genomic level. Jeffreys (20) has recently described sequence polymorphism in the human globin genes of normal individuals detected by Southern blot hybridization. All three variant restriction enzyme cleavage sites occurred in the intervening sequence of the genes, similar to the results of Lai et al. (21) with the chicken ovalbumin gene. No variants were detected in the coding regions of the globin genes examined in this study. However, restriction enzyme analysis of this type is limited to sequence changes that occur within specific enzyme recognition sites. Based on Jeffreys' analysis, an average of at least 1 in 100 bp may be expected to vary polymorphically throughout the human genome. Comparison of cloned DNA sequences is more extensive and has allowed us to detect polymorphism in the coding region of bPRL. We may be able to investigate polymorphism in the coding sequence of bPRL at the genomic level due to the occurrence of restriction enzyme sites containing the variant sequences described here, as well as polymorphism of the intervening sequences. Such information may be useful in determining the number of PRL sequences per genome.
Sequence Homology of Prolactin mRNAs-The primary structure of bPRL mRNA is also of interest for purposes of comparison with PRL mRNA sequences from other species. Substantial stretches of homology are apparent between the rat (18, 19), human (22), and bovine PRL sequences at both the amino acid and nucleotide level (data not shown). The overall homology for the protein sequences of bPRL and rPRL including the signal peptide is 57%, while the total base homology including the untranslated portions is 68%. Similarly the protein sequences of bPRL and hPRL contain 73% identical amino acids, while the nucleotide sequences are 80% homologous. The bPRL mRNA sequence reported here contains more extensive sequence information for the 5'-and 3'noncoding regions of the mRNA than the rPRL or hPRL sequences. Significant homologous stretches are apparent in the untranslated regions of the three sequences by introducing gaps for comparisons. The apparent sequence homology among these mRNAs may provide information on specific mRNA secondary structures that are functionally conserved in evolution.
PRL is structurally related to growth hormone which is also produced in the pituitary (23). In comparison with bGH mRNA, which has a 59% G + C content, bPRL mRNA contains only 49% G + C. This difference is also reflected in the nonrandom codon usage in the translated portions of these two mRNAs. The bPRL message utilizes G or C in 61% of its codons, and the bGH message favors G or C in 82% of its codons. This difference in the G + C content and codon usage of PRL and GH mRNAs is also present in the rat (19) and human (22). The per cent homology between the two bovine sequences also parallels the homology for the rat PRL and GH sequences. Both sets of nucleotide sequences are approximately 38% homologous while the proteins are approximately 23% homologous. The areas of homology for the bovine related sequences are rather short. The longest exact amino acid match is three, and the longest nucleotide overlap is nine. It is difficult to make comparisons of the untranslated regions for these mRNAs where the sequences have apparently diverged extensively.
The availability of the full length bPRL cDNA clone described here will allow us to examine the expression of the hormone PRL in the bovine system. Such studies will be of interest for both comparative and mechanistic purposes.