The Gene Encoding the ,&Subunit of Rat Luteinizing Hormone ANALYSIS OF GENE STRUCTURE AND EVOLUTION OF NUCLEOTIDE SEQUENCE*

The nucleotide sequence of the gene encoding the a-subunit of rat luteinizing hormone ( L m ) has been de- termined from a genomic DNA fragment cloned in X phage Charon 4A. Blot hybridization of restriction enzyme digests of rat genomic DNA indicates that the gene is present in a single copy. The transcriptional unit is 0.98 kilobase in size and contains three exons interrupted by two introns of 245 and 225 base pairs (bp). The locations of the exonlintron junctions at amino acid codons -161- 15 and +41/+42 have been conserved between the rat L m gene and the related genes, human LH@ and human chorionic gonadotropin 8. Using S1 nuclease mapping and oligonucleotide- primed reverse transcription of ovariectomized rat pituitary mRNA, the start of transcription was deter- mined to be 7 bp upstream from the start of translation. Characteristic promoter elements are present in the 5“flanking region of the gene, including the Goldberg-Hogness sequence, TATAAA, 31 bp, and the consensus CAAT box sequence, 167 bp upstream from the start of transcription, respectively. Within the proximal 200 bp flanking the 5”region of the transcriptional unit, there is strong homology between the rat and human LHB genes, suggesting that these regions include sequences which may be important for regulation of gene expression. Isolation and characterization of the rat LHB gene further defines the evolution of glycoprotein hormone genes and will facilitate the study coding portion of rLHP in order to determine the position of the start of mRNA transcription. To substantiate further the transcriptional start position, a second oli- gonucleotide (26 bases, Fig. 4), which overlapped the putative transcriptional start position in the gene sequence, was synthesized. The oligonucleotide was 5’-end labeled, hybridized to mRNA (1 pg) from ovariectomized rat pituitary, digested with S1 nuclease (Sigma) (21), and lengths of the protected oligonucleotide fragments compared with those of known sizes on a 20% polyacrylamide/urea gel.

require the coordinate production of the common a-subunit and the unique P-subunits. In the case of luteinizing hormone, synthesis and secretion by the gonadotropes of the anterior pituitary gland are stimulated by the hypothalamic peptide, luteinizing hormone-releasing hormone, and are subject to complex feedback regulation by the gonadal sex steriods (9).
We are interested in studying the regulation of LH at the level of gene expression. Recently, we cloned and characterized the cDNAs encoding the a-and @-subunits of rat LH (10, 11). In this paper, the cDNA encoding the @-subunit of rat LH has been used as a hybridization probe to isolate the genomic DNA fragment encoding rat LH@. Structural and nucleotide sequence analyses indicate that the isolated genomic fragment contains the entire transcriptional unit of the gene in addition to several kilobases of 5'-and 3"flanking regions. Introduction of this genomic fragment into eukaryotic cell lines and transgenic animals will facilitate analysis of the cellular and molecular mechanisms which regulate expression of the LHP gene.

EXPERIMENTAL PROCEDURES
Screening of Rat Genomic Library-A library of Sprague-Dawley rat liver DNA partially digested with EcoRI and cloned in the bacteriophage X derivative Charon 4A (12) was screened by filter hybridization (13). Using a nick-translated (14) cDNA probe rLHP-2 (1-2 X 10' cpm/pg) ( l l ) , which corresponds to amino acids +4-+lo8 (75% of the coding portion of rLHP mRNA), six recombinant phage clones were selected from 6 X lo5 plaques, representing approximately two genomic equivalents. Hybridizing recombinant phage clones were isolated by repeated plaque purification, and DNA was prepared from phage grown in liquid culture (15).
Characterization of Genomic and Recombinant Cloned DNA-Restriction enzymes (New England Biolabs or Bethesda Research Laboratories) were used under standard conditions (15) to digest recombinant phage DNA (0.1 pg) or rat (Sprague-Dawley, Charles River Breeding Laboratories) genomic DNA (10 pg). Restriction enzyme fragments hybridizing to rLHB-2 cDNA were identified by the blothybridization technique of Southern (16).
Subcloning of Phage Genomic DNA into pBR322-Restriction enzyme mapping in conjunction with blot hybridization analysis suggested that a common genomic fragment containing the rLHp gene was represented in six recombinant phage isolates (see "Results"). One of those containing a 12-kb EcoRI insert was selected for further analysis. Restriction enzyme mapping and sequence analysis indicated that recombinant phage genomic DNA contained a 5'-1.4-kb BamHIIEcoRI nonhybridizing fragment adjacent to a 3'-2.4-kb EcoRIIBamHI fragment which hybridized to rLHP-2 cDNA. To facilitate DNA sequencing, these fragments were subcloned into pBR322 and the plasmid DNA purified using standard procedures DNA Sequencing-All DNA sequencing was performed by the chemical method of Maxam and Gilbert (17). Restriction fragments labeled at the 5'-end were prepared by using polynucleotide kinase (Bethesda Research Laboratories) and at the 3'-end using the Klenow fragment of Escherichia coli polymerase (Bethesda Research Laboratories) (17)   Identification of the Site of Initiation of Transcription-Specific primed cDNA synthesis was used to determine the sequence of the 5"untranslated region of rLHb mRNA. The phosphotriester method (19) was used to synthesize a 17-base deoxyribonucleotide primer complementary to the region of rLHP mRNA corresponding to codons -5 to -10 within the signal sequence of the precursor polypeptide ( Fig. 4). The primer (50 pmol) was labeled with 32P at the 5'-end (specific activity = 6 x IO6 cpm/pmol), hybridized to 20 fig of total mRNA isolated from ovariectomized rat pituitary (20), and extended using reverse transcriptase (Life Sciences) (11). The labeled cDNA was isolated from an 8% polyacrylamide/urea gel, and its sequence was compared with that of the gene sequence in the region immediately upstream from the start of the coding portion of rLHP in order to determine the position of the start of mRNA transcription. To substantiate further the transcriptional start position, a second oligonucleotide (26 bases, Fig. 4), which overlapped the putative transcriptional start position in the gene sequence, was synthesized. The oligonucleotide was 5'-end labeled, hybridized to mRNA (1 pg) from ovariectomized rat pituitary, digested with S1 nuclease (Sigma) (21), and lengths of the protected oligonucleotide fragments compared with those of known sizes on a 20% polyacrylamide/urea gel.

Isolation of the Genomic Fragment Encoding the Rat LHP-
We have recently cloned and characterized a rat LHP cDNA which encodes most of the apoprotein (codons +4-+108) (11). In order to isolate the rat LHP gene, this cDNA was used as a probe to screen a rat genomic DNA library constructed from a partial EcoRI digestion of liver DNA ligated into bacteriophage X Charon 4A (12). Approximately 6 x lo5 recombinant phage were screened, and six phage isolates hybridized strongly to the rLHP cDNA probe. Restriction enzyme mapping in conjunction with hybridization analyses were used to characterize the phage isolates and to determine whether the six isolates contain different genomic fragments or a common genomic fragment represented in multiple recombinant phage. Five of the isolates contain a 12-kb EcoRI insert with indistinguishable BamHI restriction enzyme maps. Furthermore, using LHP cDNA as a probe, each of these isolates was found to have an internal EcoRI site which divides the 12-kb insert into 3.5-kb nonhybridizing and 8.5-kb hybridizing fragments (Fig. 1C). The sixth isolate contains only a 8.5-kb EcoRI insert. However, this insert is indistinguishable in multiple restriction enzyme digests from the 8.5-kb hybridizing fragment found in the larger insert of the other five isolates, indicating that it is the product of a more complete EcoRI digestion of DNA in the genomic library. Thus, in all six recombinant phage isolates, the same 8.5-kb EcoRI genomic fragment was detected by the rLHP cDNA probe.
Southern Blot Analysis of Rat Genomic DNA-To determine the number of rLHp genes, Southern blots of rat genomic DNA, digested with EcoRI, BamHI, or Hind111 were hybridized to the rLHP cDNA probe. A single hybridizing fragment of genomic DNA is found in each restriction enzyme digest ( Fig. 2 A ) . Furthermore, when recombinant phage DNA was digested with BamHI in parallel with total rat genomic DNA, blotted, and hybridized to rLHp cDNA probe, a single hybridizing fragment (3.8 kb) is detected in each digest, indicating that the single hybridizing genomic fragment is also represented in the recombinant phage (Fig. 2B). Nucleotide sequence and additional hybridization studies (below) demonstrate that the 3.8-kb BamHI hybridizing fragment contains only one copy of the LHP gene. These results suggest that there is only a single gene encoding the LHP subunit in the rat.
Human genomic DNA contains at least seven CGP-like genes in tandem and inverted clusters (6), in addition to a single gene encoding hLHP. In an effort to detect linked LHPor CGP-related genes in the rat, Southern blots of recombinant phage DNA containing the 12-kb genomic fragment were hybridized and washed at low stringency (32 "C) using both rLH@ and hCGP cDNA (22) as probes. Using either probe under low-stringency hybridization conditions and with large amounts of recombinant phage DNA (100 ng), the hybridization pattern is indistinguishable from that at high stringency (65 "C) (data not shown). Thus, while hCGp cDNA, which has 73% nucleotide homology with rLHp cDNA (11), readily detects the genomic fragment encoding the rLHP subunit, no additional copies of LHP-related or CGB-related genes are detected in the 12-kb genomic isolate containing the rLHP gene. Similarly, no additional LHB-related genes are detected in blots of total rat genomic DNA hybridized to rLHB cDNA probe at low stringency.' Absence of Repetitive DNA in the rLHB Gene and Its Flanking Regions-Repetitive DNA sequences have been associated with a number of eukaryotic genes and can be detected by probing blots of DNA restriction digests with low specific activity nick-translated genomic DNA (lo7 cpm/pg) (23). In this way, only those sequences which are repeated many times in the total genome will be detected. Using total rat liver genomic DNA as a probe, no repetitive sequences are found in the 3.8-kb BamHI LHB genomic fragment. However, on the same blot, repetitive sequences are easily detected in a similar digest of the 3'-end of the rat glucagon gene, which is known to contain repetitive DNA?
Restriction Mapping and Subcloning of the Rat LHB Gene-Single and double restriction enzyme digests with EcoRI, BanHI, HindIII, and ApaI were used in conjunction with hybridization to rLHP cDNA to generate a more detailed restriction map of the phage isolate (Fig. 1). ApaI, which is known to cut once within the sequence of the rLHB cDNA probe, reduces the 3.8-kb BanHI fragment into pieces of 2.0, 1.0,0.5, and 0.3 kb of which only the 0.5 and 1.0-kb fragments hybridize. This result combined with the sequence data (below) demonstrates that the 3.8-kb BamHI fragment contains only a single copy of the LHB gene. EcoRI cuts the 3.8-kb F. Carr, W. Chin LHB Gene Sequence-The strategy used to sequence the entire transcriptional unit of the rLHB gene (981 bp) as well as 794 bp of 5'-flanking sequence is illustrated in Fig. 1. The gene consists of 3 exons and 2 introns. The junctions of the exons and introns were defined by comparison of the gene sequence with the known rLHB cDNA sequence (11). The sequences of the junction splice sites are consistent with those previously reported for consensus sequences at exon-intron boundaries (24). Intron 1 is 245 nucleotides long and is inserted between amino acid codons -16 and -15 in the apoprotein signal sequence. Intron 2 is 225 nucleotides long and is inserted between codons +41 and +42. While the lengths of these introns are slightly shorter than those found in the hLHp and hCGB genes (8), their relative locations within the transcriptional unit are identical, indicating strong conservation of gene structure within the glycoprotein hormone family as well as between mammalian species. The sequences of the 3 exons are identical to those previously reported for rLHB cDNA (11) except that the 5'-terminal adenine reported in the rLHB cDNA untranslated tract has now been shown to be guanine in the rLHB gene sequence (Fig. 3) as well as in the primer extensions of rat pituitary mRNA (Fig. 48).
In order to determine the site of transcriptional initiation in the rLHP gene, a 17-base oligonucleotide complementary to rLHP mRNA corresponding to amino acid codons -5 to -10 of the leader sequence was used to prime reverse transcription of ovariectomized rat pituitary mRNA. The primer extension reaction yielded two cDNAs of nearly identical lengths (Fig. 4A). These cDNAs have identical nucleotide sequences except that the longer cDNA extends 1 nucleotide further into the 5"untranslated tract (Fig. 4B). The nucleotide sequence of the primed cDNA indicates that the start of mRNA transcription begins 7 bases upstream from the ATG codon which signifies the start of mRNA translation. To substantiate further that the end of the primer extension reaction represents the actual start site for mRNA transcription, a 26-base oligonucleotide which spans the putative mRNA cap site was labeled with 3'P at the 5'-end and hybridized to rat pituitary mRNA. The heteroduplex was digested with S1 nuclease and the lengths of the protected fragments determined by comparison with labeled oligonucleotides of known lengths on a denaturing 20% polyacrylamide gel. As shown in Fig. 4C, following hybridization to rat pituitary RNA, fragments of 19 and 18 bases in length are protected from S1 nuclease digestion. The lanes showing purified oligonucleotide (lune I ) and unhybridized oligonucleotide digested with S1 nuclease (lune 2) demonstrate that these bands are not the result of heterogeneity or degradation of the oligonucleotide. The protected fragment of 19 bases corresponds to the position of transcriptional start predicted by primer extension (Fig. 4B). The observation of single base heterogeneity in both the primer extension and S1 nuclease determination of the length of the 5"untranslated tract may reflect true heterogeneity in the cap site of the mRNA or steric effects of the cap structure on the actions of reverse transcriptase and S1 nuclease (25). Although the precise site of transcription cannot be defined, the results of both the primer extension and S1 nuclease experiments are consistent with a 5"untranslated tract which is 7 bases in length, placing the mRNA cap site at the purine base, adenine. Thus, the rLHP mRNA contains a uniquely short 5"untranslated tract when compared with those described in a recent large survey (26).
The consensus sequence, TATAAA, has been proposed to define a site approximately 21-33 bp downstream at which RNA polymerase I1 initiates transcription (24). In the rLHP gene, such a sequence (TATAAA) is found 31 bp upstream from the transcription initiation site determined above (Fig.  4). In addition, a consensus "CAAT box" sequence (24), which has been proposed to increase the efficiency of transcription (27), is located 167 bp upstream from the transcriptional start site (Fig. 3). The "CAAT box" is flanked by 5-bp inverted repeat sequences forming a secondary structure which may be relevant for sequence recognition.
As noted previously (ll), the 3"untranslated tract of rLHP cDNA contains the putative signal for polyadenylation, AATAAA (28) (Fig. 3). In addition, this is followed by the sequence TTTACAACTGC, which is strongly homologous (9 of 11 bp) to the consensus sequence which precedes the poly(A) tract (24). Since this sequence is followed by a poly(A) tract in the rLHp cDNA (ll), it likely represents the 3'-end of the transcript from the rLHP gene.
Comparison of Rat LHP and Human LH@/CGP Gene Sequences-As noted above, the sites of intron/exon junctions are the same in rat LHP, human LHP, and human CGP genes, consistent with the view that these genes arose from a common evolutionary precursor (1). In addition, there is 80% nucleic acid sequence homology between rat and human LHPcoding sequences. The high degree of homology between the rat and human LHP gene is maintained in the initial 200 bp of the 5'-flanking regions (-77%) (Fig.  5) but is less pronounced in the introns (-61%). A TATAAA box sequence is found in the same position in the rat and human LHP genes as well as one of the human CGP genes (hCG5, Ref. 8) which is thought to be expressed (8). The conservation of additional sequences in the regions adjacent to the putative promoter raises the possibility that some of these sequences are important for the regulation of gene expression. No significant homology was detected beyond the proximal 200 bp of 5'flanking sequence, and the consensus CAAT box sequence found in the rLHP gene (167 bp upstream from the transcription start site) is not present in the hLHp or HCGP genes.
Evolutionary Divergence of the LHP and CGP Genes-Comparisons of amino acid or nucleic acid sequence homology have been used as an evolutionary clock to estimate dates of divergence of related proteins (29). Nucleic acid sequence comparisons allow determination of not only the number of base changes which result in amino acid changes (replacement changes) but also the number of base changes which have no effect on amino acid sequence (silent changes). Determinations of both silent and replacement substitution thereby provides an index of the rates of neutral genetic drift at silent or intron sites as well as the rates of divergence at replacement sites. For a given set of related genes, the rate of sequence divergence at replacement sites is nearly linear when analyzed over recent evolution (since the mammalian radiation) (30).
The rates of nucleotide substitution at replacement, silent, and intron sites are shown in Table I for a number of gene pairs including human LHPICGP (8), rat LHPlhuman LHP, and several other rodent/human gene pairs (30,31). In contrast to the relatively low rates of replacement (4%) and neutral (silent = 7%, intron = 4%) substitutions in the human LHP/CGP gene pair (8), the rat LHPlhuman LBP gene pair demonstrates much greater divergence (replacement = 14%, silent = 36%, and intron = 39%). This suggests that the precursors to present day rat and human LHP genes diverged prior to the divergence of human CGP from LHP. The evolutionary relationship of these three hormones is illustrated in Fig. 6. By this analysis the divergence of the hCBp gene from the hLHP gene has been a relatively recent evolutionary event. 5'-end with 32P hybridized to ovariectomized rat pituitary mRNA, digested with S1 nuclease, and separated by electrophoresis on a 20% acrylamide/urea gel. Lane I , 5'4abeled oligonucleotide prior to S1 nuclease digestion; Lune 2, 5'4abeled oligonucleotide digested with S1 nuclease in the absence of mRNA; Lune 3, 5'4abeled oligonucleotide digested with S1 nuclease following hybridization to ovariectomized rat pituitary mRNA. Oligonucleotides of known lengths were used as markers to determine the lengths of the labeled fragments protected from S1 nuclease by mRNA (dashes at the right of the figure).

DISCUSSION
Recent evidence demonstrates that in the human there are at least seven CGB-like genes in addition to a single LHB gene (6). Moreover, the genes are clustered in tandem and inverted pairs, and it has been suggested that the CGB genes have arisen by duplication of an LHP or LHB-like ancestral gene (8). In contrast, in the rat several lines of evidence indicate that there is only a single gene encoding an LHP-or CGB-like hormone. 1) Although six recombinant phage isolates were selected using a rLHB cDNA probe, the same genomic fragment is represented in each recombinant phage. 2) Southern blots with multiple restriction enzyme digests of total rat genomic DNA demonstrate that only a single fragment from each digest hybridizes to rLHB cDNA ( Fig. 2A).
In the BarnHI digest, the single hybridizing fragment is only 3.8 kb, and this fragment is shown to contain a single copy of the rLHP gene. 3) Low-stringency Southern blots probed with either rLHP cDNA or hCGp cDNA do not detect additional LHB-or CG-like sequences in either total genomic DNA' or in recombinant phage DNA which contains several kilobases of flanking sequence on either side of the rLHP gene. Although it is possible that the nucleotide sequence of CGB in the rat has diverged enough that the rLHB and hCG@ cDNA probes cannot detect it, this seems unlikely given the high degree of homology between human LHB and CGB and the fact that hCGB readily detects rLHB.
The observation that CGB-like genes are apparently absent in the rat is consistent with results recently obtained in several laboratories. For example, Tepper and Roberts4 were unable to detect either placental rat CGB mRNA or CGB genes using a rLHB cDNA fragment as a probe. Similarly, we have been unable to detect mRNA encoding either the a-or 8-subunit of CG in rat placentae at various stages of gestation? Wurzel et al. (33) have also been unable to detect asubunit in rat placenta using either cDNA probes for mRNA or radioimmunoassay for the protein. These observations indicate that the rat lacks chorionic gonadotropin and suggest that either this gene has been lost in the rat or that the divergence and duplications of the CGB-like genes occurred after the divergence of rats and higher mammalian species.
Talmadge et al. (8) recently observed that the human LHB/ CGB gene pair has a low rate of neutral nucleotide substitutions relative to replacement site substitutions, suggesting that there has been little genetic drift between these two genes. The availability of the rat LHP sequence now provides a third gene sequence for comparison and allows determina-' M. Tepper and J. Roberts, manuscript in preparation. * 4 * ***** *t** * * ****** ******* ******* ***e************ ***** **e******* ****************** *****    Table I. The divergence of rodent and human ancestors has been estimated to have taken place approximately 85 million years ago (32). tion of the evolutionary relationship between LHP and CGP. In contrast to the human LHP/CGP gene pair, the rat/human LHP gene pair is shown to have a relatively high rate of genetic drift at the silent and intron nucleotide positions relative to the replacement sites. Furthermore, the rates of divergence found for the rat/human LHP gene pair are comparable to those observed in several other rodentlhuman gene pairs (Table I), indicating that the LHP gene does not have an unusually high rate of divergence. Thus, there has been much greater genetic drift in the rat/human LHP gene pair than in the human LH@/CGB gene pair. This suggests that the CGB genes diverged from an LHP ancestor after the divergence of rodents from higher mammals and is consistent with the absence of chorionic gonadotropin in rats.
Another placental hormone, chorionic somatomammotropin appears to have recently diverged from growth hormone (31) ( Table I). Both the LHPICGP and growth hormone/ chorionic somatomammotropin gene families contain multiple copies of closely linked genes (8, 31). While unequal crossover events between related genes can reduce the apparent rate of evolution between gene pairs in a given species (concerted evolution) (341, this process is unlikely to account for the very-low rates of nucleotide substitutions observed in the neutral sites of multiple copies of CGP genes (8).
Analysis of the rat LHP gene structure is an important first step toward analyzing the mechanisms which regulate its expression. Consensus sequences for the CAAT box and TATAAA box are located 167 and 31 bp, respectively, upstream from the transcriptional start site and likely represent promoter elements of the gene. Comparison of the 5"untranslated regions of the rat LHP, human LHP, and human CGP genes reveals that all three genes have a TATAAA box sequence in comparable positions upstream from the coding sequence (Fig. 5 ) . Nevertheless, the rat gene transcribes an mRNA with a 5"untranslated tract of 7 bases while hCGP has a much longer 5"untranslated tract of -360 bases (2, 8). This indicates that despite the strong homology with the promoter region used by the rLHP gene, the hCGp gene uses a different promoter, located further upstream. Since LH and CG have virtually indistinguishable biological effects (35), it is possible that new features within the regulatory region of the CGP gene have been important determinants in its divergence from the LHP gene. For example, selection of a regulatory region in the CGP gene which is not inhibited by the high levels of estrogen encountered during pregnancy would permit the continued production of gonadotropin necessary to maintain the corpus luteum. Studying the effects of selective alteration of the 5"flanking sequences of the rat LHP gene will further define their role in the regulation of LHP gene expression.