The beta subunit of human chorionic gonadotropin is encoded by multiple genes.

Two recombinant phage clones bearing sequences corresponding to the beta subunit of human chorionic gonadotropin (hCG beta) were isolated from a human genomic library. The beta sequences were mapped by blot hybridization of restriction digests of these phage DNAs and the nonoverlapping inserts were subcloned in pBR322 and sequenced. The nucleotide-sequencing data show that the hCG beta subunit is encoded by at least three nonallelic genes. Moreover, based on restriction analyses of human placental DNA, these genes may be linked in a single cluster with four other hCG beta-like genes. The sequenced genes all differ in their 5' flanking regions, and none of them is completely homologous in sequence to either of two hCG beta cDNA clones used here. In the translated region of one of these genes, three base substitutions result in two changes from the reported amino acid sequence. In the family of beta-containing glycoprotein hormones, the hCG beta subunit is unique in that it contains an extension of 29 amino acids at its COOH end. The DNA sequence corresponding to this region in the sequenced genes is part of a larger exon. These data show that the COOH-terminal extension does not result from splicing of the primary RNA transcript.

the hCGp and LH(3 subunits is about 80% (3). The 12 cysteine residues that form intrachain disulfide bonds are highly conserved positions in the p chain family, presumably reflecting their importance to the secondary structure required for association with the a subunit (3). The hCGP subunit is unique in that it contains an extension of 29 amino acids a t its COOH end (4). The nucleotide sequence of a cDNA clone to hCGp mRNA suggests this extension derives from loss of an upstream stop codon, since the triplet encoding glutamine, the terminal amino acid of the extension, forms part of the polyadenylation signal (5). This signal is normally present in the 3' nontranslated region of eukaryotic mRNAs (6).
Since endocrine function of the various cell types in which the a subunit is expressed depends on coordinate expression with a specific p subunit, these glycoprotein hormones offer an opportunity to study a group of closely related genes whose expression may require the presence of a previously activated o ( gene. T o examine the expression, structure, and organization of the (3 subunit genes, we isolated and sequenced clones bearing hCG(3 genes from a human genomic library and probed the hCGp-coding regions in human cellular DNA. Our nucleotide sequence data indicate that the hCG@ subunit is encoded by a t least three nonallelic genes which may be linked in a single cluster with four other hCG@-like genes. These findings are supported by the results of Boorstein et al. (7) who identified eight hCG@-like genes by restriction enzyme analysis of several genomic DNA isolates. The three genes we report on here differ in their 5"flanking sequence and none of them is completely homologous in sequence to either of two hCGP cDNA clones we have isolated. One of the sequenced genes has 3 base pair differences in the translated region resulting in changes of 2 amino acids, and the two complete gene copies show that the COOH terminal extension of hCGp does not result from a splice event during transcription.
Multiple Human Chorionic Gonadotropin p Subunit Genes 11493 containing 165 pg/ml of EtBr for 16 h at 45,000 rpm in a Beckman VSOTi rotor.
Human Genomic Library Screening-All work involving E. coli strains carrying pBR322-derived vectors or amplifying recombinant phage was carried out under P1-EK1 conditions as specified by National Institutes of Health guidelines for recombinant DNA research.
A human genomic DNA library was provided by Dr. Tom Maniatis, Harvard University. It was prepared from 15-20-kb fragments of a nonlimit HaeIII-Ah1 digest of human fetal liver DNA ligated to bacteriophage Charon 4A "arms" via EcoRI linkers (16). E. coli strain LE392 (17) was grown in Luria-Bertani (LB) broth and used as host in plating phage at a density of 1-2 X lo4 plaques/gO-mm diameter dish. Plaques were screened (18) using nick-translated PUNY (cDNA clone to hCGP) labeled with a-[32P]dATP to a specific activity of 1-2 X lo8 cpmlpg. Filter replicas of the plaques were prehybridized for 6 h at 65 "C in 3 x SSC (SSC = 150 mM NaC1,15 mM sodium citrate), 4 x Denhardt's solution (19), and hybridized to probe for 40-60 h at 68 "C in 6 X SSC, 4 X Denhardt's solution, 10 pg/ml of sheared, denatured salmon sperm DNA, 25 pg/ml of each polyadenylic acid and polycytidylic acid, 50 pg/ml of E. coli tRNA, and 0.1% sodium dodecyl sulfate. Filters were washed at 50 "C in a solution of 0.5 X SSC, 0.1% sodium dodecyl sulfate rinsed in 0.5 x SSC at 50 "C, dried, and autoradiographed at -70 'C using Dupont Cronex Lighting-Plus intensifier screens. Plaques thus identified by cDNA probes were picked, phage were replated on LE392, and rescreened as above until free of contaminating phage. Phage were isolated from cleared lysates (20) and banded in 4.0 M CsCl in a Beckman SW60 Ti rotor at 35,000 rpm for 24 h.
Analysis of Phage Inserts and Cellular DNA-Charon 4A recombinant phage DNA (200-500 ng) was incubated in a final volume of 20 pl with restriction enzymes purchased from various sources, under conditions specified by the manufacturer. Digestions were adjusted to 20 mM EDTA, heated at 68 "C for 5 min, and electrophoresed in gels of 0.9-1.2% agarose containing TBE (89 mM Tris-borate (pH 8.3), 2 mM EDTA) at a field of less than 1 V/cm. EtBr-stained gels were photographed under short wave UV and DNA was transferred to nitrocellulose filters in 20 X SSC as described (21). Filters were hybridized with restriction fragments of various plasmids, nick-translated with (Y-[~'P]~ATP or dCTP to a specific activity of 1-5 X lo7 dpmlpg. Prehybridization at 65 "C, hybridization at 65 "C, and washing of filters at 56 "C was performed in the solutions described above. Filters were autoradiographed at -70 "C with screens. Cellular DNA (10 pg in a 20-4 volume) isolated from placentae (1) was digested with 15-20 units of restriction enzyme for 8-12 h. Digests were stopped as above and electrophoresed in gels of 0.9-1.0% agarose in 0.5 X TBE at a field of less than 2.5 V/cm. DNA was transferred to nitrocellulose filters as described above, except for a preliminary soak in 0.25 M HCl for 5 min prior to alkaline denaturation. Prehybridization in 50% formamide (22) and hybridization in 50% formamide, 10% dextran sulfate (23) was performed at 42 "C in solutions otherwise identical with those for phage DNA blots. Filters were washed at 42 "C and autoradiographed at -70 "C with intensifier screens.
Subcloning Phage Inserts and Sequence Analysis-Plasmid pBR322, cut with EcoRI, treated with 2 units/100 ml of calf alkaline phosphatase (Boehringer-Mannheim) in 100 mM Tris (pH 8) at 45 "C for 1 h and phenol-extracted, was ligated to EcoRI-digested recombinant phage fragments overnight at 14 "C using T4 DNA ligase (Boehringer-Mannheim). Transformation of HBlOl was performed as for cDNA clones; colonies growing on LB-ampicillin (20 Fg/ml) were screened for phage fragment inserts by hybridization of filter replicas (12) to nick-translated PUNY and phage clones. Positive colonies were further screened by small scale plasmid purifications (15) followed by EcoRI digestion, 5% polyacrylamide-TBE gel electrophoresis, and EtBr staining. PstI digests of selected EcoRI phage insert subclones were electrophoresed in the same gel system. Specific bands were cut out and DNA was purified as described (1). Ligations into PstI-digested, alkaline phosphatase-treated pBR322 were described above. Tetracycline-resistant, ampicillin-sensitive colonies were screened by restriction endonuclease digestion of small scale plasmid preparations followed by polyacrylamide-TBE gel electrophoresis and EtBr staining.
Large scale preparations of plasmid-subclone DNAs were obtained as described above for cDNA clones. Phage insert subclones were sequenced according to Maxam and Gilbert (24), using both 5' and 3' end labeling methods as described.

RESULTS
Characterization of cDNA Probes and hCGp mRNA-To identify genomic sequences homologous to hCGP mRNA, we have used three different hCGP cDNA clones in the plasmid vector pBR322 (Fig. 1). The first of these, PUNY, contains an insert of 110 nucleotides (8) encoding amino acids 13 to 48 of mature hCGp. Two larger cDNAs were later isolated from the same pool of first trimester placental mRNA and sequenced. The first, pCGP507, contains 192 nucleotides of the 5' untranslated region and 241 nucleotides of translated sequence. The second, pCGP474, encodes 151 nucleotides of the 5' untranslated region and 429 nucleotides of translated sequence. These clones therefore contain information on the 5' untranslated region not present in a previously analyzed cDNA, which extended 25 nucleotides upstream from the initiator methionine codon and contained the complete 498 nucleotide translated region (5). This cDNA showed that the final Gln codon CAA and the terminator codon UAA form part of the polyadenylation signal in the mRNA, which has a 3' untranslated region of only 16 nucleotides before the polyadenylation site. Since synthesis of cDNA can terminate prematurely on the mRNA template, there is no assurance that pCGP507 or pCGP474 represent the complete 5' untranslated sequence of hCGp mRNA. In a primer extension experiment in which a 194-nucleotide DdeI fragment of the pCGP507 insert was annealed to first trimester mRNA, reverse transcriptase gave a product which extended -165 nucleotides beyond the 5' end of the cDNA insert.' The 5' untranslated portion of this hCGp mRNA species therefore appears to be "360 nucleotides long (Fig. 1).
We expected identical 5' untranslated sequences in the pCGP474 and pCGP507 clones if they were copies of a single mRNA species. In fact, the first 75 nucleotides upstream from the initiator methionine codon show one difference between the two cDNAs, at nucleotide -9 (see Fig. 5). At nucleotide -76, the sequences diverge. In the case of pCGP474, nucleotides -73 to -110 seem to be an inverted repeat of the hCGB coding sequence 17-54 (34 match out of 38) and nucleotides -111 to -151 share no homology with pCGP507 or any portion of the hCGP genes sequenced below. The inverted repeat in pCGP474 contains the repeated leucine codons of the hCGp signal sequence (21 bases) on one of its strands. While we have not excluded the possibility of nonlinear transcription during cloning, pCGP507 and pCGp474 cDNAs may represent copies of two distinct mRNA species.
Isolation and Southern Blot Analyses of hCG(3 Genes-To identify and amplify genomic sequences encoding hCGP subunit, a Charon 4A human genomic recombinant phage library (kindly provided by Dr. Maniatis) was probed with nicktranslated PUNY cDNA. Two isolates, CGPa and CGPe, were identified and purified from 3 x lo5 phage that were screened.
To determine the regions of CGPa and CGpe that hybridized with PUNY, the phage DNAs were digested with several restriction enzymes in single or double digests, electrophoresed in agarose gels, and blotted for hybridization with the same probe (Fig. 2). For CGpa, PUNY hybridized to a single fragment in each digest except for the EcoRI, EcoRI-BamHI, and XbaI digests. A prominent 13.6-kb fragment and a minor band of 11.3 kb are observed in EcoRI digests of CGpa. A faint XhoI fragment of 6.8 kb (below the 9.6-kb band) is also seen in the stained gel, that cannot be explained by incomplete digestion. We suspect that phage which banded at a lower density in CsCl gradients of CGP a are responsible for these fragments, since such phage are likely to contain a shorter H. Fukuoka and I. Boime, unpublished data. DNA insert. The presence of a minor HindIII fragment of 4.5 kb is consistent with a deletion of -2.9 kb in this variant, within the region marked in Fig. 3. In the EcoRI-BamHI and XbaI digests, the extra PUNY-hybridizing bands are the result of incomplete digests. The central XbaI site (brackets) is particularly resistant to cleavage for unknown reasons. The PUNY probe identifies a region of 0.6 kb between the rightward HindIII site and a PuuII site that contains all the sequence homologous to the 110-nucleotide hCGp cDNA (Fig.  3).
In contrast to this single region of homology, the CGpe insert contains two regions separated by 8 kb that hybridize to the PUNY probe. This is shown most directly by digests with EcoRI, HindIII, and PuuII (Fig. 3). The insert has no internal EcoRI sites and contains two HindIII sites that yield an 8-kb fragment not recognized by PUNY. The flanking HindIII fragments of 20.5 and 12 kb do hybridize, and each is cut by EcoRI to give hybridizing fragments of 0.75 to the left and 7.1 kb to the right of the nonhybridizing HindIII fragment. When digested with PuuII, CGPe yields PUNYhybridizing fragments of 2.5 and 2.1 kb. Further digest with HindIII shows that the two regions of homology to the probe are of the same size (0.6 kb) as that found in HindIII-PuuIIdigested CGPa. Restriction digests of both CGPa and CG@e with BglI also give common PUNY-hybridizing fragment sizes of 0.5 kb. These homologies between the three sites suggested each was a copy of the hCGP gene.
The region between the two hCGP-related sequences in CGpe contains sites for PuuII, XbaI, and XhoI, at 2.3, 2.6, and 3.2 kb from the leftward HindIII site and 1.9,2.2, and 2.7 kb from the rightward HindIII site, respectively (Fig. 3). The XhoI positions were determined from RI-XhoI double digests (Fig. 2), while those of the PuuII and XbaI sites relied on other restriction digests (data not shown) and on inferences from sequence analysis of the CGPe insert subcloned in pBR322 (see below). The array of these restriction sites indicate that sequences in each half are similar, inversely oriented to each other, and differentiated by a length of -400 base pairs present in the leftward PuuII-Hind111 region absent in the rightward region. Despite identical restriction sites to CGPa within two regions of at least 0.6 kb (HindIII, PuuII, and BglI data), the remaining restriction map of CGPe is distinct from CGpa. By probing a fraction of the human genomic library, we isolated two chromosomal segments bearing three distinct copies of hCGp-related sequences. Two of these putative genes are linked in an inverse orientation in the CGPe insert. Boorstein et al. constructed three composite maps containing eight hCGP-like genes from a series of genomic isolates (7): the first of these maps confirms the arrangement of hCGP-like sequences found in CGpe. Comparison of CGPa to their second and third maps, however, suggests the latter may overlap and share an identical gene. If so, this reduces the number of hCGP-like genes in their analysis to seven.3 Orientation of Genes in CGpa and CGPe-The putative hCGP gene structures in the CGpa and CGpe phage clones were characterized by sequence analysis. T o amplify the genomic inserts, we subcloned their EcoRI fragments in pBR322 and subclones of the phage inserts were identified using nicktranslated CGPa and PUNY. The large 13.6-kb EcoRI fragment of CGPa and the single 16-kb EcoRI insert fragment of CGPe were subjected to sequence analysis from the HindIII sites adjacent to the regions hybridizing to PUNY (Figs. 3 and 4). Although not present in the sequence of hCGP cDNA, it seemed likely that this restriction site was within either the long 5' untranslated region, or an intervening sequence conserved between the three genes. EcoRI sites at the left and right ends of the CGPe subclone were also used as sequencing start sites. Extensive sequence homology was observed when reading from the HindIII sites toward the PUNY-hybridizing regions. None of these sequence runs were pursued into the translated regions identified by PUNY. Sequencing into the CGPe insert from the EcoRI site at its left end revealed a portion of the hCGP coding sequence beginning a t nucleotide 320 and continuing through the 5' end of an exon that begins at nucleotide 184 ( Figs. 4 and 5 ) . This established the leftward PUNY hybridization region of CGPe as an hCGP-like gene and allowed us to infer the 5'-3' orientation of the three genes. Sequencing into the CGpe insert from the EcoRI site at its right end for 140 nucleotides showed that this region contains a small exon partially homologous to the first five codons of hCGP mRNA and continues in a 5' direction with some homology to the untranslated regions of hCGP cDNAs ( Fig. 4 and see below). This exon lies 7 nucleotides to the left of a PstI site and 85 nucleotides to the left of the insert end. Besides providing further information on the probable structure of the three genes identified by the hCGp cDNA probe, the presence of this exon increased the number of known hCGP-like gene structures to at least four, three linked in the CGPe and one in the CGPa inserts. Boorstein et al. (7) have reported that this fourth gene copy, which corresponds to P-HCG4 in group a of their maps, encodes the LHP subunit.
Sequence Determination of the Cloned Genes-To obtain further sequence information, we inserted PstI fragments from the PUNY-hybridizing regions of the above EcoRI subclones into pBR322. A 700-nucleotide fragment from CGPa CGPa is homologous to the group b map of Boorstein et al. (7) except that the 2.8-kb region at the rightward end of CGPa contains an XbaI site whereas Boorstein et al. identify another gene copy, P-HCG5, in an equivalent position. The CGPa gene copy seems identical not only with the P-HCGP gene copy of group b, but also to the P-HCG7 gene of group C, when adjacent restriction enzyme sites are considered. Group c consists of two gene copies, P-HCG7 and 8, which form a tandem pair. These could recombine to generate gene 6 of the inverted pair represented in group b, if gene 5 is originally found adjacent to, and in an inverted pair with gene 8. We do not know how many genomic isolates reflect the organization of genes in group b, nor do we reject the possibility that CGpa itself reflects a recornbinant genotype. coli phage X and Charon 4A human genomic library clones CGPa ( a ) and CGPe ( e ) are indicated above the lanes. The dark panels on the left are the 0.9% agarose gels containing marker and clone DNA visualized under UV light with ethidium bromide. The light panel7 on the right are autoradiograms of nitrocellulose filter transfers of the DNA which were hybridized to "P-labeled nicktranslated PUNY cDNA. Autoradiograms are matched to their respective gels. The left margin shows phage X HindIII marker fragment sizes in kilobases. and CGpe was the only PstI fragment detected by PUNY hybridization (data not shown). Following ligation and transformation, two CGpe subclones and one CGpa subclone were selected for sequence analyses. Assignment of the two PstI fragments from CGBe to the appropriate PUNY-hybridizing regions of the genomic insert was made, based on several base changes in the sequences reading 3'-ward from the Hind111 sites of the left and center gene copies present in CGpe (see Fig. 5). Complete sequence determination of these three PstI inserts was pursued, and showed that they contained the second exon of the three gene copies, which extends from the sixth codon of the signal peptide to codon 41 of the mature protein, or nucleotide 16 through 183 of the mRNA. This exon was flanked in each case by appropriate splice recognition sequences (Ref. 25 and Fig. 5). At the extreme 3' end of the inserts, four codons from the next exon were detected, corresponding to the sequence next to the PstI site found in the coding region of each hCGp cDNA. The 5' and 3' ends of hCGP coding information therefore lie outside the PstI subclones (Fig. 4). These regions were sequenced directly from the larger EcoRI subclones, using restriction sites shown by arrows in Fig. 4. In the three complete or partially complete genes, the 15-nucleotide translated region of the first exon is separated from the 168-nucleotide second exon by an intron of 351-352 nucleotides (Fig. 5)  A Possible Pseudogene-Several base changes and apparent deletions and insertions occur when the first and second introns of the sequenced genes are compared, but the only alteration that may affect transcript splicing is at the donor position in the first intron of the leftward CGPe gene. The GA, rather than GT, sequence at this site contradicts the donor consensus sequence present in 139 tabulated splice junctions (25). The 5' untranslated regions of the genes in the two inserts are compared in Fig. 5 to those of pCGP474 and pCGB507. As mentioned above, these two cDNAs diverge in sequence 75 nucleotides upstream from the initiator methionine codon. Three of the four genes share this region of homology with the cDNAs, namely the CGpa gene, the central gene in CGPe, and the rightward gene in CGPe. The leftward CGpe gene shows homology to these sequences for only nine bases upstream from the translational start site, then diverges completely from the upstream cDNA sequences. The restriction map of the CGpe insert (Fig. 3) shows that the cluster of PuuII, XbaI, and XhoI sites upstream from this gene copy are -400 nucleotides further from the PUNY-hybridizing region than those of the central CGpe gene 5' flanking region. Southern blot analysis of this region in CGPe, using pCGP474 and pCGP507 as probes, will be necessary to determine if the leftward CGpe gene shows any further homology to these cDNAs in its 5' flanking region. The presence of an insertion Pr "+ at this site together with the splice junction mutation noted above would suggest that this copy is a pseudogene.

3' Flanking Regions and Sequence
Summary-The CGPa gene and the central CGpe gene show considerable homology in their 3' flanking regions: notably, each contains an &fold repeat of the doublet CA, located 51-52 nucleotides from the polyadenylation signal AATAAA and 36 nucleotides from the presumptive site of polyadenosine addition to the transcribed message. Although the 16-nucleotide 3' nontranslated region of the CGpe gene agrees with the previously reported cDNA sequence ( 5 ) , the CGPa gene shows three base changes and one deletion in this comparison. We suggest that transcripts of this gene add polyadenosine at an equivalent distance from the termination codon. These three genes are clearly copies of the hCGP-encoding sequence, though the CGpa gene displays two amino acid coding substitutions, Pro 4 + Met 4 and Asp 117 "-* Ala 117, and the leftward CGPe gene displays a 5' translated region that bears little homology to either cDNA clone and a splicing mutation. Both complete copies of coding sequence demonstrate continuity of the third exon and the COOH-terminal-encoding region. The Asp 117 +Ala 117 substitution in the CGpa gene copy occurs at the start of the sequence that encodes this extension.
Are the hCGp Genes Clustered?-The multiplicity of genes present in the above inserts prompted us to ask what the total number and arrangement of hCGP genes is in the human genome. We digested high molecular weight human DNA with HindIII, EcoRI, BamHI, XbaI, and KpnI and probed these fragments with a H h d fragment of pCGP474 which contains the 151-nucleotide 5' untranslated sequence contiguous with 246 nucleotides of the translated region (Fig. 1). In comparison to known hCGP gene sequences (Fig. 5), one might expect probe recognition of primarily nontranslated sequence to the left of the HindIII site in the first intron and of coding sequence to the right of this site (second exon and part of third exon). Except HindIII, none of the enzymes used cleave within the coding regions present in CGPa or CGPe nor in the immediate flanking or intervening sequences (Fig. 3). In the genomic analysis (Fig. 6), EcoRI appears to give one fragment that hybridizes to the hCGP cDNA. Although this band seems only slightly larger than the 23.1-kb marker fragment, the EtBr-stained gel indicates that the largest fragments of genomic DNA in these preparations migrate at this position. Even if the hybridizing EcoRI fragment is larger than -23 kb, the method of electrophoretic separation prevents an -s s s s s TG __" " _" " " " " _" " " " " " " " " " " " " " " ce,lter """""-"""-""-

C C C C C G C T T C C A G G C C T C C T C T T C C T C A M G G C C C C T C C C C C C T T T C C C G f ? C T C C C G G G G C C C T C G G A C A C C C C G A T C C T C C C A C~~A
CGBe c e n t e r A G pProArgPheGZnAZaSerSerSerSerLysAZaProProProSerLe,euProSerProSerArgLeuProGlyProSerAspThr~oIleLemProGZn ASP 4 9 s CC Ba

FIG.
5. Nucleotide sequence of pCGP507, pCGP474, and hCGP-related sequences in genomic clones. The first line is the complete untranslated region of the pCGP507 insert. The second line is the equivalent portion of the pCGP474 insert. Lines 3-6 indicate the DNA sequence in the Charon 4A human genomic library clones CGpa and CGpe. Exons are defined by the encoded hCG0 amino acid sequence underneath. Introns are defined by the shaded boxes enclosing the beginning and end of their sequences. Nucleotide numbering originates at the initiator methionine codon and is continuous in the 5' upstream direction, but is interrupted in the 3' direction by introns. Only those bases differing from the top line sequence are indicated on their respective lines. Undetermined nucleotides are denoted by hyphens; spaces (s) have been inserted to maximize sequence homologies. CGPe left sequence ends at the leftward EcoRI site (nucleotide 320) and CGPe right sequence ends a t the rightward EcoRI site of the genomic insert (at nucleotide 80 of Intron I). CGPa and CGPe center are intact copies of the coding region of hCGP. Suggested sites of polyadenosine addition to gene transcripts are shown by arrows.
accurate determination for the length of restriction fragments fragment(s) containing the hCGP genes. The CGPa insert above this size. BamHI also yields a hybridizing band in this may represent one end of a hybridizing genomic EcoRI fragrange, plus a fragment of 6.9 kb. Double digestion with EcoRI ment, since it contains two EcoRI sites and three BamHI and BamHI gives the same pattern as BamHI alone. These sites, all to the left side of its hCGp gene (Fig. 3). The CGPe data indicate that BamHI sites are limited within the EcoRI insert is internal to a genomic EcoRI fragment, because it lacks sites for this enzyme. CGpe contains two BamHI sites near the right end of the insert, however, separating this LHPlike gene from the two others in this insert. This rightward gene may be part of the 6.9-kb BamHI genomic fragment or part of another >23-kb fragment. If the region of the genome carrying these target sequences is not polymorphic for BamHI and EcoRI sites, the CGpa and CGpe inserts may represent regions near opposite ends of a single hCGp gene cluster.
Seven hCGP-like Genes in Genomic Digests-The question of how many genes lie in this putative cluster is further delineated by digests with KpnI. Six different KpnI fragments were identified in the genomic blot (12.5,9.1, 7.7, 6.5, and 6.1 kb, Fig. 6). We also mapped these sites in the CGpa and CGPe inserts (data not shown; Fig. 3). A hybridizing KpnI band of 10.4 kb is expected from the CGpa insert, which probably corresponds to the 12.5-kb genomic fragment that is cleaved by RamHI in double digest. The 2.1-kb disparity in fragment size may be related to the deletion region in CGPa discussed above. The central gene in CGpe is located in a KpnI fragment of 7.7 kb, corresponding to a genomic fragment of the same size which hybridizes more strongly than any other of the six bands. The central gene in CGpe could have greater affinity for the 5' untranslated sequence in the probe or, alternatively, the 7.7-kb genomic band represents more than 1 gene equivalent. Secondary restriction with HindIII results in loss of the signal at 7.7 kb, and appearance of the 3.6-kb HindIII-KpnI coding region fragment from CGpe, not the 4.1-kb KpnI-HindIII fragment containing the 5' nontranslated region. The pCGp474 nontranslated region is thus not responsible for the strong KpnI 7.7-kb signal and more than 1 gene equivalent seems to be present in this fragment. Presuming there are two 7.7-kb fragments that contain one gene each, linkage of the seven KpnI fragments would yield a continuous length of -57 kb for the cluster.
The number of hybridizing KpnI fragments may reflect polymorphisms in the region containing the hCGP genes. Such polymorphism has been previously observed in the hCGa gene with EcoRI and HindIII (1). T o address this point, we used a trophoblast tissue that is primarily homozygous. The genome of hydatidiform mole is of androgenetic origin (26-28). It usually develops from duplications of one of the spermatids in an enucleate egg. By this criterion, we have recently shown that hydatidiform mole is consistently homozygous at the hCGa locus.4 KpnI digestion of molar DNA and hybridization of the blot transfer to pCG(3474 probe gives the same pattern of six bands as normal placental DNA (data not shown). Moreover, we have seen this KpnI pattern in digests of DNAs from several individual placentae and moles. These data indicate that the genome contains a t least seven nonallelic hCGP-like genes. It should be noted that none of these genes hybridized preferentially to the 5' untranslated region of the pCGP474 probe. The 38-nucleotide segment of this untranslated region which resembles part of the conserved translated portion of hCGp may be hybridizing intramolecularly in the probe, or to this conserved region in each of the gene copies, rather than to the specific, 5' flanking region of the gene from which it is transcribed.

DISCUSSION
There is no reason to expect that all of the structural genes encoding the p subunits of the glycoprotein hormone family should be clustered at one locus, since even the closely related hCGB and hLHp chains share only 80% homology. Nevertheless, it seemed possible the PUNY cDNA probe would recognize hLHP genes, since it encodes a portion of hCGp which is 86% homologous to the equivalent amino acid sequence of hLHp (31 of 37 residues). This was not the case in those Charon 4A recombinant phage clones which we isolated with this probe. Instead, we found that the PUNY-hybridizing regions represented nonallelic copies of the hCGp gene. Though the gene copy in the CGPa insert contains three nucleotide substitutions that change two amino acids of the encoded protein, no frameshift or stop codon mutations were detected in any of the coding regions that would clearly indicate the existence of pseudogenes. However, examples exist of P-thalassemias in which changes of intron or exon sequence alter the normal transcript splicing patterns, with drastic effects on stability of transcribed sequences (29, 30). Thus, even if the leftward gene of the CGpe insert is transcribed, it may produce little or no stable mRNA, considering the splicing mutation in its first intron.
Isolation of two hCGP cDNAs with differing 5' nontranslated sequences (pCGP507 and pCGP474) from a pool of first trimester cDNAs suggests that at least two hCGp genes are transcribed to produce stable message. Previously, we observed a form of hCGp translated from purified fractions of first trimester mRNA that differs in methionine content from the predominant precursor form of hCGp (31). However, the inverted repeat in the 5' region of pCGP474 could arise from readback of the primary cDNA strand on itself. Readback does not explain base mismatch a t four sites in the 38base pair repeat and the unique sequence of 40 nucleotides which precede the repeat. The two AUG codons upstream from this secondary structure (Fig. 5) suggests an interesting control in the translation of the proposed mRNA encoded by M. Hoshina, M. Boothby, and I. Boime, manuscript in preparation.
by guest on March 24, 2020 http://www.jbc.org/ Downloaded from pCGp474. We are sequencing the 5' structures of the cloned hCGp genes to determine if one or both of the cDNAs arise from these particular hCGp gene copies. We can already rule out the copy in CGpa as the source of either clone, since neither contains the Pro 4 + Met 4 mutation of this gene.
Another mutation found in the CGpa gene occurs near the start of the COOH-terminal extension of hCGP. Asp 117 in the hCG0 chain is seven amino acids beyond the last conserved cysteine residue of the p chain family. The hTSH(3 chain terminates at an equivalent position to this Asp residue, while hLHP terminates one amino acid before this position and hFSHp continues for 7 more residues. The origin of the relatively long peptide extension in hCGp, continumg 29 amino acids past the terminus of hLHp, remains obscure. The evolution of this COOH-terminal peptide extension probably predates the divergence of horses and primates since the / 3 subunit of pregnant mare serum gonadotropin, which is similar in biological activity to hCG, contains an equivalent COOH-terminal extension (32). Our sequence analysis of two complete gene copies demonstrates that it is an integral part of the third exon of each gene and is not joined to the rest of the mRNA by a post-transcriptional splicing event.
The 3' flanking regions of the two complete gene copies studied here contain the sequence (CA)s, 30 nucleotides downstream from the presumptive polyadenylation sites. A similar sequence is found in the 3' flanking regions of human CYglobin genes ~~1,012, and pseudogene d a l , where it defines the endpoint of sequence homology between the three duplicated regions (33). Similarly, we find that sequence homology decreases markedly downstream from the (CA)s sequence in our hCGp clones (Fig. 5). A larger survey of these regions in the family of hCG@ genes will be necessary to see how prevalent this sequence is in defining gene duplication units.
The importance of the multiplicity of hCGP genes is obscure at present since we do not yet know how many of the seven identifiable genes produce stable transcripts. Studies dealing with bovine LH biosynthesis suggest (34), and cDNA quantitation of hCG mRNA (35) show, that there is an excess of CY subunit mRNA relative to the corresponding p subunit mRNAs. The p subunit family of genes may possess similar regulatory sequences that are effective in limiting the output of each of the glycoprotein hormones. In the case of hCG& evolutionary gene duplication may allow the transient surge in hCG that is needed in the first trimester, or possibly allow switching of genes at various stages of placental development.