Structural Studies on Equine Glycoprotein Hormones AMINO ACID SEQUENCE OF EQUINE LUTROPIN @-SUBUNIT*

The amino acid sequence was determined for equine lutropin /3 (eLHB). Large fragments were derived from reduced, carboxymethylated eLH@ by digestion with Staphylococcus aureua V8 protease, by cyanogen bro-mide cleavage, and by cleavage of acid-labile Asp-Pro bonds. The fragments were purified by gel filtration and high performance liquid chromatography (HPLC). The fragments were sequenced by automated Edman degradation to establish the primary structure of eLH@. Some peptides were further digested with chymotrypsin and the resulting peptides purified by HPLC. In addition to sequencing by automated Edman degradation, these were also sequenced by the complementary 6-dimethylaminonaphthalene- 1-sulfonyl-Ed-man procedure which enabled us to directly identify glycosylated amino acids. The eLHB subunit is a glycoprotein of 149 amino acids containing both N- and 0-linked oligosaccharides. It possesses a COOH-ter-minal extension similar to that seen in human chorionic gonadotropin. Carboxypeptidase Y digestions suggest that the COOH terminus is blocked by glycosylation. Interestingly, the amino acid sequence of eLH@ is identical to that of equine chorionic

Although there can be up to four glycoprotein hormones, there appear to be only 3 receptors. There are specific receptors for TSH' and FSH, whereas LH and CG share the same receptor. In spite of their possessing approximately 44% identity at the polypeptide level (due to the common a-subunit) there is little cross-reactivity between a given glycoprotein hormone and the other two receptors. Exceptions to this rule include eCG and eLH. The dual LH/FSH nature of eCG has been recognized for a long time (19), and more recently the dual LH/FSH nature of eLH has been demonstrated by several laboratories (20-22). There is evidence that part of the FSH activity observed with these hormones results from the lack of specificity of the FSH receptors of the rat (22), and this probably extends to those of the pig, cow, and donkey as well (20-24). Nevertheless, even the horse FSH receptor recognizes some FSH-like structure in eLH and eCG that it does not recognize in other LH molecules (22). It was to investigate the intrinsic FSH activity of eCG that our laboratory became involved in the study of the equine gonadotropins. Highly efficient methods for the isolation of equine gonadotropins have been devised (25,26). In an earlier report the complete amino acid sequence of eCGa was proposed along with a partial sequence proposal for eCGP (27). In the accompanying paper we report the complete amino acid sequence for eCG@ (28). Here we report the complete amino acid sequence for eLH@.
Portions of this paper (including "Materials and Methods," "Results," Fig. 1-13, Tables I-XIII, and Footnotes 3-7) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 86M-3052, cite the authors, and include a check or money order for $15.20 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press. Edman procedure. The initial cleavage procedure (CNBr) proved to be inefficient and was carried out under conditions that led to additional partial cleavages at acid-labile sites. Nevertheless, the CNBr cleavage studies made two major contributions to the structure determination: 1) by supplying an overlap of two V8 protease fragments not provided by any other fragment; and 2) by leading to the discovery of a pair of conveniently located Asp-Pro bonds. A convenient and versatile procedure was developed to cleave these acid-labile bonds. The fragments from reduced and S-Carboxymethylated eLHB were used in establishing two important parts of the structure: the determinant loop region and the COOHterminal sequence. Incubation of intact eLHP with dilute HC1 provided two fragments: the COOH-terminal glycopeptide that could be used in the structure determination and the disulfide-bonded core that has been used to examine the biological activity of hybrids of eLHa or ovine LHa and eLH@ or eLHB lacking the COOH terminus? Staphylococcus aureus V8 protease proved to be exceedingly useful in providing fragments which filled in most of the gaps in the sequence. The enzyme was useful almost as much for the cleavages it did not make as for those that it did make. Of the 3 Glu residues in eLH@, V8 protease only cleaved on the COOHterminal side of 2. This was most useful since 2 were separated by a single lysine. Had both Glu residues been targets of equal probability, overlapping sequences would have resulted because cleavage after l Glu would preclude cleavage after the other since the enzyme does not work near the NH2 terminus or COOH terminus of a peptide (39). When V8 protease was used to redigest the COOH-terminal fragment V8-I under conditions where cleavage after aspartic acid residues occurred (phosphate buffer) only one of four potential cleavages occurred. This facilitated purification of the fragments and permitted overlaps with the acid-labile fragments to be obtained. It should be noted that in the case of the a-subunit of eLH, the CNBr fragments proved to be the most amenable to sequencing, the V8 fragments much less so, and there were no acid-labile bonds to exploit?
We used two complementary sequencing procedures. Automated Edman degradation, which included the automatic conversion of the anilinothiazolinones to phenylthiohydantoins, was used to obtain most of the sequence information including the amide placements. The manual dansyl-Edman procedure was used to directly identify glycosylated amino acid residues and to confirm the identification of Thr and Ser residues. The latter procedure was typically applied to small peptides such as the chymotryptic peptides derived from larger acid-labile fragments as well as the tryptic peptides of intact eLHB.
The most difficult part of the structure to establish was that of the COOH terminus. Because of inadequate supplies of eCGB, a crucial overlap just outside the disulfide-bonded core was not obtained in the earlier sequence study (27); therefore, the framework provided for the initial 110 residues was lacking for the COOH terminus. Unlike eCGP, where a heavily glycosylated, but homogeneous COOH terminus peptide (CTP) resulted from Asp-Pro cleavage, eLHB CTP was heterogeneous in terms of carbohydrate and in having COOHterminal heterogeneity. This was reflected in the HPLC pattern (Fig. 7B) and was only partially remedied by deglycosylation with trifluoromethanesulfonic acid (data not shown). Nevertheless, the eLH@ CTP could be sequenced in a spinning cup sequenator and we were able to obtain most of the sequence by automated Edman degradation. The dansyl-Edman procedure established the sequence of residues 127-134. The last 9 residues proved to be the most difficult to sequence. Chymotrypsin acted rather nonspecifically at the Arg-Arg. We tried thermolysin and tryptic digestions of the CTP, but the combination of partial digestion coupled with the fact that all the peptides that resulted from these digestions eluted in the same area as the undegraded CTP made it impossible to isolate peptides for sequencing. We finally solved the problem by digesting intact eLHB with trypsin. From earlier experiments we knew that the Lys at position 119 was not a very efficient tryptic cleavage point and that the double Arg at positions 139-140 was very efficiently cleaved by trypsin. We were able to isolate two COOH-terminal fragments from trypsin-digested eLHB (Fig. 10). T-1 represented residues 141-149, whereas T-2 consisted of residues 141-146. The total yield of T-1 and T-2 was 49% of theoretical. The yield of T-2 was 12% and carboxypeptidase Y digestion quantitatively released Ile (Table XII). These results suggest that approximately 24% of the eLH@ has lost the COOH-terminal 3 residues Lys-Thr-Ser. Earlier carboxypeptidase Y digestions of eLHP resulted in partial release of Ile (20-30%). Carboxypeptidase Y digestion of the highly purified T-l subfractions b and c was not possible since sequencing expended the small amounts of available peptide. Nevertheless, we conclude that the COOH terminus of eLH@ is blocked to carboxypeptidase Y digestion, probably by glycosylation of either Thr or Ser.
The only COOH-terminal residues available to digestion are those in peptides which have been partly degraded so that they entirely lack residues 148 and 149 and partly lack Lys-147. The COOH-terminal heterogeneity probably reflects some degradation of the gonadotropin. As mentioned before (31), eCG was isolated from serum where it was initially protected from degradation by serum protease inhibitors. NH2-terminal heterogeneity has been reported for eCG preparations isolated from endometrial cup tissue (40). NH2terminal and COOH-terminal heterogeneity of glycoprotein hormones has commonly been reported and is probably the result of tissue degradation during pituitary collection (1).
The results of this study, which are summarized in Fig. 2, show eLHB to be a 149-residue glycoprotein, the same as eCGB. In the early stages of these studies, we observed that, with few exceptions, the only places where the sequence of eLHB differed from that of eCG@ were those places where no definitive structure had been determined previously. When supplies of eCG@ became available that sequence determination was resumed, and all the uncertainties of the eCG@ sequence were resolved as described in the accompanying paper (28). All of the sequence differences disappeared and we were left with the surprising observation that the amino acid sequences of eLH@ and eCGB are identical. This finding is, of course, consistent with the immunological similarity between the two hormones (41), demonstration of the dual LH/FSH activity of both hormones in certain systems (20-22), and the inactivity of the eLH@ or eCG@ hybrids resulting from recombination with all a-subunits except equine (42-44). Hereafter, we will refer to the identical eLH and eCG @ subunit sequence as eCG/eLH@. Where the structure of either hormone alone is discussed, eLH@ and eCG@ will be employed. Fig. 11 summarizes the known amino acid sequences of glycoprotein hormone @-subunits including the sequences predicted from the DNA sequences. Half-cystine placements were used as positional points of reference. There are four regions which are remarkably conserved among all the 8subunits: Thr-Ser-Ile-Cys-Ala-Gly-Tyr-Cys (residues 31-38), Gln-Pro-Val-Cys-Thr-Tyr-Arg-Glu-Leu (residues 54-62), Leu-Pro-Gly-Cys-Pro (residues 69-73), and Ser-Phe-Pro-Val-Ala-Leu-Ser-Cys-His-Cys-Gly (residues 81-91). These conserved identities undoubtedly indicate a biologically significant role for these structures in the hormone-specific @subunit.
On the other hand, perusal of the sequences inside the disulfide-containing core (residues 1-110) indicates that, at residue 51, eCG/LHp and hCGP share an alanine substitution, whereas LHP and FSHP contain proline and TSHP contains lysine. There are, furthermore, some unique replacements of amino acid residues in eCG/LHP; Ile-25, Met-45, Ile-52, and Gln-Ile-Lys (residues 94-96) are not found at the respective positions of all other @-subunits (even in hCGB).
The percentage of amino acid identities between each pair of P-subunits is given in the Matrix Table XIII. A careful comparison of Fig. 11 and Table XI11 shows that eCG/eLHP more closely resembles the @-subunits of LH and it is much less homologous to ,&subunits of FSH and TSH. At residue 30 ( Fig. l l ) , eCG/eLHp and all species of LHP (except for human CG and LH) share a threonine residue and this residue is not glycosylated, whereas hCGp and all the other glycoprotein hormone 8-subunits have an asparagine residue, and an N-linked oligosaccharide is found at this residue. hLH@ also has this asparagine; however, it is not glycosylated due to an Ile substitution for Thr at position 32. Therefore, on the basis of its overall structure, eCGP appears to be an LH-like hormone, and this is consistent with the biological characterizations of its activity. The human hormones differ from each other primarily at the COOH terminus, where hCGB has a 24-amino acid extension, but also share only 86% identity over the first 120 residues. Most of the differences are scattered single or paired amino acids substitutions until the last half-cystine (Cys-110) is reached; thereafter, hLHP shares only 2 of 11 amino acids in common with hCGp. Eight of these different amino acids (at positions 114-121) are the result of a single base deletion in the hCGP gene that causes a shift to an alternate reading frame and eventually permits read-through into the 3"untranslated region and results in 16 of the 24 additional amino acids observed in hCGP (47). In contrast, the amino acid sequences of eLHP and eCG@ are identical. Comparison of the eCGIeLH(3 subunit sequence with those of hLHP and hCGP shows a high degree of sequence homology of the disulfide-bonded core (67% compared with either hLH or hCG) in contrast to the almost complete lack of homology after the last half-cystine. Only 3 out of 35 amino acids are the same between eCG/eLHP and hCGB as we have the sequences aligned in Fig. 11. There does exist, however, a 10-residue stretch of high homology if one allows for a 3residue frame shift in the sequences as shown in Fig. 12. We have included the gene sequences for hLHP, and the PCG5 and pCG6 genes from the report of Talmadge et at. (47). Although it has been reported that only PCG genes 3 and 5 appeared to be expressed in the human (66,67), it is interesting that the equine has an Ala-114 in place of the Asp-117 in this frame-shift comparison. Moreover, this same residue change is present in the unexpressed pCG6 gene (Fig. 12). The equine sequences showed an additional Ala-Asp conversion 6 residues further down the sequence which does not occur in the PCG6 sequence. Determination of whether or not this represents a highly conserved sequence will require examination of the COOH sequences of other glycoprotein hormones that have the COOH-terminal extension. Likely candidates include chimpanzee and gorilla CG, which are known to react with antibodies raised against the COOH terminus of hCG (68), and guinea pig CG, which has recently been isolated and appears to have a COOH-terminal extension (69). To test the potential of a pituitary LH to obtain a similar sequence by means of a single base deletion, as appears to be partly responsible for the COOH-terminal extension in hCG, we examined the known sequences for the three LHP subunit genes that have been sequenced (47, 50, 53) along with that of a TSHP gene (64). We aligned the sequences at the last half-cystine and compared the sequences. The rat and cow LH gene sequences were consistent with the hypothesis that a single base deletion may have occurred in the hCGP gene (47). We then predicted the result of a single base deletion occurring in the same place, as it apparently has in hCGP. These predicted sequences corresponding to hCGB amino acid residues 116-125 are as follows: hLHP, -Gln-Ala-Ser-Ser-Ser-Ser-Lys-Asp-Pro-Pro-; bovine LHP, -Gln-Thr-Ser-Ser-Ser-Ser-Lys-Asp-Ala-Pro-; rat LHP, -Pro-Ala-Phe-Ser-Ser-Ser-Val-Ala-His-Pro-; and mouse TSHB, -Ser-Ile-Trp-Gly-Asp-Phe-Leu-Phe-Asn-Phe-. Interestingly, the predicted hLHB sequence would be identical to that of eCG/eLHB, 116-125 versus 113-122 as in Fig. 12 (although thereafter the predicted hLHP sequence diverges), and the predicted bovine LHB sequence would be fairly similar. The predicted sequence for the rat LHB would have only four or five amino acids in common and that of mouse TSHP would have none in common. There appears to be the potential to generate a sequence similar to that which is found in the COOH terminus of hCGB and that of eCG/eLHB by means of a single nucleotide deletion in some LHP genes. Therefore, in the horse, the transformations leading to the COOH-terminal extension may have occurred before the genes were duplicated (if they were duplicated). The similar stretch of sequences of the COOH termini might be a conserved protein sequence. An approach toward resolving this question employing protein chemistry would be to cleave the Asp-Pro bonds found outside the disulfide-bonded core of eCG/eLHP and hCGB and see whether the resulting modified subunits could recombine with an a-subunit and, if so, whether the resulting recombinant would be biologically active. It would be intriguing to compare both modified recombinants because in the case of modified hCGP, the "conserved" COOH-terminal sequence would be lost, whereas all but two amino acids would be retained in the modified eCG/eLHP. Studies along these lines have been initiated?
Moore et al. (70) have proposed that the peptide area on the (3-subunit between half-cystine at residues 93-100 is important in determining whether a hormone is an LH-, an FSH-, or a TSH-like molecule. This is called the "determinant loop" hypothesis. In this hypothesis, glycoprotein hormones having a net charge of 0 to +1 have LH-like activity, whereas the FSH-like active molecules have a net charge of -3. Therefore, it was important to determine whether glutamic acid or glutamine exists at residue 94 in eCGP, which was not identified in our provisional sequence (27). As shown in the preceding report (28), we have directly demonstrated a glutamine residue at 94 (not glutamic acid). This was further supported by the finding that the peptide bond between residues 94 and 95 (Gln-Ile) was not hydrolyzed by S. aureus V8 protease at all. As reported herein the same is true for eLHP. Consequently, the net charge in the determinant loop of eCG/eLHP is 0, similar to the LH molecule. In this respect, eCG and eLH are both LH-like hormones. As shown in Fig.  11, 22 amino acids in eCG/eLHP are different from the corresponding residues in hCGP when compared inside the disulfide-bonded core. Examination of the nucleotide sequences encoding these 22 amino acids of hCG@ shows that 16 out of 22 amino acid pairs could result from a single base change. However, the remaining six pairs between eCG/eLHP and hCG& Val/Thr (residue 42), Pro/Val (residue 551, Gln/ Arg (residue 94), Ile/Arg (residue 951, Lys/Ser (residue 961, and Phe/Pro (residue 103), would imply more extensive change as compared to the nucleic acid sequences provided by Talmadge et al. (47). (Some of these could represent single base changes if evolving from an alternative progenitor sequence, however.) It is interesting that three pairs (Gln/Arg, Ile/Arg, and Lys/Ser) are located in the determinant loop. Fig. 13 represents the nucleotide sequences for the determi-nant loop that are presently available. If the same codons as those of hCG@ are utilized for identical amino acid residues which eCG/eLH@ and hCG@ share in the loop, a total of 21 nucleotide sequences are still possible as the nucleotide sequence of the loop for eCG/eLH@. Although the nucleotide sequences of eCG/eLH@ genes have not been elucidated, it can be readily imagined that the nucleotide sequence encoding for Gln-Ile-Lys (residues 94-96) in the determinant loop of eCG/eLH@ may also contain unique features as compared with those of other @-subunits.
The eLH studies were initiated with the expectation that eLH would possess only LH activity, which would therefore permit the amino acid sequence determination to be a test of the determinant loop hypothesis. Instead, eLH was also found to possess intrinsic FSH activity, and we have demonstrated that its structure is identical to that of eCG at the protein level. There is an LH from a related species, the donkey, which exhibits only LH activity (24). Examination of the structure of the determinant loop of donkey LH might provide the test of the determinant loop hypothesis that we originally set out to perform.
Direct tests of the determinant loop have been frustrated by meager supplies of eCG. Ryan and his colleagues (37, 71) are the only other group that have reported an attempt to test the determinant loop hypothesis. They modified the 93-100 loop of hCG@ using a protein kinase that recognized a sequence, -Arg-Arg-Ser-Thr-, that is unique to hCG8. The 93-100 loop was found to be shielded by the a-subunit in intact hCG. It was accessible in the free 8-subunit. The phosphorylated hCG@ could be recombined with hCGa but there was no change in the FSH/LH ratio of the modified hCG compared with the native hCG since the LH and FSH activities of the modified hCG were both reduced to the same extent. This did, however, suggest that this region of hCG@ is recognizable by both LH and FSH receptors. Since eLH is more readily available than eCG and since the structure work is complete, direct tests of the determinant loop hypothesis are possible using this hormone. The existence of a Lys in the 93-100 loop of eLH8 will permit chemical modification experiments to address the charge distribution hypothesis for which the experiments of Ryan and co-workers (37, 71) did not directly provide support. Since the 93-100 loop may be an area of a-

interaction, cross-linking experiments could indicate what
part of the a-subunit is nearby.
Since the amino acid sequences of the two hormones are identical, the apparent, but insufficiently characterized differences in carbohydrate probably account for the reports of differences between eLH and eCG in bioassays. Equine LH has consistently been found to be more active than eCG in either LH or FSH receptor binding assays. It is known that desialylation of eCG will increase its receptor binding activity (72, 73). Partial removal of terminal sugars tends to have variable results depending on the gonadotropin. For hCG a 30% increase in receptor binding activity was observed following desialylation (74). Complete deglycosylation by means of anhydrous hydrofluoric acid or trifluoromethanesulfonic acid treatment is known to increase receptor binding activity (75). For eCG, Stewart and Allen (76) found the activity of eCG in receptor binding assays employing.horse testis or ovary receptors to be SO low that they questioned its ability to maintain corpus luteum function in the pregnant mare. In in uiuo bioassays eCG has been found to be half as active as eLH in the acute ovarian ascorbic acid depletion assay, but to be 10 times as active as eLH in the chronic Steelman-Pohley assay (20). This suggests a longer serum half-life of eCG uers'sus eLH, which might account for the greater activity of eCG in the longer-term FSH assay, but these half-lives have not been directly compared. A serum half-life of 6 days has been determined for eCG in the gelding (77) and in the hysterectomized mare (78). Of the roles that have been proposed for the glycosylated COOH-terminal extensions of CGs, the only hypothetical role not yet eliminated is its role in prolonging time in circulation for CGs (79). If confirmed in this role, the glycosylated COOH-terminal extension of eLHp may account for the fact that levels of eLH peak 2 days after ovulation in the mare (80). However, prolonged presence in the serum may be only part of the story.
Although we believe that the protein structure makes the primary contribution to the intrinsic FSH activity of eCG/ eLH (81), there is evidence that the carbohydrate modulates expression of this activity by affecting the response of the target cell. When LH activities of eLH and eCG are compared, eLH is consistently found to be more active than eCG both in terms of receptor binding activity as well as in stimulation of steroidogenesis (20, 76). In FSH receptor binding assays eLH is always more active than eCG (20, 81, 82). However, when other FSH assay end points are measured, the results are less consistent. We have reported that in an in uitro granulosa cell assay for FSH activity, the method of priming the ovaries prior to harvesting the granulosa cells can affect the responses to eLH and eCG (81). If the ovaries are primed with estrogen, neither eLH nor eCG exhibit any more FSH activity than ovine LH, which has been found to be very inactive in this system. If the ovaries are primed with eCG, FSH activity of eLH and eCG is observed, although the slope of the response curve for eLH differs significantly from those of all the other hormones, including eCG. Moyle et al. (74) have demonstrated that cAMP accumulation was the response that was most sensitive to sequential removal of terminal sugars from hCG. The accumulation of CAMP by seminiferous tubules of immature rats has been used as an in uitro FSH assay. Equine CG is active in this assay (82), but eLH has been found to be inactive and to act as an FSH antagonist (20). Perhaps part of the secret of eCG as an FSH in the rat is in the ability of the polypeptide moiety to bind the rodent FSH receptor and in the ability of its carbohydrate to effect secondary responses to that receptor binding. It does not appear to be simply the amount of carbohydrate. Equine CG isolated from endometrial cups or from media from horse trophoblast cultures has less carbohydrate (13-30%) than eCG obtained from serum (45%). Although the former is in the range of the carbohydrate content of eLH, these tissue and culture medium-derived forms of eCG are consistently less active than serum-derived eCG (40, 83). It is clear that the former are less active in receptor and steroid and cAMP assays. Pituitary gonadotropins are stored in secretory granules prior to secretion, therefore the pituitary probably contains hormones with carbohydrate in all stages of processing. It has been demonstrated for hCG, which is immediately secreted, that terminal processing of the Nand 0-linked oligosaccharides takes place immediately before secretion (84). The tissue-derived eCG, therefore, probably does not represent a completely processed form of eCG with respect to carbohydrate. The reason for there being less carbohydrate in eCG isolated from trophoblastic culture medium, which is presumably a secreted form of the hormone, is less obvious. Perhaps this is an indication that proper glycosylation does not occur in uitro. Improper glycosylation would explain the low activities of these preparations in cAMP and steroidogenesis assays; however, it does not account for the reduced (150%) activities in receptor assays compared with serumderived eCG. There were indications of proteolytic degradation in the NH2-terminal heterogeneity reported for these preparations. Perhaps some nicking of the polypeptide chains also occurred since it is known that nicks in either a-or 8-subunit can result in reduced potencies of gonadotropin preparations (85,86).
Although it is known that eCG has more (45 versus 24%) carbohydrate than eLH (25, 87), details of these oligosaccharide structures have not yet been established. Unlike hLHp and hCGp where hCGP has two N-linked oligosaccharide units to eLH8's one, both eLH8 and eCGP have a single Nlinked oligosaccharide attached to Asn-13. We have previously demonstrated that this oligosaccharide moiety on eCGP differs from that of eLHB in that the former has twice as much glucosamine and 5 times as much galactose, but no fucose (31). The two hormones differ in their 0-linked oligosaccharides in that there is more carbohydrate on eCGP than eLHp. The 0-linked attachment points on both p-subunits have not yet been established. These hormones afford an opportunity to investigate the role that carbohydrate plays in differentiating the placental versus pituitary forms of a gonadotropin that are otherwise identical. Recent improvements in analytical procedures for carbohydrate should permit these studies in spite of the limited supplies of hormone.