Assignment of Intrachain Disulfide Bonds and Characterization of Potential Glycosylation Sites of the Type 1 Recombinant Human Immunodeficiency Virus Envelope Glycoprotein (gp120) Expressed in Chinese Hamster Ovary Cells*

This report describes the structural characterization of the recombinant envelope glycoprotein (rgpl20) of human immunodeficiency virus type 1 produced by expression in Chinese hamster ovary cells. Enzymatic cleavage of rgpl20 and reversed-phase high perform- ance liquid chromatography were used to confirm the primary structure of the protein, to assign intrachain disulfide bonds, and to characterize potential sites for N-glycosylation. All of the tryptic peptides identified were consistent with the primary structure predicted from the cDNA sequence. Tryptic mapping studies combined with treatment of isolated peptides with Staphylococcus aureus VS protease or with peptide:N- glycosidase F followed by endoproteinase Asp-N per- mitted the assignment of all nine intrachain disulfide bonds of rgpl20. The 24 potential sites for

The gp120 molecule is of interest as a vaccine candidate Arthur et al., 198'7), as the mediator of viral attachment via the virus receptor CD4 (Dalgleish et al., 1984;Klatzman et al., 1984) and as an agent with immunosuppressive effects of its own (Shalaby et al., 1987;Diamond et al., 1988). It is also a potential mediator of the pathogenesis of HIV-l in acquired immunodeficiency syndrome (Siliciano et al., 1988;Sodroski et al., 1986). The gp120 molecule is synthesized as part of a membrane-bound glycoprotein, gp160 (Allan et al., 1985). Via a host-cell-mediated process, gp160 is cleaved to form gp120 and the integral membrane protein gp41 (Robey et al., 1985). Together gp120 and gp41 form the spikes observed on the surface of newly released HIV-l virions (Gelderblom et al., 1987). As there is no covalent attachment between gpl20 and gp41, free gp120 is released from the surface of virions and infected cells (Gelderblom et al., 1985). The gp120 molecule consists of a polypeptide core of 60,000 daltons; extensive modification by N-linked glycosylation increases the apparent molecular weight of the molecule to 120,000 (Lasky et al., 1986). The amino acid sequence of gpl20 contains five relatively conserved domains interspersed with five hypervariable domains (Modrow et al., 1987;Willey et al., 1986). The hypervariable domains contain extensive amino acid substitutions, insertions, and deletions. Sequence variations in these domains result in up to 25% overall sequence variability between gpl20 molecules from the various viral isolates. Despite this variation, several structural and functional elements of g-p120 are highly conserved. Among these are the ability of gpl20 to bind to the viral receptor CD4, the ability of gp120 to interact with gp41 to induce fusion of the viral and host cell membranes, the positions of the 18 cysteine residues in the gp120 primary sequence, and the positions of 13 of the approximately 24 N-linked glycosylation sites in the gp120 sequence.
The disulfide bonding pattern within gp120 and the positions of actual oligosaccharide moieties on the molecule would be useful information for directing mutagenesis and fragmentation studies aimed at defining the functional domains of gp120 and sites for potential pharmacological interuption of its functions (e.g. type-common neutralizing epitopes). This information has been difficult to obtain due to the small amounts of gp120 available from natural sources, the complexity of the disulfide bonding and oligosaccharide structures in gp120, and uncertainty regarding the functionality or structural relevance (Moore et al., 1990) of rgp120 produced in non-mammalian systems. We have been able to produce large amounts of two different rgp120 fusion proteins in a mammalian cell system (Lasky et al., 1986). This has allowed us to elucidate all nine of the disulfide bonds, the positions of the glycosylation sites that are utilized, and the type of oligosaccharide moiety present at each site in rgp120 from the 111~ isolate of HIV-l produced in CHO cells.

EXPERIMENTAL PROCEDURES
Materials-Recombinant gp120 proteins were produced in CHO cells and purified by immunoaffinity chromatography as previously described (Laskv et al.. 1986). DTT. iodoacetic acid. and 2-acetamido-1-@-(L-aspartamido)-i,2-dideoxy-DLglucose (GlcNkc-Asn) were obtained from Sigma. HPLC/Spectro Grade trifluoroacetic acid (Pierce Chemical Co.), Acetonitrile UV (American B&J), and Milli Q'" water (Millipore) were used for reversed-phase HPLC.  Lasky et al. (1986) expressed gp120 in CHO cells as a fusion protein using the signal peptide of the herpes simplex gD1. Two such fusion proteins were used in this study. The recombinant glycoprotein used in most of this study (CL44) was expressed as a 498-amino acid fusion protein containing the first 27 residues of gD1 fused to residues 31-501 of gp120 (Lasky et al., 1986). This construction lacks the first cysteine residue of mature gpl20. Disulfide assignments were carried out on another recombinant fusion protein (9AA) which contains the first 9 residues of gD1 fused to residues 4-501 of gp120. This restores the first cysteine residue, Cys-24. Carboxyl-terminal analysis of CL44 using carboxypeptidase digestions indicated that glutamic acid residue 479 is the carboxyl terminus of the fully processed molecule secreted by CHO cells (data not shown). The amino acid sequences of these two constructions are given in Fig. 1.

RCM CL44 Tryptic
Map-Reversed-phase HPLC tryptic mapping was used to confirm the primary structure of the molecule, to assign intrachain disulfide bonds, and to characterize potential sites for N-glycosylation. In experiments not intended to give information about disulfides, the protein was RCM prior to digestion with trypsin. This treatment unfolds the protein and disrupts disulfide bonds, thereby resulting in smaller tryptic fragments than would be obtained with the native molecule.
The reversed-phase HPLC tryptic map of RCM CL44 is shown in Fig. 2. Tryptic peptides were separated by reversedphase HPLC using an acetonitrile/water system with trifluoroacetic acid as the ionic modifier. As will be discussed below, much of the peak heterogeneity derives from the extremely high (approximately 50% of total mass) carbohydrate content of the molecule. Peaks were collected and subjected to AAA for identification (Table I). In some cases, N-terminal sequence analysis was used for confirmation (these peaks are indicated in Table I). The peaks not assigned a label in Fig.  2 were not identified.
All of the peptides identified were consistent with the primary structure predicted from the cDNA sequence. Of the 38 predicted peptides with three or more amino acids, 36 were identified in the tryptic map of RCM CL44. In addition, four predicted peptides consisting of two amino acids each were also identified (H3, H4, T23, and T35 posed of residues 139-141 (VQK) was not identified in the map and was not given a label in Fig. 1. The only other peptide not identified was T13 (CNNK). Asparagine residue 200 of peptide T13 is a potential glycosylation site and the peptide lacks hydrophobic amino acids. Therefore, this glycopeptide is likely to be extremely hydrophilic and poorly resolved from the salt fraction on the reversed-phase column. Tryptic cleavage did not occur between peptides T5 and T6 and between peptides T8 and T9. These are designated in Fig.  2 as two T-numbers separated by a comma (T5,6 and T&9). The absence of cleavage was confirmed by N-terminal sequence analysis of the peptides. In both of these cases, the asparagine residue to the C-terminal side of the cleavage site is a potential N-glycosylation site, and it is likely that the carbohydrate moiety interferes with the action of trypsin. Incomplete tryptic cleavage was also observed between pep- tides H4 and T2' and between peptides T23 and T24 (H4,T2' and T23,24).
Several peptides arising from non-tryptic cleavages were observed in the tryptic map of RCM CL44. Two of the predicted tryptic peptides were further cleaved by "chymotrypsin-like" cleavages. Peptide T12 was completely cleaved after tyrosine residue 187 and phenylalanine residue 193 to yield peptides T12a, T12b, and T12c. Peptide T4 was partially hydrolyzed after leucine residue 95 to yield peptides T4a and T4b. Intact peptide T4 was also present.
One of the tryptic peptides, T22 (QAHCNISR) eluted at two different positions (32.4 and 34.1 min) in the RCM CL44 tryptic map. Deglycosylation studies (discussed below) with PNGase F and endo H indicated that the different retention times of the two forms of peptide T22 are not due to carbohydrate differences.
It is possible that this retention time heterogeneity results from partial conversion of the N-terminal glutamine residue to pyroglutamic acid (Sanger and Thompson, 1953).
Disulfide Assignments in gpl20-Mature gp120 contains 18 cysteine residues (enclosed in boxes in Fig. 1) and therefore could contain nine intrachain disulfide bonds. The CL44 construction lacks Cys-24, the first cysteine residue of gp120 (Lasky et al., 1986); therefore, a different construction (9AA), in which the first cysteine residue was restored, was purified to approximately the same degree as CL44.' Ellman's reagent (Ellman, 1959) was used to demonstrate the absence of free sulfhydral groups in 9AA (data not shown). Therefore, disulfide assignments were determined for the 9AA construction. Tryptic mapping studies performed without reduction and S-carboxymethylation of cysteine residues allowed partial assignment of disulfides. The tryptic map of 9AA is shown in Fig. 3. Peaks were identified by N-terminal sequence analysis (Table II). These identifications allowed unequivocal assignment of three of the nine disulfide bonds: between Cys-101 and Cys-127 (peak A, Table II), between Cys-266 and Cys-301 (peak B, Table II), and between Cys-24 and Cys-44 (peak E, Table II).
Peptides containing the remaining cysteine residues were also identified (Table II). Peptide T28 contains 3 cysteine residues and coelutes with peptide T31, which contains 1  Table II). Peptide Tll contains 2 cysteine residues and coelutes with peptides T3 and T4, each of which contains a single cysteine residue (peak F, Table II). Similarly, peptide T14 contains 2 cysteine residues and coelutes with peptides T12 and TI3, each of which has a single cysteine residue (peaks C and E, Table II). In each of these cases more than one disulfide bond was present in the group of tryptic peptides, thereby preventing unambiguous assignment. These tryptic peptides were further manipulated as described below to introduce selective cleavage between cysteine residues located on a single peptide.
The procedure used to cleave between the cysteine residues of peptides Tll and T14 is summarized (Scheme 1). Each of the peptides has a potential N-linked glycosylation site located between the cysteine residues. The peptides were treated with PNGase F, which removes asparagine-linked carbohydrate while converting the attachment asparagine residue to aspartic acid (Tarentino et al., 1985).  peptides were separated by reversed-phase HPLC and identified by N-terminal sequence analysis. The HPLC chromatogram obtained after treatment of peptides T12, T13, and Tl4 (peak C, Fig. 3) with PNGase F followed by endoproteinase Asp-N is given in Fig. 4s and the by guest on March 25, 2020 http://www.jbc.org/

Downloaded from
PNGase F x-cys-x-x x-x-cys-x-x I I t 7 X-X-X-Cys-X-IAspI-Sernhr-X-X-Cys-X-X Endoproteinass Asp N t SCHEME 1  Table III. sequences of relevant peptides are given in Table III. The results indicate that rgpl20 has disulfide bonds between Cys-198 and Cys-209 and between Cys-188 and Cys-217 (Table  III). Comparable manipulation of peak E gave similar results. Treatment of peptides T3, T4, and Tll (peak F, Fig. 3) with PNGase F followed by endoproteinase Asp-N allowed the recovery of fragments that demonstrated the presence of disulfide bonds between Cys-89 and Cys-175 and between Cys-96 and Cys-166 ( Fig. 4b and Table III).
The last two disulfide bonds were assigned by treating peptides T28 and T31 (peak D, Fig. 3) with V8 protease to cleave to the carboxy side of the glutamic acid and aspartic acid residues (Drapeau et al., 1972) located between the cysteine residues of T28. The chromatogram obtained after V8 protease digestion of T28 and T31 is given in Fig. 4c and the sequences of the relevant peptides are given in Table III. The results demonstrated the presence of disulfide bonds between Cys-348 and Cys-415 and between Cys-355 and Cys-388. Thus, the combined results of the tryptic mapping analysis and the further selective degradations permitted the assignment of all nine intrachain disulfide bonds of rgpl20. Parallel experiments performed on CL44 produced similar results for the eight disulfide bonds remaining in that construction (not shown). The disultide bond assignments of rgp120 are summarized in Fig. 6.
Glycosylation Sites of gpl20-Mature gp120 contains 24 potential sites for N-glycosylation, as recognized by the sequence Asn-Xaa-Ser(Thr) (Kornfeld and Kornfeld, 1985). These sites are indicated by a dot above the corresponding asparagine residue in Fig. 1. In the present study, tryptic mapping of enzymatically deglycosylated CL44 was used in conjunction with Edman degradation and FAB-MS of individually treated peptides to determine which of the 24 potential N-glycosylation sites are glycosylated and which contain less fully processed (i.e. high mannose-type or hybrid-type) oligosaccharides.
The two enzymes used for deglycosylation were PNGase F and endo H. PNGase F releases all types of N-linked oligosaccharide structures by cleavage of the P-aspartylglucosylamine linkage (Tarentino et al., 1985). Endo H releases only high mannose-type and hybrid-type oligosaccharide structures by cleaving between the two core N-acetylglucosamine residues (Tai et al., 1977). Deglycosylation of a peptide can be monitored by the increase in retention time of the peak corresponding to the glycopeptide in the reversed-phase elution profile. Thus, it was possible to determine which peptides were glycosylated by treatment with PNGase F and, on the basis of susceptibility to endo H, to distinguish those with attached high mannose-type and/or hybrid-type oligosaccharides as the predominant structures.
The 24 potential glycosylation sites of CL44 are contained in 14 tryptic glycopeptides.
Thirteen of these glycopeptides were identified in the tryptic map of RCM CL44 (Fig. 2). As mentioned above, T13 (CNNK) was not identified. The tryptic maps of PNGase F-treated RCM CL44 and endo H-treated RCM CL44 are compared with the RCM CL44 tryptic map in Fig. 5. The peaks corresponding to glycopeptides are labeled in each of the three tryptic maps.
As would be expected for a heavily glycosylated molecule, treatment of RCM CL44 with PNGase F (Fig. 5b) simplified the tryptic map significantly.
Typically, the peaks corresponding to potential glycopeptides in the RCM CL44 tryptic map (Fig. 5~) were broad and often appeared as multiplets. Deglycosylation resulted in sharp, single peaks for each peptide, indicating that the glycopeptide peak multiplicity and broadness was due to carbohydrate heterogeneity. Procedures." Peaks were collected and identified by AAA (data not shown).
Glycopeptide peaks are labeled according to the nomenclature in Fig. 1.
Of the 13 potential glycopeptides that had been identified in the tryptic map of RCM CL44, all were shifted to later retention times in the tryptic map of PNGase F-treated material. This demonstrates that at least 13 of the 24 potential sites are glycosylated.
Peptide T28 was not recovered after deglycosylation.
This peptide contains a large number of nonpolar amino acids and, after removal of the hydrophilic carbohydrate moieties, may bind irreversibly to the HPLC column. As described above, peptide T22 elutes at two positions in the RCM CL44 tryptic map presumably as a result of conversion of the N-terminal glutamine to pyroglutamic acid. The retention times of both of the T22 peaks were altered in the deglycosylated material produced by treatment with both PNGase F and endo H, confirming that the difference between these forms of peptide T22 in the RCM CL44 tryptic map was not due to carbohydrate heterogeneity. The tryptic map of endo H-treated RCM CL44 (Fig. 5c) indicated that six of the 13 tryptic glycopeptides were endo H-susceptible (peptides T14, T16, T22, T24, T28, and T31). In addition, a small amount of peptide T15 showed endo H susceptibility.
For each of these glycopeptides, the elution time of the endo H-treated glycopeptide was earlier than that of the corresponding PNGase F-treated glycopeptide. This is due to the hydrophilic N-acetylglucosamine residue that remains attached to the asparagine residue following endo H treatment. Peptide T16 was not identified in the tryptic map of endo H-treated RCM CL44. This peptide contains three potential glycosylation sites and was poorly recovered under any circumstances.
Conclusions as to the type of glycosylation present on each of the tryptic glycopeptides based on susceptibility to PNGase F and endo H are summarized in Table IV. Seven of the 13 glycopeptides identified in the tryptic map of RCM CL44 contain only a single glycosylation site and thus could be characterized unambiguously with regard to enzyme susceptibility. Peptides T2' (Asn-58), T26 (Asn-326), and T32 (Asn-433) were deglycosylated only by PNGase F and, therefore, contain attached complex-type oligosaccharide structures. Peptides T22 (Asn-302), T24 (Asn-309), and T31 (Asn-418) were susceptible to both PNGase F and endo H and, therefore, carry high mannose-type and/or hybrid-type oligosaccharide structures. Peptide T15 is only partially susceptible to endo H; therefore, Asn-246 carries primarily complex-type oligosaccharides but must also have some attached high mannosetype and/or hybrid-type oligosaccharide structures. Peptides T6, T9, and Tll each contain two potential glycosylation sites. Each peptide was deglycosylated by PNGase F but not by endo H indicating the presence of mostly complex-type oligosaccharide structures.
In order to determine whether one or both of the potential glycosylation sites in each peptide were actually glycosylated, the PNGase Ftreated glycopeptides were subjected to either FAB-MS or Edman degradation. Treatment with PNGase F converts the attachment asparagine residue to aspartic acid during deglycosylation (Tarentino et al., 1985). This conversion can be detected by FAB-MS as an increase of 1 atomic mass unit in the mass of the peptide for each site deglycosylated (Carr and Roberts, 1986)  was performed instead of FAB-MS on deglycosylated peptide Tll because of its high molecular weight (>2000 a.m.u.). Aspartic acid was observed in cycles 8 (derived from Asn-156) and 19 (derived from Asn-167). These combined results indicate the presence of complex-type oligosaccharide structures attached to Asn residues 106,111,126,130, 156, and 167.
The remaining three glycopeptides identified in the tryptic map of RCM CL44 contained multiple potential glycosylation sites and were endo H susceptible. Peptides T14, T16, and T28 account for a total of 10 potential glycosylation sites. Characterization of each glycosylation site was achieved by Edman degradation of HPLC-purified peptides that had been subjected to treatment with endo H followed by PNGase F (Scheme 2). When endo H releases the high mannose-type and hybrid-type oligosaccharide structures, it leaves an Nacetylglucosamine residue attached to the asparagine residue of the peptide (Tarentino et al., 1974). PNGase F will not remove this N-acetylglucosamine residue but will release the remaining N-linked oligosaccharide structures by cleavage of the &aspartylglucosylamine bond, resulting in conversion of the attachment asparagine residue to aspartic acid (Chu, 1986). Therefore, treatment with endo H followed by PNGase F will yield asparagine at an unglycosylated site, GlcNAc-Asn at a glycosylation site that contained primarily high mannosetype and/or hybrid-type oligosaccharide structures, and as-SCHEMES partic acid at a glycosylation site that carried primarily complex-type oligosaccharide structures. Paxton et al. (1987) have shown that it is possible to detect the PTH derivative of GlcNAc-Asn after Edman degradation. Using this approach, it was possible to characterize the remainder of the glycosylation sites of CL44. For example, treatment of glycopeptide T16, which contains three potential N-glycosylation sites, with endo H followed by PNGase F resulted in the appearance of the PTH derivative of GlcNAc-Asn at cycles 7 and 13 and the appearance of PTH-Asp at cycle 19 during Edman degradation. Thus, glycopeptide T16 carries primarily high mannose-type and/or hybrid-type oligosaccharides at Asn-259 and Asn-265 and complex-type oligosaccharides at Asn-271. The results of these experiments are summarized in Table V and indicate that CL44 contains complex-type oligosaccharide Characterization of multiple potential glycosylation sites on RCM CL44 tryptic glycopeptides was achieved by Edman degradation of HPLC-purified peptides subjected to treatment with endo H followed by PNGase F. Edman degradation of deglycosylated peptides shows either an Asn residue at an unglycosylated site, a GlcNAc-Asn at a glycosylation site to which had been attached high mannose or hybrid oligosaccharide structures, or an Asp residue at a glycosylation site which had carried complex type oligosaccharide structures.  204,211,232,259,265,356, and 362. Peptide T13, which contains the remaining glycosylation site, was not identified in any of the tryptic maps presented in this paper. However, FAB-MS data obtained from the void peak of a tryptic map of RCM CL44 treated with endo H followed by PNGase F revealed an ion corresponding to MH' for that peptide containing an attached N-acetylglucosamine residue (observed: m/z 740.1; calculated: m/z 740.4). The presence of peptide T13 in the void peak was further confirmed by AAA. Therefore, we conclude that Asn-200 is glycosylated and carries primarily high mannose-type and/or hybrid-type oligosaccharide structures.
The data presented here demonstrate that all 24 potential glycosylation sites of gp120 are utilized, that 13 sites contain primarily complex-type oligosaccharide structures while 11 sites contain primarily high mannose-type and/or hybridtype oligosaccharide structures. The type of glycosylation at each site is summarized in Fig. 6. DISCUSSION We have determined the disulfide bonding pattern and the attachment positions of oligosaccharide moieties of rgpl20 from the 111s isolate of HIV-l. A schematic representation of this information is presented in Fig. 6. The rgpl20 molecules from which the structural data were obtained possess the functional properties attributed to gp120 produced by HIV-l virions including high affinity CD4 binding (Lasky et al., 1987), and HIV-l neutralizing antigenicity (Lasky et al., 1986). We therefore conclude that the CHO-expressed gp120 is properly folded and that the disulfide-bonded domains reported here for the recombinant molecules are representative of those occurring in gp120 produced by HIV-l virions.
Functional Aspects of gp120 Structure-The gp120 molecule comprises five disulflde-bonded loop structures. The first and fourth are simple loops formed by single disulfide bonds while the second, third, and fifth are more complex arrays of loops formed by nested disulfide bonds. The fourth disulfide-bonded domain (residues 266-301) has been shown to contain significant type-specific neutralizing epitopes (Matsushita et al., 1988;Rusche et al., 1988;Goudsmit et al., 1988;Javaherian et al., 1989) and the fifth disulfide-bonded domain (residues 348-415) has been shown to be important for CD4 binding (Lasky et al., 1987;Kowalski et al., 1987). No direct functional correlates have been described for the other three disulfidebonded domains. The amino acid sequence of gp120 varies to a large extent between different viral isolates but the majority of the variability is localized in hypervariable regions which punctuate the otherwise relatively conserved sequences (Willey et al., 1986;Modrow et al., 1987). Modrow et al. (1987) have identified five hypervariable regions which are characterized by sequence variation, insertions, and deletions. Four of these hypervariable regions correspond to well-delineated loops as indicated in Fig. 6. With the exception of the third hypervariable loop (disulfide-bonded domain IV) the functional significance of these regions is unknown.
The positions of the cysteine residues and, presumably, the disulfide bonding pattern in gp120 are highly conserved between isolates. Among HIV-l isolates, the only exception to this conservation is the 23 isolate (Willey et al., 1986) which has an additional pair of cysteine residues in the fourth hypervariable domain (residues 363-384). These residues most likely form a tenth disulfide bond in the gp120 from this isolate. The presence of this extra bond in such a hypervariable region probably has no more effect on the structure and function of the molecule than the other sequence variations that occur in that region. In HIV-2 and SIV, the positions of the cysteine residues in disulfide-bonded domains I, II, IV, and V are conserved (Myeres et al, 1989). In domain III there are two additional pairs of cysteine residues (three in SIV isolate MM142) which are presumed to be disulfide bonded within a finger-like domain III structure analagous to that illustrated in Fig. 6. Another major difference between HIV-1, HIV-2, and SIV is that hypervariable region V2 is reduced to five amino acids in HIV-2 and SIV. The functional significance of the differences between HIV-l, HIV-P, and SIV is unknown at this time.
One of the most important functions of gp120 is its ability to bind to CD4 and thereby mediate the attachment of virions to susceptible cells (Klatzman et al., 1984;Dalgleish et al., 1984). The CDB-binding function has been localized by mutagenesis and structural studies (Lasky et al., 1987;Kowaiski et al., 1987) to the region between residues 320 and 450, which includes the fifth disulfide-bonded domain. Lasky et al. (1987) showed that deletion of residues 396 to 407 and mutagenesis of Ala-402 to Asp abolished CD4 binding. They also mapped the epitope of a monoclonal antibody that blocks gp120-CD4 binding to residues 392-402. Kowalski et al. (1987) identified three regions as being involved with CD4 binding. Insertions between residues 333-334, 388-390, and 442-443 abolished CD4 binding. In addition, a deletion of residues 441-479 abolished CD4 binding while deletion of residues 362-369 within the fourth hypervariable region had no effect on binding. Cordonnier et al. (1989) have shown that mutagenesis of Trp-397 to Tyr or Phe decreases CD4 binding and changes to Ser, Gly, Val, or Arg abolish binding. Nygren et al. (1988) have reported that a proteolytic fragment of gp120 from residue 322 to near the C terminus retains the ability to bind to CD4. The results of these studies indicate that the CD4 binding capacity of gp120 is localized to the region between residues 320 and 450 and more specifically to the residues around 333-334,442-443, and the sequence between 388 and 407.
In the course of efforts to map the epitope of monoclonal antibody 5C2-E5 which blocks gp120-CD4 binding, Lasky et al. (1987) treated rgp120 (CL44) with acetic acid to cleave the protein at aspartic acid residues (Ingram, 1963) and isolated the peptide fragment 383-426 from a column of immobilized antigpl20 monoclonal antibody 5C2-E5. Digestion of reduced rgpl20 yielded the same fragment. Consequently, it was concluded that a disulfide bond existed between Cys residues 388 and 415. In the analysis reported here we have failed to find this disulfide bond and, instead, have consistently found the disulfide bonds between Cys-355 and Cys-388, and between Cys-348 and Cys-415 as summarized in Fig. 6. We believe that the true disulfide-bond assignment is as indicated in Fig.  6 and that the acetic acid digestion produced some disulfide bond rearrangement (Ryle and Sanger, 1955) in the earlier work.
The Oligosacchurides of gpl20-Approximately 50% of the apparent molecular mass of gpl20 is carbohydrate. The structures of the oligosaccharide moieties released by hydrazinolysis of CL44 rgpl20 have been exhaustively analyzed (Mizuochi et al., 1988a;Mizuochi et al., 1988b). These authors found that 33% of the N-linked oligosaccharides were of the high mannose type, 4% were of the hybrid type, and 63% were of the complex type. Of the complex oligosaccharides 90% were fucosylated and 94% were sialylated. The complex structures were approximately 4% monoantennary, 61% biantennary, 19% triantennary, and 16% tetraantennary. No Olinked oligosaccharides were found. Geyer et al. (1988) have Characterization of Recombinant 53~120 analyzed the oligosaccharides of gp120 from the 111s isolate of HIV-l-infected human cells. They found that high mannose-type oligosaccharides accounted for approximately 50% of the carbohydrate structures. The remaining structures were fucosylated, partially sialylated bi-, tri-, and tetraantennary complex-type oligosaccharides.
No novel carbohydrate structures, or moieties that would be expected to act as heterophile antigens in man, have been isolated from gp120 from either source.
We have shown here that all 24 glycosylation sites are utilized, and that 13 of the 24 sites contain complex-type oligosaccharides as the predominant structures while 11 contain primarily hybrid and/or high mannose structures. The demonstration of endo H-susceptible structures at 11 of the 24 sites is consistent with the earlier results of Mizuochi et al. (1988a, 198813) who determined that nearly 40% of the total oligosaccharide structures released from rgp120 were hybrid and/or high mannose-type oligosaccharides. The 24 potential N-linked glycosylation sites in the gp120 sequence are conserved to a large extent between different viral isolates (Willey et al., 1986;Modrow et al., 1987). Based on the gp120 sequence comparisons in these references, 13 of the sites on gp120 from the 111s isolate of HIV-1 are absolutely conserved; these include eight of the 11 sites that carry predominantly hybrid-type and/or high mannose-type oligosaccharides. Thus, the less fully processed (i.e. endo H-susceptible) oligosaccharides of gp120 are found preferentially at the most conserved glycosylation sites. The remaining sites (eight complex and three hybrid/high mannose) are relatively conserved, even though many of them occur in the hypervariable regions. The positions of these sites may shift or be deleted, but there is always one or more new site(s) within 5-10 residues of the reference 111s site. Studies by Willey et al. (1988) demonstrated that mutagenesis of Asn-232 to Gln decreased the infectivity of virions containing the mutant g-p120 molecules without affecting CD4 binding or syncitium formation.
At this time, no particular functional significance can be attributed to the type of oligosaccharide structure at any of the sites.
The role of the carbohydrate moieties on gp120 in CD4 binding has been investigated by several authors (Lifson et al., 1986;Matthews et al., 1987;Fenouillet et al., 1989). Those that employed enzymatic deglycosylation in the presence of detergents (Lifson et al., 1986;Matthews et al., 1987) have concluded that the carbohydrates are not directly involved with the binding but that they are required to maintain the conformation of gp120 necessary for binding. In contrast, Fenouillet et al. (1989) enzymatically deglycosylated gp120 without detergent and demonstrated that the CD4 binding affinity was preserved. It therefore appears that the carbohydrate moieties of gp120 are not required for its binding to CD4 but that the conformational stability of gpl20 to detergents is lost after deglycosylation.
The rgpl20 used for these determinations is functionally and structurally equivalent to g-p120 produced by HIV-linfected cells. The structural data presented here will be useful in future attempts to manipulate the structure of gpl20 in order to better understand the biology of the virus and to produce an effective vaccine.
Achnowledgments-We wish to thank Dr. Phil Berman and Dr. Larry Lasky for many helpful discussions, Dr. John Chakel for FAB-MS analysis, and Carol Morita and Anne Stone for preparation of illustrations.