Isolation and characterization of the major protein and glycoprotein of hepatitis B surface antigen.

Hepatitis B surface antigen has been purified and its major protein (p-25) and glycoprotein (gp-30) isolated. These isolated proteins have been subjected to amino acid analysis, Edman degradation (30 steps), carboxypeptidase digestion, and peptide mapping following tryptic hydrolysis by polyacrylamide gel electrophoresis in the presence of sodium dodecyl sulfate, two-dimensional thin layer chromatography and electrophoresis, and high performance liquid chromatography. These studies demonstrated that p-25 and gp-30 have identical protein structure, differing only by the presence of carbohydrate in gp-30. Removal of this carbohydrate by treatment with anhydrous hydrofluoric acid converted gp-30 into p-25. The NH2-terminal sequence, carboxyl-terminal sequence, and the amino acid composition of several internal tryptic peptides were found to be consistent with the proposed protein sequence based upon the published sequences of hepatitis B viral DNA. The carbohydrate of gp-30 was demonstrated to be attached within the carboxyl-terminal 104 amino acids, most likely between residues 121 and 170.

' The abbreviations used are: HBsAg, hepatitis B surface antigen; SDS, sodium dodecyl sulfate; PTH, phenylthiohydantoin. related. Shih and Gerin (7) have shown that all of these proteins elicit both group-specific and subtype-specific antibodies which cross-react. Thus, they concluded that these proteins share common antigenic determinants and therefore have common structural features. In support of this conclusion, we have shown that p-25 and gp-30 have identical amino acid compositions, NHz-terminal and carboxyl-terminal sequences (9). On this basis, we proposed that p-25 and gp-30 may differ only by the presence of carbohydrate in gp-30, and are in fact the same gene product. In this paper, further evidence in support of this conclusion is presented, and we demonstrate .that the carbohydrate is attached to a site or sites in the carboxyl-terminal portion of the molecule. We also show that the protein structure, as far as determined, is consistent with the simple linear read-out of the DNA sequences determined by three different groups (15)(16)(17).

Materials
All electrophoresis reagents were obtained from Bio-Rad. Iodo [2-3H]acetic acid (290.77 mCi/mmol) was obtained from New England Nuclear. AU sequenator reagents, anhydrous hydrazine, ethylenimine, and micropolyamide sheets were obtained from Pierce. Trypsin (dicyclohexylcarbodiimide-treated), chymotrypsin, and fluorescarnine were obtained from Sigma. Potassium bromide was obtained from Matheson, Coleman & Bell. All solvents for high performance liquid chromatography were spectrophotometric grade obtained from Aldrich. Amino acid analyzer reagents and buffers were obtained from Durrum. Goat anti-norma1 human serum was obtained from Miles Laboratories.

Methods
Assay of HBsAg antibodies prepared in guinea pigs.

Purification of HBsAg
All HBsAg used in these studies was obtained from a single chronic carrier of HBsAg (ayw) according to the following procedures.
Ammonium Sulfate Fractionation-To each 100 ml of plasma, 17.6 g of solid ammonium sulfate were added. The solution was stirred at room temperature for 30 min, then centrifuged for 30 min at 10, OOO X g. The supernatant was recovered, and to each 100 m1, 12.7 g of solid ammonium sulfate were added. The solution was stirred for 30 min at room temperature, then centrifuged at 10, OOO x g for 30 min. The precipitate was collected and dissolved in a minimum of water, and then dialyzed against 0.01 M potassium phosphate buffer, pH 6.8.
Hydroxyapatite Chromatography-The dialyzed solution from step 1 was applied to a column (4 X 15 cm) of hydroxyapatite prepared as described by Jenkins (18). The column was previously equilibrated with 0.01 M potassium phosphate buffer, pH 6.8. The HBsAg was eluted with 0.1 M potassium phosphate buffer, pH 6.8.
Ammonium Sulfate Precipitation-The antigen-containing fractions from step 2 were pooled, and to each 100 ml, 31.3 g of solid ammonium sulfate were added. The precipitate was collected by centrifugation at 10, OOO X g for 30 min, and then dissolved in a minimum volume of water, generally about 10 ml final volume.
HBsAg was assayed by counter electrophoresis against anti-HBsAg 6975 Agarose 4B Chromatography-The antigen from step 3 was applied to an agarose CL-4B column (4 X 100 cm) equilibrated with 0.1 M phosphate buffer, pH 6.8. Fractions of 5 ml were collected. A typical elution profile is shown in Fig. 1. Antigen-positive fractions were pooled and concentrated to 10 ml by pressure dialysis in an Amicon ultrafiltration cell with an XM-300 membrane.
Ultracentrifugation-The HBsAg from step 4 was layered onto two 32-ml linear cesium chloride gradients, density 1.15 to 1.32, and centrifuged for 18 h at 25,000 rpm in a Beckman SW27 rotor. The antigen forms a visible band at a density of approximately 1.2, which was recovered with a syringe. The antigen-containing bands were then diluted to 20 ml with a KBr solution, such that the final density of the solution was 1.24, and centrifuged for 15 h at 30,000 rpm in a Beckman Ti 50.2 rotor. The antigen floats under these conditions and is readily collected with a syringe.
The purified antigen was examined for the presence of normal human serum protein contaminants by Ouchterlony double diffusion against goat anti-normal human serum antibodies.

Polyacrylamide Gel Electrophoresis
Analytical polyacrylamide gel electrophoresis was performed with a Bio-Rad slab gel apparatus and 12 or 23% polyacrylamide gels. A 4% stacking gel was used. The gel and buffer formulations were those of OFarrell (19). Preparative polyacrylamide gel electrophoresis was performed with the same Bio-Rad slab gel system and buffers, with a 3-mm, 12% gel. Up to 10 mg of HBsAg were applied with bromphenol blue as a tracking dye. Electrophoresis was performed until the dye was eluted, then the proteins were visualized by placing the gels in 1 M cold KC1 (20). The gel was then sliced into strips containing the proteins, and the strips were placed in 11.5-mm dialysis bags. Approximately 10 ml of 0.1 M Tris-HC1, pH 8.5, containing 0.01% SDS were then added to each bag. The sealed bags were placed in an electrophoresis cell at right angles to the electric field, and the proteins were electrophoretically eluted from the gel. Generally 3-4 h at 50 mA was sufficient. The gel slices were then removed from the bag and the remaining protein solution was concentrated to the desired volume by dialysis against 25% polyethylene glycol. When necessary, SDS was removed by the addition of 20 volumes of acetone: triethy1amine:acetic acid (8555) as described by Henderson et al. (21).

Treatment with Hydrofluoric Acid
The carbohydrate was removed from HBsAg proteins according to the method of Mort and Lamport (22) by treatment with anhydrous HF.

Reduction and Alkylation
For automatic Edman degradation, approximately 500 n m O l of P-25 or gp-30 were dissolved in 5.0 ml of 1.5 M Tris-HCl, pH 8.6, containing 0.1% sodium dodecyl sulfate. A slow stream of nitrogen was passed through the solution for 20 min, then 0.1 d of mercaptoethanol was added. The reduction was allowed to proceed for 4 h at room temperature under nitrogen. A solution of 270 mg of iodoacetic acid, containing 0.6 mg of [3H]iodoacetic acid (290 pCi/pmol) in 1.0 ml of I N NaOH was then added. After 10 min the reaction mixture was dialyzed against water, in the dark at 4 "C. Dialysis was continued until no further radioactivity was detected in the dialysate. The extent of alkylation was determined by amino acid analysis and liquid scintillation counting.
Aminoethylation of p-25 and gp-30 was performed on 50-100 pg of protein in 1.5-ml Reacti-Vials (Pierce) by the following procedure. The protein, dissolved in 100 pl of 0.01 M Tris-HC1 buffer, pH 8.6, and 0.1% sodium dodecyl sulfate, was freed of oxygen by flushing with nitrogen for 10 min. Mercaptoethanol (LO p1) was then added and the reduction was allowed to proceed for 4 h at room temperature. Following reduction, three additions of ethylenimine (10 pl) were made at 30-min intervals. The protein was precipitated by the addition of acetone:triethylamine:acetic acid (8555) and then washed three times with acetone:triethylamine:acetic acid:water (85:5:5:5), and three times with water. The protein was insoluble after aminoethylation.

Amino Acid AnaEysis
Aliquots of protein were hydrolyzed in sealed, evacuated tubes with constant boiling HCI for 24-96 h at 110 "C. Amino acid analyses were performed with a Durrum D-500 amino acid analyzer with ninhydrin detection or with a Durmm MBF amino acid analyzer with fluorescent detection using o-pthalaldehyde. For the former, 10-50 pg of protein were used, for the latter, 1-5 yg were used.

Sequence Analyses
Edman degradations were performed either manually or with a Beckman 890 C Sequencer. For automatic sequencing, approximately 200 nmol of protein were applied in 0.5 ml of heptafluorobutyric acid, and the standard sample application subroutine was used. Edman degradation was then performed with the fast protein Quadrol program (972172-C) with 0.1 M Quadrol as buffer. The released thiazolinone derivatives were converted to the phenylthiohydantoins by treatment with 1 N HCl at 80 "C for 10 min. The PTH-derivatives were then identified by the following procedures.  Manual Edman degradations were performed by the method of Tan (25). Generally 1-2 nmol of protein were used. All solvents contained 50 pl/liter of ethanethiol. The released amino acid was identified by amino acid analysis following hydrolysis with HI or NaOH-dithionite.

Hydrazinolysis
Determination of the carboxyl terminus by hydrazinolysis was performed by the method of Fraenkel-Conrat and Tsung (26), with the modifications previously described (9).

Carboxypeptidase Digestions
Carboxypeptidase digestions were performed with carboxypeptidase A in 0.01 M ammonium bicarbonate buffer at an enzymembstrate ratio of 1:100. Generally, 1-2 nmol of protein were used per time and digestion was performed for 0-60 min at 37 "C. The reaction was terminated by drying in a Savant Speed-Vac concentrator. The digest was then dissolved in 50 $ of 0.2 M sodium citrate buffer, pH 2.2, and placed directly on the amino acid analyzer.
Tryptic Digestion of Proteins p-25 and gp-30 and the reduced and aminoethylated proteins are insoluble in all buffers not containing sodium dodecyl sulfate. Digestions were therefore performed on suspensions of these proteins. The protein (50-100 pg) previously recovered from preparative polyacrylamide gel electrophoresis and freed of sodium dodecyl sulfate by ion pair extractions, then washed with water, was washed with 0.01 M ammonium bicarbonate three times. The protein was then suspended in 100 pl of 0.01 M ammonium bicarbonate containing 1 pg of trypsin. The digestion was allowed to proceed for various times with constant stirring at 37 "C. Following digestion, the suspension was centrifuged at 15,000 x g. The supernatant, containing soluble peptides, was removed and dried on a Savant Speed-Vac concentrator for examination by thin layer or high performance liquid chromatographic peptide mapping. The pellet of insoluble peptides was examined by SDS-polyacrylamide gel electrophoresis.

Thin Layer Peptide Maps
Two-dimensional peptide maps on silica gel plates were prepared essentially as described by Stephens (27). The soluble peptides from 2-4 nmol of protein were recovered as described above. The peptides were dissolved in 10 pl of formic acid and spotted at a distance of 12 cm from one edge, 2 cm from the bottom. Chromatography was performed in the fvst dimension in chloroform:methanol:ammonium hydroxide (2:2:1), followed by electrophoresis in the second dimension for 20 min at 900 V, with pyridine:acetic acid:water (100:3:897) as buffer.
Peptides were recovered from the plates by scraping the silica gel into microcentrifuge tubes and extracting three times with 0.5 ml of 70% formic acid. The combined extracts were dried in a Savant Speed-Vac concentrator.

SDS-Polyacrylamide Gel Electrophoresis of Insoluble Tryptic Peptides
Those peptides not solubilized by trypsin and recovered as the pellet from the digest as described above were readily soluble in the sample buffer of OFarrell (19) (2.3% sodium dodecyl sulfate, 10% mercaptoethanol, 0.0625 M Tris-HC1, pH 6.8). The pellet was dissolved in 100 pl of this buffer and aliquots were applied to a 23% polyacrylamide gel. The following standards were run simultaneously, myoglobin, lysozyme, and the cyanogen bromide fragments of myoglobin.
The gel was stained with Coomassie blue or by the periodate-Schiff staining reaction (28). Peptides were recovered from the stained gels by repeated soaking of the finely minced, excised band in 1.5 ml of formic acid. Generally, three extractions for 24 h each were used. The combined extracts were dried under vacuum.

High Performance Liquid Chromatography
Separation of soluble peptides was performed with a Varian model 5000 liquid chromatograph and a Varian Micro-pak MCH-10 column. The column was equilibrated with 54.4 RIM sodium phosphate at pH 2.8. The sample was applied in 50 pl of this starting buffer, then eluted with a gradient of this buffer and acetonitrile as second solvent. The percentage of acetonitrile, initially at zero, was increased to 12% during the fmt 13 min, then increased to 28% during the next 65 min, and finally taken to 62% during the final 30 min (29). The effluent was monitored at 215 nm with a variable wavelength monitor and collected in acid-cleaned tubes in 1-ml fractions. The fractions were dried in a Savant Speed-Vac concentrator and subjected to amino acid analysis.

RESULTS
Purification of HBsAg-The currently used method of purifying HBsAg is summarized in Table I. An overall yield of about 50% is obtained, based on counter electrophoresis titer, and approximately 3-5 mg of HBsAg are obtained from 100 ml of high titer plasma. The antigen is free of detectable normal human serum proteins, when checked by Ouchterlony double diffusion analysis against anti-normal human serum, at a concentration of 10 mg/ml of protein. However, undetected trace quantities of some serum proteins, or serum proteins which fail to react with our anti-normal human serum antibody may still remain. This purification procedure has the advantage of being applicable to large quantities of plasma without the need for the use of zonal rotors and large quanti-  ties of cesium chloride. The entire procedure is readily performed in 4 days.
The purified HBsAg gives the same pattern of proteins and glycoproteins as obtained from antigen purified solely by ultracentrifugation (9), i.e. a major protein, with an apparent M , = 22,000-25,000, a major glycoprotein, with an apparent M, = 28,000-30,000, and six minor bands with M , = 40,000-75,000 (Fig. 2). We have previously designated the major protein and glycoprotein as p-22 and gp-28, respectively (30). However, these should more properly be designated p-25 and gp-30, based upon the probable true molecular weight of the protein as deduced from the DNA sequence of the viral gene Isolation ofp-25 andgp-30-Although many other methods have been attempted, the only method by which we have been able to separate and isolate p-25 and gp-30 has been by preparative polyacrylamide gel electrophoresis (9). The electrophoresis must be done in sodium dodecyl sulfate (generally 0.1%) and the proteins must be reduced with mercaptoethanol or other reducing agents. Evidently considerable intermolecular disulfide bonding is present in the intact HBsAg particle, (15)(16)(17). since no bands are observed upon sodium dodecyl sulfate polyacrylamide gel electrophoresis of unreduced antigen, even on 4% polyacrylamide gels. In fact, all of the cysteines of HBsAg may be present as disulfides, since reaction with iodoacetate without prior reduction results in the formation of no detectable carboxymethylcysteine, even when performed in 6 M urea or 1% sodium dodecyl sulfate. Fig. 2 shows the results of a typical preparative electrophoretic separation of p-25 and gp-30. Only very minor cross-contamination is observed. When stained for carbohydrate, gp-30 stains strongly while p-25 stains only very weakly; the latter may represent nonspecific staining (Fig. 3, lanes I and L ) .
I t is also apparent in Fig. 2 that small amounts of higher molecular weight proteins are present in the purified p-25 and gp-30, having apparent molecular weights corresponding to dimers of these proteins, and that these minor proteins comigrate with two of the minor components observed in HBsAg. This suggests that at least these minor components of HBsAg are in fact aggregates of the major proteins. These results are consistent with the immunologic cross-reactivity of these various HBsAg proteins (7).
Amino Acid Analysis of p-25 and gp-30- Table I1 shows the amino acid compositions of p-25 and gp-30 obtained from the ayw subtype of HBsAg. Also shown are the amino acid compositions predicted from the DNA sequence of the HBsAg gene as determined by three different groups (15)(16)(17). The HBsAg gene sequenced by Valenzuela et al. (15) is of unknown subtype, that sequenced by Pasek et al. (16) is of a complex subtype designated "ady," and that sequenced by Charnay et al. (17) is of the ayw subtype. The amino acid compositions determined by amino acid analysis are in good agreement with those predicted from the DNA sequence except for isoleucine, leucine, and phenylalanine, which are consistently low. This may be due to the high content of these and their relative resistance to hydrolysis, or to a real difference between our antigen and that for which the DNA sequences have been determined. This can be resolved only by more complete sequence analysis.  considerably lower than that obtained for most proteins (generally 94-95%), but is consistent with that reported for other hydrophobic proteins (31). Fig. 4 also shows the complete amino acid sequence of p-25 as predicted from the DNA sequences (15)(16)(17). Only step 24 failed to give an identifiable product (arginine, from the DNA sequences). Steps 16 and 30 were identified by amino acid analysis as glutamic acid after hydrolysis of the PTH-derivative, but could not be identified by thin layer chromatography. Therefore, these could be glutamine or glutamic acid. Both are predicted to be glutamine from the DNA sequence. Since step 5 gave alanine by HI hydrolysis, but not by NaOH-dithionite hydrolysis and contained no radioactivity (1 cysteine would give 4000 cpm), it was identified as serine. This agrees with the DNA sequence. All other steps were unambiguous and all agree with the DNA sequences. We have also sequenced the adw proteins and found them to be identical with the ayw for the fist 30 steps. These results are consistent with the fact that all three DNA sequences predict the same NHa-terminal sequence, even though they differ at other regions of the molecule. Carboxyl Terminis of p-25 and gp-30-Hydrazinolysis of HBsAg (ayw) p-25 and gp-30 gave isoleucine as the carboxyl terminus in each case, in approximately 30% yield. Only traces of other amino acids were observed. We have previously shown the carboxyl terminus of the adw subtype to be isoleucine also (9). Carboxypeptidase digestion of HBsAg (ayw) p-25 and gp-30 gave identical results, with the release of amino acids consistent with the carboxy-terminal sequence -Val-Tyr-Ile, which is identical with the carboxyl-terminal sequence which we found in the adw subtype. This is also consistent with that reported from two of the DNA sequences (Fig. 4).

NH2-terminal Sequence
Tryptic Digestion of p-25 and gp-30-When either p-25 or gp-30 or the reduced and carboxymethylated proteins were subjected to tryptic digestion, these insoluble proteins remained completely insoluble. Even prolonged digestion (18 h) at 37 "C with trypsin at an enzyme:substrate ratio of 1:25 did not give rise to soluble peptides. In fact, when the digest was centrifuged, and only the supernatant was examined by thin layer peptide mapping, essentially no peptides were observed for either p-25 or gp-30. However, when the insoluble material from the digest was examined by SDS-polyacrylamide gel electrophoresis, it was apparent that both proteins had been cleaved (Fig. 3). Neither protein was completely cleaved in 3 h at 37 "C (Fig. 3, lanes A and B ) . However, both p-25 and gp-30 were partially cleaved, each giving two peptides. p-25 was cleaved into p-25-1 and p-25-2 having apparent M, = 15,000 and 12,000, respectively (Fig. 3, lane B ) .  gave rise to a 15,000-dalton fragment, gp-30-1, and a smaller fragment, gp-30-2 (Fig. 2, lane A ) . With longer digestion times (18 h) the digestion was more complete (Fig. 3, lanes C and   D). P-25 still gave rise to only two peptides, p-25-1 and p-25-2 (Fig. 3, lane D). However, now gp-30-1 and gp-30-2 were not resolved because the bands are so close (Fig. 3,  lane C ) . However, when less material was analyzed and electrophoresis was extended, both bands were still apparent (Fig. 3, lane F). When stained for carbohydrate, only gp-30-2 (Fig. 3, lane K ) and residual undigested gp-30 stained (Fig. 3, lanes I and K ) . As expected, neither p-25 nor its digestion products stained for carbohydrate (Fig. 3, lanes J and L ) . In order to show that the digestion product of gp-30 that stained for carbohydrate was gp-30-2, and not gp-30-1, the position of the periodate-Schiff stain was marked with ink, and then the same gel was stained with Coomassie blue (Fig. 3, lanes E-H). The position of the periodate-Schiff stain exactly coincided with gp-30-2.
The tryptic peptides p-25-1 and p-25-2 were recovered from the gels by excising the stained bands, and extracting with 70% formic acid. Amino acid analysis and two steps of manual Edman degradation were performed on each. The amino acid compositions are shown in Table IV. Also shown are the amino acid compositions of the fragments of p-25 corresponding to residues 1-122 and 123-226.  . (17). The amino acids shown above the sequence are those predicted to occur at that position in an unusual subtype designated ady (16). The amino acids below the sequence are those predicted to be present at those positions in an unknown subtype (15). All three sequences are identical at all other positions. The arrows above the sequence pointing forward indicate those amino acids which were identified by Edman degradation. The reverse arrows indicate those amino acids detected by carboxypeptidase digestion. The amino acids enclosed in boxes indicate the amino acid sequence consistent with the amino acid composition of the peptides (as shown in Table V) which we have isolated from an ayw subtype. The numbers below these peptides corresponds to the number of the peptide shown in Fig. 7. p-25-1 gave the NH2-terminal sequence Met-Glx by hydrolysis of the PTH-derivatives with NaOH-dithionite.
p-25-2 gave the NHn-terminal amino acid threonine (as a-aminobutyric acid) by hydrolysis with HI. No product was found in step 2. gp-30-1 and gp-30-2 could not be resolved sufficiently for sequencing.
These results are consistent with the conclusion that trypsin is able to hydrolyze only a t lysine 122 of the proposed sequence of p-25. It should be noted that all proposed sequences have arginine and lysine residues a t identical places (Fig. 4). Thus, the higher molecular weight fragment, p-25-1, corresponds to the NHn-terminal 122 amino acids of p-25, and p-25-2 corresponds to the carboxyl-terminal 104 residues of p-25. The apparent M , = 15,000 and 12,000 observed on sodium dodecyl sulfate gel electrophoresis are in fairly good agreement with the calculated true M , = 13,500 and 11,900, especially in view of the observation that other viral coat proteins may behave anomalously on sodium dodecyl sulfate gels (32). The reason for the failure of trypsin to cleave at all of the lysines and arginines is not known. The expected Lys-Pro bond a t residue 141 would be expected not to cleave; however, failure to cleave at other positions must reflect their inaccessibility. It must be pointed out that the protein during digestion is an insoluble suspension, and therefore much of the protein may not be exposed to the trypsin. Attempts to bring about more complete digestion by performing the digestion in the urea were not more successful; however, even 6 M urea does not solubilize the isolated HBsAg proteins. Digestions in 0.05% sodium dodecyl sulfate were also not more extensive, even though the proteins were soluble.
Tryptic Digestion following Aminoethylation-Complete aminoethylation of HBsAg proteins was never accomplished. Typical results of amino acid analysis indicated approximately 10-11 aminoethylcysteines/molecule of p-25 or gp-30. The extent of aminoethylation did not differ for the two proteins. Tryptic hydrolysis of reduced and aminoethylated p-25 and gp-30 proceeded to a greater extent, as expected. However, considerable insoluble material still remained, even after 18 h of digestion a t 37 "C (enzymembstrate ratio of 1:50). When  ' These values are those obtained from the sequence shown in Fig.  4 (17).
the insoluble peptides were examined by sodium dodecyl sulfate polyacrylamide gel electrophoresis, the results shown in Fig. 5 were obtained. The 15,000-dalton product was still observed in the case of p-25. However, the 12,000-dalton product was greatly reduced, and two lower molecular weight peptides were observed, with apparent M , = 6,700 and 7,900. gp-30 still gave rise to a product which co-migrated with the 15,000-dalton peptide of p-25, a lower molecular weight peptide with an apparent M , = 11,000, and two peptides which co-migrate with the 6,700 and 7,900-dalton peptides observed in p-25. When stained for carbohydrate, only the 11,000-dalton protein and the residual undigested gp-30 stained. The 12,000dalton fragments of both p-25 and gp-30 were extracted as described above and were subjected to two steps of Edman degradation. Both gave the NH2-terminal sequence Met-Glx by amino acid analysis after hydrolysis of the PTH-derivatives with NaOH-dithionite.
When the soluble peptides contained in the supernatant from the tryptic digestion of p-25 and gp-30 were examined by thin layer peptide mapping, the results in Fig. 6 were obtained. It is clear that the peptide maps are quite similar, with the differences being largely confined to a few spots in the lower right quadrant of the peptide maps. One might expect some differences due to the presence of carbohydrate on some peptide or peptides of gp-30, and also, one might expect the glycosylated fragments to be more soluble and perhaps therefore more extensively degraded. Both proteins show a group of poorly resolved spots in the central left region of the maps. All of the spots were scraped from the thin layer

6981
plates, and subjected to amino acid analysis. Most of the amino acids were contained in this poorly resolved region of the chromatogram and the peptides could not be identified. Therefore, high performance liquid chromatography was used to resolve these peptides.
High Performance Liquid Chromatography of the Soluble Tryptic Peptides of p-25 and gp-30-The soluble peptides from tryptic digestion of approximately 1 mg of p-25 and gp-30 were applied to the high performance liquid chromatograph and eluted as described under "Methods." The elution profiles are shown in Fig. 7. The profiles were nearly identical, differing only in the peptides labeled 1 and 2 in Fig. 7. The peptide fractions were collected and the peptides were subjected to amino acid analysis. Each 1-ml fraction was analyzed separately. The amino acid compositions of these peptides were identical for the two proteins, and are shown in Table V. The compositions are such that assignment to the HBsAg sequence predicted from the DNA sequence of the ayw subtype is unambiguous. These assignments are shown in Fig. 4. All of the peptides obtained from these digests could be assigned to three expected peptides predicted from the DNA sequence (Fig. 4). Peptides 1 and 2 of the chromatogram are residues 139-147 and 140-147, respectively. Peptides 4-8 of the chromatogram are residues 125-137, 138, or 139, and peptide 9 is residue 123-138 or 139. The multiplicity of peaks appears to be due only to differences in the amount of aminoethylcystein present, with earlier peaks containing more aminoethylcysteine. Edman degradation of these peptides has confiied these assignments. The dipeptide threonine-aminoethylcysteine (peptide 0 ) occurs twice in the expected sequence, at residue 123-124 and 148-149 and was eluted with the large absorbance peak at 4-5 min on the chromatogram (Fig. 7).
Deglycosylation of HBsAg proteins-The effect of treatment of HBsAg with anhydrous hydrofluoric acid for 1 h at room temperature is shown in Fig. 8. Under these conditions, gp-30 is no longer observed, and a single major protein corresponding to p-25 is present in increased amounts. In addition, smaller amounts of lower molecular weight products are seen, probably due to nonspecific cleavage of some peptide bonds.
Alkaline Sulfite Treatment-Treatment of p-25 and gp-30 with alkaline sulfite resulted in the appearance of essentially identical amounts of cysteic acid, 5.6 and 5.7 nmol/nmol of protein, respectively. The serine and threonine contents were also identical. These results suggest that the carbohydrate is not attached by 0-glycosidic bonds to serine or threonine. These results would support the conclusion of Shiraishi et al.  Values in parentheses are those expected, based upon the sequence shown in Fig. 4 (17). ND, not determined.

DISCUSSION
These results demonstrate that p-25 and gp-30 have the same amino acid comosition, the same NH2-terminal sequence, and the same carboxyl-terminal sequence. Also, both are cleaved into two large fragments only by trypsin. The fragments corresponding to the NH2 terminus of each protein are identical. However, the carboxyl-terminal fragments differ in that the fragment from gp-30 stains for carbohydrate. Digestion after aminoethylation demonstrated that the carboxyl-terminal fragments of both p-25 and gp-30 were further digested to give a number of identical peptides, with the only difference being demonstrable in a large peptide-containing carbohydrate. These results and the fact that treatment of HBsAg with anhydrous HF results in the apparent conversion of gp-30 into p-25, strongly support the conclusion that the proteins differ only in the presence of carbohydrate on gp-30.
Although only limited sequence data are available, the data support the conclusion that the protein sequence predicted from the DNA sequence is indeed that of the isolated protein. This conclusion is based on sequence data derived from the NH2-terminal end, the middle, and the carboxyl-terminal end of the protein. It is unlikely that the other regions differ from the expected sequence. However, the only way to verify this conclusion is to obtain further sequence data on these proteins. This has proven to be quite difficult due to their extreme hydrophobicity and insolubility, and their resistance to proteolytic digestion. Although the failure of trypsin to cleave at all of the expected sites in the protein cannot be explained at present, this failure has allowed us to isolate some large peptides such as p-25-1 and p-25-2, which should prove useful as starting material for further digestion with different proteases. Papain and chymotrypsin appear to generate peptide fragments from other regions of the protein in reasonable yield and this should provide a source of additional sequence data?
The structure of the carbohydrate present in HBsAg and the nature of the amino acid-carbohydrate linkages are not known. However, it is apparent that since both p-25 and gp-30 are cleaved by trypsin to a 122-residue, NH2-terminal fragment that does not contain detectable carbohydrate, most, if not all, of the carbohydrate must be attached within the carboxyl-terminal 104 amino acids of gp-30. The proteins contain only three Asn-X-zg sequences. Of these only one is found in this carboxyl-terminal region of the molecule, at residue 146. This residue is also found in the only peptide which was found to differ from p-25 on high performance liquid chromatography, although at present we have no evidence that this peptide contains attached carbohydrate. The protein is uniformly rich in serine and threonine residues that could also serve as carbohydrate attachment sites, although we have not detected this type of linkage by alkaline sulfite treatment.
Other circumstantial evidence that the carbohydrate is attached in the NH2-terminal region of the carboxyl-terminal 104-residue peptide of gp-30 is obtained from tryptic digestion of the aminoethylated protein. The 104-residue peptide is cleaved into two peptides with M, = 6700 and 7900, neither of which contain detectable carbohydrate. These fragments must both surely contain the expected tryptic fragment consisting of residues 170-221 (Fig. 4), suggesting that the carbohydrate is most likely located between residues 121 and 170. This central region contains the asparagine residue 146 (and numerous threonine and serine residues) for possible attachment sites. We are currently attempting to isolate and identify such glycopeptides.
It is interesting to note that this region of the protein (residues 120-150) is the most variable region of the molecule, based on the DNA sequence data. Of 20 differences in the three known sequences, seven occur between residues 120 and 143. It is tempting to speculate that this region of the molecule contains those amino acids which are responsible for the antigenic variability. The peptides which we observe in our ayw antigen are not completely identical with those predicted from the DNA sequence of a different ayw virus (17). However, the differences we observe are due to amino acid changes which are predicted to occur, based upon the other two DNA sequences (15,16). This demonstrates that at least some changes are not reflected in changes of the subtype. The ease of identifying the peptides from this region of the molecule by high performance liquid chromatography of the tryptic digests of aminoethylated protein should allow us to examine a large number of different HBsAg samples to see if any of these observed differences are consistently associated with different subtypes, or are due to microheterogeneity with no relationship to subtype. Such studies are currently in progress.