Amino Acid Sequence of Rubber Elongation Factor Protein Associated with Rubber Particles in Hevea Latex*

The amino acid sequence of rubber elongation factor, a recently discovered protein tightly bound to rubber particles isolated from the commercial rubber tree Hevea brasiliensis, is presented. The role of this protein in rubber elongation and its interaction with pren- yltransferase and rubber particles have been discussed in the preceding paper in this series (Dennis, M. S., and Light, D. R. (1989) J. Biol. Chem. 264, 18608- 18617). Trypsin, Staphylococcus protease, chymotrypsin, acetic acid, and hydroxylamine cleavage were used to generate peptide fragments that were isolated by reverse phase high pressure liquid chromatography and analyzed by amino acid composition and automated Edman degradation. Each digest contained one blocked peptide identified as the amino terminus. The blocked amino-terminal peptide from the tryptic digest was analyzed by amino acid composition, fast atom bombardment mass spectrometry (molecular ion 1659.9), subdigested with Staphylococcus protease for partial sequence analysis, and finally deblocked with bovine liver acyl-peptide hydrolase removing an acetylalanine to allow analysis by Edman degradation. Rubber elongation factor is 137 amino acids long, has a molecular mass of 14,600 daltons, and lacks four amino acids: cysteine, methionine, histidine, and tryptophan. The NHz terminus is highly charged and con-

The amino acid sequence of rubber elongation factor, a recently discovered protein tightly bound to rubber particles isolated from the commercial rubber tree Hevea brasiliensis, is presented. The role of this protein in rubber elongation and its interaction with prenyltransferase and rubber particles have been discussed in the preceding paper in this series ( Trypsin, Staphylococcus protease, chymotrypsin, acetic acid, and hydroxylamine cleavage were used to generate peptide fragments that were isolated by reverse phase high pressure liquid chromatography and analyzed by amino acid composition and automated Edman degradation. Each digest contained one blocked peptide identified as the amino terminus. The blocked amino-terminal peptide from the tryptic digest was analyzed by amino acid composition, fast atom bombardment mass spectrometry (molecular ion 1659.9), subdigested with Staphylococcus protease for partial sequence analysis, and finally deblocked with bovine liver acyl-peptide hydrolase removing an acetylalanine to allow analysis by Edman degradation.
Rubber elongation factor is 137 amino acids long, has a molecular mass of 14,600 daltons, and lacks four amino acids: cysteine, methionine, histidine, and tryptophan. The NHz terminus is highly charged and contains only acidic residues (5 of the first 12 amino acids). The first four amino acids are highly represented in other known NHz-terminally acetylated proteins. Comparison of the sequence of rubber elongation factor with other known sequences does not reveal significant sequence similarities that would suggest an evolutionary relationship.
In earlier communications we described experiments designed to elucidate the biosynthesis of rubber in Hevea brasiliensis (1-3). Rubber transferase described by Archer et al. (4, 5) was purified in our laboratory from the latex of the commercial rubber tree, H. brasiliensis, based on its ability to elongate rubber molecules by cis additions of isopentenyl pyrophosphate (IPP).' After purification, this enzyme was * The determination of T1 by fast atom bombardment mass spectrometry was supported by Grant MH 23861 from the National Institute of Mental Health (to Dr. J. D. Barchas, Stanford). This paper is the fourth in a series on the biochemistry of cis-polyisoprene rubber synthesis. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C.
Section 1734 solely to indicate this fact.
I( To whom correspondence should be addressed.  (2,3), the identification, purification, and characterization of this protein is described in detail. In this report, we describe the complete amino acid sequence of REF and discuss its role in rubber biosynthesis.

Purification of REF
Purification of REF was performed as described previously (3). Rubber particles were isolated containing greater than 95% pure REF, extracted with 1% SDS and the protein was acetone-precipitated after removal of the rubber fraction by centrifugation. Unless otherwise specified, REF was then further purified by SDS-PAGE followed by electroelution from the gel.

Sequence Similarity Search
We searched for sequence similarity between REF and other known sequences using FASTP, a rapid search program based on the Lipman-Pearson method (6) and tested the quality of the alignments using a global alignment program, HOM.GLOBAL, based on an accepted algorithm (7)(8)(9). Both programs were written by Colin Watanabe, Genentech, Inc. The data base consists of sequences collected by the National Biomedical Research Foundation (7400 sequences, Release 16, March, 1988), translations of significant coding regions (4000 sequences) from the GenBank Nucleotide Database (Release 55, March, 1988) and EMBL (Release 13, October, 1987), and around 700 in-house sequences.

Secondary Structure Prediction
In an attempt to predict regions of secondary structure in REF the sequence was analyzed by the program HYDRO which uses a Fourier Portions of this paper (including part of "Materials and Methods," Figs. 3-11, Footnote 3, and Tables 1-21) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. transform amphipathic analysis algorithm and which was written by Janet Finer-Moore and Robert M. Stroud, University of California, San Francisco (10). We also used secondary structure prediction algorithms for globular proteins using the Chou-Fasman method (11) and the Robson-Garnier method (12).

Peptide Nomenclature
The peptides were named according to the digestion procedure used, and are numbered from the amino terminus. For double digestions of fragments, the name from the first digest is given followed by a name derived from the digestion procedure employed starting from the amino terminus of the parent peptide. Where more than one fragment for a given digestion was obtained over the same region owing to alternate cleavage sites, the peptides are distinguished by the letters a orb. Specific designations follow.

Purification and Amino Acid Anulysis-REF was purified
from the commercial rubber tree H. brasiliensis grown in Costa Rica as previously described (3). When washed rubber particles are extracted with 1% SDS, only one protein of 14.6 kDa is observed by SDS-PAGE (3). SDS was used to release REF from the particles, and then the SDS was removed by acetone precipitation. Following resuspension, REF is suitable for hydrolysis by trypsin, Staphylococcus protease, acetic acid, hydroxylamine, and chymotrypsin. CNBr digestion was ruled out after amino acid composition analysis (Table 1) which revealed a lack of methionine as well as cysteine and histidine in the sequence of the protein. After completing the sequence reported in Fig. 1, the lack of these amino acids and, additionally, the lack of tryptophan (not detected by amino acid analysis) was confirmed. Intact REF was almost completely resistant to digestion with Staphylococcus protease at pH 8 in ammonium bicarbonate, although one peptide V1 (position 114-137) was isolated and the sequence determined. This sequence allowed the alignment of tryptic peptides T10-T13. Since both V1 and T13 end with an asparagine at position 137 there is strong evidence that this is the carboxyl terminus. Compositions of VI, HA7, and T13 are also consistent with Asn-137 being the carboxyl terminus of REF (Tables 1-3). Additional evidence was provided by the lack of a Staphylococcus protease cut at Glu-136. Austen and Smith (13) have shown that this protease will not cut at glutamic or aspartic acids if these residues are within 2-3 residues from the amino or carboxyl terminus.
A potential cleavage site for hydroxylamine (Asn-Gly) at position 72 was observed when the sequence of peptide T6 was determined; subsequent digestion of REF with hydroxylamine (Fig. 4) and identification of the unblocked peptide NG2 (73-137) by sequence analysis provided overlap for T6-T9. Positions 81 and 88 were not identifiable in the first sequencing run. These positions were later identified as threonine in a second sequence analysis with more material.
Acetic acid digestion of REF resulted in poor yields and peptides were difficult to resolve on HPLC (Fig. 5). However, HA7 (positions 100-137) provided overlap between NG2 and VI, and these peptides along with the tryptic peptides T6-T13 provided contiguous sequence from position 67 through the carboxyl terminus. HA4 (position 51-79) was used to align tryptic peptides T4-T6, and HA3 (position 41-50) extended the sequence of T4 back to position 41.
In order to connect the carboxyl-terminal side of HA2 at position 40, a chymotryptic peptide was required. Digestion of REF with chymotrypsin provided a complicated mixture of peptides; however, subdigestion of NG1 by chymotrypsin (Fig.  6 ) allowed identification of peptide NG1-CY6 (position 38-46) which provided the needed overlap to extend the sequence back to position 23 with HA2. An additional chymotryptic peptide NG1-CY2 (position 17-26) enabled extension of the sequence back to position 17.
Since the composition of T1 (Table 4) was known to be different than the NHz-terminal tryptic contained in HA2, it was suspected that a tryptic peptide was missing from the digest in this region of REF. At this point peptide T2 which was previously not observed was obtained in low yield (Fig.  3) confirming the sequence back to position 17. This peptide was later obtained from a tryptic digestion of NG1 (NG1-T2) where it was recovered with much higher yield (Fig. 7), allowing identification of a tyrosine at position 16. The only remaining place to insert the sequence of T1 is preceding Tyr-16. Based on the composition of the entire protein, this accounted for all the tryptic peptides (Lys = 7, Arg = 4). Sequence of T1 "In order to elucidate the primary sequence of T1, first atom bombardment mass spectrometry, amino acid composition, subdigestion with Staphylococcus protease, and acyl-peptide hydrolase were used. The amino acid composition of T1 indicated 3 Asp, 3 Gly, 1 Ala, 1 Leu, 1 Lys, and 6 Glu residues, when normalized to 1 Lys (Table 4). To verify the amino acid composition especially the high level of glutamic acid and to identify the amino-terminal blocking group, fast atom bombardment mass spectrometry was used to determine an accurate molecular weight for T1. In the positive ion mode, a molecular ion of 1660.895 was observed (Fig. a), whereas in the negative ion mode a molecular ion of 1658.98 was observed (Fig. 8B), suggesting a molecular mass of approximately 1659.9 Da. No other major peaks in either mode were detected. The fragment molecular weight was then compared with the calculated core molecular weight from the Primary Structure of Rubber Elongation Factor T1-Vla and TI-Vlb were both blocked as determined by sequence analysis, and at least part of T1-Vla is contained in T1-Vlb since both peptides contain the single alanine in T1. Peptide Tl-V2 which contains the single lysine is the carboxyl terminus of the tryptic peptide, T1, and must also be partially contained in TI-Vlb. Thus alanine, not present in Tl-V2, must be contained within the first four amino acids.
Sequence analysis of Tl-V2 provided just over half the sequence of T1 (DNQQGQGE); however, the sequence ended three amino acids (G, L, K) short of that predicted by amino acid analysis due to wash out on the sequencer. Repeated analyses failed to provide any further sequence. The composition of T1-Vla (A, D, 2E) indicated the location of the remaining amino acids at the amino terminus.
Two different attempts to remove the blocking group on the amino terminus were made. The first, enzymatic removal of a potential pyroglutamic acid by pyroglutamate aminopeptidase, failed to deblock the T1 peptide (data not shown). The second, enzymatic removal of a potential acetyl blocking group (Fig. 8C) by acyl-peptide hydrolase, yielded an unblocked peptide (Fig. lo), which when sequenced revealed EDEDNQQGQ. The last part of the sequence could not be detected, although an additional 2G, E, L, and K were detected by amino acid analysis. Thus, the only amino acid lost during the hydrolysis is alanine, the amino terminus blocked by an acetyl group.
From the composition of chymotryptic peptides NG1-CYla and NG1-CYlb, strong evidence was provided that T1 indeed preceded peptide T2: NG1-CYla has the same composition as T1 with an additional tyrosine, while NG1-CYlb lacked only the lysine when compared to the T1 composition. In addition, subdigestion with Staphylococcus protease of the NG1 peptide in phosphate, pH 7.4 ( Fig. l l ) , provided several peptides in the amino-terminal region of the protein, including NG1-V1, providing an overlap between T1 and T2. This peptide fixed the positions of Gly-13, Leu-14, and Lys-15, and thus, with the sequence from T1-ACE and Tl-V2, completed the entire sequence of REF.

DISCUSSION
In the previous two papers in this series (2,3), we reported the identification and characterization of REF. The presence of this protein on rubber particles is required for rubber elongation. We proposed that REF interacts with Hevea prenyltransferase to alter the stereochemistry of IPP addition from the normal trans addition to cis and overrides the normal termination after two trans additions to affect the formation of cis-polyisoprene. This protein is only associated with rubber particles and is not detected in latex serum. Enzymatic, chemical, or physical removal of REF from the rubber particles renders them incapable to accept further IPP additions during incubation with a prenyltransferase. Removal of REF from rubber particles and reconstitution of a rubber biosynthetic system has not been successful to date. However, affinity-purified antibodies raised specifically to REF inhibit rubber biosynthesis in vitro. Fig. 1. REF was digested with trypsin, acetic acid, hydroxylamine, Staphylococcus protease, and chymotrypsin to yield overlapping fragments that resulted in sequence determination, All peptides from each digest were recovered with the exception of T7 and HA5 which were never observed, suggesting that this region of the protein may present recovery problems due to its hydrophobic nature. The sequence provided by NG2 eliminated the need to pursue this problem, however. One peptide from each digest was observed to be blocked to Edman degradation, and the NHz-terminal blocking group of the protein was identified using fast atom bombardment mass spectrometry and subdigestion of the NHz-terminal tryptic fragment. From the fast atom bombardment data and the known composition of T1, an acetyl group was predicted to be the blocking group. Acyl-peptide hydrolase which removes an acetylalanine was then used to allow Edman degradation of the amino-terminal tryptic. This enzyme was not capable of removing the acetyl-alanine from the intact protein under a variety of conditions tested. Thus acyl-peptide hydrolase digestion of the NHz-terminal tryptic T1 was performed to yield a peptide of which the sequence could be determined. REF is 137 amino acids long with a molecular mass of 14,604 Da and does not contain four of the amino acids: cysteine, methionine, histidine, and tryptophan.

Sequence Analysis of REF-The complete protein sequence of REF is shown in
Acetylated NH2 Terminus-NHz-terminal acetylation is a common post-translational modification of eukaryotic proteins. The function of acetylation is unknown, although protection against proteolysis has been suggested (14). We note that both REF and the prenyltransferase we purified from H. brasiliensis serum (1) have blocked NH2 termini, although the blocking group of the latter protein was not determined. When the NHz-terminal sequence of REF is compared to other NHZterminally acetylated proteins it appears to be very typical of this class of proteins. Thus, acetylalanine is a very common first amino acid, a close second to acetylserine, and the glutamic acids at positions two and four are consistent with the observation that negatively charged residues predominate these positions (14). However, the acidic amino acid content of the NHZ terminus of REF is noteworthy even in the class of acetylatedproteins. Thus, Lys-15 is the first positive charge in the sequence and the first 5 of 12 amino acids are negatively charged and 9 of the first 12 are from the group aspartic acid, asparagine, glutamic acid, and glutamine.
Search for Similar Sequences-We searched extensively for similar sequences to REF. We note that the sequence from amino acid 48-73 shows a 42% sequence similarity to a region in the major chlorophyll a/b binding polypeptide from the light harvesting complex. These plant proteins bind chlorophyll and its isoprenyl side chain (15,16). However, REF is not likely to be related evolutionarily to these proteins, even by this relatively short sequence. There is no compelling functional similarity between the two proteins, and there are too many evolutionarily disfavored pairings in the alignment. If the nonidentical residues in this region are evaluated by a mutation data matrix (24), only 12 out of the 29 point mutations implied by the alignment are either accepted or neutral. All other such alignments can be dismissed by similar arguments.
Secondary Structure Predictions-The NHz terminus of REF is one of four regions predicted to be predominately ahelix by both the Chou-Fasman method (11) and the Robson-Garnier method (12). These regions are shown graphically in Fig. 24 in relation to the two-dimensional plot of the Fourier- transform of the hydrophobicity of the individual amino acid side chains. Such plots can be used to identify amphipathic secondary structural elements (10). Overall REF is predicted to have a high amount of amphipathic a-helix, since many of the contours throughout the length of the sequence in Fig. 2A fall on the 1/3.6 line, the frequency of an a-helix. The ahelices I and 111 predicted by the Chou-Fasman (11) and Robson-Garnier (12) methods do not match regions of amphipathic a-helix predicted by this latter method. Predicted a-helix I1 coincides with several contours centered on the I/ 3.6 line, and predicted a-helix IV is also centered around a strongly predictive 10 contour feature on the 1/3.6 line. Predicted a-helix IV is such a prominent feature in this twodimensional plot that we have plotted this sequence in Fig.  2B as a hydrophobic net analysis (17). Note that Pro-94-Pro-95 and Pro-123-Gly-124 at the beginning and end of this feature should break the a-helix if it extends the length of this 27-amino acid stretch.
The hydrophobic face that would be an important secondary structural element of an amphiphatic a-helix formed in this region is outlined in Fig. 2B. REF coats rubber particles at the interface between the aqueous serum and the extremely hydrophobic polyisoprene molecules in a manner that is at least superfkially similar the protein coating of the lipid and cholesterol ester core of high density lipoprotein. Amphipathic a-helices are important elements of models of self-association and surface binding of apolipoprotein E (18) and apolipoprotein A-I (19) in this latter example of a protein-lipid interface.
As more of the structural elements of REF are determined, we can begin to create a better picture of the role of REF, phospholipids, and cis-polyisoprene on the surface of the growing rubber particle.
Attempts to Clone REF-Isolation of quality mRNA suitable for making a cDNA library from H. brasiliensis has been unsuccessful in our laboratory owing to time delay in obtaining fresh tissue samples. Further, the lack of methionine and tryptophan and the high content of serine, leucine, and arginine in the protein sequence leads to redundancy in any possible DNA probe to locate the gene. It is not known in which part of the Heuea plant REF is synthesized. It is present in latex that is derived from the cytosol of the cells fused to form the laticifers ( 5 ) , and Western analysis (25) demonstrates its presence in leaves (data not shown). Whether REF is present in leaves in newly formed laticifers or due to synthesis in leaf or cambium cells prior to secretion into the latex is not known. A synthetic gene for REF has been constructed using high frequency codons from known plant genes. The gene product is expressed at low levels in Escherichia coli as measured by irnmunoblots (data not shown). The lack of an assay for REF that is not bound to rubber particles (3) and the low expression levels impede further characterization of the recombinant product. However, this DNA probe will be a useful tool for ongoing studies in H. brasiliensis.