Amino Acid Sequence of Small and Large Subunits of Seed Storage Protein from Ricinus communis *

The low molecular weight, glutamine-rich storage protein isolated from the seeds of Ricinus communis (castor beans) has been shown to consist of two differ- ent polypeptide chains linked by disulfide bond(s). The small subunit is composed of 34 amino acids with a proline at its NH2 terminus, whereas the large subunit contains 61 amino acids with a cyclized glutamine as the NH2-terminal residue. The complete amino acid sequence of both subunits has been determined through characterization of the isolated subunits and selected peptides from trypsin, chymotrypsin, thermolysin, and cyanogen bromide cleavage. The intact protein pos- sesses a large number of glutaminyl and half-cystinyl residues and exhibits sequence heterogeneity as ob- served from peptide sequences. Comparison of the sequence of this protein and those of other seed proteins indicates some structural similarities between them. The amino acid sequences of the two polypeptide chains of castor bean storage protein are:

The castor bean (Ricinus communis) has many industrial and medicinal values. However, it cannot be consumed by humans because of the presence of toxic ricins (1) and allergens (2). It is of interest to study the storage proteins, since they account for the majority of total seed proteins. It has been reported that the storage proteins contain a high percentage of glutamine, which provides a source of nitrogen for the developing seedlings (3, 4). Storage proteins have been isolated from the seeds of many important crops such as soybean (5), corn (6), wheat (7), barley (8), rice (9), and rapeseed (10). However, their complete structures have not been reported. Lonnerdal and Janson (10) have shown that the rapeseed storage proteins have molecular weights of 12,000 to 14,000 with an isoelectric point around 11 and contain glutamine in excess of 30%. We have previously reported the isolation of low molecular weight, glutamine-rich proteins * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ To whom all correspondence should be addressed.
from the seeds of R. communis (11) and Momordica charantia (12). As previously noted (ll), the castor bean storage protein possesses physicochemical characteristics, including partial amino acid sequence, similar to those of allergens and protease inhibitors. During the course of complete sequence determination reported here, it was found that this castor bean protein consists of two different polypeptide chains which appear to be linked by disulfide bond($. These studies have also shown evidence for sequence heterogeneity in the castor bean protein and its structural similarities to other seed proteins.

RESULTS
Molecular Weight a n d Composition of Subunits-The protein was analyzed by SDS2-gel electrophoresis both before and after reduction. As shown in Fig. 1, the intact protein gave a single band of apparent M , = 9,000, while the reduced protein sample gave two faster migrating bands indicating the presence of two different polypeptide chains linked by disulfide bond(s). The large subunit appeared to have M , = 6,600 k 500 and the small subunit showed a faint band at 4,800 f 500, possibly due to losses during staining procedures. The complete amino acid sequences of the two subunits determined in this study establish their M , to be 7,000 and 4,000 for the large and small subunits, respectively. The calculated M , of the intact storage protein is 11,000 in contrast to the apparent M , = 9,000 as estimated by SDS-gel electrophoresis.
This discrepancy may be due to the faster mobility on SDSgel of the intact protein in which the two subunits are tightly linked by disulfide bond($. After reduction and carboxymethylation of the intact protein, the two subunits could be separated by gel filtration on Sephadex G-25 (Fig. 1). Amino acid composition was calculated on the basis of molar ratios and the assumed minimum molecular weights. As indicated in Table I   radations on 30 nmol of the large subunit failed to yield any identifiable amino acid. Twenty-four cycles of degradation on an oxidized sample of the intact protein gave only the amino acid sequence of the isolated small subunit. These results indicated that the large subunit possesses a blocked NH2terminal residue. The complete amino acid sequence of the small and large subunits was subsequently established from analyses of selected peptides purified from trypsin, chymotrypsin, thermolysin, and CNBr cleavage, summarized in Fig.  2.
The compositions of peptides T1, T2, T3, and C1 are consistent with the NH2-terminal sequence of residues 1 to 24 of the small subunit. The sequence of peptides T4, T6, C2, C3, C4, C5, and B1 confirmed the sequence of residues 18 to 34 of the small subunit. The low recovery of CNBr peptide B1 was presumably due to oxidation and cleavage of tyrosine at position 22 by some bromine presence in the cyanogen bromide reagent. Glutamic acid, probably resulting from deamidation of glutamine, was identified a t positions 18 and 25 from peptides T4 and T6. Peptides C2 and C3 have the same sequence except that peptide C2 has an extra arginine a t its COOH terminus. The recovery of peptide C3 was probably due to the residual tryptic activity of chymotrypsin. Peptide T5 containing free arginine was presumably derived from the COOH terminus of the small subunit.
As is the case with the isolated large subunit, no sequence was obtained from automated Edman degradations on peptides T11, Hll, and B11, presumably due to cyclization of NH2-terminal glutamine. The sequences of peptides T12, T13, T14, T15, and T17 were determined completely. Trypsin was unable to cleave the Arg-Cys bond a t position 20, presumably due to the fact that two acidic residues were present on either side of this arginine (13). Peptides C11 and C13 resulted from incomplete cleavage by chymotrypsin. The composition of the minor peptide C12 was similar to that of peptide C11, along with some contamination from the neighboring peptides. Peptide C14 was tentatively aligned in the COOH-terminal region of peptide C11. The sequence of peptides C15 and H17 provided the overlap for peptides H16, T17 and (216. The sequence of chymotryptic peptide C16 was identical with that of tryptic peptide T17 except the peptide C16 contains an extra arginyl residue a t its NH2 terminus. The sequences of    peptides T17, C16, H20, and B12 were aligned at the COOH terminus of the large subunit because peptide B12 did not contain homoserine and homoserine lactone. Peptide H14 had serine instead of leucine a t position 33, indicating sequence heterogeneity of the large subunit.

DISCUSSION
The cyclization and deamidation of glutamine residues presented major difficulties during the sequence determination of this glutamine-rich castor bean protein. It may be noted that peptides derived from thermolysin digestion are most useful for sequence determination of glutamine-rich proteins. Since thermolysin specifically cleaves on the amino side of hydrophobic amino acids, but not glutamine, the cyclization of NHz-terminal glutamine during peptide purification and sequence analysis can be avoided (14). Every amino acid residue present in both small and large subunits was positively identified by Edman degradation, except the first three residues from the NH2 terminus and the last COOH-terminal residue of the large subunit, which were deduced from peptide composition. Edman degradation on peptides T11, H11, and B11, as well as the large subunit, did not yield any identifiable residues because of cyclization of NHz-terminal glutamine. The detection of these three peptides by fluorescamine yielding medium intensity spots on peptide maps may have resulted from some reversion of pyroglutamic acid to glutamate in the presence of triethylamine during the fluorescamine staining procedure! Peptides T1 and C1 containing the NH2terminal proline in the small subunit were also detected as faint spots by the fluorescamine staining procedure, as previously reported by Mendez and Lai (15).
The amino acid sequence of this protein indicates that 38% of 34 residues in the small subunit and 25% of 61 residues in the large subunit are either glutamine or glutamic acid. The identification of glutamic acid a t positions 18 and 25 of the small subunit was assumed to be due to the deamidation of glutamine during peptide purification and sequence analysis. However, heterogeneity of genetic origin might also be a possibility. Both serine and leucine were recovered a t position 33 in the large subunit of this protein. Further, the existence of sequence heterogeneity appears to be characteristic in many seed proteins (4,5,7). Most of the glutamine and glutamic acid residues in the large subunit are clustered within the NH2-terminal two-thirds of the polypeptide chain (35% of fist 40 residues being glutamic acid or glutamine), whereas only a single glutamine residue is present at position 58 in the COOH-terminal 21 residues (5% of 21 residues being glutamine). The similarity of this castor bean protein sequence to other known sequences of plant proteins was examined by C. Y. Lai, personal communication.
computer search analysis and is shown in Fig. 3. The amino acid sequence of the small subunit was unexpectedly found to be homologous with the sequence of residues 116 to 149 of the sweet protein thaumatin I from the fruit of Thaurnatococcus daniellii (16). Of the 34 residues compared, 9 residues including 2 Cys were identical (26% identity). However, sequence homology was not detected between this castor bean protein and another sweet protein, monellin from the fruit of the West African plant Dioscoreophyllum comminsii. The monellin protein was found to consist of two subunits of 43 and 50 amino acids (17). The amino acid sequence of the large subunit of castor bean protein also shows sequence homology with plant protease inhibitors. The NHp-terminal25 residues of the large subunit were aligned with residues 11 to 38 of the Bowman-Birk protease inhibitor of lima bean with a 3-residue gap for maximum homology (18,19). Of the 28 residues compared, 12 residues, including 4 Cys, are identical (43% identity). In the case of the Bowman-Birk sequence, the Lys-Ser at position 26-27 is the active site of the trypsin inhibitor (20-23). The Arg-Ser a t position 3-4 of the large subunit might also be involved in a similar function, since this native castor bean protein seems to be resistant to digestion by trypsin, chymotrypsin, and thermolysin (preliminary result). Thus, the large subunit may function as an endogenous protease-inhibitor such as is found in the endosperm of monocots and the cotyledons of dicots (3,24).
Low molecular weight storage proteins have also been isolated from Brassica napus (rapeseed) and Momordica charantia in addition to Ricinus communis (10)(11)(12). However, the storage proteins from other plants (3)(4)(5)(6)(7)(8)(9) have been reported to possess much higher molecular weights. Further studies of these seed proteins would be of importance to determine their structural similarities, biological functions, and genetic origins. The nutritional quality of these seed proteins for feed and food purposes could be improved by using genetic engineering techniques to select the genes producing seed proteins with desirable characteristics (25,26).