Structure of the Rapeseed 1.7 S Storage Protein, Napin, and Its Precursor*

Napin (1.7 S protein) is a basic, low molecular weight storage protein synthesized in rapeseed (Brassica napus) embryos during seed development. Napin is composed of two polypeptide chains with molecular weights of 9000 and 4000 that are held together by disulfide bonds. Comparison of the deduced amino acid sequence of a napin cDNA clone with that of napin peptide fragments established that napin is initially synthesized as a precursor of 178 residues. This polypeptide is subsequently processed through several proteolytic events, which ultimately generate the two mature napin chains, of 86 and 29 residues, respectively. Protein biosynthesis in vitro showed that the initial translation product (Mr 20,000) contains a signal sequence which is removed during transfer of the protein into the endoplasmic reticulum. Two additional peptides, of 22 and 19 residues, as well as the COOH-terminal residue, are also removed during maturation of napin, as deduced from the sequence comparison. Comparisons of the napin sequence with other known protein sequences established that there is a significant homology between napin and two other small seed proteins, the castor bean storage protein and a trypsin inhibitor from barley.

Rapeseed (Brassica napus) has been used as a cultivated plant for 4000 years and is today a major oil-seed crop in many parts of the world. In addition to a high lipid level, the seeds also have a significant protein content, which constitutes some 20-25% of the dry seed weight (1). The predominant protein species are two seed storage proteins: the 12 S and 1.7 S proteins. The 12 S protein, or cruciferin, is a high molecular weight, neutral complex, composed of several polypeptide chains (2). In contrast, the 1.7 S protein, or napin, is a low molecular weight, basic protein, composed of two disulfide-linked polypeptide chains (3,4).
The expression of both cruciferin and napin appears to be strictly regulated, as their synthesis is confined to embryonal and axis cells and occurs only during a limited time of seed * This work was supported by grants from the Swedish Council for Forestry and Agricultural Research and the Swedish Council for Science. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported  development (5). Storage proteins are synthesized on and translocated across the rough endoplasmic reticulum membrane. Eventually they become deposited in distinct cellular vesicles, protein bodies, that most probably are derived from the endoplasmic reticulum (6). The possibility of engineering these proteins to improve their amino acid composition from a nutritional point of view has been given much attention. To this end, more has to be learned about the specific mechanisms underlying storage protein transport to, and deposition in, the protein bodies. No study has so far been able to reveal any general signal common to storage proteins that could serve for these purposes. As a prerequisite for studies on the biosynthesis and intracellular transport of storage proteins we have determined the primary structure of napin. We report here the sequence analyses of a cDNA clone and of napin peptide fragments. Sequence comparisons have enabled us to deduce the primary structure of the precursor and of the two mature napin chains.

DISCUSSION
Sequence analyses of napin polypeptide chains and of a cDNA clone encoding napin have enabled the elucidation of the primary structure of both the precursor and the mature napin chains. Napin is synthesized as a precursor consisting of 178 residues on membrane-bound ribosomes. During the transfer of the polypeptide into the microsomal lumen, a signal peptide is removed. Our data do not unambiguously establish where the cleavage of the signal sequence occurs. Nevertheless, we have tentatively assigned the alanine (-1 in Fig. 5) as the processing site based on (i) general features of signal sequences, (ii) the "-1, -3 processing rule" (21,22), and (iii) the shift in molecular weight of the products in vitro made in the absence or presence of microsomal vesicles (Fig.   3). The Ala (-I) residue constitutes the best compromise to meet these three criteria.
In addition to the above-mentioned processing, sequence comparisons reveal that amino acids +1-22,52-70, as well as the carboxylterminal residue are removed during maturation of napin. Fig. 6 gives a schematic representation of the proc-Portions of this paper (including "Materials and Methods," "Results," Figs 1-4, and Tables 1-111) are presented in miniprint at the end of this paper. Fig. 5 is presented in the Appendix. The abbreviations used are: PTH, phenylthiohydantoin; SDS, sodium dodecyl sulfate; DNP, 2,4-dinitrophenyl. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 86M-1576, cite the authors, and include a check or money order for $4.00 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press. essing steps that the initial translation product undergoes to yield the mature napin. The order, intracellular location, and mechanism of the events involved in these latter processing steps of the napin precursor are not known. Enzymatic cleavage of other types of storage protein precursors takes place in the protein bodies (23). However, although the embryonal cells of B. napus do contain protein bodies (24), it has not actually been shown that napin is located in these vesicles. It is nevertheless quite likely that the napin precursor, from which the signal sequence has been removed, is transported to and deposited in protein bodies prior to the final processing. One experimental finding in support of this postulate is that the maturation of pulse-labeled napin is slow in vivo, whereas the removal of the signal peptide is fast (25).
Other storage proteins, e.g. those of legumes, have been found to possess conserved sequence motifs around the processing sites (26) which could be an indication of a conserved specificity of the processing enzymes. We were not able to find any analogous sequence motifs in the napin precursor and, consequently, we do not know whether the final processing steps of the napin precursor occur by endoproteolytic events alone or by exoproteolysis following an endoproteolytic cleavage at a single site within the removed peptide. The involvement of exoproteolytic activity in the protein maturation process gains some support from the observation that a fraction of the heavy chains are lacking the two NH2-terminal amino acids and that all of the heavy chains seem to have lost their COOH-terminal tyrosine residue. We cannot presently strictly rule out the possibility that these modifications occur during the purification of napin. An interesting feature of the napin precursor is that the charged amino acids are unevenly distributed and that 12 out of 16 of the negatively charged residues are removed during maturation. This gives the mature protein (PI = 11, Ref. 3) a substantially higher PI than that of the precursor (PI = 7 , Ref. 4). The significance, if any, of this fact for the processing and deposition of the napin precursor remains to be established.
Analyses of the protein fragments, the cDNA clone in this work, and two other cDNA clones characterized by Crouch et al. (4) have confirmed that napin is heterogenous in sequence.
Although this heterogeneity appears to be quite limited, it points to the existence of a small gene family encoding napin variants. Napin has previously been classified among a vast, distantly related seed protein superfamily containing such diverse members as secalin from rye, hordein from barley, the  The open regions correspond to sequences with more than 28% identical amino acids and the solid regions to sequences with less homology. Napin was compared to (a) the castor bean storage protein (28) and (b) a trypsin inhibitor from barley (30). Imperfect repeats in napin and the castor bean storage protein are indicated by dashed arrows, and a short perfect repeat in napin is indicated by solid arrows. For the comparisons, we used the Align program described in Ref. 32 with the Mutation Data Matrix+2 and a break penalty of 6. so-called CM-proteins, a-amylase inhibitors, and trypsin inhibitors from different species (27). Considering the molecular weight and overall primary structure of napin, the most obvious kinship seems to be that of napin and the castor bean storage protein or that of napin and some seed protease inhibitor proteins.

Castor bean
Napin and the castor bean storage protein are both small, basic, glutamine-rich proteins made up of two subunits linked together by disulfide bonds (28). They show no homology to the 7 S, 11 S, and 12 S storage globulins so far sequenced from Brassica and other dicots. Furthermore, by analogy with napin, the mature storage protein of the castor bean, M , 10,900, is derived from a M , 34,000 precursor form (29). A trypsin inhibitor from barley, M, 13,000 (30), together with the castor bean storage protein, gave high alignment scores when compared to napin in a computer analysis. The results of these comparisons are shown in a schematic drawing (Fig.  7 ) . An Arg-Leu sequence in position 33 and 34 is the assigned reactive site of the trypsin inhibitor, the position of which is indicated in Fig. 7 . The castor bean storage protein has Arg-Arg in the same positions, 33 and 34, and it has been specu-lated whether the protein might in fact retain some protease inhibitory function (31). Napin, on the other hand, seems to lack such a site in the aligned sequence portion.

Isolation of rweseed mRNA and cellfree translation
Total RNA was Isolated from developing B.napur seeds I a dihaplaid line of svensk Kazat, kindly mpplied by Dr. Lena Bengfsson, SvalBv AB, Sweden) 35 days after pollination (5,101. Polyadenylated mRNA was enriched by two passages through an Oligo-dT cellulose column. A rabbit reticulocyte lysate System, with or without dog pancreas microsomes 111.12) was used for cell-free translations. The translation ProdUCte were imunoprecipitated With a rabbit antiserum raised against purified napin and were subsequently analysed by SDSpolyacrylamide gel electrophoresis and autoradiography (131. The specificity of the antiserum for napin was Verified by imunodiffusion and western blotting analyses against purified napin as well as against a crude rapeseed protein extract. The antiserum was found to react exclusively with the two napin chain..

Construction Of a =DNA librarv and screeninq of clones
Double stranded EDNA was synthesize from 3 0 1 1 9 rapeseed embryo ~R N A (14). =DNA (100 "9) was tailed With 'H-dCTP in a O.1M potassium-cacodylate buffer, pH 7.0, containing 2mM m C 1 2 0.2 mM dithiothreitol, 1.25 mglml bovine serum albumin. The reaction was carried out in a volume Of 100 u1 at 37O for 12 minutes which yielded the addition Of homopolymer tails With an average length of 30 nucleotides per end. The tailed cDNA was size fractioned by passage through I annealed to Pst I cleaved. oligo-dGTP tailed, pUc9 (15). The hybrid plasmids Sephadex G-100 column. Tailed =DNA molecules longer than 250 base pairs were were used to transform E.co1i JM83. Transformation was done by the calcium Chloride procedure 1161.
Recombinant plasmids were prepared from small Cultures Of individual clones (17). Screening for relevant clones was done by hybrid selection.
Briefly, plasmid DNA (approximately 2 ugJ from five clones was mixed, sheared to an average length size of 0.2-0.8 kilobases by sonication and imobilized, after mRNA were performed as described by Ricciardi et al., (18). The eluted mRNA alkali-denaturation, Onto nitrocellulose filters. Hybridization and elution of was subjected to cell-free translation, the produets imunoprecipitated, separated and visualized as described above.
Nucleic acid sequence determination ReStziction fragments were labelled at their 5' termini u~ingtt-~~P-dATPI and TI polynucleotide kinase. Purification Of labelled fragments and nucleic acid sequence determination were performed in accordance with Maxam and Gilbert (19). Parts of the =DNA were also sequenced by use of the M13 dideoxy method (201.

Characterization of napin
Napin was isolated from B.napus reeds as described in Materiels and Methods. SDS-polyacrylamide gel electrophoresis Of the napin preparation revealed a single component with an apparent molecular weight of 13,000. Reduction and alkylation of the napin yielded two components with apparent molecular weights of 9,500 and 8.500 (data not shownl. TO obtain a more reliable estimate of the sizes Of the two polypeptides, a sample of napin was reduced and alkylated and then subjected to chromatography, along with molecular weight standards on an denaturating conditions lFig.11. The molecular weights Of the heavy and light UltroPac TSK-G 2000 SW (LKB, Broma, Sweden) chromatography Column under chains were found to be 11.500 and 4,500, respectively.
Preparative separation Of the individual napin chains was obtained by gel chromatography in 6M guanidine-HC1 after reduction and alkylation. Amino acid compositione of the individual chains were determined and were found to be in good agreement with earlier published data (ref .1 and table 1).
In order to Obtain protein Sequence information on the mature napin chains, amino acid sequencing was Performed on the reparated subunit polypeptides both polypeptide chains contained methionines; one residue in the light chain (Table 21. More information wan gained by taking advantage of the fact that and two in the heavy chain. Thus,the two constituent chains were separately fragmented by cyanogen bromide cleavage, and the resulting peptides were separated by gel chromatography chain and three [Hl,H2 and H31 from the heavy. The combined amino acid ( Fig. 21. As expected two peptides lL1 and L2J were obtained from the light Compositions of the peptides were in good agreement with those of the intact chains both in case Of the lignt chain and Of the heavy chain (Table I). ~2 and HI lacked hamoserine and can accordingly by assumed to Constitute the COOH-terminal portions of the light and heavy chain, respectively. terminal sequence determination Of the cyanogen bromide fragments establi% the order Of the fragments to be L1-L2 a d Hl-H2-H3 in the two chains. Sequence information was obtained for 28 positions Of the light chain and for 6 5 positions of the heavy chain (Table I1 and  111).
Several independent lines of evidence indicate that napin possesses minor heterogensites in sequence. First. although most amino acids exhibited close to integral values in the amino acid analyses, a few exceptions were found. Analyses Of several preparations of napin chains consistently gave "onintegral values for arginine and alanine in the light chain preparations and amino-terminal sequence determination of the napin chains Suggested that napi; far isoleucine, glycine and lysine in the heavy Chain samples. Furthermore found, although each in lower yield than expected for a single residue. An is heterogenous in sequence. In several positions more than one residue was preparation. A fraction of the material had been Shortened by two amino acid additional S D Y Z C~ Of heterogeneity was identified in the heavy chain cycles Out of phase (Table I1 and 1111. Finally the comparison of three residues giving rise to a minor sequence identical to the major. but two napin-encoding =DNAs (see below) is also confirmt&y of the notion that napin is slightly heterogenous.
In vitro biosynthesis Of napin mRNA Was isolated from developing rape Seeds and assayed for its abilit to methionine labelled polypeptides formed were separated by SDS-direct protein synthesis in Vitro in a reticulocyte lysate system. The r5s-One Of these polypeptide* Could be immunOprecipitated using an antiserum polyacrylamide gel electrophoresis. More than a dozen components were evident.
when synthesis in vitro was performed in the presence Of microsomal veaicles agalnrt napin (Fig.3. lane 8 explanation that products related to only one Of the mature polypeptides were Isolation and resuencing Of a =DNA clone enaodina L napin.
A =DNA library consisting Of 2,000 cloneb was constructed from rapeseed mRNA. The =DNA molecules were sizefzactioned before annealing to the Vector to reduce the frequency of clones with inserts shorter than 2 5 0 base pairs. Fifty Protein biosynthesis in Vitro and inmunoprecipitation. Three clones were found randomly choose" clones were screened by hybrid selection Of mRNA followed by that hybridized Specifically to napin M A .
One of these clones, pNAP1, containing the longest insert (118 base pairs1 was chosen for detailed restriction enzyme mapping and nucleotide sequence determination [ Fig. 4 1 . A single open reading frame consisting Of 534 base pairs was found. (Fig. 5).
with the sequences of the napin cyanogen bromide fragments revealed that the Comparison of the amino acid sequence decuded from the nucleotide sequence acids t71 to 1 5 6 Of the PT~CUTSOT amino acid sequence.
light chain corresponds to amino acids +23 to 51 and the heavy chain to amino