Molecular Cloning and Gibberellin-induced Expression of Multiple Cysteine Proteinases of Rice Seeds (Oryzains)”

We screened a cDNA library of germinating rice seeds with a cDNA for aleurain (cysteine proteinase from barley) and obtained three distinct types of cDNA clones encoding three species of cysteine proteinases (oryzains a, 8, and y). The deduced amino acid se- quences are distinct in part, but, on the whole, are similar to one another. The three sequences all contain the catalytic triad C y ~ ~ ~ - H i s ’ ~ ~ - A s n ’ ~ ~ (papain number-ing). The three oryzains are similar to one another and also to other known cysteine proteinases such as papain and cathepsin H with respect to the sequences around the active site residues and the COOH-terminal Trp-rich region. Amino acid sequence comparison revealed that oryzains a and 8 are similar not only to each other (70% similarity) but also to actinidin and papain (about SO%), whereas oryzain y was rather similar to aleurain (85%) and cathepsin H (60%). Northern blot analysis revealed that the mRNAs for the three oryzains are expressed only in seeds, not in shoots or roots, and show different expression profiles during germination and when the seeds are treated with gibberellic acid. Oryzains a and y are expressed continuously during germination with a maximum expression 5 days from the start of germination, but are present in neither ripening nor ripened seeds.

We screened a cDNA library of germinating rice seeds with a cDNA for aleurain (cysteine proteinase from barley) and obtained three distinct types of cDNA clones encoding three species of cysteine proteinases (oryzains a, 8, and y). The deduced amino acid sequences are distinct in part, but, on the whole, are similar to one another. The three sequences all contain the catalytic triad C y~~~-H i s '~~-A s n '~~ (papain numbering). The three oryzains are similar to one another and also to other known cysteine proteinases such as papain and cathepsin H with respect to the sequences around the active site residues and the COOH-terminal Trprich region. Amino acid sequence comparison revealed that oryzains a and 8 are similar not only to each other (70% similarity) but also to actinidin and papain (about SO%), whereas oryzain y was rather similar to aleurain (85%) and cathepsin H (60%).
Northern blot analysis revealed that the mRNAs for the three oryzains are expressed only in seeds, not in shoots or roots, and show different expression profiles during germination and when the seeds are treated with gibberellic acid. Oryzains a and y are expressed continuously during germination with a maximum expression 5 days from the start of germination, but are present in neither ripening nor ripened seeds. On the other hand, oryzain is expressed not only during germination, but also in ripened seeds before germination. It was noted that the expression of the three oryzain mRNAs is enhanced in different manners by gibberellic acid but is not enhanced by other plant hormones such as auxin. The induction of oryzain 8 mRNA is transient, reaching a maximum in 4 h from the addition of gibberellic acid and diminishing rapidly thereafter, while the induction of oryzain a and y mRNAs continues over 5 days. Thus, multiple systems involving cysteine proteinases must be differentially involved in the germination process, probably under hormonal control.
Cysteine proteinases (EC 3.4.22.) are widely distributed in animals (1)(2)(3)(4)(5)(6), plants (7-12) and microorganisms (13) where they are involved in many intracellular and extracellular processes of physiological importance. The activities of cysteine proteinases are controlled by protein inhibitors called cystatins. A number of cysteine proteinases and cystatins are * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s)

090406-090408.
already well characterized in animals (14), and refined enzymatic analyses have been carried out on multiple systems involving cysteine proteinases such as cathepsin H, L, and B, and cystatins such as stefin A, B, and cystatin 8. In plants, however, aleurain which occurs in the aleurone of barley (12) is the only seed cysteine proteinase that has been well defined at the molecular level.
We have found a cysteine proteinase in germinating rice seeds whose expression reaches a maximum 5 to 8 days after the start of germination (15) and is also enhanced by gibberellic acid (GA,)' (16). As for plant cystatins, we have found two types of proteinaceous cysteine proteinase inhibitors in rice seeds. These are oryzacystatin I (17)(18)(19) and oryzacystatin I1 (20), which are expressed during the ripening stage and stored in mature seeds. Oryzacystatin I and I1 exhibit potent inhibitory activities toward papain and cathepsin H, respectively. It is possible that the activities of the cysteine proteinases in rice seeds are inhibited by either or both oryzacystatins. To understand the physiology of cysteine proteinases and cystatins in plant seeds, their identification and characterization at the molecular level is of primary importance. We have also found that a major storage protein in rice seeds (glutelin) is efficiently degraded in vitro by a cysteine proteinase existing in germinating seeds (15). However, refined molecular analyses on this cysteine proteinase and other proteinases have not been performed.
Here we report on the isolation and characterization of cDNA clones for rice seed cysteine proteinases (oryzains) to define their molecular nature. We also report that the synthesis of mRNAs encoding oryzains is induced during germination by GA,.

EXPERIMENTAL PROCEDURES
Materials-DNA polymerase I from Escherichia coli, its Klenow fragment, bacterial alkaline phosphatase, T4 polynucleotide kinase, and T4 DNA ligase were purchased from Takara Shuzo Co. Restriction enzymes were the products of Takara Shuzo Co. and Toyobo Co. The oligo(dT)12 packed column was a product of Amersham. A multiprime DNA labeling kit ("Rapid Hybridization System-Multiprime"), [Y-~'P]ATP (3000 Ci/mmol), and [cY-~'P]~CTP (3000 Ci/ mmol) were purchased from Amersham. The X g t l O phage vector arms and X phage packaging kit ("Giga pack gold) were products of Stratagene Cloning Systems. The cDNA synthesis kit was purchased from Pharmacia LKB Biotechnology Inc.
Rice Seeds-Cultivar Nipponbare of the rice Oryza sativa L. japonica was used. The seeds were harvested on days 3, 5, and 8 after germination and stored at -80 "C until used.
Determination of Partial Amino Acid Sequences-A cysteine proteinase was purified from rice seeds as described previously (15) and digested with lysylendopeptidase (Wako) (21). Peptides were fractionated on a Vydac ODS column (Senshu Kagaku). Partial amino acid sequences were analyzed with a protein sequenator (Model 470A, ' The abbreviations used are: GA,, gibberellic acid; SDS, sodium dodecyl sulfate.

Multiple Cysteine Proteinases of Rice Seeds
Applied Biosystems) to give three sequences, Asp-Glu-Arg-Xaa-Asp-Val-Asn-Arg-Lys, Xaa-Gly-Xaa-Ile-Asp-Thr-Glu-Xaa-Asp-Tyr-Pro-Tyr-Lys, and Leu-Pro-Glu-Xaa-Asp-Trp-Arg-Xaa-Lys. We also analyzed the NH2-terminal sequence of the purified protein and obtained the sequence Leu-Pro-Glu-Xaa-Asp-Trp-Arg-Xaa-Lys, which corresponds to the third sequence described above. Construction of cDNA Library and Isolation of cDNA Clones for Oryzains-Total RNA was extracted by the phenol-SDS method (22) from rice seeds harvested on days 3, 5 , and 8 after germination. Poly(A)+ RNA was purified by oligo(dT)-cellulose column chromatography (23). Double-stranded cDNA was synthesized using a cDNA synthesis kit (Amersham) (24) and used for construction of a X g t l O cDNA library. Recombinant phages from the cDNA library described above were propagated on E. coli C600Hfl-lawn. Plaques were transferred onto Nylon filters (HybondTM-N, Amersham), prehybridized for 16 h at 65 "C as described previously (251, and hybridized for 24 h at 65 "C with an aleurain cDNA fragment (12) labeled by the multiprime system (Amersham) as a probe. The filters were finally washed in 2 X SSC containing 0.1% SDS at 65 "C (26).
Nucleotide Sequencing-cDNA inserts excised by EcoRI digestion of recombinant phage DNA were subcloned into plasmid vector pUCI8 (27) and sequenced by a modified dideoxy method (28).
RNA Blot Hybridization-Total RNA was extracted from rice seeds at various stages of germination as described above. The RNA sample (10-15 fig in total RNA) was denatured and electrophoresed in a formaldehyde-containing agarose gel (26). After electrophoresis, the RNA was transferred onto a Nylon membrane (HybondTM-N, Amersham) and hybridized with "P-labeled cDNAs at 65 "C in rapid hybridization solution (Amersham). The filter was finally washed in 0.1 X SSC containing 0.1% SDS at 65 ' C .

Isolation and Identification of cDNA Clones for Cysteine
Proteinases-Among 1.5 X 10' independent plaques from the X g t l O cDNA library, approximately 20 clones were hybridized with aleurain cDNA. These clones were divided into three types based on restriction mapping and cross-hybridization analyses (29). Three clones, XOZA022, XOZBlO2, and XOZC511, containing the longest inserts of these three types, were selected and sequenced by the strategy shown in Fig. 1. Since the three clones encoded different species of putative cysteine proteinase (to be described in detail later), the encoded proteins were termed oryzain a, j3, and y in correspondence with clones XOZA022, XOZB102, and XOZC511, respectively.
The nucleotide sequence of the XOZA022 cDNA insert encoding oryzain a contains a single, long open reading frame of 1374 nucleotides (458 amino acid residues) preceded by an in-frame termination codon (position -12 to -10 in Fig. 2A) and followed by a poly(A) tail. The deduced amino acid sequence contains all the partial amino acid sequences already found in peptides derived from the purified cysteine protein- HincII; K, KpnI; P, PstI; Sc, SacI; and S1, SalI. Arrows indicate the length and direction of sequencing. The sequences of the proteincoding regions are boned. ase by partial digestion (see "Experimental Procedures"), indicating that oryzain a is the cysteine proteinase previously identified by protein chemistry (15). The nucleotide sequence of XOZBlO2, encoding oryzain p, contains a single open reading frame of 1413 nucleotides (471 amino acids) (Fig. 2B). That of XOZC511, encoding oryzain y, contains a single open reading frame of 1086 nucleotides (362 amino acids) (Fig. 2C).
The initiation methionine codons for oryzains p and y were assigned for the following reasons: 1) the nucleotide sequences around the putative initiation codons are consistent with those ( A A A a G C T ) proposed for plant mRNAs (30,31); 2) hydropathy profiles (Fig. 3) and sequence similarity strongly suggest the existence of signal and pro-sequences which follow the putative initiation codons; and 3) the lengths of the mRNAs (Fig. 5) approximate the total length of the cDNA inserts.
Overall Structure and Sequence Homology of the Three Oryzains-The three oryzains are thought to be cleaved by post-translational modification. The signal sequences were estimated based on the general characteristic that a signal sequence contains a charged residue within the first five amino acids followed by a core of at least 9 hydrophobic residues (32). By this general rule, as well as from the hydropathy profiles (Fig. 3), we assigned possible signal sequences at Metl-Ala", Met'-Ala", and Met'-Ala24 for oryzains a, j3, and y, respectively.
The amino acid sequences of oryzains a, B, and y were aligned for maximal homology (Fig. 4). The NH2 terminus of mature oryzain a was assigned as Leu'" because the NH2-terminal sequence analysis revealed the sequence Leu-Pro-Glu-Xaa-Val-Asp-Trp (see "Experimental Procedures"), which corresponds exactly to the deduced sequence of residues 129 to 135. The NH2 termini of oryzains / 3 (Led4') and y (Leu'45) were assigned from sequence similarity to the sequence around Leu129 of oryzain a. Thus, the prepropeptides of oryzains a, j3, and y are predicted to be the sequences from Met' to from Met' to Glu13', and from Met' to Ala144, respectively. With respect to the homology between the prosequences, oryzains a and j3 show 53% similarity. However, the similarity between the prosequences of oryzains and other cysteine proteinase precursors is relatively low (20-42%). Oryzains a and /? contain sequences similar to the COOHterminal extension peptide sequence found in actinidin (11) and, therefore, are thought to have had telo-sequences; oryzain y per se does not contain a telo-sequence. From sequence similarity to actinidin, the COOH termini of mature oryzains a and j3 should be G~u~~~ and Ala360, respectively. The molecular weight of the mature form of oryzain a is thus calculated to be 23,782, which agrees well with the value of 23,500 estimated from SDS-gel electrophoresis (15).
The sequences around the catalytic triads are also similar to those in other known cysteine proteinases (data not shown). In the COOH-terminal regions of the mature forms, the three sequences contain Trp residues (for example, residues 305, 311, and 315 in oryzain a ) . This feature is common to other cysteine proteinases which contain 2 to 5 Trp residues in the corresponding region (33). Therefore, the mature oryzains would be expected to have proteolytic activity. In actuality,  we have found oryzain a to have potent proteolytic activity against casein and glutelin, a major storage protein of rice seeds (15). Potential N-glycosylation sites exist at in oryzain a, at A d 4 ' and in oryzain / 3, and at Asn'" and AsnZm in oryzain y. In the case of oryzain y, one of the two potential glycosylation sites is of the form Asn-Ile-Thr, as is the case for cathepsin H (3) and aleurain (12).

Multiple
The sequences of the mature forms of oryzains a, /3, and y are comparable to one another and to other known cysteine proteinases (Table I). The sequence similarity is significant for all comparisons and ranges from 32% to 85%. High similarity is observed in comparing oryzain y and aleurain, suggesting that oryzain y is a rice counterpart of barley aleurain. Relatively high sequence similarity is also observed between oryzains a and /3, between oryzain y and aleurain, between aleurain and cathepsin H, between oryzain y and cathepsin H, etc. From these comparative studies, a putative pedigree for cysteine proteinases can be drawn as outlined in Fig. I. This indicates that the seeds contain at least three cysteine proteinases extending over two subfamilies. Thus, in rice as a species of plant, multiple cysteine proteinases exist and

Multiple Cysteine Proteinases
of Rice Seeds probably have important functions. It is also suggested that divergence and evolution of cysteine proteinases occurred a t an early stage in the history of evolution. Expression of Oryzain mRNAs during the Germination of Rice Seeds-Each cDNA insert was used as a hybridization probe to evaluate the expression of oryzain mRNAs. Although the three cDNA sequences are similar, they did not crosshybridize under high stringency conditions (data not shown). As shown in Fig. 5, single bands were detected at approximately 2.0, 2.0, and 1.5 kilobases for oryzain a, p, and y mRNAs, respectively, a t all stages of germination. However, the three mRNAs were not detected in shoots or roots (Fig.   5, lanes 9 and 10 for oryzain p; data not shown for oryzains a and y). The mRNA for oryzain a was observed to peak at 5 days after germination and was not detected in mature seeds. This pattern is consistent with the time course profile observed for oryzain a activity (15), indicating that the activity of oryzain a may be controlled at the level of transcription. On the other hand, the mRNA for oryzain is detected in mature seeds (Fig. 5, lane 5) and reaches a maximal level 3 days after germination. The profile for oryzain y mRNA is similar to that of oryzain a; the mRNA reaches a maximal level 5 days after germination. These mRNAs are not expressed in the endosperm but in the aleurone or germ (data not shown).
Next, we examined whether the expression of oryzains is induced by GAB, because previous studies on oryzain a (16) and aleurain (12) showed they are GAB-inducible. The mRNAs for oryzains a and y were induced 24 h after GA3 treatment   Fig. 6, I, lunes 1 and 1 1 ) and was constantly expressed for 5 days (lanes 1-4 and 11-14). On the other hand, expression of oryzain / 3 mRNA was strongly induced 4 h after the addition of GAB (Fig. 6, I, lane 6 ) and then returned to a nearly basal level in 24 h (lane 9). As shown in Fig. 6, 11, other plant hormones had no significant effect on expression.

DISCUSSION
In the present study, we isolated three types of cDNA clones encoding cysteine proteinases, one of which, oryzain a, corresponds to the protein which we had previously purified from germinating rice seeds (15). The amino acid sequences of the three oryzains share the general characteristics common to cysteine proteinases of animal and plant origin. In addition, they have signal and pro-sequences suggesting that they are secretory enzymes and may exist as precursors. Our preliminary observation' is that the mRNAs for oryzains exist in the aleurone or germ, not in the endosperm, although oryzain a exists in germinating endosperm (15). Thus, rice seeds possibly have a transport system for oryzains.
Physiological roles for multiple oryzains are suggested by the following two interesting observations. First, the three mRNAs for oryzains a, p, and y all responded to the addition of GA3, although the induction of oryzain ,8 mRNA was transient (Fig. 6, I ) . Oryzains a and p, two closely related oryzains similar to actinidin and papain, are regulated differently by GA, (Fig. 6). Along with the observation that oryzain p is expressed in both ripening and germinating rice seeds, oryzain /3 may be involved in rapid, but transient protein degradation induced by GAB and/or in housekeeping events. Oryzain a may be involved in relatively chronic events during the germination process. Based on its pattern of expression (Fig. 5), oryzain y should be similar to oryzain a; however, since it lacks a telo-sequence (Fig. 4), differences in regulatory mechanisms of sorting and/or activation processes may reflect different physiological roles. Second, the action of oryzains a, p, and y should also be considered with respect to their regulation by cysteine proteinase inhibitors (cystatins). Two cysteine proteinase inhibitors, oryzacystatin I (19) and I1 (201, exist in rice seeds before germination. These two endogenous cystatins have highly similar sequences (20), but show different specificities; oryzacystatin I inhibits papain more effectively than cathepsin H, whereas oryzacystatin I1 inhibits cathepsin H more effectively than papain (20). Considering this result in relation to the fact that oryzains a and are highly similar to papain, it is possible that these two proteinases can be effectively inhibited by oryzacystatin I, whereas oryzain y, which is similar to cathepsin H, may be inhibited by oryzacystatin 11. Actually, we have observed that the hydrolytic activity of oryzain a for a major rice storage protein (glutelin) is stoichiometrically inhibited by oryzacystatin I in    3, 9, and 15), with 10-o M cytokinin (lanes 4, IO, and 16), with 10" M ethylene (lanes 5, 11, and 17) or with M auxin (lanes 6, 12, and 18) for 4 h (lanes 7-12) and 24 h (lanes 1-6 and 13-18). Probes used were the same as in Fig. 5.

~-
Another important observation concerning oryzacystatin I and I1 is that their mRNAs are synthesized during the seedripening stage and large amounts of the cystatins accumulate in seeds before germination but degrade rapidly after germi-  Table I. nation (20). Thus, oryzacystatins do not remain at the time when oryzains begin to appear. The existence of oryzacystatins before germination may be important in preventing the degradation of seed storage proteins by oryzains or by exogenous proteinases possibly brought by destructive insects. Since oryzacystatins disappear after germination, oryzains may catalyze the proteolysis of the storage proteins.
The physiological functions of oryzains will be disclosed by histochemical studies to define their compartmentalization in the seed cell. A similar histochemical study with oryzacystatins will also be necessary to elucidate the mechanism of oryzain-oryzacystatin interaction in vivo.