Gastric Procathepsin E and Progastricsin from Guinea Pig PURIFICATION, MOLECULAR CLONING OF cDNAs, AND CHARACTERIZATION OF ENZYMATIC PROPERTIES, WITH SPECIAL REFERENCE TO PROCATHEPSIN

Procathepsin E and progastricsin were purified from the gastric mucosa of the guinea pig. They were converted to the active form autocatalytically under acidic conditions. Each active form hydrolyzed protein substrates maximally at around pH 2.5. Pepstatin inhib- ited cathepsin E very strongly at an equimolar concentration, whereas the inhibition was much weaker for gastricsin. Molecular cloning of the respective cDNAs permitted us to deduce the complete amino acid se- quences of their pre-proforms; preprocathepsin E and preprogastricsin consisted of 391 and 394 residues, respectively. Procathepsin E has unique structural and enzymatic features among the aspartic proteinases. Lys at position 37, which is common to various aspartic protein- ases and is thought to be important for stabilizing the

Procathepsin E and progastricsin were purified from the gastric mucosa of the guinea pig. They were converted to the active form autocatalytically under acidic conditions. Each active form hydrolyzed protein substrates maximally at around pH 2.5. Pepstatin inhibited cathepsin E very strongly at an equimolar concentration, whereas the inhibition was much weaker for gastricsin. Molecular cloning of the respective cDNAs permitted us to deduce the complete amino acid sequences of their pre-proforms; preprocathepsin E and preprogastricsin consisted of 391 and 394 residues, respectively. Procathepsin E has unique structural and enzymatic features among the aspartic proteinases. Lys at position 37, which is common to various aspartic proteinases and is thought to be important for stabilizing the activation segment, was absent at the corresponding position, as in human procathepsin E. The rate of activation of procathepsin E to cathepsin E is maximal at around pH 4.0. It is very different from the pepsinogens and may be correlated with the absence of Lys3'.
Native procathepsin E is a dimer, consisting of two monomers covalently bound by a disulfide bridge between 2 Cys3'. Interconversion between the dimer and the monomer was reversible and regulated by low concentrations of a reducing reagent. Although the properties of the dimeric and monomeric cathepsins E are quite similar, a marked difference was found between them in terms of their stability in weakly alkaline solution: monomeric cathepsin E was unstable at weakly alkaline pH whereas the dimeric form was stable. The generation of the monomer was thought to be the process leading to inactivation, hence degradation of cathepsin E in vivo.
The aspartic proteinase family, each member of which has 2 essential aspartyl residues at the active site, includes pepsins (pepsin A, gastricsin, and chymosin), cathepsin E, cathepsin D, and renin in mammals (reviewed in Refs. [1][2][3]. All these enzymes are thought to have diverged from a common ancestor. Significant differences, however, have been observed in their characteristics such as hydrolytic specificity and susceptibility to inhibitors, and this is reflected in the significant variations in primary structure among members of these groups. Therefore, to understand structure-function relationships of aspartic proteinases in greater detail, it was thought to be useful to elucidate the primary structures and enzymatic properties of those aspartic proteinases that have unique characteristics. Cathepsin E represents an important example of such aspartic proteinases. To date, it has been known that cathepsin E is a nonsecretory, intracellular, but non-lysosomal proteinase. Cathepsin E has been isolated from various tissues, such as human (4)(5)(6)(7)(8)(9)(10) and rat (11) gastric mucosa, rabbit (12) and rat (13) spleen, human (14) and rat (15) erythrocyte membranes, and rat neutrophils (16). Although various designations were used previously for the enzyme, the name "cathepsin E" is used at present (16)(17)(18). Cathepsin E is a dimeric enzyme different from other aspartic proteinases. The enzyme has a molecular mass of about 80 kDa, consisting of two identical 40-kDa subunits (9)(10)(11)(12)(13)16). On the other hand, the other aspartic proteinases are single polypeptides of about 40 kDa (1)(2)(3). The enzymatic properties of cathepsin E have been shown to resemble those of pepsins; for example, it has hydrolytic activity at acidic pH, with an optimum at pH 2-3, (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17) and is sensitive to various pepsin inhibitors (7-9, 11, 13-16). Although the physiological role of cathepsin E is still unclear, it has been suggested to play an important role in intracellular processing of proteins and/or peptides (19,20), or in immune functions because of its distribution in lymphoid-associated tissue (9,21).
Structural studies of cathepsin E have not yet progressed to a level comparable with those of pepsinogens mainly because of the difficulty in obtaining a sufficient amount of the native enzyme. Recently, the primary structure of human cathepsin E was deduced from the molecular cloning and analysis of its cDNA (22), and the presence of a pro-peptide was demonstrated by isolation and NH2-terminal sequence analysis of human gastric procathepsin E and cathepsin E (23,24), indicating that autocatalytic activation of procathepsin E is involved in the generation of active cathepsin E (24).
The structural analysis also suggested that the dimeric form is produced by covalent association of two monomers through a disulfide bridge(s) (25,26). Therefore, to understand structure-function relationships of cathepsin E, it is important to clarify the differences in properties between the proenzyme and the active form and between the dimeric and monomeric forms. Further, it also seems useful to compare the primary structure of procathepsin E and some enzymatic properties of cathepsin E with those of other aspartic proteinases, especially the pepsinogens. Therefore, in the present study, guinea pig procathepsin E and progastricsin (type C pepsinogen) were chosen, since rodents are known to contain both proenzymes at high levels in gastric mucosa (11,21) to permit simultaneous purification.
Thus, we have carried out a series of studies including purification, molecular cloning of its precursor, and elucidation of enzymatic properties of cathepsin E. The results show that the primary structure and the process of activation of procathepsin E are markedly different from those of progastricsin and other aspartic proteinases. A notable difference in enzymatic properties was also found between the dimeric and the monomeric forms of cathepsin E.

EXPERIMENTAL PROCEDURES AND RESULTS'
Purification-The results of the purification are summarized in Table I. Procathepsin E and progastricsin were purified simultaneously from guinea pig gastric mucosa (Fig. 5). The level of procathepsin E in gastric mucosa was the highest among the animals examined so far. On the other hand, progastricsin was the predominant pepsinogen species in guinea pig gastric mucosa. Two progastricsin components were resolved by FPLC,' and they had quite similar amino acid compositions. The major component, which was eluted earlier on FPLC, was used for further characterization. Progastricsin became unstable during chromatography on the anion exchanger, in part as a result of its autocatalytic activation.
Each purified proenzyme gave a single protein band upon nondenaturing (Fig. 6) and denaturing (Fig. 3) PAGE. The molecular mass determined by SDS-PAGE under reducing conditions was about 43 kDa for each proenzyme. By contrast, the native procathepsin E was eluted at the position corresponding to a molecular mass of about 80 kDa on gel filtration and gave a band of protein with a similar molecular mass on SDS-PAGE under non-reducing conditions (Fig. 3). There-' Portions of this paper (including "Experimental Procedures," Figs. 5-13, and Tables 1-111) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.
* The abbreviations used are: FPLC, fast protein liquid chromatography; SDS, sodium dodecyl sulfate; PAGE, polyacrylamide gel electrophoresis; kb, kilobase(s). fore, procathepsin E was deduced to be a dimer. Procathepsin E is a glycoprotein, and the content of carbohydrate was estimated to be about 4% by weight. The amino acid compositions of procathepsin E and progastricsin were rather similar except for notable differences in the content of a few amino acids, such as Asp, Ser, Pro, Met, and Tyr ( Table 11). The NH2-terminal sequences of about 30 residues of the proenzymes were determined by Edman degradation (Figs. 1 and  8). Only a single residue was identified at each step for both proenzymes. Thus, procathepsin E appears to be composed of identical subunits. Although some common residues were observed, the NH2-terminal sequences of procathepsin E and and Asn3", are potential N-glycosylation sites.
progastricsin are significantly different from each other. Molecular Cloning of cDNAs and Structural Amlysis-Among 2,000 recombinant clones of X g t l O prepared from the gastric mucosa of adult guinea pigs, about 50 clones hybridized very strongly with the radiolabeled 45-base oligonucleotide probe. Five clones were chosen at random, and the inserted DNA fragments were subcloned into pUC18 plasmid and definitively identified by sequence analysis. Since the NH2terminal sequences of both procathepsin E and progastricsin had been determined at the protein level, identification of clones was rather easy. Thus, these five clones were shown to be those of the cDNA for progastricsin. The restriction map and the nucleotide sequence of a typical clone (pGP461) are shown in Figs. 7 and 8, respectively. Using the cDNA for progastricsin as a probe, we rescreened the 50 clones under high stringency conditions. Two clones that did not hybridize under these conditions with the progastricsin cDNA were isolated and found by sequence analysis to be those of procathepsin E. The restriction map and the nucleotide sequence of one of the clones of procathepsin E (pGP477) are shown in Figs. 7 and 1, respectively.
The deduced amino acid sequences of the two proenzymes consist of three regions, i.e. the pre-peptide (signal peptide), the pro-peptide (activation segment), and the active enzyme. The signal peptides are composed of 19 and 16 residues, the pro-peptides are composed of 32 and 49 residues, and the active enzymes were composed of 340 and 329 residues for procathepsin E and progastricsin, respectively. The amino acid residues that are conserved in other mammalian aspartic proteinases were also well conserved in both proenzymes (Fig.  2). However, procathepsin E has some unique structural features. The lysine at position 37 (numbering for pepsinogen A from monkey), which has been suggested to be important for the function of the activation segment (27, 28), is absent. Some deletions and insertions were noted in the pro-peptide and the NH2-terminal region of the cathepsin E moiety as compared with the sequences of other mammalian aspartic proteinases (Fig. 2). Asn-67 and Asn-311 were found to be the potential N-glycosylation sites (Fig. 1). The molecular masses of procathepsin E and progastricsin were calculated to be 40,086 and 41,150 Da, respectively, based on the amino acid compositions deduced from the cDNAs.
Interconversion of the Dimeric and Monomeric Forms of Procathepsin E-The conversion of the dimeric procathepsin E to the monomeric form occurred in the presence of a low concentration of a reducing reagent. A typical result is shown in Fig. 9. The dimer was converted to the monomer to the extent of 30-50% by incubation of the former with 1 mM 2mercaptoethanol, L-cysteine, or reduced glutathione at 37 "C for 20 min. The conversion was complete with any one of these reagents at 10 mM under the same conditions (Fig. 9A). The conversion was reversible, since the dimeric form was regenerated after removing the reducing reagent (Fig. 9B).
Proteolytic activity was not affected by interconversion. When the monomer was carboxymethylated, carboxymethyl-Cys was determined to be 0.88 mol/mol of monomeric procathepsin E. This partially modified monomeric procathepsin E retained complete proteolytic activity. Carboxymethyl-Cys was identified at position 4 from the NH2 terminus of cathepsin E (position 37 of procathepsin E) by Edman degradation. The result thus provided direct evidence that the dimeric form is generated by formation of a disulfide bridge at between the two monomers.
Activation Profile-The profile of activation of the proenzyme was analyzed by SDS-PAGE (Fig. 3). Activation of procathepsin E proceeded autocatalytically under acidic con-ditions; the rate of activation was maximal at pH 4.0 and decreased gradually as the pH was lowered to 2.0 (Fig. 3A). In addition, appreciable activation occurred both at pH 5.0 and 6.0 upon prolonged incubation. The rate of activation did not change when monomeric procathepsin E was activated under the same conditions. Procathepsin E appeared to be directly converted to cathepsin E, since the intermediate form(s) was generated at a very low level (Fig. 3, B and C). The bands of procathepsin E and cathepsin E were detected at positions of 82 and 76 kDa, respectively, after SDS-PAGE under non-reducing conditions, whereas the pro-and active forms gave a band of 43 kDa and a band of 39 kDa, respectively, after SDS-PAGE under reducing conditions. Therefore, the dimeric form was maintained throughout the activation. Isolation and structural analysis of the active form revealed that the site of cleavage upon activation was the L e~~' -A s n~~ bond. Thus, the NH, terminus of cathepsin E is located 4 residues before C Y S~~. The cleavage site was the same when monomeric procathepsin E was activated under the same conditions. In addition, the profile of activation of guinea pig progastricsin was also examined (data not shown). The process was largely similar to that observed for other progastricsins (29), and the major cleavage site to generate gastricsin was the Phe4'-Ser5' bond.
Enzymatic Properties of the Active Forms-Cathepsin E and gastricsin are optimally active at around pH 2.5 toward hemoglobin as a substrate (Fig. 10). Cathepsin E has higher specific activity than gastricsin and porcine pepsin A. Both enzymes are inhibited by pepstatin, a specific inhibitor of aspartic proteinases (Fig. 11). Susceptibility of cathepsin E to pepstatin was the same as that of porcine pepsin A, the inhibition profile indicating the strong equimolar binding of pepstatin to the active site. The susceptibility of gastricsin was about 100 times lower than that of cathepsin E and porcine pepsin A. Low susceptibility has commonly been observed with gastricsins of other animals (30, 31).
Cathepsin E is easily converted to monomers in the presence of a low concentration of a reducing reagent, as was procathepsin E as described in the preceding section. Therefore, the difference in enzymatic properties between the dimeric and the monomeric forms of cathepsin E was investigated. Although the hydrolytic activity against hemoglobin at pH 2.0 was the same for both forms, a slight increase in activity was observed with the monomer at pH 5.0 as compared to the dimer. Such an increase was not observed, however, when the enzyme was assayed with other protein substrates (Table 111). By contrast, a striking difference between the dimeric and monomeric forms was found in terms of stability at weakly alkaline pH (Fig. 4). While the dimer was stable at weakly alkaline pH, the monomer lost its activity very rapidly above pH 7. On the other hand, gastricsin was very unstable at alkaline pH as reported for gastricsins of other animal sources (data not shown).
Expression of the Genes in Various Tissues-Expression of the genes for procathepsin E and progastricsin was examined in various tissues from adult guinea pigs by Northern analysis (Fig. 12). The mRNAs for both enzymes were expressed at a high level in gastric mucosa only. In addition, procathepsin E mRNA was found at a low level in spleen. The predominant species of mRNAs of procathepsin E and progastricsin had the same size of around 1.9 kb. The size is very similar to those of pepsinogen mRNAs of other mammals (32-341, but is different from that of human procathepsin E mRNA which has been shown to range from 2.2 to 3.6 kb (22).

~-K W L G L L G L ---V A l S E ----C L V T I -P~M K V K~M R E N L R E N D l L L D Y L E K H P Y R P T Y K L -L S
. .

-K T F~l~l l -~l~~~. l G O A P --~A~~R~. P . ! S R R E~l R~X l~l A O G~1 T E L W K S O~~~D~
H u m a n E

DISCUSSION
Procathepsin E and progastricsin were purified from the gastric mucosa of guinea pigs. The level of procathepsin E was 4-10 times higher than that in human gastric mucosa (8,10) and was the highest among those reported to date for various animal tissues. The reason for this high level is not clear, but it seems that the gastric mucosa of the guinea pig may serve as a good source of procathepsin E for future studies at the protein level. Progastricsin was found to be the major pepsinogen component rather than pepsinogen A. This result is consistent with the results obtained with rat stomach (32,  35).
The structures and some enzymatic properties of procathepsin E and progastricsin were determined and compared between the two proenzymes and also with those of other   which is present in other mammalian aspartic proteinase zymogens, was not found in guinea pig nor in human procathepsin E. The positive charge of the lysine residue has been shown to provide electrostatic stabilization via hydrogen bonding to one of the net negative charges of the two aspartic acids at the active site (27, 28). Therefore, the lysine residue has been suggested to be essential for maintaining the proenzyme in an inactive form, thereby playing an important role in the activation of these aspartic proteinases. The activation of procathepsin E proceeded most rapidly at pH 4.0, and appreciable activation occurred at even higher pH. This phenomenon was markedly different from the pepsinogens which are activated most rapidly at pH 2.0 and below (36). The maximal activation a t weakly acidic pH may be correlated with the absence of L y P in procathepsin E, since electrostatic stabilization is thought to be weak in procathepsin E. Since procathepsin E is a non-secretory intracellular proteinase and since its activation would occur a t physiological pH, the maximum rate of activation a t weakly acidic pH seems to be well adapted to the physiology of the proenzyme. LysBi was conserved in guinea pig progastricsin. When the sequences of the connecting region of the pro-peptide and the cathepsin E moiety of guinea pig and human procathepsin E were compared with those of other aspartic proteinases, deletions of several residues around Cys3' appears to be significant (Fig. 2). The cleavage sites associated with activation in other aspartic proteinases, in particular in pepsinogens A and progastricsins, are located in this area (29) and indicate a high degree of conformational lability (28). Therefore, if the deleted positions of procathepsin E were actually occupied by amino acids, cleavage might occur at these sites after C y P , resulting in the generation of monomeric cathepsin E after activation. Therefore, the deletions may be essential for the cleavage before Cys3' and, thus, for maintaining the dimeric form via a disulfide bond during activation. In progastricsin, the activation segment is composed of 49 residues, the longest among known sequences of mammalian aspartic proteinases. The role of this extended segment, however, remains to be clarified, since the cleavage site for activation is the same as that of rat progastricsin, which has a shorter segment of 46 residues.
With respect to the structure of the cathepsin E moiety, the common residues among other aspartic proteinases, including those around the 2 aspartic acid residues of the active site, are well conserved (Fig. 2). One notable point is the lower level of basic residues as compared with cathepsin D, the other intracellular aspartic proteinase (Table 11). The level is comparable to that in pepsins and gastricsins and this similarity may be correlated with the optimal activity at lower The structure of guinea pig procathepsin E is most similar to that of human procathepsin E with 86 and 84% identity at the nucleotide and the amino acid levels, respectively (Fig. 2). The identity with other aspartic proteinases is less than 60%. The evolutionary relationships among various gastric aspartic proteinase zymogens, including pepsinogens A, prochymosins, and progastricsins, have been deduced (33,37,38). However, the relationship between procathepsin E and these gastric aspartic proteinases and other non-pepsin-type aspartic proteinases, such as cathepsin D and renin, have not been elucidated. Therefore, we constructed a phylogenic tree to examine the relationships among various aspartic proteinases including procathepsin E (Fig. 13). The tree shows clearly that procathepsin E is closer to pepsinogens than are procathepsin D and prorenin.
The generation of a dimer is characteristic of (pro)cathepsin E. The present results provide direct evidence that a disulfide bridge involving Cys3' between the two monomers is responsible for generating the dimer. The interconversion between the dimer and the monomer is reversible: the dimer is easily converted to the monomer in the presence of a low concentration of a reducing agent ( Refs. 25 and 26, Fig. 9A), and the monomer is forced to regenerate the dimer in the absence of a reducing agent (Fig. 9B). Such high susceptibility to a reducing agent is thought to be due to the tertiary structure of procathepsin E, in which the region around Cys3' is presumed to be on the surface of the protein as expected from the tertiary structures of other aspartic proteinases (27, 28). Therefore, it may be reasonable to consider that a reducing agent, such as glutathione, may regulate the interconversion between the two forms in vivo. Indeed, the occurrence of the monomeric form of procathepsin E has been detected in human gastric muco~a.~ The interconversion between the two forms may have little significance in the case of procathepsin E, since no difference in properties was found between the two forms. On the other hand, the conversion of the dimer to the monomer seems critical in the case of cathepsin E. Mon-pH.
omeric cathepsin E is more unstable than dimeric cathepsin E at weakly alkaline pH (Fig. 4). This characteristic is very similar to that of pepsin, although pepsin is inactivated more rapidly at weakly alkaline pH (39). Considering that a drastic conformational change is involved in the process of alkali denaturation of pepsin (40), monomeric cathepsin E may be more susceptible to a conformational change at weakly alkaline pH than is dimeric cathepsin E. Therefore, the dimeric form is thought to be essential for stabilizing cathepsin E.
Thus, the generation of the monomeric form may be important in the degradation of cathepsin E in vivo. On the other hand, cathepsin E, as well as procathepsin E, is rather unstable at weakly acidic pH in both dimeric and monomeric forms, presumably due to autodigestion, since the enzyme has appreciable proteolytic activity under weakly acidic conditions (41) ( Table 111).
The tissue distribution of procathepsin E is rather limited. Northern analysis showed that the level of expression of the gene for procathepsin E is high in gastric mucosa while the mRNA for procathepsin E is just detectable in spleen. This distribution suggests that the enzyme has a role that is correlated with gastric physiology. The proenzyme has been shown to be localized in surface epithelial cells of human (9) and rat (42) stomach. It was suggested that cathepsin E could play a role in gastric mucosal injury (9). Furthermore, preferential expression of procathepsin E in fetal gastric mucosa  , R. A., Richards, A. D., Kay, J., Dum, B. M., Wyckoff, J. B., Samloff,   19. Lees, W. E., Kalinka, S., Meech, J., Capper, S. J., Cook, N. D., and Kay, J.   20. Sakamoto, W., Yoshikawa, K., Yokoyama, A,, Tables I and 111. protein substrates including heaoglobln, by the same procedure as described TO examine o f substrate specificity. the enzyme was essayed w i t h vaz'ious above except that ;he concentration of substrate was 1% and that a fluorometric assay ( 5 3 ) was used f~ quantirate trichloroacetic acid-soluble peptides.

1\11 procedures except for FPLC were performed at O-4OC.
Chromatography and gel filtration were carried Out in 0.01 M bodium phosphate buffer, pH 7 . 0 .
Step 1 . preparation of the nucosal Extract -Gastric mucosa (total weight. guinea pigs and homogenized in a Waxing blender with 40 m l of the buffer. The assay mixture Of monomeric cathepsin E contained 0.5 mM 2-mercaptoethanol. Each reaction was stopped by the addition Of 400 p l Of 5% trichlDroaCetic acid. After centrifugation, an aliquot Of the supernatant "1)s subjected to a fluorometric assay with fluorescarnine ( 5 3 ) to determine the amount of trichlo-rOBCetiC acid-Soluble peptides. Activity is expressed relative to rhe activity Of the native dimeric form of cathepsin E against bovine hemoglobin. vhicP was taken 8s-00% and corresponded to the release of 0.31 pmol leucine mi"' heart cytochrome C: OVA, egg albumin; TG, human gama-globulin.

Activity (6)
Northern Blot Analysis subjected to alactrophoreeis in a 1% og11108e gel that contained 1.1% forme-Five pg Of the total RNA from various guinea-pig tissues were denatured and mid*. After the RNA had een transferred to DitrDcelluloSe paper, the paper was hybridized with the 3qP-lebelled &NAP for procathepsin E and progastric-Sin under high-stringency conditions. The sires Of RNA3 were estimated by reference to the mobilities Of fragments Of ADNA generated by digestion with XindlII.