Cloning and characterization of a mouse cysteine proteinase.

cDNA clones encoding a mouse cysteine proteinase were isolated from a cDNA library constructed from mRNA derived from the macrophage-like cell line J774. The DNA sequence predicts a protein that is closely related to, but distinct from, the lysosomal enzyme cathepsin H. Alignment of the predicted amino acid sequence with the known protein sequences for seven other cysteine proteinases suggests that the cloned DNA encodes a 334-residue protein containing both a 17-amino acid pre-region and a 96-amino acid pro-region. Consistent with this prediction, antiserum raised to a recombinant fusion protein expressed in Escherichia coli immunoprecipitated multiple forms of the cysteine proteinase in mouse peritoneal macrophages and fibroblasts. In pulse-chase experiments, a 36-kDa precursor, presumedly the pro-form, was converted intracellularly into a 28-kDa protein and subsequently into a 21-kDa protein. Indirect immunofluorescence microscopy results suggested that the cysteine proteinase was localized to lysosomes. Western blot analysis detected significantly more of the proteinase in thioglycolate-elicited peritoneal macrophages than in resident peritoneal macrophages. Northern blot analysis revealed that several cell lines failed to express mouse cysteine proteinase mRNA.

the lysosomal cysteine proteinases is unclear, but they are generally thought to play a role in intracellular protein degradation. Consistent with their lysosomal location, the mammalian cysteine proteinases are optimally active at acidic pH (1). However, tumor cells have been shown to secrete a cysteine proteinase which retains activity at neutral pH and may play a role in tumor metastasis (5, 6).
The best characterized lysosomal proteinases are the cysteine proteinase cathepsin B and the aspartic proteinase cathepsin D. Whereas the biosynthetic intermediates for cathepsin D have been defined (7-11), less is known about the biosynthesis of cathepsin B or other lysosomal cysteine proteinases. Both cathepsin B and cathepsin D appear to be synthesized as pro-enzymes (9, 12). Although the mature forms of these proteins have been characterized enzymatically and fully sequenced (13-15), the pro-forms, being transient, have not been well characterized. Recently, cDNA clones encoding these enzymes have been isolated and sequenced (16,17), making possible more detailed analysis of the transient intermediates in the biosynthesis of these lysosomal proteinases. In this report, we describe the cDNA cloning and initial characterization of a cysteine proteinase which is expressed at high levels by mouse inflammatory macrophages. DNA sequence analysis suggests the presence of both a preand a pro-form of the cysteine proteinase. We use an antiserum raised to the protein expressed in Escherichia coli to localize the proteinase to macrophage lysosomes and to characterize steps in its biosynthesis.

RESULTS
Cloning and DNA Sequence of the Mouse Cysteine Proteinase cDNA-cDNA clones were identified in a macrophage cell line cDNA library using a subtractive cDNA probe as described under "Experimental Procedures." Two overlapping clones of 1050-and 1150-nucleotide base pairs (pMCP-10 and pMCP-39, respectively) were mapped by restriction enzyme analysis, subcloned into derivatives of M13, and subjected to DNA sequence analysis (Fig. 1). 1294 base pairs including a Portions of this paper (including "Experimental Procedures" and The map is oriented 5' to 3' (left to right). All sequences were obtained 5' to 3'; arrows represent the strand and extent of DNA sequenced. The model for the predicted protein was based on a comparison to the amino acid sequence of mature forms of other cysteine proteinases (Fig. 3). S refers to the predicted signal peptide. PRO refers to a transient protein sequence which shows no relatedness to the mature forms of other cysteine proteinases. BP, base pairs. P h e A s p G l n T h c P h e Ser A l a G l u T r p His G l n T c p L y 5 S e r T h r tils A l g A r q

T T T G A T C I A ACC TTT ACT
GCA GAG TGG CAC CAG TGG AAG TCC ACG CAC AGA AGA 1 7 3 L e u Tyr G l y T h r As" G l u G l u G l u T c p A r q A r q A l a Ile T r p G l u L y s A 5 n net CTG T A T GGC ACG AAT GAG GAR GAG TGG AGG AGA GCG ATA TGG GAG AAG AAC A m 2 2 7

Airq Met
I l e G l n L e u H l s As" G l y G l u Tyr Ser ASn G l y G l n 111s G l y P h e S e r -4 0 AGA A E ATC CAG CTA CAC AAC GGG GAA TAC AGC AAC GGG CAG CAC GGC T T T TCC 2 8 1 net G l u Met A S n A l a P h e G l y A s p l e t T h r A s n G l u G I " P h e A r g G l n V a l V a l A R : GAG A R : AAC GCC T T T GGT GAC A X ACC A A T GAG GAA TTC AGG CAG GTG GTG 3 3 5 AAT GGC TAT CGC CAC CAG AAG CAC AAG AAG GGG AGG CTT TTT CAG GAA CCG CTG 3 8 9 A m G l y T y r A r q His G l n Lys His L y s L y s G l y Acq L e u P h e G l n G l u P r o L e u net L e u L y s I l e P r o L y s Ser V a l A s p T r p A r q G l u L y 5 G l y Cys V a l T h r P r o poly(A) stretch of 18 nucleotides were sequenced (Fig. 2). The longest open reading frame begins with a methionine at nucleotide 60 and ends with an asparagine at nucleotide 1061.
Comparison to Other Cysteine Proteinases-The nucleotide sequence (Fig. 2) predicts a 36-kDa protein of 334 amino acids. Comparisons with other proteins present in the National Biomedical Research Foundation data base using the computer program FASTP (43) revealed that the mouse protein shared significant amino acid relatedness with five members of the cysteine proteinase family over a stretch of 221 amino acids. Two amino acids (cysteine and histidine) known to be in the active sites of cathepsin H, cathepsin B, and papain (44) were both conserved in the amino acid sequence of the mouse protein, suggesting that we had cloned a member of the cysteine proteinase family.
TO confirm that the encoded protein was a cysteine proteinase, we aligned it with seven cysteine proteinases for which a complete protein sequence of the mature form was available (Fig. 3 ) . Amino acid sequence data from purified proteins were obtained for rat liver cathepsin H (13), papaya plant papain ( 3 ) , Chinese gooseberry actinidin (45), and human liver cathepsin B (15). Protein sequence was deduced from a gene sequence for barley aleurain (46), for Dictyostelium cysteine proteinase 1 (4), and for rat liver cathepsin B (16). The eight proteins were aligned with gaps introduced to maximize codon similarity. Of the 221 amino acids aligned, 31 (14%) were found to be conserved in all family members including the mouse protein. There were 111 (50%) residues conserved in four or more of the eight aligned sequences. As has been observed previously, the amino-and carboxyl-terminal regions of the proteins were most highly conserved (13). Based on the alignment presented in Fig. 3, the mouse protein is 48% identical to rat cathepsin H, 46% identical to papain, 48% identical to Dictyostelium proteinase 1, but only 32% identical to rat cathepsin B. For comparison, the plant cysteine proteinases actinidin and papain are 50% identical, and rat cathepsin B is 35% identical to rat cathepsin H and 84% identical to human cathepsin B. Therefore, the close sequence relatedness of the mouse protein to the other cysteine proteinases strongly suggests that it is indeed a mouse cysteine proteinase which we hereafter refer to as MCP.
The cDNA sequence of MCP predicts a protein of 334 amino acids. The mature forms of the cysteine proteinases aligned in Fig. 3 begin to show relatedness to MCP after 114 amino acid residues, suggesting that post-translational proteolytic processing generates a 221-residue mature form of MCP. Post-translational glycolytic processing may also occur as there are two potential sites for N-glycosylation, at asparagines 108 and 155 (Fig. 3).
Immunological Detection of MCP-In order to examine the protein expressed by mouse cells, an antiserum was raised to recombinant protein expressed in E. coli. A plasmid was constructed that encoded a fusion protein comprising 240 amino acids of the coding sequence of MCP (including 27 amino acids of the pro-region) and 81 amino acids of the influenza protein NS1 (40). The resulting fusion protein is 36 kDa of which 9 kDa are the truncated NS1 protein. The fusion protein (NS1-MCP) represents approximately 10% of total E. coli protein after temperature induction (Fig. 4). NS1-MCP protein was purified by elution from SDS2-polyacrylamide gels and was used to elicit a rabbit antiserum that was used to detect the mouse cysteine proteinase in mouse cells.
Western blot analysis (Fig. 5) revealed multiple forms of MCP and different levels of expression by different cells. Resident mouse peritoneal macrophages showed almost undetectable amounts of antigenic material (lunes 1 and 2). However, thioglycolate-elicited mouse peritoneal macrophages synthesized a considerable amount of protein recognized by the antiserum. These cells showed three cellular forms of MCP of approximately 36, 28, and 21 kDa (lane 3 ) and a secreted form that was slightly larger than the 36-kDa intracellular form (lane 4). Periodate-elicited peritoneal macproteins detected on Western blots were biosynthetically rerophages also showed enhanced expression of MCP compared lated. to resident macrophages (data not shown), suggesting that Subcellular Localization of MCP-The relatedness of MCP inflammatory macrophages, in general, express significantly with lysosomal cysteine proteinases and its intracellular procmore MCP than do resident macrophages. The macrophageessing suggested that MCP might be localized to lysosomes. like cell line 5774, from which MCP was cloned, predomi-Indirect immunofluorescence was performed to test this. Flunantly expressed a secreted 36-kDa polypeptide (lane 6), while orescence microscopy using affinity-purified antiserum on only showing low levels of intracellular MCP (lane 5 ) . This is methanol-fixed resident mouse peritoneal macrophages reconsistent with a previously reported observation that 5774 vealed intracellular punctate fluorescence in some of the secretes up to 70% of its newly synthesized lysosomal enzymes macrophages (Fig. 7A). An occasional non-macrophage in the (47). An embryonic fibroblast cell line, CL.7, possessed the same three forms as the macrophages (lane 7) at concentrations similar to the thioglycolate-elicited macrophages.
Post-translational Processing of MCP-To assess the biosynthetic relationship of the three forms of MCP detected on Western blots, a pulse-chase experiment was performed using thioglycolate-elicited mouse peritoneal macrophages (Fig. 6). A 36-kDa polypeptide was initially immunoprecipitated and either secreted (data not shown) or processed intracellularly adherent population showed numerous fluorescent intracellular granules (Fig. 7C). A control affinity-purified anti-ovalbumin serum showed no staining (data not shown). These data are consistent with MCP being a lysosomal enzyme, although further experiments are necessary to verify its subcellular location.
Detection of MCP mRNA in Mouse Cells-Northern blot analysis (Fig. 8) revealed that MCP mRNA was present in normal liver, embryonic fibroblasts, embryonic liver cells, and, by 2 h to a 28-kDa polypeptide, which was converted after 24 to a lesser extent, in normal T cells isolated from the spleen h to a 21-kDa protein. These data indicated that the three and lymph node. Three of four mouse macrophage cell lines

DISCUSSION
Two overlapping cDNA clones encoding a cysteine proteinase were isolated from a mouse macrophage library and sequenced. The predicted protein has been identified as a mouse cysteine proteinase (MCP) based on its significant amino acid relatedness to all known members of the cysteine proteinase superfamily (Fig. 3). Each of these enzymes has cysteine and histidine in its active site. In addition, as depicted in Fig. 3, MCP shares 29 other residues with all of the members of the superfamily. Furthermore, of the 221 amino acids which are aligned, 111 are concensus residues present in four or more of the eight sequences. Thus, the cloned cDNA clearly encodes a protein belonging to the cysteine proteinase superfamily.
MCP is most closely related (48%) to rat liver cathepsin H. However, MCP is probably not mouse cathepsin H since the cross-species amino acid relatedness of lysosomal enzymes is much higher. For example, rat cathepsin B is 84% identical to human cathepsin B (Fig. 3), whereas human cathepsin D shows 87% identity with porcine cathepsin D (17). The first 39 amino acids of human liver cathepsin L3 show 85% amino acid identity with the predicted amino acid sequence of MCP. For comparison, in this same region of the sequence, MCP and rat cathepsin H share 62% identity, whereas rat cathepsin H and human cathepsin L show 64% identity. Thus, MCP is a cysteine proteinase which has not been sequenced previously and is likely to be the mouse analog of cathepsin L, a lysoso- mal cysteine proteinase originally isolated from rat liver (48). Definitive identity of MCP as cathepsin L will require purification of the MCP protein and characterization of its enzymatic activity on various substrates relative to other characterized cysteine proteinases. By immunofluorescence, MCP was localized in discrete granules in the macrophage cytoplasm, consistent with MCP being a lysosomal enzyme. However, not all macrophages in the adherent cell population stained positive. Thus, the low levels of MCP seen on Western blots of resident peritoneal macrophages compared to elicited macrophages may represent cellular heterogeneity of MCP in macrophages. Non-macrophage cells, probably fibroblasts, showed a pattern of fluorescence similar to that reported for cathepsin L in rabbit fibroblasts (49).
Lysosomal enzymes undergo both proteolytic and glycolytic post-translational processing, giving rise to multiple biosynthetic forms detectable by pulse-chase experiments (50). MCP also exhibits multiple intracellular biosynthetic forms. Three polypeptides of 36,28, and 21 kDa were observed in Western blots of different cell types probed with an antiserum raised in rabbits to an MCP-NS1 fusion protein expressed in E. coli. Pulse (41). Lunes 1-7, 2 pg of poly(A)+ mRNA; lune 8, 1 pg of total RNA; lune 9, 2 pg of poly(A)+ mRNA, lunes 10-12, 10 pg of total RNA. THIO-M4 refers to thioglycolate-elicited mouse peritioneal macrophages. Nick-translated probe DNA (10' cpmlpg) was derived from the PstI fragment isolated from plasmid pMCP -39. revealed that the 36-kDa form was also secreted, which is consistent with previous findings that early biosynthetic forms of lysosomal proteinases are secreted (8-10).
The observation that the amino acid sequences of the mature forms of cysteine proteinases can be aligned with MCP beginning at residue 114 suggests that MCP undergoes post-translational amino-terminal proteolytic processing. Analogous to the aspartyl lysosomal proteinase cathepsin D (9), MCP may be synthesized as a preproprotein. Cathepsin D has been shown by sequence analysis to be synthesized with a 20-residue transient amino-terminal signal sequence mediating translocation across the endoplasmic reticulum membrane (7,9). We predict that the first 17 amino acids of MCP constitute a signal peptide cleaved co-translationally after Ala-(-97). This region consists of a stretch of 10 hydrophobic amino acids preceded by a charged residue (asparagine), which is consistent with other signal sequences (51). The sequence Ala-X-Ala, here Ala-Leu-Ala, is the most frequent sequence preceding the signal peptidase cleavage site (51). We have recently shown by amino-terminal sequence analysis of the 36-kDa protein that cleavage does occur after Ala-(-97).4 MCP has a 96-amino acid pro-region between the signal sequence and the conserved amino terminus of mature cysteine proteinases similar to two other cysteine proteinases for which a complete gene sequence is available, aleurain (46) and Dictyosteliurn cysteine proteinase 1 (4). The pro-groups for the cysteine proteinases show no relatedness with each other or with the shorter pro-group (44 amino acids) of cathepsin D. Proteolytic removal of the 96-amino acid proregion of MCP would result in a 28-kDa protein, as was observed. Further proteolysis probably converts the 28-kDa single-chain form to the observed 21-kDa heavy chain and a 7-kDa light chain as seen for cathepsins B and H (13). The expected 7-kDa polypeptide which would contain only a single methionine is detected when cells are continuously labeled (data not shown).
MCP contains two potential sites for N-linked carbohydrate addition. Single sites exist for cathepsin B and H, whereas no N-glycosylation addition sites are present on the mature forms of papain and actinidin, the secretory plant proteinases. On Western blots, the secreted form of MCP appeared to migrate slightly slower than the intracellular form. Carbohydrate analysis will be necessary to determine if this is due to the addition of complex carbohydrate, as observed for the secreted forms of several other lysosomal enzymes (50).
The expression of cysteine proteinases has been shown to be regulated in both plants and animals. RNA for the cysteine proteinase aleurain, isolated from barley aleurone cells, is A. H. Erickson

Mouse Cysteine Proteinase
induced 7-fold upon treatment with the hormone gibberellic acid (46). In Dictyosteliun, mRNA specifying cysteine proteinase 1 represents 1% of cellular RNA in differentiating cells but is absent from growing cells (4). MCP mRNA was undetectable in several mouse cell lines, but abundant i n others. W e can speculate that this reflects cellular heterogeneity of MCP expression in vivo. Furthermore, the amount of MCP protein seen on Western blots increased in inflammatory macrophages as compared to resident macrophages. These results are consistent with the observations of others that inflammatory macrophages show increased levels of certain lysosomal enzymes and secreted neutral proteinases (52,53). Such increases in enzyme levels are thought to reflect the state of macrophage activation (54). Enhanced secretion of these enzymes may also be partially responsible for the sequelae seen during an inflammatory response. We are currently analyzing the level at which regulation is controlled in macrophages.
Acknowledgments-We thank Dr. Bruce Erickson for the amino acid alignment of the cysteine proteinases, Amalia Pavlovec for excellent technical assistance in DNA sequencing, Sumi Koide for providing T cells, Robert Mason for unpublished data on the sequence of cathepsin L, and James Young and Marty Rosenberg (Smith Kline & French Laboratories) for use of the pB4+ expression vector and bacterial strains.
Note Added in Proof-Comparison of the cDNA sequence for MCP with partial cDNA sequences for the major excreted protein of transformed mouse fibroblasts (D. Denhardt, Cancer Res., in press; B. Troen and M. M. Gottesman, personal communication) revealed that the encoded proteins are identical. Since the first 39 residues encoded