The Gene and Deduced Protein Sequences of the Zymogen of Aspergillus niger Acid Proteinase A*

Proteinase A obtained from the culture medium of Aspergillus niger var. macrosporus is a unique acid endopeptidase that is insensitive (or less sensitive) to specific inhibitors of ordinary acid or aspartic proteinases, such as pepstatin, diazoacetyl-DL-norleucine methyl ester, and 1,2-epoxy-3-(p-nitrophenoxy)- propane. In the preceding paper . M. J. Biol. Chem. 266, 19480-19483), we reported the complete pri- mary structure of the mature enzyme determined at the protein level. The enzyme has a unique two-chain structure with a 39-residue light (L) chain and a 173-residue heavy (H) chain linked noncovalently. As an extension of this study, we isolated genomic and cDNA clones encoding this proteinase and determined their nucleotide sequences. To isolate a genomic clone, the genomic DNA was selectively amplified by polymerase chain reaction using mixed oligonucleotide primers de- signed from the amino acid sequence of the H chain, and a specific probe thus generated was used for screening a XgtlO genomic library.

solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted M688 71.
to the GenBankTM/EMBL Data Bank with accession number (s) cine methyl ester (DAN)' in the presence of cupric ions, and 1,2-epoxy-3-(p-nitrophenoxy)propane (EPNP) (4,5). On the other hand, proteinase A is a different type of acid proteinase (1,6). Its substrate specificity and responses to inhibitors are quite different from those of the ordinary aspartic proteinases (1)(2)(3)(4)(5). It is not inhibited by pepstatin and EPNP and only partially inhibited by DAN in the presence of cupric ions under conditions in which proteinase B is completely inhibited (4). As described in the preceding paper (6), we have determined the complete amino acid sequence of proteinase A by conventional methods of protein chemistry, which revealed that the enzyme has a two-chain structure consisting of a 39-residue light (L) chain and a 173 residue heavy (H) chain bound noncovalently, and has none of the consensus active-site sequences of ordinary aspartic proteinases (6).
Non-pepsin type acid proteinases like proteinase A have also been found in some species of molds, mushrooms, bacteria, and archaebacteria (7)(8)(9)(10)(11)(12)(13)(14)(15). Among them only the sequences of the acid proteinase B from Scytalidium lignicolum (15) and thermopsin from Sulfolobus acidocaldarius (16) have been reported. The former is insensitive to pepstatin and DAN, but sensitive to EPNP. On the other hand, thermopsin is inhibited by pepstatin, but only slowly and nonspecifically with DAN and EPNP. The amino acid sequence of proteinase A shows no homology with thermopsin, but approximately 50% homology with Scytalidium proteinase B. We could not find any other proteins homologous to proteinase A by computer comparisons with 10,856 proteins in the GenBank data base. Proteinase A and Scytalidium proteinase B thus seem to belong to the same subclass of non-pepsin type acid proteinases. However, there are three major structural differences. First, the former is a two-chain enzyme, whereas the latter is a single-chain enzyme. Second, the former has two disulfide bonds, whereas the latter has three, and only one of them is common to both enzymes. Third, the former has no counterparts to G~u~~ and Aspg8, the proposed active-site residues of the latter (17,18). It therefore remains to be seen which residues really participate in the catalytic function of proteinase A and how the mechanism operates. It will also be interesting to elucidate the mechanism of processing of the precursor form to the mature enzyme.
In the present study, we have isolated the gene and cDNA for this enzyme and sequenced them as a further step in this direction. Thus, we have deduced the amino acid sequence, as well as the gene structure of the single-chain 282-residue precursor form of the enzyme, which is composed of a 59residue prepropeptide, the 39-residue L chain, an 11-residue The abbreviations used are: DAN, diaZOaCetyl-DL-nOrleUCine methyl ester; EPNP, 1,2-epoxy-3-(p-nitrophenoxy)propane; L chain, light chain; H chain, heavy chain; bp, base pair(s); kb, kilo base(s).

DNA Blotting
Analysis and Cloning of the Gene for Proteinase A-When the amplified radiolabeled DNA prepared by using mixed oligonucleotide primers (Fig. 1) was hybridized t o a blot of A. niger DNA that had been digested with EcoRI, BamHI, or both, a single major band was found to hybridize with the probe in each digest, and their lengths were approximately 9 kb in the EcoRI digest and approximately 2.8 kb in the BamHI and EcoRI/BamHI digests (Fig. 2). These results suggested that the region from residues 41 to 154 of the H Portions of this paper (including "Experimental Procedures" and Figs. 1,2,4,5, and 7-10) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. Nucleotide Sequence and the Deduced Amino Acid Sequence of the Zymogen-The cloned DNA was shown to be 2,714 bp long by sequence determination (Figs. 3 and 4). Assuming the absence of an intron, the clone would have an open reading frame of 846 bp encoding a protein of 282 amino acid residues containing both chains of proteinase A. The nucleotide sequence of the cDNA selectively amplified by using oligonucleotide primers (Fig. 5) was analyzed and shown to be completely identical with that of the genomic DNA. This indicated that there is no intron in this gene.
A TATA box-like sequence, CATAAA, was found a t 167 bp upstream from the initiation codon ATG, and an Spl-binding site-like sequence at 375 bp upstream from the ATG. On the other hand, no consensus sequence (i.e. AATAAA) for the polyadenylation characteristic of higher eukaryotes was found in the region up to 1.2 kb downstream from the termination codon. Several sequences with a single base pair change from AATAAA are present in the downstream region of the gene: AATAAG (2009-2014), AATATA (2043-2048), and two TA-TAAAs (2045-2050 and 2150-2155) (Fig. 3). Among these, however, only AATAAG (at 2009-2014) is known to be functional but with low efficiency in other genes (19)(20)(21).

DISCUSSION
As described in the preceding paper (6), the mature form of proteinase A is a 212-residue protein composed of a 39-residue L chain and a 173-residue H chain bound noncovalently. On the other hand, by nucleotide sequencing of the cDNA and the genomic DNA for the enzyme performed in the present study, the precursor protein of proteinase A was deduced to be a single polypeptide of 282 amino acid residues containing both H and L chains, as shown in Figs. 3 and 6u. Thus, this precursor protein is 70 residues larger than the mature enzyme (6), containing a 59-residue preprosequence (residues 1-59) at the NHz terminus and an 11-residue sequence (residues 99-109) intervening between the L and H chains. So far, neither the prosequence nor the intervening sequence has been found in the mature enzyme isolated from the culture filtrate. The amino acid sequences deduced for the H and L chains by DNA sequencing agree completely with those obtained by protein sequencing (6), except that the NHz-terminal amino acid residue of the H chain is glutamine in the former and pyroglutamic acid in the latter. The NHz-terminal glutamine residue is thought to have been converted to a pyroglutamic acid residue after cleavage at the Arg'Og-Gln"o bond.
The sequence of uncharged amino acid residues with a high content of hydrophobic amino acids from residues 3 to 22 in (a) 10 20 the NHz-terminal region is characteristic of a signal sequence. This area is the most hydrophobic region in the protein, as can be seen from the hydropathy profile (Fig. 6b), according to Kyte and Doolittle (22). Further, a lysine residue at position 2 is frequently found in the signal sequences. The most probable site of signal sequence cleavage was predicted by the method of von Heijne (23,24) to be between residues Ala18 and Ala" (Fig. 7). The location of alanine residues at positions -1 and -3 relative to the putative cleavage site satisfies the (-3, -1) rule (25), and the amino acid residue at position -2 is leucine, which is most frequently found in known signal sequences (23). The next probable site of signal sequence cleavage is between Ala"j and Led7. In either case, the length of the signal sequence is roughly the same as those of pepsinogens (26). Since proteinase A is excreted into the culture medium, either of these sequences is thought to function as a signal sequence.

L e u -T h~l u -L J~-A r~-A~~-A l~-A~~-L~~~~~-A~~-A~e -A~~-A l~-G~~-L~~-A~e -H i~
In contrast to the abundance of acidic residues in the H and L chains, basic residues are abundant in both the prosequence and the intervening sequence (Fig. 8). Basic residues are particularly prevalent in the regions from residues 24 to 48 (in the prosequence) and from residues 106 to 109 (in the intervening sequence). These basic residue-rich sequences may stabilize the conformation of the proenzyme by interacting electrostatically with acidic residues of the H or L chain, or may be necessary for proper folding of the proenzyme molecule. The precursors of aspartic proteinases such as pepsinogen also contain a prosequence that resembles that in proteinase A in the abundance of basic amino acid residues.
In pepsinogen, indeed, the basic residues of the prosequence are thought to interact with acidic residues of the pepsin moiety to stabilize the zymogen molecule at neutral pH inside cells (27). The prosequence of proteinase A may play the same role as that of pepsinogen. Fig. 9 shows a comparison of the putative prosequence of proteinase A with the prosequences of some typical pepsinogens (28-32) and procathepsin D (33). Interestingly, there appears to be some homology between the prosequence of the proteinase A and those of pepsinogens and procathepsin D, especially in the NHz-terminal half of the prosequences. Since organic acids such as citric acid excreted by A. niger make the environment acidic, the proenzyme may be activated just after secretion like pepsinogen. However, the possibility also remains that activation takes place before secretion.
The mature form of proteinase A composed of the H and L chains is considered to be generated by removal of the NHZterminal prosequence and the intervening sequence from the one-chain proenzyme. Interestingly, these two sequences are the most hydrophilic ones in the protein, as judged from the hydropathy profile shown in Fig. 6b. The three sites, AsnS9-Glum, TyrgS-Glyg9, and Arg'Og-Gln"o, therefore, should be cleaved in the activation process of the proenzyme. They may be located at the surface of the enzyme and easily attacked by proteinases. The secondary structure prediction of the proenzyme according to Chou and Fasman (34) suggests that these sites are all in turn structures (Fig. 10). It remains to be elucidated what kind of proteinase(s) participates in processing. Proteinase A or proproteinase A itself may process the proenzyme intra-or intermolecularly like pepsinogen, or another proteinase(s) such as A. niger acid proteinase B may be involved. It may be possible that proteinase A cleaves the Tyrg8-GlyW bond, since it was reported to cleave the Tyr-Thr bond in the oxidized insulin B chain (5) and the Tyr-Gly bond in [Tyr*]-substance P (35). Another question of interest is whether the putative intermediate enzyme whose L and H chains are linked with the intervening sequence is active or not. So far, however, the proenzyme containing the pro part and/or the intervening sequence has not been isolated from the culture medium. Studies are in progress to investigate the processing and activation of proteinase A by preparing the proenzyme by expressing the cDNA in Escherichia coli and yeast. The fact that the amino acid sequence of proteinase A is approximately 50% identical with that of Scytalidium acid proteinase B seems to suggest that the two enzymes share the same active sites. However, neither Gld3 nor Asp", the proposed active site residues of the Scytalidium enzyme (17,18), are conserved in proteinase A. Both amino acid and cDNA sequencing showed that the corresponding residues in proteinase A are Gln and Lys, respectively (6). Neither of the consensus sequences of the active sites of ordinary aspartic proteinases, Asp-Thr-Gly-and Asp-Ser-Gly-, are present in proteinase A. It remains to be seen which residues in proteinase A are the catalytic residues. Studies are under way to elucidate the active site residues of this enzyme by several methods, including protein engineering and site-directed mutagenesis.