Purification, Characterization, and Gene Cloning of Thermopsin, a Thermostable Acid Protease from Sulfolobus acidocaZdurius*

A thermostable, acid proteolytic activity has been found to be associated with the cells and in the culture medium of Sulfolobus acidocaldarius, an archaebacterium. This acid protease, which has been named ther- mopsin, was purified to homogeneity from the culture medium by a five-step procedure including column chromatographies on DEAE-Sepharose CL-GB, phenyl-Sepharose CL-4B, Sephadex G-100, monoQ (fast protein liquid chromatography), and gel filtration (high pressure liquid chromatography). The purified thermopsin produced a single band on sodium dodecyl sulfate-polyacrylamide gel electrophoresis and the proteolytic activity was associated with the band. Thermopsin is a single-chain protein as indicated by gel electrophoresis and by a single NHz-terminal se- quence. It has maximal proteolytic activity at pH 2 and 90 “C. A genomic library of S. acidocaldarius was prepared and screened by an oligonucleotide probe


Purification,
Characterization, and Gene Cloning of Thermopsin, a Thermostable Acid Protease from Sulfolobus acidocaZdurius* (Received for publication, August 21,1989) Xin-li Lin$ and Jordan Tang Most of these residues appear to be characteristic of a leader sequence. However, the presence in this region of a short pro sequence cannot be ruled out. Thermopsin contains a single cysteine at residue 237 that is not essential for activity (Fusek, M., Lin, X.-L., Tang, J. (1990) J. Biol. Chem. 265, 1496-1501 Thermopsin has no apparent sequence similarity to aspartic proteases of the pepsin family nor to pepstatin-insensitive acid protease (Maita, T., Nagata, S., Matsuda, G., Murata, S., Oda, K., Murao, S., and Tsura, D. (1984) J. Biochem. 95, 465-475) and thus may represent a new class of acid proteases.
Also absent is the characteristic active site aspartyl sequence of aspartic proteases. Acid proteases are a well-established group of proteolytic enzymes that digest proteins and peptides in acidic solution. Most of these enzymes, including pepsin, gastricsin, chymosin, and cathepsin D, share similarity in amino acid sequences, three-dimensional structures, catalytic mechanisms (l-3), and are inhibited by pepstatin, a transition-state analogue inhibitor (4). Because these proteases contain two aspartic acid residues in their catalytic sites, they are called aspartic proteases. The structure and function relationship of aspartic proteases is a topic of current research interest because some aspartic proteases are involved in human diseases, e.g. renin in hypertension and the protease of human immunodeficiency virus in acquired immunodeficiency disease. Also, the availability of primary and high resolution crystal structures of several aspartic proteases has made these enzymes attractive models for the study of structure-function relationships. The aspartic proteases studied so far, including enzymes derived from yeast, fungi, plants, and animal sources, all fall within a normal range of thermostability.
Because the structure-function relationships of acid proteases was of interest to us, we searched for a thermostable acid protease in an archaebacterium, Sulfolobus acidocaldarius, since this bacterium optimally grows in pH 2 at about 70 "C. In this paper, we report the presence of a thermostable acid protease in S. acidocaldarius.
This enzyme, which is named thermopsin, has been purified to homogeneity and characterized. In addition, the gene of thermopsin has been cloned and sequenced. The deduced amino acid sequence of thermopsin is also reported here. EXPERIMENTAL PROCEDURES'

RESULTS
Purification of Thermopsin- Table  I contains data on the proteolytic activity of the culture medium and recovered cells of S. acidocaldarius, measured at 80 "C and pH 3.2. Thermostable proteolytic activity of thermopsin was clearly present in both fractions. The activity in the cell fraction, however, appeared to be tightly associated with cellular structure (data not shown) and was more difficult to purify. The purification of thermopsin, therefore, was carried out using culture media as starting material. The purification steps are summarized in Table I  .
: and gel filtration (HPLC). The patterns in each of the last two chromatographic steps (Figs. 4 and 5) produced a single peak associated with proteolytic activity, indicating that the enzyme had been purified. Overall, about a 2600-fold of purification was achieved with a yield of 13%.
The homogeneity of the purified thermopsin was tested by SDS-polyacrylamide gel electrophoresis. Thermopsin stained poorly with various dyes; thus their use resulted in very poor detection sensitivity. Therefore, purified thermopsin was iodinated with "'1 and then electrophoresed on a SDS-polyacrylamide gel. The autoradiogram of the gel produced essentially a single band (Fig. 6A). A light shading of material, which migrated ahead of the thermopsin band, was probably a breakdown product caused by "'1 radiation, since the intensity of this "shade" greatly increased during storage of the iodinated enzyme. It was found that when the gel was soaked in a solution of bovine hemoglobin, a band located at the same position of the gel was clearly seen (Fig. 6B). The hemoglobin staining was presumably effected by the binding of hemoglobin to thermopsin. Longer incubation of the gel with hemoglobin followed by incubation at high temperature produced a clearing band at the same electrophoretic position (Fig. 6C), indicating that hemoglobin had been digested within the area of the band. These results suggest that the final thermopsin prepared was pure and that its activity was associated with the band. Supportive evidence for homogeneity was the amino-terminal sequence of purified thermopsin. Only a single sequence was observed (see below).
Molecular Weight of Thermopsin-The molecular weight of thermopsin from S. acidocaldarius was found to be 46,000 from the elution position of the enzyme on a column of Sephadex G-75, and 51,000 from its electrophoretic mobility in SDS-polyacrylamide gel (using the data from Fig. 6). These values are larger than calculated from the amino acid sequence (32,651, see below) probably because thermopsin is a glycoprotein (see "Discussion").

Thermodependence of Thermopsin
Activity-The proteolytic activity of thermopsin was determined at different temperatures using the hemoglobin as substrate. As shown in Fig.  7, the maximum activity was found to be at 90 "C. Residual activity was clearly detectable below 30 "C, as shown in the in-set of Fig. 7. At 100 "C, the activity was still significant. The decline of activity from 90 to 100 "C! was mainly due to inactivation of thermopsin at higher temperatures (results not shown).
Thermostability of Thermopsin-Thermopsin is stable at 80 "C for 48 h at pH 4.5 without appreciable loss of activity (data not shown). The enzyme is also stable at 4 "C.
pH Dependence of Activity-The primary activity of thermopsin was found between pH 0.5 and pH 5 (Fig. 8). The optimal activity was found near pH 2.0. Interestingly, residual activity was still measurable in the pH range of 8-11 (Fig. 8,

Cloning and Structure of Thermopsin
Gene-Restriction maps of the five positive clones, TPl to TP5, indicated that they were related to one another. Since the inserts in these clones were near 5 kb, the combined map covered about 8 kb (Fig. 9). To identify the thermopsin coding region, Southern blots were carried out on restriction fragments of these clones using the synthetic mixed-sequence nucleotide probe (see "Experimental Procedures"). From the fragments, which were positive in Southern blots, the position of the thermopsin gene was approximated and the fragments near the estimated area were subcloned and sequenced. In addition, two deletion libraries were made from a subclone of TP2, from which 27 clones were chosen for additional sequence determinations. A region of about 2 kb was completely sequenced from both strands of the DNA to reveal the thermopsin gene, as shown in Fig. 10. The nucleotide sequence contains an open reading frame from bases 146-1165. This is apparently the thermopsin gene since the 35-residue NH&,erminal sequence determined by protein chemistry is found between nucleotides 269 and 373 (Fig. 10). The thermopsin sequence deduced from its gene contains 299 amino acid residues. The amino acid composition of the enzyme generated from this sequence is close to that determined by amino acid analysis (Table II). This further supports the correct identification of the thermopsin gene.

DISCUSSION
The results presented here established that thermopsin is an acid protease (pH optimum at 2.0) and is unique among known acid proteases for its stability at high temperature. The activity of the enzyme increases with the temperature up to 90 "C (Fig. 8). Above this temperature, the enzyme denatures slowly but the activity is still measurable at 100 "C. The molecular weights determined by gel filtration (46,000) and by electrophoretic mobility (51,000) were much higher than that calculated from the amino acid sequence (32,651). This is probably because thermopsin is a glycoprotein. Within the 299-residue thermopsin sequence there are 11 potential Nglycosylation Asn-X-Thr/Ser signals. The two asparagines located at positions 24 and 28 were the only residues we could not identify in the NH*-terminal sequence determination. These residues are probably glycosylated. At present, it is not known how many glycosylation positions are present in the enzyme.
There are 41 amino acids in front of the NH*-terminal position of thermopsin including the initiation methionine. Examining this region, the sequence of the first 30 residues seems characteristic of a leader sequence with a high content of hydrophobic amino acids. Since thermopsin is found in the culture medium, it seems reasonable to suggest that a thermopsin leader peptide occupies the beginning of the coding region. Leader sequences have been found in other archaebacteria proteins. Some of these sequences resemble eucaryotic and eubacterial leader sequences (13), but others are drastically different (14, 15). Residues 29-40, however, are quite hydrophilic.
Whether this region is part of the leader sequence or whether it represents a proenzyme sequence is not clear at present. The upstream region from the thermopsin gene appears to contain regulatory sequences. The T-rich region between nucleotides 90 and 106 possibly contains the translation termination signals (16) of the preceding gene. Two possible promoter regions are present in the thermopsin gene. The sequence AAAGCTTATATA located between nucleotides 112 and 123 is very similar to the promoter sequences of the methanogen archaebacterium, which has a consensus promoter sequence of AAANNTTTATATA (16). A second sequence AAATTATTTAAA (nucleotides 129-140), which follows the above sequence closely, is very similar to the consensus promoter sequence (AAANNTTTAAA) of a sulfurdependent thermophilic archaebacterium (16). The transcription termination sequence of thermopsin appears to be located near the T-rich region between nucleotides nos. 1220-1232. A putative ribosome binding sequence, GTGAT (nucleotides nos. 143-147), is perfectly complementary to the Sulfolobus 16 S rRNA 3' sequence (17). About 0.8 kb of nucleotide sequence which follows the thermopsin gene, was also sequenced (Fig. 9). This sequence, which appears to code for a yet unidentified gene, will be submitted to Genbank. We have searched for sequence similarity in either the thermopsin gene or protein in the data bases of Genbank, EMBL, and NBRF using the GCG Wordsearch program (18). However, the relatedness scores of the closest sequences were not better than could be achieved from a totally random match. These results suggest that thermopsin gene and protein sequences have not been previously reported and are not significantly related to the known sequences in the databanks. The amino acid sequence of thermopsin was also compared directly with aspartic and thiol proteases by a computer dot matrix method. No sequence homology was observed. More significantly, the characteristic active site amino acid sequences for aspartic proteases, Asp-Thr-Gly-, which is present in all aspartic proteases even with only distant evolutionary relatedness is absent in thermopsin.
These comparisons suggest that thermopsin has perhaps a different type of active site than other aspartic proteases.
Thermopsin is also clearly different from a second group of acid proteases that is pepstatin insensitive (19). Not only is thermopsin pepstatin-sensitive (Iso is about 0.5 pM, see Ref. 20) but its molecular weight is also considerably larger than that of the pepstatin-insensitive acid protease B from Scytulidium lignicolum, 22,000 (21). In addition, there is no sequence homology between the two enzymes.
Thermopsin activity is found both in the culture medium and in bacterial cells. Although the activity in the medium represents a minor fraction of the total activity, the current procedure is effective for obtaining pure enzyme for property studies. We have now repeated this purification procedure many times and found it to be highly reliable. We have also demonstrated the homogeneity of the final purified enzyme by both electrophoresis and by amino-terminal sequence determination.
We have dealt with the question of the relationships between thermopsin associated with the cells and the enzyme in the culture medium. Various methods were tried to release the bound enzyme from the cells. These included treatments with acids, bases, salts, detergents, sonication, proteolysis, and organic solvents. None of these proved to be significantly effective. We found that incubation of cells in 0.25 M sodium formate at 80 "C for a long time caused partial release of thermopsin (results not shown). The released enzyme was purified to homogeneity using the same procedures reported above. The NH*-terminal sequence of formate-released thermopsin was identical to that of enzyme purified from the culture medium (results not shown). These observations suggest that thermopsin may be linked to cells by covalent linkages through some side chains.
An acid protease in archaebacterium S. acidocaldarius has not been described prior to this study. A neutral protease, archaelysin, has previously been isolated from a different strain of archaebacterium (22). However, judging from the properties, archaelysin is obviously unrelated to thermopsin. It was difficult to establish the existence of thermopsin at the beginning of our study partially because it is present at a very low concentration.
This difficulty was overcome by the use of a highly sensitive assay with '%-methylated hemoglobin as substrate. Although our work has not established the physiological role of thermopsin in S. acidocaldarius, it seems reasonable to suggest that thermopsin is a digestive enzyme, which breaks down proteins in the culture medium of this archaebacterium to supply amino acids for bacterial growth and metabolism.