The ATP-dependent Clp Protease of Escherichia coli SEQUENCE OF clpA AND IDENTIFICATION OF A Clp-SPECIFIC SUBSTRATE*

The clpA gene, which codes for the ATP-binding subunit of the ATP-dependent Clp protease of Escherichia coli, has been sequenced. coding a of ClpA pre- protease of E. eoli, in the The clpA a Primer

The clpA gene, which codes for the ATP-binding subunit of the ATP-dependent Clp protease of Escherichia coli, has been sequenced. The coding region contains a single open reading frame for a protein of 758 amino acids; within the amino acid sequence are two consensus sequences for ATP-binding sites. The sequence of ClpA does not resemble that of other previously described ATPases or Lon, the other sequenced ATP-dependent protease of E. eoli, except in the ATPbinding site consensus region.
The clpA gene is expressed as a monocistronic message. Primer extension experiments define a major start point of transcription at -183 relative to the start of translation. A rho-independent terminator is located 23 bases beyond the end of the coding region.
The ClpA protein is degraded in uiuo in a Clp-dependent fashion (h/2 -60 min). A fusion protein containing the first 40 amino acids of ClpA fused in frame to &galactosidase is degraded very rapidly in a clpA+ host (tllz -3 min) but not in a clpA-host. This fusion protein is the first Clp-specific substrate described.
The rapid degradation of specific regulatory proteins is an important aspect of cellular control mechanisms (l-4). In addition, abnormal and damaged proteins as well as many foreign proteins introduced in cells by cloning or infection are rapidly degraded intracellularly. The turnover of these short lived proteins in Escherichia coli and other organisms is frequently energy dependent (1-S). It is increasingly evident that the energy dependence of protein degradation in E. coli in uivo is largely attributable to the action of proteases that are either totally dependent on or highly activated by ATP (9-12). Identifying the ATP-dependent proteases found in cells, studying their mechanisms of action, and determining the unique features of in viva substrates of these proteases should help our understanding of this mode of physiological regulation.
The best understood ATP-dependent protease is the Lon protease of E. coli (9, 10). Lon substrates in uiuo include the cell division inhibitor SulA, the X antiterminator N, and the positive regulator of capsule synthesis . Also, the degradation of many abnormal proteins is at least partially dependent on functional Lon protease in uiuo (7,8,(17)(18)(19)(20).
In vitro, purified Lon protease directly cleaves multiple peptide bonds in a variety of denatured proteins and in purified X N protein; maximal activity requires the continuous presence of ATP or an analog of ATP (9,10,15,21). In vitro studies imply that under physiological conditions, proteolysis by Lon is accompanied by hydrolysis of two ATPs/peptide bond cleaved (22). The deduced amino acid sequence of Lon protease reported recently by  contains a sequence motif identical to the nucleotide-binding sites found in many ATP-binding proteins and ATPases. No other recognizable features in common with other proteases were noted. A second ATP-dependent protease of E. coli, which we have called Clp (called Ti by Chung and co-workers), has been identified and purified to homogeneity (11,12,24,25). Clp protease also directly cleaves peptide bonds in various proteins in a process that requires ATP hydrolysis. Clp differs from Lon in structure, in its in viuo substrates, and in the regulation of synthesis of the gene products. Lon is a tetramer composed of identical 87-kDa subunits, whereas the Clp protease consists of two dissimilar subunits, ClpA (81 kDa) and ClpP (21 kDa). ClpA has been shown to have ATPase activity and to bind ATP (24, 25). ClpP has been shown to be labeled by the serine protease inhibitor diisopropyl fluorophosphate and has low endopeptidase activity against small peptides in the absence of ClpA' (26). The Lon protease is regulated by htpR as part of the heat shock response (27, 28). ClpA is clearly not a heat shock protein (24), but its regulation has not been described. clpA mutants do not share the properties of lon mutants, although ClpA seems to contribute somewhat to the degradation of abnormal proteins in the absence of Lon. A comparison of these two energy-dependent proteases may give us the first opportunity to understand the essential elements of an energy-dependent protease system.
We report here the sequence of the clpA gene and its regulatory regions. We have found recently that ClpA contains two regions highly homologous with proteins from prokaryotic and eukaryotic cells'; each of these domains contains a consensus sequence for a nucleotide-binding site. Sequence similarities between Clp and Lon protease are restricted to a very short region ((50 amino acids) centered on the consensus ATP-binding motif. We have used translational fusions of Lac to ClpA to define a substrate fusion protein specifically degraded by Clp.   I%. P-continued mined with 5,5'-dithiobis(nitrobenzoic acid) (39). Aromatic amino acids were determined spectrophotometrically (40) by second derivative UV spectroscopy of ClpA in 6 M guanidine hydrochloride. The molar concentration of ClpA was calculated from the concentration of tryptophan or tyrosine obtained from the second derivative of the UV absorbance and the tryptophan and tyrosine content of the protein obtained by sequence.

RESULTS
Sequence of &A-In our previous paper (24) we reported the cloning of ClpA and showed that the amino acid sequence of the amino-terminal portion of ClpA determined by protein sequence analysis agreed with that predicted from partial sequencing of the DNA of the cloned gene. The remainder of clpA has now been sequenced from two Ml3 clones, each carrying one of the strands of clpA. The internal primers used for sequencing are shown in Fig. 1. The entire clpA gene and surrounding regulatory region were sequenced from both strands; the sequence is shown in Fig. 2. In this sequence, the open reading frame beginning at base pair 1 codes for a protein of 758 amino acids of a molecular mass of 83,875 daltons. This is in good agreement with the estimated size of 81 kDa determined for the purified protein by SDS-acrylamide gel electrophoresis. Table I shows a reasonable agreement between the amino acid composition determined experimentally Ti and provided to us, are found within our predicted amino acid sequence. This agreement confirms that their purified Ti protein is in fact identical to ClpA and that the disagreement in amino acid composition probably results from an error with their earlier preparation.
The UV absorbance spectrum of ClpA in buffer B (24) at pH 7.5 showed a maximum absorbance at 278 nm and an absorption coefficient of 0.40 + 0.02 (mg/ml)-I.
The p1 of ClpA calculated from the sequence was 6.3.
The previously observed size of ClpA protein fragments from truncated copies of the gene and Akan insertions in the gene (24) agrees well with that predicted from the sequence. The Nru site, used as the start point for the clpA164 deletion, is located 300 base pairs upstream of the translation start (base pairs -297 to -292). The codon usage pattern for ClpA is not significantly different from that for general E. coli proteins (data not shown).
Transcription of clpA-The in vivo start points of clpA transcription from both the chromosome and a plasmid-borne gene were determined by primer extension, as described under "Experimental Procedures." Three relatively strong bands were detected which would predict transcription start points within 200 base pairs of the start of translation when chromosomal RNA was the template (Fig. 3, lanes a, b, and d). All three bands were increased about 20-fold when RNA from a transformant carrying multiple copies of the clpA gene was used (Fig. 3, lane e), and they were all absent when the RNA used was extracted from the AclpA164 mutant (Fig. 3, lane c). The first of the putative start sites begins at -183 from the translation start and is preceded by -10 and -35 regions that are reasonably close to consensus (underlined in Fig. 2 (A) or dideoxy TTP (ZJ. stem and 4-base pair loop, consistent with a transcription termination signal (Fig. 4). Predicted Structure for the ClpA Protein-ClpA has been shown to have ATPase activity in vitro and to interact with ClpA to activate ATP-dependent proteolysis' (24, 25). Therefore, ClpA would be expected to have an ATP-binding site. Examination of the sequence of ClpA revealed that the sequence Gly-X-Y-Gly-Val-Gly-Lys-Thr occurs twice, at amino acid residues 214-221 and 495-502 (underlined in Fig. 2). This sequence corresponds to part A of a two part ATP-binding consensus sequence (Fig. 5)  This or a closely related version of the sequence motif is found in nearly all ATP-binding proteins examined to date (23). According to Walker, part B of the consensus has 3 or 4 hydrophobic residues (@) followed by aspartate or glutamate ((+JAsp/Glu) and appears 50-100 amino acids to the carboxyl-terminal side of part A. In ClpA, part B sequences are found at amino acid residues 281-286 and 560-564, about 60 amino acids away from their respective parts A (Fig. 2,  lined regions). Data base analyses by Chin et al. (23) indicate that parts A and B of the consensus only occur in proteins that bind ATP. Thus, the occurrence of two such consensus motifs in ClpA strongly suggests that ClpA has two binding sites for ATP.
We have reported recently that the ATP-binding consensus sequences in ClpA are prominent features in two regions of the primary protein sequence defined by very close homology to a second E. coli Clp-like protein, ClpB, and to a group of proteins found in other bacteria, lower eukaryotes, and plants. ' Because each of these regions shows conservation of sequence with the corresponding regions in the different genes and the first of the regions is bounded in the plant genome by introns, we refer to these as domain 1 and domain 2. Domain 1 of ClpA (amino acid residues 183-415, coded for by nucleotides 547-1245), shares 54% identical and 88% similar amino acids with ClpB, and domain 2 (residues 420-609, coded for by nucleotides 1258-1827), shares 53% identical and 89% similar amino acids with ClpB. Conservation between ClpA and the Clp-like proteins from other organisms is virtually the same as that between ClpA and ClpB of E. coli. In contrast, homology between the sequences of the two domains of ClpA is limited to the two relatively short regions immediately surrounding the two parts of the ATP-binding consensus sequence (Fig. 5). Domain 1 is longer than domain 2 and contains elements found in the fi subunit of E. coli Fl ATPase not found in domain 2. Thus, although it is likely that both domains bind ATP, the differences between them suggest that the two domains have functionally distinct roles in the enzyme.
As reported by Chin et al. (23), Lon protease has a single ATP-binding consensus sequence. Remarkably, there are no extensive sequence homologies between Lon protease and ClpA outside of the narrow region around the ATP-binding consensus sequence. An alignment of the consensus sequences in ClpA, Lon protease, and several other ATPases from E. coli is shown in Fig. 5. There are no absolute conservations outside of the core consensus ((Gly/Ala)X,-Gly-Lys(Thr/ Ser)-space-@,(Asp/Glu)), although groups of proteins have identical amino acids in certain positions. All of the proteins have 2-3 hydrophobic amino acids within the 4 amino acids immediately preceding the first glycine/alanine in part A, and most of them have hydrophobic amino acids at positions 2-3, 5-6, and 8 following the threonine. It is worth noting that the basic amino acid often included in the consensus 5-8 amino acids before the first glycine (23) is not found in several of the proteins (and therefore may not be a necessary feature of such a site).
More extensive analyses of the sequences of ClpA and Lon have revealed few similarities that might reflect the common enzymatic properties of the two proteases. The spacings between part A and part B are similar in both domains of ClpA and in Lon protease, but this is not a unique feature of the proteases, since the spacing is similar also in DnaA, RecA, and NtrA. The region between the Gly-Lys-Thr and the @4-Asp in Lon is very basic (as it is in RecD, UvrD, and helicase) but is neutral in both domains of ClpA. Sequence alignments between ClpA and the other proteins in Fig. 5 were calculated by either BESTFIT or by SEQHP (42) using 140-160-amino acid long regions centered about the ATP-binding consensus sequences. The region in domain 2 of ClpA shows a better quality alignment with Lon than with other ATPases and aligns better with Lon than does the corresponding region of domain 1. Domain 2 of ClpA would thus appear to be evolutionarily more closely, albeit still quite distantly, related to Lon protease.
Secondary structure predictions for the regions around the ATP-binding consensus sequences in both domains of ClpA, RblA, a highly conserved homolog of ClpA found in Rhodopseudomonas blastica (43), Lon protease, and the p-subunit of Fl ATPase are shown in Fig. 6. The ATP-binding consensus sequences would be expected to be found in structures known to form the elements of a nucleotide-binding pocket or Rossman fold (44), the essential elements of which are p-sheet-Gly-Lys-Thr-loop-a-helix in part A, and a-helix-loop-+,-Psheet in part B. Domain 1 of ClpA and RblA conform reasonably well to the equivalent segments of the P-subunit of Fl ATPase (Fig. 6). Domain 2 shows some significant differences compared with domain 1 and appears to resemble more closely Lon protease in predicted structure, particularly around part B, where the predicted +&-sheet is very short and is followed by an a-helix terminating in a strong turn. The positions of predicted turns are quite similar in Lon and domain 2 of ClpA, which partially reflects the locations of proline residues in the primary sequence. The prolines in this region of ClpA The PEPPLOT program (42), which uses the algorithm of Chou and Fasman to calculate a-helix (m) and @sheet (0) propensities, was used to predict possible secondary structures in ClpA, RblA, a conserved homolog of ClpA from R. blastica (43), Lon protease, and the p-subunit of Fl ATPase from the respective amino acid sequences.
Regions for which cu-helices or p-sheets are equally predicted are shown as lifihtly shaded (Cl) and regions for which no preferred structure was predicted were assumed to be coils (-).
Turns (A) are indicated wherever strong predictions or several weak predictions of turns were made.
These similarities between domain 2 and Lon protease further suggest that these sites have equivalent functions in the respective enzymes.
Domain 1 of ClpA, on the other hand, has a number of features in common with the a-subunit of Fl ATPase, in addition to the structural similarities mentioned above. The positions of prolines are similarly spaced in the primary sequence, and most of the predicted turns are located in comparable locations (data not shown). Domain 1 shows a slightly better quality amino acid alignment with the P-subunit of Fl ATPase than does domain 2. As reported elsewhere,2 beyond the @,-Asp (part B of the consensus sequence) 2 tyrosines known to be located in the ATP-binding pocket of the @-subunit of Fl ATPase are found in the arrangement, Tyr-Xs-Thr-X1,l-Tyr, at positions +80 in domain 1 of ClpA and at +88 in Fl ATPase. The location of these residues in domain 1 suggests that, as with the @-subunit of Fl ATPase, ATP hydrolysis occurs at this site.
A site very similar to part B of the consensus sequence is found at about +115, measuring from the ad-Asp in ClpA, and at +133 in the /%subunit of Fl ATPase. It is followed by an a-helix and a region rich in basic residues. The occurrence of a second part B (Ile-Asp-Val-Ile-Asp in ClpA) was first noted by Craig Squires for the family of ClpA-like proteins.' That this second part B is highly conserved in ClpA-like proteins in other bacteria and in higher organisms implies that it is important for the integrity of ClpA. We have also found it in UvrB, UvrD, DnaB, and RecD, but it is not found in Lon protease, domain 2 of ClpA, the a-subunit of Fl ATPase, RecA, Rep helicase, or NtrA. The more extensive similarities between domain 1 of ClpA and the P-subunit of Fl ATPase serve to underscore the differences between the two domains of ClpA and support the conclusion that they probably have functionally distinct although interdependent roles in the activity of Clp protease.
ClpA-Lad Protein Fusions-Our previous work had demonstrated that clpA, in contrast to ion, is not regulated by the heat shock g factor, h&R (24). In order to study changes in expression of ClpA under different physiological conditions, we constructed an in-frame fusion between the 5'-terminal region of ClpA (up to base pair 121 of Fig. 2) and la&. This fusion encodes the amino-terminal 40 amino acids of ClpA followed by a g-amino acid linker joined to the 9th amino acid of P-galactosidase.
A transcriptional fusion of clpA to 11~2, carrying the same fragment of clpA, was also constructed. Both fusions contain 1000 base pairs in front of the ClpA translation start codon. Both fusions were transferred by homologous recombination to X RS45; these X derivatives were used to construct single copy lysogens of the clpA-lac fusions for the study of clpA synthesis and accumulation. Initial tests suggested a major difference between the expression of the translational fusion, SB84, in clpA' and clpA -hosts. Lysogens of the clpA-host were visibly more Lac+ than those in the clpA + host. The transcriptional fusion, SB85, did not show the same clpA-dependent difference. Such a difference could reflect either ClpA-dependent translational regulation of ClpA synthesis or the specific degradation of the ClpA-LacZ protein fusion by the Clp protease. Protein turnover experiments with the fusion demonstrate that the second possibility is true (Fig. 7), although translational regulation may also contribute to the difference. The ClpA-LacZ fusion ' C. Squires, personal communication. is degraded with a half-life of about 4 min in clpA + hosts but with a half-life greater than 20 min in a clpA -host. lon mutants have no effect on the accumulation of ClpA-LacZ (data not shown). Therefore, this fusion, which carries only the first 40 amino acids of ClpA, is degraded in a Clpdependent fashion. Since anti-b-galactosidase antibody was used to detect the fusion protein in the experiment of Fig. 7, it can be concluded that the entire LacZ portion of the protein is degraded, and no large intermediates accumulate. Instability of ClpA in Viuo-Given the instability of the ClpA-LacZ fusion and in vitro observations on the instability of ClpA activity, ' it seemed possible that ClpA is itself a substrate for Clp-dependent degradation.
The half-life of ClpA protein in uivo was determined by pulse labeling and immunoprecipitation of ClpA followed by gel electrophoresis and autoradiography (see "Experimental Procedures"). In wild-type cells, ClpA was degraded with a tllL of approximately 1 h (Fig. 8A). Although this rate of degradation is not as rapid as some of the regulatory degradation that occurs in E. coli, it is sufficient to remove almost half of the protease from the cell during each generation. Moreover, ClpA was stable in uiuo in a mutant lacking the proteolytic component of Clp protease, ClpP (Fig. 8A). Thus, active Clp protease is required for ClpA degradation in the cell. The half-life of ClpP was also determined by the same method; ClpP is not degraded in vivo (data not shown). In experiments with purified ClpA and ClpP, excess ClpA protein was rapidly degraded in an ATPdependent manner in a reaction that required both active ClpA and active ClpP.' Thus, it is likely that modulation of ClpA levels in the cell is accomplished at least in part by autodegradation of free ClpA subunits.
Accumulation of ClpA in cells was measured by running equivalent amounts of cell extract from cells grown to low, moderate, or high density on SDS-polyacrylamide gels, blotting, and immunochemical detection of ClpA. ClpA amounts/ unit of cells did not appear to vary more than IO-20% during exponential growth (Fig. 8B). Examination of the accumulation of /3-galactosidase in clpA-lac fusions suggests that the synthesis of ClpA increases when the cell density reaches an Asoo = 0.5 (data not shown). Therefore, the combination of any changes in synthesis rate with growth and degradation of ClpA results in a relatively constant amount of ClpA in the cell.
Given the instability of both ClpA and the ClpA-LacZ fusion, it seems reasonable to suggest that the amino terminus of ClpA acts as a recognition region for Clp-dependent degradation. The degradation may be prevented when ClpA is in a proper complex with ClpP; the ClpA-LacZ fusion protein, which is presumably unable to form such a complex, may be rapidly degraded because it is always free.

DISCUSSION
The clpA structural gene and regulatory elements have been sequenced and the start points of transcription for the gene determined. The gene does not seem to be part of a larger operon. This also seems to be true of km, another ATPdependent protease of E. coli. clpP, the gene for the second component of the Clp protease, has been mapped by us3 to min 10 on the E. coli chromosome, far from clpA.
The two consensus nucleotide-binding sites of ClpA resemble the single site in Lon and other ATPases from E. coli; explicit amino acid homologies to Lon do not extend significantly beyond these sites, except for the disposition of proline residues noted. Nevertheless, we would tentatively conclude that domain 2 of ClpA is more likely to have structural and functional similarity to Lon. Although secondary structure predictions based on sequence alone are not entirely reliable, the consistency with which the features found in nucleotidebinding domains are predicted for the ATPases discussed in this paper makes it more likely that the agreements in structure between Lon and domain 2 of ClpA and between @subunit of Fl ATPase and domain 1 of ClpA are significant. What the functional equivalence of domains 1 and 2 to these other proteins means in mechanistic terms is not yet clear.
ClpA possesses a basal ATPase activity and a substrateactivated ATPase activity.' Preliminary results in vitro indicate that it is possible to inhibit protease activity and protease-stimulated ATPase activity without affecting basal ATPase activity. In light of the two putative ATP-binding sites identified by sequencing, it is possible that basal ATPase activity occurs in one domain and protease-stimulated ATPase activity occurs in the other. Alternatively, one domain could contain the catalytic site, and the second domain could have an allosteric ATP-binding site, occupancy of which is required for ATPase activity at the other site. Although both models are equally possible, we favor the former, inasmuch as both Fl ATPase and Lon protease, which appear by sequence analysis to be functionally analogous to domains 1 and 2, respectively, each possesses intrinsic ATPase activity. Experiments are in progress to demonstrate the binding of ATP to both sites in ClpA and, by site-directed mutagenesis, to alter the activity of each of the possible active sites.
The finding that cZpA does resemble another gene in E. coli as well as genes in a variety of prokaryotic and eukaryotic organisms' suggests that the clpA organization may in fact turn out to be a more generally used motif for ATP-dependent proteases than the ion organization.
The cellular function of the Clp protease is not yet clear. The widespread conservation of Clp-like protease genes may suggest that this protease is responsible for a fairly general and central housekeeping function rather than for the degradation of specific substrates that may be unique to specific organisms. It is fairly unusual to find families of duplicated genes in E. coli. The other examples thus far detected include the ribosomal RNA genes and tufA and tufB, essentially identical genes for translation initiation factor (45). ClpA and ClpB differ for the amino-terminal 200 amino acids and in a central domain of 180 amino acids that cZpB but not clpA contains. It seems possible that the diverged amino termini of these proteins reflect different targets. If the sensitivity of the fusion protein containing the first 40 amino acids of ClpA to Clp-dependent degradation is due to recognition of this amino-terminal sequence, Clp interaction with its substrates may depend on amino-terminal sequences. Recognition of the amino-terminal amino acids is one component of the recognition of substrates for the ubiquitin-dependent degradation of substrates in eukaryotic cells (46). Since ClpA retains an amino-terminal methionine, the recognition of Clp substrates must take into account sequences beyond the amino-terminal amino acid.
clpA synthesis does not increase on heat shock as ion synthesis does (24). Instead, the pattern of expression of clpA-la& translational fusions may suggest that accumulation increases when cells are at midlogarthimic growth, when oxygen levels begin to fall (data not shown). The conditions for optimum ClpA synthesis, and possibly for optimum Clp activity, are apparently very different from the heat shock, aerobic conditions that favor the synthesis of Lon and the increased degradation of Lon substrates, and may provide some hint of the role of Clp for E. coli.