Structure and Expression of the alkB Gene of Escherichia coli Related to the Repair of Alkylated DNA*

When the alkB gene of Escherichia coli that controls sensitivity of bacteria to methyl methanesulfonate was placed under the control of the lac regulatory region on a multicopy plasmid, the gene product, AlkB protein, was overproduced. By monitoring the band on sodium dodecyl sulfate-polyacrylamide gel electrophoresis, the protein was purified to near physical homo- geneity. An amino-terminal sequence and total amino acid composition of the purified AlkB protein were in accord with the amino acid sequence deduced from the nucleotide sequence of the alkB gene, determined by the phage M13 dideoxy method. It was concluded that the AlkB protein is comprised of 216 amino acids and has a molecular weight of 23,900. The nucleotide sequence analysis also revealed that the ada and alkB genes are adjacent on the E. coli chromosome and that the first initiation codon for AlkB protein overlaps with the termination codon for Ada protein. We con- structed hybrid plasmids carrying an alkB'-lacZ' fu-sion, with or without the ada control region, and in- vestigated expression of the alkB gene in response to the alkylating agent. We obtained evidence that the ada and alkB amino acid sequence determined by automated Edman degradation. Details regarding these procedures including the equip-ment used were as described (11). Nucleotide Sequence-The DNA sequence of the alkB gene was determined by the dideoxy method A 3.2-kb HindIII fragment containing the alkB gene was isolated from pYN3028 (13) and recloned into phage mWB2341 and mWB2342 replicative form DNA in both orientations. Both were treated with appropriate restriction enzymes to make two sets of deletions. Di- deoxy reactions performed using M13 sequencing kits from Takara Shuzo Co. out (22).

to the ~e n B a n k T M / E~B L Data Bank with accession numberb) The nucleotide sequence(s) reported in this paper has been submitted 502607. 11 To whom correspondence should be addressed Dept. repair. The ada gene codes for a protein that possesses 06m e t h y l g u a n i n e -~~A methyltransferase activity and also acts as a transcriptional regulator for the ada regulon (12)(13)(14)(15).
Although the alkB gene has been cloned and its protein product identified (16), the precise role of this gene in DNA repair has remained obscure. To obtain further insight into the nature of this gene, we purified the AlkB protein and determined the primary structure. We also studied expression of the alkB gene in relation to that of the ada.

EXPERIMENTAL PROCEDURES
Bacterial Strains-All the bacterial strains used in this study are derivatives of E. coli KI2. Strain HK82 (as AB1157 but naL. 4, alkB22) was constructed by transduction of naL.4 and alkB22 to AB1157 with P1 phage (3). Strain CSH26 (ara, A(lac-pro), thi) was used as the host cell for the @-galactosidase assay (17). Strain WB373 ("294 but harboring mini F plasmid, pWB373) was the host cell of mWB2341 and mWB2342, used for DNA sequencing (18).
Plasmids and Phages-Plasmid pUC8 (19), used for protein overproduction, is a temperature-dependent runaway plasmid (13), which was provided by K. Yoshioka of the Research Laboratory for Genetic Information, Kyushu University. Plasmid pMC1403 (201, used for study of a1kB'-lacZ' fused gene, was obtained from H. Shinagawa of the Research Institute for Microbial Disease, Osaka University. Phage mWB2341 and mWB2342, which are derivatives of the M13 cloning vectors (18), were provided by K. Kanda of the Institute of Biological Control, Kyushu University. Plasmid pHK12, which carries the ad@'-aZkB region (161, was also used. Plasmids pYN3028 and pYN3059 were as described (13,15).
Chemicals and Enzymes-MMS and O-nitrophen3:l-P-D-galactopyranoside were purchased from Nakarai Chemicals, Ltd. (Kyoto). IPTG and DNase I are products of Sigma. Restriction enzymes, T4 DNA ligase, E. coli DNA polymerase I large fragment, and bacterial alkaline phosphatase were obtained from Takara Shuzo Co. Ltd. (Kyoto). Exonuclease I11 and nuclease Ea131 were obtained from Bethesda Research Laboratories. The reagents used for amino acid sequence determination were purchased from Beckman Instruments Inc., except for benzene and ethyl acetate, which were from Wako Pure Chemical Co. (Osaka).
Purification of AlkB Protein-E. coli strain HK82 harboring plasmid pCF3 was incubated in 2 liters of LB broth containing 50 pg/ml ampicillin in a rotary shaker at 42 'C. At As60 = 0.8,5 X M IPTG was added and the culture was incubated at 42 "C for 3 h. Cells were harvested, washed with 100 ml of M9S medium (5.8 g of Na2HP04, 3 g of KH,PO,, 5 g of NaC1,l g of NH,Cl/liter of water), and suspended in 40 mi of buffer A (20 mM Tris-HC1, pH 8.5, 5% glycerol, 1 mM EDTA, 1 mM P-mercaptoethanol). The cells were then disrupted by an ultrasonic disintegrater and the lysate was centrifuged at 12,000 x g for 1 h at 4 "C. The supernatant was used as the crude extract (Fraction I, 39 ml). Polymin P (Bethesda Research Laboratories) was diluted with distilled water to give 10% concentration and adjusted to pH 8.5 with HC1. This so1ut.ion was added to Fraction I (final concentration of Polymin P, 0.5%), the mixture was stirred for 30 min at 4 "C and centrifuged at 12,000 x g for 1 h at 4 "C, and the supernatant. was used as Fraction I1 (38 ml). An equal volume of saturated ammonium sulfate in water was added to Fraction I1 to remove Polymin P. After standing at 4 "C for 30 min, the precipitate was suspended in 10 ml of buffer A. This fraction was dialyzed against 2 liters of buffer A for 6 h and centrifuged twice at 12,000 X g for 30 min at 4 "C, and the supernatant was obtained (Fraction 111, 14 ml). Fraction 111 was applied to a column of DEAE-Sephacel (bed volume, 25 ml; Pharmacia P-L Biochemicals) equilibrated with buffer A. After washing the column with 80 ml of buffer A, a linear gradient of NaCl (0-0.15 M) in 200 ml of buffer A was applied. Proteins eluted from the column were monitored by UV monitor and SDS-PAGE (12.5%), and fractions containing the 27,000-dalton protein were pooled (Fraction IV, 13.5 ml). Fraction IV was dialyzed against 2 liters of buffer A for 6 h and centrifuged at 5,000 X g for 30 min at 4 "C, after which the supernatant was applied to a column of phosphocellulose (bed volume, 25 ml; Whatman P11) equilibrated with buffer A. After washing the column with 50 ml of buffer A, 300 ml of a linear gradient of NaCl (0-0.3 M) in buffer A was applied. The AlkB protein was monitored in the same manner as described above, and the fractions were pooled (Fraction V, 28.5 ml). Finally, gel filtration was carried out. Fraction V was concentrated to 5 ml with the use of 40% polyethylene glycol ( M , = 20,000), applied to a column of Bio-Gel P-30 (column size, 2.7 X 100 cm; Bio-Rad), and then eluted with buffer A containing 0.3 M NaCl. The fractions were monitored and pooled as described above (Fraction VI, 13.5 ml). Amino Acid Composition and Amino-terminal Sequence Analyses-The amino acid composition was determined after hydrolysis of the samples with 5.7 N HCl for 24, 48, and 72 h at 110 "C and with 4 N methanesulfonic acid containing 0.2% tryptamine for 24 h at 110 "C. The amino acid sequence was determined by automated Edman degradation. Details regarding these procedures including the equipment used were as described (11).
Determination of Nucleotide Sequence-The DNA sequence of the alkB gene was determined by the dideoxy method (18). A 3.2-kb HindIII fragment containing the alkB gene was isolated from pYN3028 (13) and recloned into phage mWB2341 and mWB2342 replicative form DNA in both orientations. Both were treated with appropriate restriction enzymes to make two sets of deletions. Dideoxy reactions were performed using M13 sequencing kits from Takara Shuzo Co. Ltd. and following the related instruction manual. Gel electrophoresis was based on the method described by Barnes et al. (18).
Other Methods-The assay of &galactosidase was essentially the method of Miller (17). General methods for DNA manipulations were carried out using the standard methods described by Maniatis et al. (21). Gel electrophoresis of protein was essentially the procedure of Laemmli (22).

RESULTS
Overproduction of AlkB Protein-Previously we constructed a hybrid plasmid pHK12 which carries the alkB gene of E.
coli K12 (16). Transposon Tn3 insertion experiments revealed the alkB gene is located in the central region of the cloned DNA fragment, as depicted in Fig. 1. Although the cloned DNA fragment contains a part of t h e ada gene (13), it would n o t be expressed because of lack of the 5"distal region.
To overproduce the AlkB protein, the 2.9-kb SalI/HindIII fragment of pHK12 was recloned into a temperature-dependent runaway plasmid pUC8, which possesses the control region for the lac operon. The resulting plasmid, pCF3, carries the 2.9-kb fragment, in proper orientation with regard to t h e lac promoter (Fig. 1).
When an extract of cells harboring pCF3, which were grown in medium containing IPTG at 42 "C, was analyzed by SDS-PAGE, a distinct band corresponding to a 27,000-dalton protein was detected (Fig. 2, track I). No such protein was formed in cells harboring the vector plasmid pUC8. Since the size of the overproduced protein is equal to that of AlkB protein, previously identified by a maxicell experiment (16), we considered this AlkB protein.
Purification of AlkB Protein-AlkB protein was purified by Polymin P and ammonium sulfate fractionation, followed by DEAE-Sephacel and phosphocellulose column chromatography and gel filtration through Bio-Gel P-30. Since the enzyme activity associated with AlkB protein has not been character- ized, the 27,000-dalton protein band was monitored throughout purification procedures. With this preparation, we measured the following enzyme activities: DNA-dependent ATPase activity (23), glycosylase activity acting on methylated DNA (6,7), DNA-methyltransferase activity (13), protease activity degrading Ada protein (13), and ability to bind DNA containing ada promoter region. In no case did we detect any significant activity (data not shown).

The Amino Acid Composition and Amino-terminal Sequence
of the AlkB Protein-The amino acid composition of t h e purified AlkB protein (Fraction VI) was determined after hydrolysis with 5.7 N HC1 or 4 N methanesulfonic acid. Table  I shows the result of amino acid composition analysis. This protein is composed of about 220 amino acid residues.
The first 30 amino acid residues from the amino terminus were determined using an automatic sequencer and the result is shown in Table 11. The unique amino terminus provides additional evidence for the physical homogeneity of the AlkB protein preparation. Nucleotide Sequence of the alkB Gene-The nucleotide sequence of the alkB gene was determined by the dideoxy method, using M13 phage (18). Fig. 3 (Appendix) shows the strategy for the sequencing and target sites of restriction enzymes. Total length of the DNA fragment used for this study was about 3 kb, and the nucleotide sequence was determined for the left two-thirds of this DNA fragment. This 3kb DNA fragment carried the entire ada gene, which codes for 06-methylguanine-DNA methyltransferase, in the left one-third region of the DNA fragment. Details have been described elsewhere (13).
The nucleotide sequence of the alkB gene and the amino acid sequence deduced are shown in Fig. 4. Only one open reading frame starting from nucleotide position 207 is found, which would code for the protein with a molecular weight of 23,900. Since the amino-terminal sequence and amino acid composition of the purified AlkB protein match well the amino acid sequence of the hypothetical protein predicted from the DNA sequence, the primary structure of this protein shown in Fig. 4 seems to be correct.
There is a nucleotide sequence corresponding to the "Shine-Dalgarno sequence" (from position 197 to 204) and sequences for "-10" and "-35" regions of the promoter (from position 59 to 95). Since the -10 and -35 regions are separated by 25 bases, far from the 17-base distance for the ordinary E. coli promoters (24), the transcription initiation activity of this promoter, if any, would be very low. Palindromic sequences were found at the regions from position 261 to 276, and from 859 to 874.
The ada gene, the structural gene for 06-methylguanine-DNA methyltransferase, was found at just the precedent region of the alkB gene. Interestingly, the first adenine of the initiation codon "ATG" of the alkB gene overlapped the last    5). Thus, there is the possibility that these two genes are cotranscribed, as discussed below.

l v////////////////////////////-
Construction of the a1kB'-lacZ' Fusion-To measure levels of expression of the alkB gene, we constructed the a1kB'-lacZ' fused gene, the product of which can readily be assayed. Fig.  6A shows construction of plasmids carrying the fused gene. Plasmid pYN3059 carries a 1.4-kb of the HindIII/SrnaI fragment derived from the E. coli K12 strain, which contains the entire ada gene and the beginning 150-base pair region of the alkB gene (13). Plasmid pYN3059 was digested with two restriction enzymes, HindIII and SrnaI, and the 1.4-kb fragment excised was purified by electrophoresis on 0.7% agarose gel. The purified fragment was treated with E. coli DNA polymerase I large fragment to convert the HindIII cohesive ends to blunt ones. The blunt-ended 1.4-kb fragment was inserted into the SmaI site of pMC1403, which carries the lacZ'/lacY genes, without its own promoter (20). Since the correct translational reading frame is conserved by this joining, the product of the a1kB'-lacZ' fused gene would be produced, which possesses P-galactosidase activity. The plasmid was designated pCF5. T o remove the promoter region of the ada gene from the plasmid, pCF5 was digested with EcoRI and self-ligated. The resulting plasmid was designated pCF6.
Cells of E. coli strain CSH26 (a&+, Alac) harboring one of these plasmids were incubated with or without 0.01% MMS a t 37 "C, and their @-galactosidase activity was determined. As shown in Fig. 7a, a high level of P-galactosidase activity was found in cells harboring pCF5, regardless of the presence or absence of MMS during the incubation. On the other hand, only a low level of P-galactosidase activity was detected in cells harboring pCF6, in either condition. These results may be explained as follows. Because of a high copy number of the plasmids (more than 20 copies/cell), a large amount of Ada protein is produced, even in the absence of MMS. The overproduced Ada protein would act on the ada promoter to enhance further transcription of the ada gene. Since levels of 8-galactosidase activity in this system are dependent on the presence of the ada promoter, it is likely that the ada and alkB genes are co-transcribed.
Control of Expression of the alkB Gene-To avoid complexity due to the self-induction of uda expression, it was necessary to construct a plasmid carrying a functional ada promoter but a defective ada coding region. For this purpose, plasmid pCF5 was partially digested with SalI, treated with the large fragment of E. coli DNA polymerase I, and then self-ligated. The resulting plasmid, pCF7, carries a 4-nucleotide insertion at the original SalI site, thereby producing a termination codon TAA 87 base pairs downstream from the initiation site (see Fig. 6B). In this plasmid, transcription would occur normally but no functional Ada protein would be formed. Thus, in CSH26 (ada+, Alac) cells harboring pCF7, Ada protein would pYN3059 was digested with HindIII and SmaI, and the 1.4-kb fragment was purified. The cohesive ends of this fragment were converted to blunt ends by the DNA polymerase I large fragment, after which the fragment was inserted into the SmaI site of plasmid pMC1403 to retain the correct reading frame. The orientation was checked by digestion with EcoRI and BamHI. B, organization of pCF5, pCF6, and pCF7. Plasmid pCF5, which carried the 1.4-kb fragment in a proper orientation, was digested with EcoRI, and pCF6 was constructed by self-ligation. To construct pCF7, pCF5 was partially digested with SalI, and the cohesive ends were converted to blunt ends and self-ligated. *, insertion of 4 bases; V, TAA termination codon formed; P:ada, promoter for the ada gene; E, EcoRI; H , HindIII; S, SalI; Sm, SmaI; B, BamHI. be supplied only from the chromosomal ada gene.
When this strain was used, P-galactosidase activity increased, in a methylating agent-dependent manner. The activity increased with duration of time of incubation with harboring pCF5, pCF6, or pCF7 were grown in M9 medium at 37 "C to 5 X 10' cells/ml, and the culture was divided into two portions, each of which was diluted with M9 medium to 1 X 10' cells/ml. One portion was untreated and the other was treated with 0.01% MMS. The cultures were then incubated at 37 "C, and aliquots were removed periodically. Total @-galactosidase activity in the culture was determined by the method of Miller 0.01% MMS, and increase in activity in the absence of MMS was nil (Fig. 7b). These results clearly indicate that expression of the alkB gene as well as of the ada gene is controlled by the ada promoter.

DISCUSSION
Kataoka and Sekiguchi (16) have shown that introduction of a small deletion into the alkB region of the bacterial chromosome results in inactivation of both the alkB and ado genes, thereby suggesting that the two genes are adjacent on the E. coli chromosome. Teo et al. (12) later noted the presence of some open reading frames next to the ada gene and implied that it might be part of the alkB gene. Data presented in this paper support this proposal and provide further evidence that the ada and alkB genes constitute an operon. The nucleotide sequence analyses of the ada and alkB region revealed that the last nucleotide of the termination codon for the uda overlaps the first nucleotide of the initiation codon for alkB. Furthermore, expression of the two genes is controlled by a single promoter, present at the upstream region of the ada gene.
Even though the expression of the two genes is co-regulated, levels of expression of these genes do differ. In cells harboring plasmids that carry the two genes, considerably larger amounts of Ada protein were formed as compared with that of the AlkB protein. Moreover, Nakabeppu et al. (15) showed that the maximum level of p-galactosidase activity produced by the ads'-lacZ' fused gene is about 10,000 units/A,, while the activity level owing to plasmid pCF5 was no more than 1,000 units/As, (Fig. 7). Thus, additional regulation in transcription and/or translation level must take place. It is conceivable that transcription from the ada promoter might terminate partially at the inside of the alkB coding region, probably due to its secondary structure.
Actually, a palindromic sequence is present from nucleotide positions 261 to 276. If such is indeed the case, the total amount of messenger RNA for the alkB gene would be reduced, as compared with that for the ada.
The previous finding that alkB mutants are specifically sensitive to MMS and lack the host cell reactivation capacity for MMS-treated phage (3) strongly suggested that the alkB gene product is involved in some step(s) in the pathway for repair of alkylated DNA. The present finding that the alkB is a part of the uda operon also supports this view. Hence, using the purified AlkB protein, we measured several enzyme activities relevant to the repair of alkylated DNA. However, significant activity has so far not been detected. We also compared turn-on and turn-off patterns of the adaptive response in the mutant and wild type cells, and here too no significant difference was apparent. ' We made a computer search for homology between AlkB protein and other proteins, using Protein Sequence Database of the PIR, Release 6.0 (25,26). There are certain low levels of homology between some parts of the AlkB protein and of proteins belonging to the oxidoreductase superfamily. For instance, there is about a 40% match between the 189th to 217th amino acid sequence of AlkB protein and the 164th to 192th sequence of E. coli fumarate reductase. This raises the possibility that AlkB protein might function to convert alkylated bases to other forms by oxidoreductase reaction.

Structure and Expression
of alkB Gene 15777