Determination of the nucleotide sequence for the exonuclease I structural gene (sbcB) of Escherichia coli K12.

The complete nucleotide sequence of the structural gene for Escherichia coli exonuclease I has been determined. The coding region corresponds to a 465-amino acid protein with molecular weight of 53,174. The partial amino acid sequence of purified exonuclease I agrees with that predicted by the DNA sequence. Two putative weak promoters have been localized by S1 nuclease analysis. The sbcB coding sequence contains many non-optimal codons, characteristic of many poorly expressed E. coli genes.

It acts processively in a 3' to 5' direction, releasing mononucleotides (1). Two classes of exonuclease I-deficient mutants have been phenotypically characterized. sbcB mutations are able to indirectly suppress the deficiencies in both genetic recombination and DNA repair associated with recB and recC (exonuclease V) deficiency (2). The xonA mutation, however, is only able to suppress the sensitivity to DNA damaging agents, with the cells remaining recombinationally deficient (3). While the biochemical nature of the suppression is still poorly understood, it has been proposed that the loss of exonuclease I activity in recB and recC deficient strains allows an intermediate which is normally degraded to be utilized by the RecF recombinational pathway (2).
In order to facilitate the purification and characterization of the exonuclease I protein, the sbcB gene was originally cloned on a 17-kb' Hind111 fragment (4) and subsequently obtained on a smaller 7.6-kb EcoRI to BamHI fragment (5). Using this plasmid a new purification for exonuclease I was developed and the enzyme was physically characterized (6). Exonuclease I has a molecular weight of 55,000 and is active as a monomer (6). It has also been estimated that there are only 40-60 copies of exonuclease I protein per cell.
In this report the complete nucleotide sequence of the exonuclease I structural gene is presented along with the transcription start site as determined by SI nuclease analysis.
Of particular interest are the high abundance of rare codons in the coding sequence along with a relatively poor promoter.

* This work was supported in part by United States Army Research
Fellowship D-AAG29-83-G-0111 (to G. J. P.) and National Institutes of Health Grant GM27997 (to S. R. K.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s)
The validity of the open reading frame was confirmed by amino acid sequencing of the first 12 amino acids. The coding sequence contains only a limited numer of hydrophobic regions.

EXPERIMENTAL PROCEDURES
Bacterial Strains and Plosmids"SK4258 is C600 transformed with the runaway replication plasmid pDPK13 (5). pDPK2O is a pBR322 derivative plasmid containing the wild type sbcB gene (5). The M13 vectors mp18 and mp19 (7) were used for construction of all templates, and JM103 (8) was used as the host strain.
Materials-All enzymes were purchased from Boehringer Mannheim or New England Biolabs and used as specified by the manufacturers. Radiochemicals ( [ L Y -~~S I~A T P and [T-~'P]ATP) were obtained from New England Nuclear. Agarose was obtained from FMC Corp.; acrylamide, bisacrylamide, ultra-pure urea, and ammonium persulfate from Bio-Rad; TEMED from Eastman-Kodak; antibiotics from Sigma; deoxynucleotides and dideoxynucleotides, 17-mer synthetic primer, and probe-primer from P-L Biochemicals.
DNA Sequencing-The Sanger dideoxy chain termination method (9) was used to determine the DNA sequence, using the modifications recommended by Biggin et al. (10) for the use of [ L Y -~~S I~A T P . Templates were generated by a combination of forced subcloning into the M13 vectors mp18 and mp19 and the exonuclease 111 deletion procedure of Henikoff (11).
SI Nuclease Mapping-A modification of the Berk and Sharp (12) procedure for S1 nuclease protection was used to determine the site of transcription initiation. RNA was extracted from SK4258 by the phenol method, employing modifications described by Markham et al. (13). End labeling of the 563-bp ClaI-EcoRI DNA fragment by T4 polynucleotide kinase and isolation of the radiolabeled fragment was as described by Maniatis et al. (14). 200 pg of RNA were hybridized with 80,000 cpm of labeled DNA fragment for 16 h at 52 "C, in a solution of 80% formamide, 100 mM Pipes, pH 6.8, 400 mM NaC1, and 10 mM Na'EDTA. The hybridization mixture was diluted into a 2 0 0 4 volume of S1 digestion buffer (100 mM sodium acetate, 300 mM NaCI, and 1 mM ZnSOJ, and the RNA-DNA hybrids were digested with 500 units/ml of S1 nuclease (BM) for 30 min at 37 "C. Control experiments were done by incubating the identical 5'-endlabeled DNA fragment with 200 pg of tRNA under equivalent conditions prior to S1 digestion. No nonspecific protection was detected. Following S1 digestion, the RNA-DNA hybrids were phenol-extracted, precipitated with 2 volumes of absolute ethanol, and suspended in 20 p1 of loading buffer (15) before electrophoresis on an 8% polyacrylamide/urea gel. Maxam and Gilbert (15) sequencing reactions were performed on the same 563-bp DNA fragment and run parallel to the S1 protection experiment. A correction factor of 1 to 1.5 bases (16) was used in determining the base at which transcription initiated.
Nucleotide Sequence Analysis-The computer program DNASEQ (17), developed in the Department of Genetics, University of Georgia, was used for editing, translation, and homology searches. The Intelligenetics system was used for the hydropathy plot and for additional homology searches.
Amino Acid Analysis-Exonuclease I protein was purified as previously described (6). 100 picomoles of the protein was analyzed on an Applied Biosystems Amino Acid Sequencer.
Other Methods-Strand-specific hybridization probes were made from M13 mp18 and mp19 derivative templates containing the 1.2kb EcoRI-PuuII fragment (Fig. 1). Primer extension was carried out 455 as outlined by Hu and Messing (18). RNA dot blots were done as the 1.2-H~ EcoRI-SmaI segment into the M13 vectors mp18 previously described (19). and mp19 (7). The probes were radiolabeled by the primer

RESULTS
Nucleotide Sequence of sbcB-We have previously reported the localization of the sbcB coding region (5), as shown in Fig.  1. The direction of transcription of sbcB was determined by hybridizing two strand-specific probes to total cellular RNA (data not shown). The probes were generated by subcloning  %I ATCGATCCGCGACTCCGACTAGAGATAACCCGTCATCAGCTTTGTCAGGCTGGCGGGATTGCGCTGTTGATGCTCATTACCGCCCTGAGGATCTAGACCGGTGGTGTAATTAATGATCAA AACAGCAAACCCTCAGGAGmCiUATAGCTGTTCmTTACGGAAATACCTTATGAACTGGCTGGAATAAGTGCAAG~TGTACCCTCTCATTTTTATCTGACATGATCTATTGCCA -100    4. Determination of the in vitro transcription start sites by S1 nuclease protection analysis. The 563-bp EcoRI-ClaI fragment, 5"end-labeled at the EcoRI site, was hybridized to 200 wg of total cellular RNA form SK4258, an exonuclease I overproducing strain (6). The hybridization conditions, S1 nuclease treatment, and electrophoresis were performed as described under "Experimental Procedures." The SI-protected fragments were run in parallel with Maxam and Gilbert sequencing reactions (15) of the same ClaI-EcoRI fragment. The arrow indicates two potential transcription initiation sites. extension procedure described under "Experimental Procedures."

C T C G C T G C C A A A T T G T G G C G C T A A A G~A G C A C G G T G A T A T T T~~G G C A G A C A G C A G A A A T A A C G G A T T T M C C T A A T G ATG AAT GAC GGT AAG
The DNA sequence of 1968 nucleotides was determined from the M13 clones shown in Fig. 1. Sets of nested deletions were produced by exonuclease I11 digestion of the templates employing the method of Henikoff (11). Using this procedure, greater than 95% of the gene was sequenced in both directions. Fig. 2 shows a potential ribosome binding site (20) located 9 nucleotides preceding an ATG codon which begins an open reading frame of 465 codons. In addition, a potential promoter sequence was also identified by visual inspection. No other putative regulatory sequences, e.g. regions of dyad symmetry, were fcund upstream of the promoter region, consistent with the notion that sbcB is constitutively expressed. An additional feature of the sequence is a potential stem-loop struct.ure which is found between positions 1428 and 1449 following the open reading frame. This structure is potentially a rho-independent termination signal (21).
Amino Acid Sequence of sbcB- Fig. 2 also shows the predicted amino acid sequence for a translational open reading frame of 1398 nucleotides. This open reading frame, which begins with an ATG initiation codon at position 25 and terminates with a TAA at position 1421, corresponds to a 465-amino acid protein with a molecular weight of 53,174. The predicted molecular weight is in close agreement with the experimentally determined value of 55,000 for the native protein (6) and 53,700 for the denatured polypeptide (5). The predicted amino acid composition of exonuclease I is shown in Table I.
To determine if the predicted amino acid sequence corresponded to that of the purified exonuclease I protein, the first 12 amino acid residues were determined as described under "Experimental Procedures." These residues, as underlined in Fig. 2, corresponded to the predicted amino acid sequence.
It is interesting to note that there are two ATG codons which initiate the open reading frame. Amino acid sequence analysis of the protein revealed that methionine is the aminoterminal amino acid. Presumably the first methionine is removed from the mature protein.
A hydropathy plot (22) of the exonuclease I sequence is shown in Fig. 3 and reveals several hydrophilic regions, consistent with its role as a nucleic acid specific enzyme.
Mapping the Transcriptional Start Site and Promoter Characterization-In order to determine the start of transcription of the sbcB gene, S1 nuclease protection experiments were done. Total cellular RNA was prepared from SK4258, a strain of E. coli which contains the cloned sbcB gene on a runaway replication plasmid (2). Previous measurements had shown that exonuclease I activity was amplified up to 400-fold in this strain. Total RNA was hybridized to a 563-bp Clal-EcoRI fragment, 5' terminally labeled with [-y-:'ZP]ATP at the EcoRI site (Fig. 1). Fig. 4 shows the results of a t-ypical S1 protection experiment run parallel to a Maxam and Gilbert sequencing ladder. After consideration of the 1-1.5-bp correction factor necessary when comprising a DNA fragment with a Maxam and Gilbert seqdencing reaction (16), two potential sites of transcriptional initiation were identified. The same two protected DNA fragments were also found in experiments using increased concentrations of S1 nuclease. Two putative -10 promoter regions upstream from these transcriptional start  sites can be found. These sites, designated " A and "B", are shown in Fig. 2. Both -10 regions appear to share a common -35 region (Fig. 2); however, the spacing between -10 region "A" and -35 is 17 bases, a distance that is highly conserved among prokaryotic promoters (23).
It has been estimated from exonuclease I purification data that there are 40 to 60 molecules/cell of this protein in logarithmically growing cultures. One explanation for the poor expression of this gene is provided by an examination of the proposed promoter region. Using the method of Mulligan et al. (24) for comparing promoter strengths, a homology score of 44.4 using -10 for region "A" and 34.9 for -10 region "B" was obtained. Such homology scores are considered indicative of a weak promoter. These values compare with score of 38.2 for lacl and 51.5 for lexA, two genes known to encode regulatory proteins present in low cellular abundance.
Codon Usage-A correlation has been observed between codon usage biases in E. coli and low cellular abundance, particularly for regulatory proteins (25). Table I1 shows the codon usage for the sbcB structural gene. Konigsberg   E. coli genes. The percent synonymous codon usage for these genes is also shown in Table 11. For 18 of the 23 codons designated as infrequently used, the sbcB gene has a higher than expected frequency of occurrence. In addition, it is known that these 23 rare codons occur at a higher frequency in the 2 out-of-frame sites than in the reading frame for highly expressed proteins, while for poorly expressed genes their occurrence is nearly equal across the three reading frames (25). Table I11 shows the distribution of the 23 rare codons in sbcB is very similar to that of the poorly expressed regulatory proteins.
Homology Comparisons-A comparison of the amino acid sequence of exonuclease I with that of the lambda exonuclease and the T7 gene 6 exonuclease revealed no significant homology. In addition, no significant homology was seen with E. coli DNA polymerase I, which possesses both 3' to 5' and 5' to 3' exonucleolytic activities (26). DISCUSSION We have reported the complete nucleotide sequence of the E. coli sbcB gene, encoding the enzyme exonuclease I. Characterization of the 5"flanking sequences indicate that sbcB is constitutively expressed from two possible inefficient promoters. In addition, similar codon usage biases with those determined for low abundance regulatory proteins have been observed. Since the ribosome binding site appears more than adequate, it is not clear at this time if the low abundance of the exonuclease I protein is the result of either poor transcription or translation or some combination of both factors. However, placing the sbcB structural gene under the control of a T7 promoter only resulted in a %fold increase in activity* over that previously reported (6).