Sequence analysis, expression, and conservation of Escherichia coli uracil DNA glycosylase and its gene (ung).

The complete nucleotide sequence of the Escherichia coli ung gene is described. Transcription initiation and termination sites were determined by S1 nuclease and RNase mapping. The common prokaryotic -35, -10, and the ribosome binding site sequences are represented by TGTTCTGTA, TAAGCTA, and AGGAGAG at their respective locations. A putative hairpin transcription terminator structure is present at the major transcription terminator sites. The open reading frame of the ung gene codes for a protein of 229 amino acids (25,664 daltons). The molecular weight, amino acid composition, and the N-terminal amino acid sequence of the uracil DNA glycosylase purified from E. coli cells match with the open reading frame of the ung gene. The protein sequence analysis shows that the N-terminal methionine is cleaved off in the mature protein. The in vitro transcription coupled translation of the ung gene directs the synthesis of a protein which comigrates with uracil DNA glycosylase. Also, the CNBr cleavage of the protein synthesized in vitro confirms the positions of the methionines deduced from the DNA sequence. The levels of ung gene expression remain constant up to the early stationary phase, but decline in the late stationary phase of the E. coli culture. The E. coli gene showed a strong sequence homology to Shigella, a weak sequence homology to Salmonella and Citrobacter, and a very weak sequence homology to Proteus genes. No sequence homologies were seen for Pseudomonas, Clostridium, Micrococcus, and several eukaryotic genomes.

DNA glycosylases excise damaged or unconventional bases from DNA and initiate the DNA repair pathway. DNA glycosylases have been identified and purified from prokaryotic and eukaryotic sources (1, 2). These enzymes are present in all organisms examined so far with the possible exception of Drosophila where a search for several DNA glycosylases has failed (3,4). Recently, however, at least two DNA glycosylases which excise oxidized thymine (5) and uracil' residues have been identified in Drosophila.
Uracil DNA glycosylases are found to be the most abundant of all the glycosylases in the cell. Both a nuclear and an organeller uracil DNA glycosylase are found in eukaryotic * This work was supported by the Medical Research Council of Canada and the Alberta Heritage Foundation for Medical Research. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) 503725.
A. R. Morgan, personal communication. cells (1, 2). Uracil DNA glycosylase excises uracil residues from the DNA which can arise as a result of either misincorporation of dUMP residues by DNA polymerase or due to deamination of cytosine. None of the uracil DNA glycosylases studied so far require metal ions for their activity and both single-stranded and double-stranded DNAs containing uracil are used as substrates. However, the enzyme from most sources has higher activity with single-stranded substrates. The tetramer (dU), is the shortest substrate for the bacterial enzymes. The uracil DNA glycosylases are product inhibited by free uracil and a limited number of its derivatives, for example, 6-aminouracil, 5-azauracil, and 5-fluorouracil. The uracil DNA glycosylases have been shown to excise some of the analogues that are effective as inhibitors if incorporated into DNA. Another category of inhibitors of uracil DNA glycosylases is represented by Bacillus subtilis phage, PBSZ, and Escherichia coli phage T5-induced proteins (6, 7). E. coli uracil DNA glycosylase has been purified to homogeneity (8). The enzyme is a single polypeptide monomer of about 25 kDa molecular mass and contains a single residue of cysteine which is not involved in the catalytic activity of the enzyme (2). The uracil DNA glycosylase (ung) gene of E. coli has also been cloned and overexpressed in E. coli (9). However, neither the amino acid sequence of the enzyme nor the nucleotide sequence of the ung gene has been reported.
In order to study the mechanism of action of uracil DNA glycosylase, the complete nucleotide sequence of the E. coli ung gene was determined and is presented in this paper. The open reading frame of the gene, confirmed by N-terminal protein sequence analysis, codes for a protein of 25,664 Da which contains a single cysteine residue towards the C terminus. Furthermore, the structure, expression, and some aspects of the ung gene conservation among other organisms are also reported.

Purification, Amino Acid Analysis, and N-terminal Sequence Analysis of Uracil D N A Clycosytose
Uracil DNA glycosylase was purified to homogeneity from E. coli BD 438 exactly as described previously (8). The BD 438 strain of E. coli harbors the cloned uracil DNA glycosylase (ung) gene of E. coli (9) on a plasmid (pBD15) and was kindly provided to us by Dr. B. K.
Duncan of Duncan Laboratories, Philadelphia, PA. Fluorescence loss assays' (slightly modified from Ref. 10) were performed to follow the activity of uracil DNA glycosylase through various purification steps, whereas [3H]uracil release assays (8) were performed for the final purification analysis. Amino acid analysis of the pure protein was carried out on an automated Beckman 6300 amino acid analyzer following its hydrolysis in 6 N HC1 at 110 "C for 24 h in evacuated, sealed tubes. Cysteine and tryptophan were not analyzed. Sequence analysis was carried out by an automated sequence analyzer (Applied Biosystems 477A).

7776
Preparation of DNA Genomic-Cultures of E. coli (wild type, HB101, JM109), Proteus vulgaris, Shigella sonnenei, Citrobacter freundii, Salmonella typhimurium, and Pseudomonas aeruginosa were grown in LB broth (11). Cells from log phase cultures (30 ml) were harvested by centrifugation and treated with 4 mg of lysozyme in 4 ml of 50 mM Tris.HC1 (pH 7.8), 20 mM Na'EDTA, 8% sucrose, 0.5% Triton X-100 (STET buffer) for 5 min at room temperature. The cells were lysed by the addition of 1% SDS' and digested with proteinase K (50 pg/ml) at 60 "C for 2 h. The DNA was precipitated with 1 volume of ice-chilled isopropyl alcohol in the presence of 0.3 M sodium acetate, spooled out, washed with 70% ethanol, resuspended in 4 ml of 50 mM Tris.HC1 (pH 7.8), 5 mM Na,EDTA, and digested with RNase A (50 pg/ml) at 37 "C for one-half h. The DNA was further purified by phenol/chloroform extractions, recovered by ethanol precipitations (ll), and resuspended in 1 ml of 10 mM Tris.HC1, 1 mM Na,EDTA (pH 7.8).
Genomic DNAs of Clostridium perfringens and Micrococcus lysodeikticus were purchased from Sigma and further purified by phenol/ chloroform extractions and ethanol precipitations. The DNA was resuspended in 10 mM Tris.HC1, 1 mM Na'EDTA (pH 7.8) at a concentration of approximately 1 mg/ml.
Plusmid DNA-Plasmid DNAs were prepared from overnight cultures of the plasmid harboring E. coli cells using the boiling method of lysis (12) and further purified by CsC1-EtBr density gradient centrifugation (13).
Subcloning and DNA Sequence Analysis A 1.4-kb HpaI fragment (size revised to 1.5 kb following sequence analysis) containing the ung gene (9) was isolated and subcloned into the SmI site of a multifunctional vector pTZ19R (Pharmacia LKB Biotechnology Inc.) in clockwise and anticlockwise orientations, pTZUng2 and pTZUng4, respectively. Further constructs used in DNA sequence analysis and in vitro transcription coupled translation studies are detailed in Fig. 5. Standard recombinant DNA techniques were used throughout the whole procedure (11).
DNA sequence was obtained from both strands of DNA by a modified enzymatic chain termination method (14) and chemical degradation method (15).

Synthesis of RNA Probes
RNA probes were prepared by carrying out in vitro transcriptions in the presence of [cY-~'P]UTP. The vector used in these studies, pTZ19R, contains a T7 phage promoter. The transcription reactions (16) with some modifications were performed as follows. About 1 pg of the linearized plasmid DNA was incubated in a 25-pl reaction consisting of 40 mM Tris.HC1 (pH 7.5), 6 mM MgC12, 2 mM spermidine, 10 mM dithiothreitol, 20 units of RNA guard (Pharmacia), 500 p~ each of ATP, GTP, and CTP, 100 p~ of UTP, 50 pCi of [cY-~'P] UTP (Amersham Gorp., specific activity approximately 400 Ci/ mmol), and 7 units of T7 RNA polymerase (Pharmacia) at 38 "C for 1 h. The DNA template was digested by incubating with 10 units of RNase-free DNase I (Pharmacia) at 37 "C for 10 min. The reaction was then treated with proteinase K (50 pg/ml) in the presence of 0.5% SDS at 45 "C for 30 min, extracted once with phenol, and fractionated on a G-75 column to separate the unincorporated nucleotide triphosphates. The RNA probe was recovered by ethanol precipitation in the presence of carrier yeast total RNA (50 pglml). The specific activity of the probe was approximately 5 X 10' cpmlpg.

Southern Blot Analysis
Genomic DNAs (0.5 pg each) were cleaved with restriction endonucleases according to the suppliers recommendations and electrophoresed on a horizontal and submerged 0.9% agarose gel using TAE electrophoresis buffer (40 mM Tris.HC1, 2 mM acetic acid, 2 mM Na'EDTA, 1 pg/ml EtBr, pH 8.1). The DNA was transferred onto a nylon membrane (Zeta probe, Bio-Rad) using 0.4 M NaOH as the transfer buffer (17).
Hybridizations were performed essentially as described previously (18) except that the 10 X Denhardt's solution was replaced with 1% Carnation low fat dairy milk powder during prehybridization and by 0.5% during hybridization and the first post-hybridization wash. The SDS concentration was increased to 0.5% throughout and the hybrid-' The abbreviations used are: SDS-PAGE, sodium dodecyl sulfatepolyacrylamide gel electrophoresis; kb, kilobase pair; PIPES, 1,4piperazinediethanesulfonic acid.
ization and the washings were carried out at 68 "C in the absence of any formamide. The probe for hybridization was derived by in vitro transcription of DraI linearized pTZUng4S (Fig. 5).
Preparation of RNA RNA was prepared from 10-ml cultures at different times during the growth cycle as total nucleic acids. Cultures were chilled in ice and centrifuged at 5000 X g for 5 min. The pellet was resuspended in 0.1 ml of the STET buffer and quickly lysed by the addition of 0.4 ml of 7 M urea, 2% SDS, 10 mM Tris.HC1 (pH 7.5), 0.15 M NaCl, and 1 mM Na,EDTA at 60 "C. The total nucleic acids were then vigorously extracted three times in H,O-saturated phenol at 60 'C, followed by two to three extractions with chloroform/isoamyl alcohol (241). The nucleic acids were ethanol precipitated, washed with 70% ethanol, and resuspended in 0.5 ml of H'O.

SI Nuclease Mapping
Transcription Initiation Site-Plasmid pTZUng4B was digested with BamHI and 5'-end labeled with T4 polynucleotide kinase (19) and [-y-3ZP]ATP (Du Pont-New England Nuclear, specific activity approximately 7000 Ci/mmol). Secondary digestion with DraI yielded a 0.38-kp fragment (DraI-BamHI, see Fig. 2 A ) , which was eluted from a 5% polyacrylamide gel by the diffusion method (11). This fragment, 5'-end labeled on the antisense strand, contained the 5'flanking region, promoter region, and part of the coding region sequences. The labeled fragment was ethanol precipitated with carrier yeast total RNA and used as a probe for S1 nuclease mapping to determine the transcription initiation site (20). Aliquots (250,000 cpm) of this probe were lyophilized with 50 pg each of the total nucleic acids extracted from log phase cultures of E. coli HBlOl (wild type for the ung gene) and E. coli JM109 harboring the plasmid pTZUng2.
The nucleic acids were resuspended in 25 p1 of hybridization buffer (80% formamide, 400 mM NaC1,40 mM PIPES, 1 mM Na2EDTA, pH 6.5), overlaid with mineral oil, denatured at 85 "C for 5 min, and hybridized at 49 "C for about 16 h. At the end of hybridization, the reaction was diluted with 400 pl of the ice-chilled S1 nuclease buffer (300 mM NaC1,30 mM sodium acetate, pH 4.6,l mM ZnSOI). Aliquots (200 pl) were digested with 250 or 500 units of S1 nuclease (Pharmacia) at 37 "C for 45 min. The reaction was stopped by phenol/ chloroform/isoamyl alcohol (25:24:1, v/v) extraction and the nucleic acids were recovered by precipitation with 2.5 volumes of ethanol in the presence of carrier yeast total RNA (50 pg/ml) and resuspended in 10 p1 of sequencing dye. Aliquots (5 pl) were analyzed on 6% polyacrylamide, 8 M urea gels. Chemical sequencing reactions (A + G and C + T) of the probe were used as markers (15).
Transcription Termination Site-Plasmid pTZUng2B was digested with BamHI and 3'-end labeled by using T4 DNA polymerase (Pharmacia) and [ c Y -~' P ]~C T P (Amersham, specific activity approximately 3000 Ci/mmol). Secondary digestion with EcoRI resulted in a 0.75kb fragment which was 3'-end labeled on the antisense DNA strand and contained most of the coding region and 3'-flanking region sequences. This fragment was purified from a polyacrylamide gel and used for S1 nuclease mapping as described above except that the hybridization was performed at 56 "C. Markers were obtained by 3'end labeling of the MspI-digested pBR322 using T4 DNA polymerase and [w3'P]dCTP.

RNase Mapping
An antisense RNA probe for the RNase protection experiments was derived by in vitro transcriptions of pTZUng4, linearized with EcoRI. About 250,000 cpm of the probe were lyophilized with 25 pg of total nucleic acids, resuspended in 15 pl of hybridization buffer, and overlaid with mineral oil as above. The nucleic acids were hybridized at 56 "C for 16 h without any prior denaturation step. After hybridization, the reaction was diluted with 200 p1 of 30 mM Tris. HCl (pH 8.0), 375 mM NaCl, 2 mM Na'EDTA and digested with 25 pg of RNase A and 25 units of RNase T1 at 34 "C for 1 h. The contents were then digested with proteinase K (50 pg/ml) for 30 min at 45 "C in the presence of 0.5% SDS, extracted with phenol, and precipitated with 2.5 volumes of ethanol. Nucleic acids were resuspended in 10 pl of an 80% formamide dye mixture and 5-pl aliquots were analyzed on 6% polyacrylamide, 8 M urea gels. The MspI-or XhoII-digested pBR322 was 3'-end labeled using T4 DNA polymerase and [a-32P]dCTP and used as size markers.

In Vitro Transcription Coupled Translations
The S-30 extracts of E. coli (21,22) were purchased from Amersham. Reactions (5 p l ) containing 0.5 pg of circular or linearized plasmid DNA were carried out in the presence of [:''S]methionine according to the supplier's recommendations. The reaction was then diluted to 25 pl with H 2 0 and 2-pl aliquots were analyzed on 15% SDS-polyacrylamide gels (SDS-PAGE; Ref. 23). Proteins were detected by autoradiography with or without fluorography (24).
For CNBr digestion, a 25-pl reaction, containing 2.5 pg of circular plasmid pTZUng2 was carried out in the presence of 60 pCi of ['HI Leu. At the end, the reaction was passed through a 0.5-ml hydroxylapatite (Bio-Rad) column, using 12 mM Pi, 200 mM KCI, 1 mM dithiothreitol (pH 7.4) buffer. The flow through was mixed with 50 pg of carrier bovine serum albumin and dialyzed against 10 mM Tris. HCI (pH 7.5), 60 mM NaCI, 1 mM Na2EDTA. Aliquots containing about 5 pg of bovine serum albumin were lyophilized and resuspended in 50 pl of 70% formic acid containing 20 mM CNBr (25). The reaction was incubated a t room temperature for 24 h. The contents were recovered by freeze-drying and analyzed on a 17% SDS-PAGE (23). The gels were fluorographed by using Amplify (Amersham) and autoradiographed using Kodak X-AR films at -70 "C.

Purification of Uracil DNA Glycosylase
Uracil DNA glycosylase was purified (approximately 2500fold) to homogeneity from an overproducing strain (BD438) The amino acid analysis of these two bands following their electroelution from the gel was found to be identical (data not shown). It is not clear at present why the purified protein occasionally runs as a doublet.
Structure and Expression of E. coli ung Gene DNA Sequence Analysis-The restriction map of the HpaI fragment containing the E. coli ung gene (9) is presented in Fig. 2A. The DNA sequence was determined from both strands as described under "Materials and Methods" and is shown in Fig. 2B. Only one open reading frame was found which could code for a polypeptide of the molecular weight corresponding to that of the purified enzyme. The reading frame starts with an ATG (Met) at position 533 and ends at position 1,219, which is then followed by a termination codon TAA for a total of 229 amino acids giving a protein of the expected molecular weight of 25,664. The initiation codon is preceded by a polypurine sequence 5'-AGGAGAG-3', starting at position 522 and ending at 528; this sequence, known as Shine-Dalgarno sequence is a common feature of most prokaryotic genes (27) and represents the ribosome binding site in mRNA. Sequences homologous to the -10 and -35 prokaryotic promoter regions (28) are also seen in the ung gene and are shown in  observations strongly suggest that the putative stem and loop structure may represent the transcriptional terminator.
If these analyses based on the DNA sequence are correct, S1 nuclease protection mapping must locate the transcription initiation site upstream but near to the ribosome binding site; also the transcription terminator site should be very near to the putative stem and loop structure found in the 3' region of the sequence.
Transcription hitiation Site-The DraI-BamHI DNA fragment ( Fig. 2 A ) , 5'-end labeled at the BamHI site, was prepared as described under "Materials and Methods." The end labeled DNA strand represents the antisense sequence between positions 343 and 725 of Fig. 2B. This probe, containing the 5'-flanking and part of the coding region sequences was hybridized to the ung mRNA and digested with S1 nuclease. The length of the protected DNA fragment was determined by its electrophoresis on a 6% polyacrylamide, 8 M urea gel ( Fig. 3A). Sequence markers G + A and C + T obtained by chemical degradation (15) of two separate aliquots of the probe were also run on the same gel. S1 nuclease protections were performed by using total nucleic acids from E. coli HBlOl (wild type for ung) (Fig. 3A, lane 2 ) , E. coli JM109 containing

242' 21 7
FIG. 3. S1 nuclease protections. A, determination of the transcription initiation site. The 0.38 kb long DraI-BarnHI DNA fragment 5'-end labeled at the BarnHI site of the antisense strand was hybridized to the total nucleic acids and digested with 250 units of S1 nuclease. The S1 nuclease protected material was recovered by ethanol precipitation and analyzed on sequencing gels. Lanes A + G and C + T represent sequencing markers from two separate aliquots of the probe. Results of S1 nuclease protection using total nucleic acids from E. coli HB101, wild type for ung (lane 2), E. coli JM109, harboring pTZUng2 (lone 3 ) , and a heterologous source, mouse LMTKcells (lane 4 ) are shown. Lane I is a control where the treatment was the same as in lone 2, except that it was not digested with S1 nuclease. The A marked with an asterisk represents the limit of the protected fragment. This corresponds to nucleotide T a t position 517 in Fig. 2B. B, determination of transcription termination site. A BarnHI to EcoRI fragment (see pTZUng2 Fig. 5) 3'-end labeled at the BamHI site of the antisense strand was used as a probe. Details and descriptions of lanes 1-4 are the same as in A. M represents markers obtained by 3'-end labeling of MspI-digested pBR322 fragments with (a-"'PIdCTP and T4 DNA polymerase. Sizes of the fragments (in nucleotides) are as shown. pTZUng2 (Fig. 3A, lane 3), and mouse LMTK-cells (Fig. 3A,  lane 4 ) . The major protected bands in lanes 2 and 3 (Fig. 3A) comigrate and their lengths correspond to an Ado in the sequence shown in Fig. 3A. This Ado in turn corresponds to the nucleotide Thd at position 517 of Fig. 2B. As expected no protection of the probe was seen in lane 4 (Fig. 3A) where total nucleic acids were used from a heterologous source (mouse LMTK-cells; Ref. 18). The intensity of the protected band in lane 3 is greater than that of lane 2 (Fig. 3A); this is expected because the pTZUng2 replicates as a multicopy plasmid and therefore, the bacteria harboring this plasmid will have higher levels of ung mRNA than the wild type structure. The amino acid sequence of the uracil DNA glycosylase enzyme, deduced from the gene is shown below its nucleotide sequence. A horizontal arrow (position 517) indicates the transcription initiation site, whereas the vertical arrows around nucleotide 1260 show transcription termination sites. bacteria. In another control (lane 1, Fig. 3A) where S1 nuclease was not added, no endogenous degradation of the probe is seen.
These results are in agreement with the predictions from the DNA sequence analysis and further confirm the assignment of the -10 and -35 regions of the ung gene (Fig. 2B).
Determination of the 3' Termini of the ung mRNA-The 3'-end of ung mRNA was also mapped by S1 nuclease protection. The antisense strand of the BamHI-EcoRI fragment was 3'-end labeled at the BamHI site and used as a probe. This probe, containing the coding region, the 3"flanking region sequences, and a few nucleotides of the multiple cloning site of the vector (see Fig. 5, construct pTZUng2) was prepared and hybridized as described under "Materials and Methods." The lengths of the protected fragments represent the 3' termini of the mRNA. Fragments in the major region of protection are about 530-535 nucleotides long (Fig. 3B), placing the 3'-end of the ung mRNA about 530 nucleotides downstream of the BamHI site. This corresponds to the region around position 1260, marked with vertical arrows in Fig. 2B.
Unlike the S1 nuclease mapping for the 5' analysis where a fragment of fixed length represents the major transcriptional initiation site, the region of protection in 3' analysis consists of a number of fragments varying in sizes from approximately 530-535 nucleotides. This suggests that the 3'end of the ung mRNA is somewhat heterogeneous. Also after prolonged autoradiography, additional bands of sizes longer than 600 nucleotides were seen, suggesting some degree of run through transcription.
RNase Protection-From S1 nuclease protection experiments, transcription initiation and termination sites are placed a t positions 517 and around 1260, respectively. This suggests that the total size of the ung mRNA is approximately 0.74 kb. This was confirmed by RNase protection experiments. An antisense RNA probe encompassing the whole sequence ( Fig. 2B) was synthesized using T 7 RNA polymerase by in vitro transcription of EcoRI-linearized pTZUng4. Following its hybridization to the ung mRNA it was digested with a mixture of RNase A and RNase T1. The size of the protected RNA probe, determined on a denaturing 6% polyacrylamide, 8 M urea gel corresponds to about 775 nucleotides, as compared to the DNA markers (Fig. 4). However, since RNA migrates slower than DNA of the same size, the size of the protected RNA probe actually corresponds to about 0.74 kb, as predicted from the S1 nuclease protection experiments. This experiment also demonstrated the relative abundance of the ung mRNA at different stages of cell culture growth (2-11 h). The amount of ung mRNA in the cells remains constant during the log phase and up to the early stationary phase at approximately 9 h (Fig. 4). A decline in the amount of ung mRNA, late in the stationary phase, is likely due to cell death. The minor amounts of ung mRNA detected in an ung-strain of E. coli during its log phase (lane ung-(5 h ) ) correspond to the low levels of uracil DNA glycosylase activity detected in the cellular extracts of this mutant (data not shown). As expected, no protection of the RNA probe was seen when total nucleic acids from a heterogeneous source, mouse LMTK-cell line, were used (Fig. 4, lane LMTK-).

I n Vitro Transcription Coupled Translations
These studies confirm the conclusions made from the sequence analysis of the ung gene and show that the product synthesized in vitro of the ung gene corresponds to the uracil DNA glycosylase purified from E. coli. Various constructs of the plasmids used for these studies are outlined in  and visualized by autoradiography. Fig. 6 shows the results of these analyses. When S-30 extracts are incubated in the absence of any DNA (lanes 7 and 9) only one major polypeptide of approximately 60 kDa is seen. This polypeptide seems to be a result of endogenous labeling (21). In another control, pTZUng4B linearized with restriction endonuclease DraI was incubated with the extract. Even though the linearization of this plasmid with DraI devoids it of any functional gene (Fig.  5), synthesis of several polypeptides of molecular weight lower than 21,500 is seen (lane 6, Fig. 6). This control, therefore, suggests the nonspecific nature of the origin of these bands in the different reactions. If, however, the S-30 extracts are supplemented with circular plasmids containing the ung gene, e.g. pTZUng2 (lanes 1 or 13), pTZUng4 (lane 3 ) , or pBD15 (lane 1 I), synthesis of several new proteins of sizes between the 21.5-and 45-kDa markers is seen. Of these, the band marked with an arrow represents the ung gene product. Since these plasmids contain only one other functional gene, @lactamase (AmpR), the other bands may represent the @lactamase gene products. Moreover, when the plasmids pTZUng2 and pTZUng4 are linearized with DraI which cleaves the @-lactamase gene (Fig. 5) synthesis of the putative @-lactamase gene products is eliminated, while the ung product is still synthesized (lanes 2 and 4 ) albeit at a lower rate (see Ref. 21). In another set of reactions performed to further confirm the identification of these bands, protein synthesis was also directed by a control (circular) plasmid pAT153 (a derivative of pBR322; Ref. 11) which possesses the 8-lactamase gene but is devoid of the ung gene. Purified uracil DNA FIG. 5. Subcloning of the ung gene and various plasmid constructs. The HpaI fragment shown on top was isolated from plasmid pRD15 (9) and subcloned into the SmaI site of pTZ19R in both orientations, clockwise pTZUng2 and anticlockwise pTZUng4. Further constructions were done as follows: pTZUng2B and pTZUng4B were obtained by a RamHI digestion, followed by religation of their parent plasmids pTZUng2 and pTZUng4, respectively. Similarly, pTZUng2S and pTZUng4S were constructed by an SphI digestion, followed by religation of the parent plasmids. Important restriction sites are shown. Blocked arrows indicate the location of different genes: e.g. uracil DNA glycosylase (ung), 8-lactamase (8-lact or Amp'), and &galactosidase Z' peptide Each lane is labeled with the plasmid used to direct protein synthesis using S-30 extracts. Plasmid names, when followed by ( L ) indicate that these plasmids were first linearized with restriction endonuclease DraI and then used for directing protein synthesis. Lanes 1-7, 9-23, 17, and 18 were visualized following autoradiography. The contents of lanes 8-16 were electrophoresed on the same gel and the visualization of lanes 8, 14, 15, and 16 was done by Coomassie Blue staining. Since the translation products could only be visualized by autoradiography, the autoradiograph and the Coomassie Blue-stained gel were carefully aligned with the help of markers on both sides (lanes 8 and 16) to arrive a t a composite picture (lanes 8-16). Description of lanes 14 and 15 is the same as for lanes 1 and 3, respectively,  glycosylase and bovine chymotrypsinogen A were co-electroan arrow) comigrates with the purified uracil DNA glycosylase phoresed with these reactions. The products of pAT153 (lane (lane 1 4 ) and bovine chymotrypsinogen A (lane 15). There-10) comigrate with the putative 8-lactamase gene products of fore, the ung gene directs the synthesis of a protein under in pTZUng2 (lane 13) and, the ung gene product (indicated by uitro conditions which corresponds to the uracil DNA glyco-sylase purified from the E. coli cells.

passing the whole HpaI fragment sequence was prepared by in vitro transcription using EcoRI linearized pTZUng4 and hybridized to total nucleic acids extracted from E. coli HBlOl (wild type for ung) at different times (2.5-11 h) or from an ung-strain of E. coli at 5 h. Hybrids were treated with RNase A and T1 and analyzed on sequence analysis gels. LMTK-shows a control where total nucleic acids from LMTK-cells were used. Lane C(-RNase) is another control where the treatment was the same as in the lane marked 5 h except that
Protein synthesis directed by DraI linearized and circular pTZUng2B (shown in lanes 5 and 12 (Fig. 6)) shows a specific polypeptide of approximately 22 kDa (marked with a solid triangle). It is evident upon close examination that during the construction of this plasmid (Fig. 5), a hybrid gene is created which codes for the N-terminal region (first 25 amino acids) of the &galactosidase Z' peptide and the remainder of the ung protein (amino acid positions 65-229 of Fig. 2B). The size of the protein coded by this hybrid gene, using the pgalactosidase promoter (see Fig. 5) is expected to be approximately 4 kDa shorter than the ung protein, as was observed.

CNBr Digestion of the in Vitro Synthesized ung Gene Product
DNA sequence analysis predicts the presence of 3 methionine residues at positions 1,92, and 221. Cleavage of the uracil DNA glycosylase with CNBr should result in two major polypeptides of approximately 10 and 14 kDa. The in uitro synthesized (in the presence of [3H]Leu) ung gene product from plasmid pTZUng2 was purified on a hydroxylapatite column. All of the in uitro synthesized radioactive proteins remain bound to the column except the 3H-labeled ung (see lane 17, Fig. 6) which is collected in the flow through. Upon digestion of this protein with CNBr, two major peptides, 14 and 10 kDa, are seen (lane 18, Fig. 6) confirming the positions of the methionine residues in the sequence. The 10-kDa peptide is seen to run as a doublet which might relate to the doublet band migration of the ung protein itself (Fig. 1B). A partial digestion is not expected to result in a doublet of 10 kDa.

Amino Acid Composition and N-terminal Sequence Analysis of the Uracil DNA Glycosylase
Amino acid composition of the purified protein was determined by acid hydrolysis of the purified protein (Table I). Also shown are the amino acid compositions of uracil DNA glycosylase determined in an earlier study (8) and the amino acid composition expected from the gene sequence. The values obtained by acid hydrolysis of the protein agree with those determined from the DNA sequence, supporting the assignment of the open reading frame of the ung gene.
The amino acid sequence of 31 N-terminal amino acid residues is shown in Fig. 7. The first amino acid residue was Ala (which is the second amino acid from the DNA sequence analysis, the first being Met). This suggests that in the mature protein Met is cleaved off. The remaining amino acid sequence matches completely with the sequence deduced from the DNA sequence, confirming the start of the open reading frame.

Conseruation of E. coli Uracil DNA Glycosylase Gene Sequence
Uracil DNA glycosylase is an enzyme of wide occurrence among prokaryotic and eukaryotic organisms. Therefore, the sequence relatedness of the E. coli gene to various bacterial and eukaryotic genes was determined. Genomic blots of EcoRI as well as Hind111 digests were hybridized to an RNA probe prepared by in uitro transcription of the DraI linearized pTZUng4S (see Fig. 5). This RNA probe covers most of the coding region and part of the 5'-flanking region. Results of the hybridization of the prokaryotic genomic blots are shown in Fig. 8. The Shigella gene showed a strong homology to the E. coli gene, but the hybridization signals for Citrobacter and Salmonella were weakly homologous. Proteus genome also showed a weak hybridization signal but only after prolonged autoradiographic exposures of 3-4 weeks (data not shown).

DISCUSSION
The complete DNA sequence of the HpaI fragment possessing the E. coli ung gene (9) is reported. The 5' and the 3' boundaries of the ung coded mRNA were determined by S1 nuclease mapping. The antisense DNA strands 5' or 3'-end labeled at the BamHI site were used to locate the transcriptional initiation and termination sites of the ung mRNA at positions 517 and approximately 1260, respectively. The size of the ung mRNA was thus estimated to be approximately 0.74 kb. This was further confirmed by RNase protection experiments where an antisense RNA encompassing the whole sequence of the HpaI fragment was used as a probe. Genomic DNAs (0.5 pg) of each bacteria were digested with EcoRI or HindIII (as shown), electrophoresed on agarose gels, and transferred to a nylon membrane (Zeta probe, Bio-Rad). The genomic blot was hybridized to a radioactive RNA probe derived from DraI linearized pTZUng4S and washed as described under "Materials and Methods." Results obtained by autoradiography are as shown. Each lane is labeled by the bacteria from which the DNA was used. The molecular weight markers (in kb) are from X DNA digested with HindIII.
in the loop. This putative transcriptional terminator is followed by an A + T-rich region. The ends of the mRNA transcripts also map near this structure.
Expression of the ung gene was studied both under in vivo and in vitro conditions. The relative abundance of the ung mRNA, determined by RNase mapping experiments a t various stages of the growth suggest that there are no major changes in the level of ung gene expression up to late log or early stationary phases. Later in the stationary phase, ung mRNA levels decline presumably because of cell death.
When the in vitro synthesis of proteins in an S-30 extract was directed by plasmids pTZUng2 and pTZUng4, which contain the ung gene in opposite orientations (Fig. 5), synthesis of the same ung gene product is seen suggesting that its synthesis was directed by its own promoter. The ung gene product was also synthesized when these plasmids were linearized with restriction endonuclease DraI prior to their incubation in the extract. Since a DraI site is present at position 342 of the sequence shown in Fig. 2, the promoter of the ung gene must be located downstream of this restriction site, which agrees with the results of sequence analysis showing the presence of the -35 and -10 regions between positions 476 and 503 (Fig. 2B). I n vitro transcription coupled translation studies were also carried out to identify its gene product as well as the relative strength of the ung gene promoter. As expected for a protein of minor abundance, the promoter of the ung gene is not a strong promotor. This is evident from the relative levels of 8-lactamase and ung proteins (Fig. 6), where an intact plasmid has been used to direct protein synthesis. In most cases, synthesis of the ung protein is less than that of 8-lactamase, however, in the case of pBD15, synthesis of the ung gene product is more than that of the 8lactamase. This is because the plasmid pBD15 is an overproducing plasmid which contains the ung gene downstream of a X left promotor (9).
The open reading frame of the ung gene and thus the amino acid sequence of the ung protein has been confirmed through several lines of evidence. The molecular weight (25,664) and the amino acid composition of the protein coded by the open reading frame correspond exactly to those determined directly from the uracil DNA glycosylase purified from E. coli ( Fig. 1 and Table I).
The ung gene product synthesized in vitro comigrates with the purified uracil DNA glycosylase and has the same elution characteristics from hydroxylapatite. In addition, the CNBr digestion of the ung gene product gives rise to two major polypeptides of 14 and 10 kDa which corresponds to cleavage at the positions of methionine deduced from the gene sequence (Fig. 6). Finally, the N-terminal amino acid sequence of the first 31 residues of the uracil DNA glycosylase (Fig. 7) completely matches with the open reading frame of the gene. The N-terminal sequence analysis also revealed that the N-terminal methionine is cleaved off in the mature uracil DNA glycosylase, a phenomenon common to many prokaryotic proteins (31).
Relatedness of the E. coli ung gene sequences to other bacterial and eukaryotic DNAs was also determined. Among the bacteria, four Gram (-ve) members of the family Enterobacteriacea, viz. E. coli, C. freundii, P . vulgaris, and s. typhimurium, one Gram (-ve) member of the family Pseudomonadeceae, uiz. P . aeruginosa, and two belonging to Gram (+ve), C. perfringens and M. lysodeikticus, were studied. None of the Gram (+ves) show any sequence relatedness; the genomes of C. perfringens and M. lysodeikticus having G + C contents of approximately 26.5 and 72%, respectively, are very different from that of E. coli (50% G + C), and it may be that the sequences have diverged significantly. The results obtained on the homology of the E. coli ung gene with that of the other members of Enterobacteriaceae are similar to those obtained for many other genes (32). In the present study, however, Salmonella shows somewhat less homology to the E. coli ung gene than generally expected; it is quite possible that the Salmonella ung gene has diverged somewhat more than average. The Pseudomonas gene also did not show any detectable homology to E. coli. Judging from the relatedness of the E. coli ung gene to the members of Enterobacteriaceae, especially the Proteus it is not surprising that the Pseudomonas ung gene has diverged significantly to show no relatedness to the E. coli gene. Also, no homologies to eukaryotic genomes were detected even under lower hybridization stringencies. These results suggest that even though uracil DNA glycosylase is of wide occurrence among living organisms, it is not conserved in unrelated bacteria or eukaryotes.
In order to possibly determine the DNA binding/active site of the uracil DNA glycosylase, the amino acid sequences of the E. coli tag and alkA gene products (33,341 were compared. Products of both of these genes catalyze excision of 3-methyladenine residues from the DNA. The product of the tag gene, like the ung gene product, is an enzyme with limited substrate specificity, however, the gene product of alkA is less specific, since it also excises other methylated bases (2). No significant sequence homologies between the uracil DNA glycosylase and the tag and alkA gene products were detected, even though there seems to be similarity in the amino acid compositions of tag and ung gene products (33). To analyze if there were some other features of the sequence in common among these three enzymes, a computer program based on Kyte and Doolittle's calculations (35) was used, to derive hydrophobicity plots by using a window of 7 amino acids. The results of this analysis are shown in Fig. 9. It is interesting to note that tag and ung gene products display larger variations between their hydrophobic and hydrophilic regions than the alkA where they are mostly scattered around the base line. It is not clear if larger changes in the hydrophobic and hydrophilic characteristics are important in conferring substrate specificities to these two enzymes.
Studies reported on substrate specificities of the uracil DNA glycosylase by other workers (1, 2) indicate that the enzyme is very specific in its recognition of the uracil residue in its substrate. Such recognition is possible only through a very specific binding site in the protein. Currently we are attempting to localize the binding/active site(s) of the enzyme by mutational analysis of the gene. Expression of these mutated genes may provide information about the active site of this enzyme.