Cloning, sequencing, in vivo promoter mapping, and expression in Escherichia coli of the gene for the HhaI methyltransferase.

A 1476-base pair DNA fragment from Haemophilus haemolyticus containing the HhaI methyltransferase gene was isolated from a cell library and cloned into pBR322. The nucleotide sequence of this fragment was determined. The structural gene is 981 nucleotides in length coding for a protein of 327 amino acids (Mr 37,000). The translational start signal (ATG) is preceded by the putative ribosome-binding site (TAAG). Recombinant plasmids containing the 1476-basepair fragment are completely methylated when isolated from Escherichia coli, as judged by their insusceptibility to the HhaI restriction endonuclease. However, the presence of an active HhaI methylase gene in certain E. coli strains results in a very poor yield of transformants and/or in vivo-originated deletions due to the Rg1 functions of these hosts. The in vivo transcription initiation sites have been identified by S1 protection and primer-extension experiments using specific probes with total RNA prepared from E. coli cells (HB101 or RR1) which tolerate the expression of MHhaI.

(1 Present address: University of Calabar, Calabar, Nigeria. formational alterations at certain sequences. For example, cruciform formation in supercoiled molecules can be monitored by the inhibition of endonuclease cleavage at sites contained in the inverted repeats (11,12). Also, the effect of left-handed Z-DNA conformations can be analyzed; structural alterations near or at the recognition sites can influence the ability of restriction-modification enzymes to utilize them as substrates (13)(14)(15)(16).
In this paper, we report the DNA sequence of the gene coding for the Hhd methylase which recognizes the tetranucleotide GCGC and modifies the internal cytosine at position 5 (17). Analysis of the modification pattern of recombinant molecules containing the MHhuI gene demonstrates its expression in Escherichia coli. The transcription initiation sites and the putative ribosome-binding site and promoter sequences were identified. In addition, we find that certain E. coli strains are methylase-sensitive since they are poorly transformed by MHhaI gene-containing plasmids, thus indicating a possible deleterious effect of C-methylation.

MATERIALS AND METHODS
Plasmids-Plasmid pNW2801 contains the HhuI methylase gene on a 1476-bp' fragment cloned into the HindIII site of pBR322 (see below and Fig. 1). pRW951 (not shown) contains the 1476-bp insert cloned into the HindIII site of pRW751 (18), in the opposite orientation relative to pNW2801. pRW959 ( Fig. 1) harbors the same Haemophilus huemolyticus fragment in the HindIII site of pSP65 (19) (Promega Biotec). pRW962 (Fig. l), a derivative of pRW959 obtained as a spontaneous deletion during the course of a cloning procedure, lacks the polylinker region of pSP65 from the BamHI to the HindIII site and the first 382 bp of the H. haemolyticus insert. DNA plasmid isolation, after chloramphenicol amplification, was as described previously (20).
Strains-H. haemolyticus was obtained from the New England Biolabs strain collection; it is also available from the ATCC, strain No. 10014. The E. coli strains, obtained from personal collections, were as follows: RR1 was the original recipient of pNW2801; HBlOl was used as a host for all the other plasmid DNAs; TG1 (a derivative of JM101, kindly provided by J. Engler, this department) harbored all the M13 recombinant phages; DH1 and W3110 were obtained from N. P. Higgins (this department); "294 was a generous gift from A. Shatzman and M. Rosenberg (Smith Kline & French, Inc., Philadelphia).
Enzymes-All enzymes were purchased from Boehringer, New England Biolabs, or Bethesda Research Laboratories. Assay conditions were as recommended by the manufacturers.
Purification of H. huemolyticus DNA-10 g of freshly grown H. huemolyticus cells were suspended in 20 ml of 25% sucrose, 50 mM Tris, pH 8.0. 10 ml of 0.25 M EDTA, pH 8.0, and 6 ml of 10 mg/ml lysozyme in 0.25 M Tris, pH 8.0, were added. After 2 h on ice, 24 ml of 1% Triton X-100 in 50 mM Tris, 67 mM EDTA, pH 8.0, and 5 ml of 10% sodium dodecyl sulfate were mixed in to achieve cell lysis. 70 ml of equilibrated phenol were added and the solution was emulsified by shaking. 70 ml of chloroform were added and the solution was The abbreviation used is: bp, base pair. again emulsified. The mixture was centrifuged at 10,000 rpm for 30 min and the upper layer was re-extracted with equal volumes of phenol and chloroform. The upper layer was dialyzed against four changes of 10 mM Tris, pH 8.0,l mM EDTA, and then digested with RNase at 100 pg/ml for 1 h at 37 "C. NaCl was added to 0.4 M and 0.55 volume of isopropyl alcohol was layered on top of the solution. The DNA was spooled out, air-dried, and redissolved in 12 ml of 10 mM Tris, pH 8.0, 1 mM EDTA.
Preparation of the H. huemolyticus Library-H. huemolyticus DNA was digested partially with HindIII, ligated into HindIII-cut and dephosphorylated pBR322, and transformed into E. coli RR1. 10-pg aliquots of DNA at 100 pg/ml in 10 mM Tris, pH 7.5, 10 mM MgClz, 50 mM NaCl, 10 mM mercaptoethanol, 0.005% Triton X-100 were digested with nine 2-fold serial dilutions of HindIII (ranging from 2 to 0.008 units of enzyme/pg of DNA). The tubes were incubated for 1 h at 37 "C and heated for 15 min at 72 "C to stop the reactions, and 10 pl from each were analyzed by agarose gel electrophoresis. Tubes in which moderate, but incomplete, digestion had occurred were combined. 40 p1 (4 pg) of the combined solution were mixed with 10 pl of HindIII-cleaved and dephosphorylated pBR322 (2 pg). After addition of ligase buffer plus 250 units of T4 DNA ligase and incubation for 4 h at 16 "C, the mixture was transformed into 2 ml of E. coli RR1, made competent by calcium treatment (21). The transformed cells were grown to saturation in 50 ml of Luria broth, centrifuged, and resuspended in a volume of 2.5 ml, and 250-pl quantities were plated onto Luria agar plates containing 50 pg/ml ampicillin. After incubation at 37 "C, the plates were each flooded with 3 ml of 10 mM Tris, pH 7.5, 10 mM MgCl,, and the transformed colonies were scraped together and combined to form the cell library.
Selection of HhaZ Modification Clones-3 ml of the cell library were inoculated into 500 ml of Luria broth containing 50 pg/ml ampicillin and grown to saturation without chloramphenicol amplification. Plasmid DNA was prepared from this culture by cesium chloride/ethidium bromide banding (22), dialysis, and isopropyl alcohol precipitation. 300 pl of purified plasmid DNA at 50 pg/ml in 50 mM Tris, pH 8.0, 5 mM MgClz, 0.5 mM dithiothreitol, were prepared, dispensed into five tubes, and digested for 1 h at 37 'C with five 2-fold serial dilutions of HhaI (ranging from 20 to 1.25 units of enzyme/pg of DNA). The reactions were terminated by extraction with 20 pl of chloroform and the completeness of digestion was checked by gel electrophoresis. 10 p1 from each tube were transformed into competent E. coli RR1 and the transformed cells were plated onto Luria agar plates containing 50 pg/ml ampicillin. After incubation the plates were examined digestion of the library reduced the number of transformants from each tube by a factor of approximately lo', compared to the undigested control. The plasmids harbored by individual transformants were purified by the alkaline sodium dodecyl sulfate miniprep procedure (23) and analyzed by restriction endonuclease digestion and gel electrophoresis.
Assay for HhaI Modification Activity-Modification activity was assayed by incubating unmodified DNA with serial dilutions of cell extract prior to digestion with HhaI and gel electrophoresis. 30 ml of a freshly grown culture was harvested by centrifugation at 4,000 rpm, 5 min. The cells were resuspended in 1.5 ml of 10 mM Tris, pH 7.5, 10 mM mercaptoethanol, 1 mM EDTA, 1 mg/ml lysozyme, left on ice for 1 h, and then gently sonicated with two 20-5 pulses. The cell debris was removed by -microcentrifugation for 2 min. 600 pl of 50 pg/ml X DNA in 50 mM Tris, pH 7.5, 10 mM EDTA, 5 mM mercaptoethanol, and 0.1 mM S-adenosylmethionine were prepared, dispensed into five tubes, and incubated with five 3-fold serial dilutions of cell supernatant (from 1 to 0.01 pl of extract/pg of DNA). The tubes were incubated at 37 "C for 1 h, and then heated to 72 "C for 10 min to stop the reactions and to denature interfering nucleases. 100 pl of 50 mM Tris, pH 8.0, 40 mM MgCl,, 5 mM mercaptoethanol and 30 units of HhaI (1.5 pl) were added to each tube. The solutions were again incubated at 37 "C for 1 h, and 20 p1 from each were analyzed by gel electrophoresis. Modification was detected by the lack of digestion.
Cloningin M13-Ml3mplOand M13mpll(24) and M13mWB2348 (25) replicative forms were used as vectors for subcloning the HindIII insert of pNW2801; the 1476-bp fragment, purified from the pBR322 vector on a 5% acrylamide, 0.25% N,N'-diallyltartardiamide gel, was digested with Sau3A1, and the mixture was ligated with BamHIdigested M13mplO or BamHI-HindIII-digested Ml3mplO or M13mpll. Alternatively, the same HindIII fragment was cut with RsaI and the mixture was ligated with SmaI-linearized M13mplO. Finally, the entire HindIII insert was cloned in the corresponding site of M13mWB2348. Transformation, phage growth, and single-stranded DNA isolation was as described (24,26).
Sequencing-Sequence determination for the HindIII fragment carrying the HhaI methylase gene was essentially according to the dideoxy-chain termination method described by Sanger (27,281. The universal M13 single-stranded primer (Amersham Corp.) was used as well as synthetic oligonucleotides corresponding to internal regions of the HindII.1 insert (Fig. 2). [cu-36S]dATPaS was the labeling deoxynucleotide. Electrophoresis on denaturing gels was as described (29,30). After fixation in 10% acetic acid, 10% methanol, the gels were transferred onto 3MM paper (Whatman) and dried. Autoradiography was carried out at room temperature for 24-48 h with Kodak or Fuji films.
The chemical degradation method was performed according to Maxam and Gilbert (30). Plasmid DNA, digested with appropriate enzymes, was 3' end-labeled with [cu-32P]dNTPs and DNA polymerase (Klenow fragment) or 5' end-labeled with [y3'P]ATP and T4 polynucleotide kinase following dephosphorylation.
Deoxy-and dideoxy-NTPs were from New England Biolabs and Amersham Corp. Radioactive nucleotides were from Amersham Corp.
Primer Extension Analysis-Approximately 200,000 cpm (Cerenkov counts) of labeled oligonucleotide, purified on a denaturing polyacrylamide gel, were hybridized to 20-30 pg of total RNA, isolated from RR1 cells harboring pNW2801, at temperatures ranging from 45 to 30 "C. The conditions for hybridization and reverse transcriptase reaction were essentially as described by Townes et al. (31). After phenol/chloroform extractions and ethanol precipitation, the samples were electrophoresed on denaturing polyacrylamide gels in parallel with sequencing ladders obtained with the same oligonucleotide and the relevant M13 clone. S1 Nuclease Protection Analysis-The conditions were essentially as described by Berk and Sharp (32); 20-30 pg of total RNA, isolated from RR1 cells harboring pNW2801, were hybridized to 100,000 cpm (Cerenkov counts) of 5' end-labeled double-stranded DNA probe for 12 h at 54 "C after denaturing for 10 min at 70 "C. After 20-40 min incubation at room temperature with 130 units of S1 nuclease (Bethesda Research Laboratories), the samples were phenol/chloroformextracted, ethanol-precipitated, and analyzed on a denaturing polyacrylamide gel in parallel with Maxam-Gilbert sequencing ladders of the same probe.
In Vitro Filter Binding Assay-Filter binding experiments were performed on appropriate restriction fragments as described by Krause et al. (33).
Other Methods-Synthetic oligonucleotides were obtained by the phosphoramidite chemical method using an Applied Biosystems Model 380A DNA synthesizer. Computer analyses were performed with a VAX 11/750 computer (J. Engler, this department); software was as described (34). Agarose gel electrophoresis was performed in 40 mM Tris, pH 8.2, 20 mM sodium acetate, 1 mM EDTA, 0.5 pg/ml ethidium bromide. Agarose gels were prepared with the same buffer plus 1% agarose (Bio-Rad).

Isolation of the H h I Modification Gene-Recombinants
carrying the HhaI modification gene were isolated by selection as survivors from a library of plasmid clones that had been digested with the H h I restriction endonuclease. A similar approach has been used to isolate clones of the BspRI (35), MspI (36), BsuRI (37), and DdeI' modification genes. The library was prepared by ligating HindIII fragments of H. hemolyticus DNA into pBR322 (39); it was propagated in E.
coli RR1 to allow self-modification of molecules carrying the H h I modification gene to occur, then the plasmids were purified, digested with HhaI, and retransformed into RR1. Among the plasmids that survived, many were found to carry a common, 1.5-kilobase pair HindIII fragment. The fragment was judged to encode the HhaI modification gene by three criteria: ( a ) plasmids carrying the fragment were resistant to digestion by H h I (GCGC) and Hue11 (RGCGCY), yet sensitive to digestion by FnuDII (CGCG), Hue111 (GGCC), HpaII (CCGG), and Aha11 (GRCGYC), enzymes that recognize similar, but distinct sequences; (6)  Wilson, and J. E. Brooks, manuscript in preparation.  (19). pRW962 was derived from pRW959 as described (see "Materials and Methods"). The arrow in each circular map indicates the start and orientation of the MHhaI coding region. Circular maps are not drawn to scale. The linear map shows a partial restriction map of the sequenced 1476-bp Hind111 fragment as well as the length and location of the coding region for MHhaI (arrow). prepared from cells harboring the plasmids displayed HhaI modification activity in uitro. No restriction endonuclease activity was detected in cell extracts, suggesting that the fragment does not encode the HhaI restriction gene, at least in its entirety. A representative plasmid, pNW2801 (Fig. l), that carried only the 1.5-kilobase pair fragment, was chosen for analysis.

FIG. 1. Schematic maps of the
Sequence Determination-The nucleotide sequence of the 1476-bp DNA fragment from pNW2801 ( Fig. 1) containing the HhaI DNA methyltransferase gene was determined. The dideoxy-chain termination method (27-29) was employed as well as the Maxam-Gilbert chemical degradation technique (30), when required. More than 90% of the sequence was determined on both strands; portions of the fragment for which data were available from only one of the two strands were sequenced several times.
The sequence strategy is shown in Fig. 2; each arrow corresponds to sequence information derived from a t least two independent experiments. The Sau3AI and RsaI sites are indicated since they were utilized to subclone overlapping segments of the HindIII fragment in the phage vectors M13mplO and Ml3mpll (24) (see "Materials and Methods").
A more detailed restriction map is presented in Fig. 1. In the first set of experiments, the M13 single-stranded universal primer was employed for all the subclones obtained. Since some of the DNA inserts expected from the random subcloning were missing, an entire strand of the HindIII fragment was cloned in the phage vector M13mWB2348 (25). This particular phage is less deletion-prone than the other M13 phages of the mp series, thus allowing larger inserts to be stably propagated.
Six oligonucleotides, ranging in size from 15 to 17 bases, were synthesized and used as specific primers; four were complementary to internal regions of the top strand of the HindIII fragment and the other two to part of the bottom strand of the 635-bp RsaI segment. Dotted lines in Fig. 2 indicate the portion of data obtained by this approach. The chemical degradation method was employed essentially to cover regions of the sequence for which only one strand was available. In addition, this technique was necessary, coupled with the use of a 20% denaturing acrylamide gel, to resolve a region complicated by secondary structure formation. Indeed, the sequence data revealed the presence of a perfect inverted repeat with a 7-bp stem and with 3 nucleotides in the potential loops, situated 5 bases downstream from the termination codon of the HhaI methylase gene (Fig. 3).

C A A T A T P C A A A A T T T C C M T T T C C A A A A C C T T T T G A G C T T A A T T T
A s n I l e G i n A s n P h e G l n P h e P r o L y s P r o P h~G l~L e u A~" T h~P h~V~1 L y~A~~L~~ + + + 1020

L e u L e u P r~A s p S e r G l u V a l G l u H i s L e u V a l I l e A s p A~g L y~A~p L~~V~l M~t T h~
AAACCAAGAAATTGAGCAAACAACCCCCAAAACAGTTCGACTTGGTATTGTAGGAAAAGG A s n G l n G l u I l e G l u G l n T h r T h r P r o L y s T h r V a l A r g L e u G l y I l e V a l G l y L y s G l y + + + The putative ribosome-binding site TAAG, indicated in Fig. 3, is 5 nucleotides away from the ATG. Although it appears to be a weak site for translation in E. coli, we showed that pNW2801 does indeed express a methylase activity when E. coli RR1 and HBlOl are the recipient strains (see below).

S e r G l n A l a T y r L y s G l n P h e G l y A s n S e r V a l V a l I l e A s n V~l L~~G l n T y r~l~A~~
Transcription Signals and Expression in E. coli-MHhaI activity in E. coli was detected by testing purified DNA for insusceptibility to HhaI endonuclease cleavage. pNW2801 was undigested following a 2-h incubation with the HhaI restriction enzyme, whereas an unmethylated internal control DNA (pRW451 (18)) was completely digested (data not shown). pRW951, in which the Hind111 insert was inverted relative to pNW2801 (see "Materials and Methods"), shares the same full methylation pattern; this in uivo methylase activity in either orientation was the first indication for the presence of a MHhaI promoter on the 1476-bp fragment. Moreover, this fragment retains methylase activity when cloned in either orientation in both vectors pSP65 and pACYC184. Also, the replicative form of M13mWB2348 with the 1476-bp insert in one orientation was found to be methylated.
Another method for testing in vivo expression of MHhaI was used. Purified pRW951 was reacted in vitro with [methyl-3H]AdoMet and MHhaI as described previously (14). No significant incorporation (less than 0.2 sites per molecule) was detected relative to an unmethylated control DNA, thus indicating that all the HhaI sites were indeed modified in uiuo (data not shown).
In order to identify sequences required to promote the transcription of the MHhaI gene, we carried out primerextension experiments to map at the nucleotide level the 5' end of the mRNA produced in uiuo. For this purpose, two out of the six synthetic oligonucleotides utilized for sequencing were selected; the primers, a 17-mer and a 15-mer, are complementary to portions of the sequence located 12 and 229 nucleotides, respectively, downstream from the ATG (Fig. 2). Total RNA from E. coli RR1 carrying pNW2801 was isolated since we found that the plasmid purified from this strain is completely methylated at the HhaI sites. The RNA was hybridized to each of the two 5' end-labeled oligonucleotides as described (31) with some modification. The temperature was varied from 45 to 30 "C in order to obtain the most stable hybrids under these conditions. Both primers showed an optimum a t 37 "C. Fig. 4 shows the sizes of the cDNAs synthesized by reverse transcriptase in parallel with dideoxysequencing ladders obtained with the same primers. The hybridization temperatures were 30 "C (lanes I and 3 ) and 37 "C (lanes 2 and 4 ) . With both primers the major products of the reverse transcriptase reactions reveal that the 5' end of the in uiuo mRNA synthesized from the MHhaI gene corresponds to positions 415 (T) and 416 (A) on the MHhaI sequence in Fig. 3.
To confirm this result, we also performed S1 nuclease protection experiments. Fig. 5 shows the results obtained using a probe extending from the MstII site (position 606) to the RsaI site (position 404) and the same total RNA preparation as above. These experiments confirmed that transcription initiates 21-22 nucleotides upstream from the ATG and ruled out the possibility of a premature termination of reverse transcriptase. The -10 and -35 consensus sequences constituting this promoter region are underlined in Fig. 3.
Two additional lines of evidence support the finding that these sequences are indeed required to promote the transcription of the HhaI methylase gene. First, during the course of a subcloning procedure using pRW959 (Fig. l ) , . we found a spontaneous deletion which lacks most of the sequence preceding the MHhaI ATG but still retains methylase expression in E. coli. This molecule, pRW962 (Fig. l), was further characterized through restriction analyses and sequencing around the relevant region. The deletion extends from the 3' end of the SmaI site in the polylinker region to position 382 in the MHhaI gene insert (solid triangle in Fig. 3; see also "Materials and Methods"). The sequence of the new -35 element is GGGATT. Therefore, a region of 53 nucleotides upstream from the ATG contains all the necessary information to promote MHhaI expression. Second, in vitro filter binding experiments, using E. coli RNA polymerase and different sets of restriction fragments derived from the 1476-bp Hind111 insert, revealed preferential retention of the 130-bp TaqI-RsaI segment (Fig. l ) , located just upstream from the ATG of the MHhaI gene and containing the -35 and -10 regions (data not shown).
Base Composition and Codon Usage-An analysis of the base composition for the sequence shown in Fig. 3 (Fig. 3). The S1-resistant bands (which contain 3"OH ends) a t positions 415 and 416 are displaced approximately 1.5 residues upwards from the corresponding bands in the sequencing ladders (which contain 3"phosphate ends). position of synonymous codons and, when there is a choice, also in the first position. The sequences of the only other two genes from the same organism which were reported, the H h I I restriction endonuclease and methyltransferase, also have a high A+T content of 68 and 65%, respectively (43). However, if this feature is not a general characteristic for H. haemolyticus DNA, a regulatory mechanism at the level of translation could be implicated, as suggested in the case of the EcoRI restriction-modification system (42).
Comparison of Protein Sequence with Other Methylases-The predicted amino acid sequence of MHhaI was compared with those of the SPR, BspRI,and BsuRI methylases (37,49) which are also C-methylases with exclusively GC base pairs in their recognition sequences. The MALIGN program was used to search for segments of identity and optimal alignment among these sequences, taking two residues as the minimum length of identity. The MHhaI protein sequence could thus be aligned to 27.2% to MSPR,18.7% to MBspRI,and 21.4% to MBsuRI, whereas the randomly scrambled sequences of M H h I a n d MSPR gave an optimal alignment of only 14.4% of MHhaI. With the DFASTP program it was found that M H h I shares identity with MSPR to 38.9% over a region of 190 residues, with MBspRI to 26.4% over 178 residues, and with MBsuRI to 22.7% over 286 residues. When conservative amino acid replacements were included in this comparison in addition to identical residues, the homology of MHhaI increased to 79. 5, 74.2, and 66.8%, respectively. Several regions are strikingly similar in all four protein sequences. 4 out of 6 residues are identical in a region containing the Pro-Cys unit, which is proposed to form the active poorly by pNW2801 (efficiency <102/pg), or other plasmids containing a functional MHhaI gene, even though they accepted pBR322 with normal efficiency. Conversely, plasmids carrying artificially created or spontaneously occurring deletions that affect MHhaI expression could transform those strains with an efficiency equivalent to pBR322 used as a control. E. coli strains HB101 and RR1 were found to tolerate HhaI methylase activity, although the growth rate of the transformants was severalfold slower compared to pBR322 containing cells. TG1 cells were transformed by pNW2801, but the morphology of the colonies appeared to be quite different compared to the control (irregular uersus rounded shape).
The "sensitive" behavior of these strains is in agreement with observations reported by other groups (37,38,40) working with plasmids which code for cytosine methyltransferases. It has been analyzed in detail using a variety of cloned modification genes and using pBR322 modified in uitro with purified modification enzymes.
The analysis indicates that inefficient transformation is the result of restriction of methylcytosine-containing DNA by the E. coli Rgl functions: inefficient transformation occurs only when the transforming DNA is modified a t cytosine residues and the bacterial strain is Rgl-proficient (38).

DISCUSSION
The DNA sequence of the HhaI methyltransferase gene reveals a coding region of 981 nucleotides which predicts a protein of 327 amino acids ( M , 37,002) in excellent agreement with the size of the protein purified from E. coli (M, 37,000 (47)). Each of the restriction-modification systems that have recently been cloned has a characteristic organization of the genes coding for the two related proteins. For the EcoRI and BsuRI systems, both enzymes are coded by the same DNA strand and the endonuclease gene precedes the methylase gene (37,41,42). For HhaII and PaeR7, the two genes are colinear, but the methylase is located upstream from the endonuclease (43,46). However, the PuuII, PstI, and EcoRV systems consist of two divergently arranged coding regions (40,44,45). The nucleotide sequence flanking the HhaI meth-ylase structural gene does not reveal any other open reading frame suitable for coding for the HhaI restriction endonuclease. The sequence of the 436 nucleotides preceding the MHhaI gene shows several translation termination signals in all possible frames, whereas the region separating the end of the gene from the end of the cloned fragment contains no start codons.
Little is known regarding control mechanisms for the expression of the two related genes inside the cell. A colinear arrangement of the coding regions, with the methylase upstream, appears to be the most simple way to ensure a delay in the endonuclease synthesis. Alternatively, a difference in the strength of promoter or ribosome-binding site could regulate the transcription and/or translation efficiency (37, 44). Folding of the mRNA in a particular secondary structure could also affect protein synthesis (45). Compartmentalization, differential ion requirements, and stability of the protein itself could also be involved in controlling enzymatic activity.
In this regard we have observed that E. coli HB101, carrying an active MHhaI gene on pNW2801, contains fully methylated plasmid DNA when grown to saturation, even at higher temperatures (40 and 43 "C).
We were interested in defining the sequences promoting transcription and translation of the MHhaI gene in order to obtain regulated expression of this enzyme in E. coli under the control of a different promoter. The nucleotide sequence determination, in combination with transcription mapping experiments, was essential in identifying those regions of interest. We localized the coding region and the orientation of the gene on the basis of the longest open reading frame. The first ATG of this open reading frame appears a t position 437 on the Hind111 fragment; the putative ribosome binding site (TAAG), is situated 5 nucleotides upstream. An identical sequence exists 7 nucleotides upstream from the start codon of the MHhaII gene. 3 We have searched for other potential fMet codons of the MHhaI gene. A GTG and an ATG triplet are present at position 308 and 587, respectively, in the same open reading frame. We believe that neither of them is the true start codon since the size of the protein would be too large or too small, respectively, compared to the established M , (47), and the transcriptional start sites at positions 415 and 416 automatically rule out the GTG triplet.
The end of the gene is marked by a TAA codon followed by a perfect inverted repeat 7 nucleotides further downstream. This inverted repeat (AAGGGGCATAGCCCCTT) predicts a hairpin with 7 nucleotides in the stem and 3 in the loon A ~ ~~~ hairpin structure at the 3' end of some bacterial mRNAs-has been suggested as a transcription termination signal for RNA polymerase to release the DNA coding strand (48). The transcriptional start sites were located at positions 415 and 416 through S1 protection and primer extension experiments using total RNA isolated from E. coli RR1 harboring pNW2801. The -10 and -35 consensus sequences are indicated in Fig. 3. We do not know whether any of these se- around position -35 with a spacing of 17 nucleotides from the -10 region. Only part of this sequence (GATT) is still present in the -35 region (GGGATT) of the deleted plasmid pRW962 that, nevertheless, was isolated as a fully methylated DNA.
An interesting characteristic of the MHhaI gene sequence is the absence of HhaI sites (GCGC), frequently found in most other DNAs. An obvious speculation would be to propose the existence of selective pressure during evolution to preserve the integrity of the gene and therefore its essential function in the presence of the related endonuclease. Alternatively, an overall high A+T content (65%) could be responsible for this feature.
The comparison of the predicted amino acid sequence of " L a 1 with other cytosine methylases revealed extensive homologies among these proteins which are derived from different systems but catalyze the same chemical modification reaction. Sequence homologies between the procaryotic methylase of BsuRI and the phage methylase of SPR and the potential origin of their similarities have been discussed in detail elsewhere (49 and references therein).
Finally, we find that several E. coli strains do not tolerate expression of the MHhaI gene; they are poorly transformed by plasmids carrying an active gene and most of the transformants have lost HhaI methylase activity following extensive deletions. Other groups have observed the same phenomenon in E. coli with other cytosine methylases (37, 40). A more detailed analysis indicates that the function responsible for this sensitive phenotype is located in the RglB locus, presumably coding for an unidentified restriction system (38). The RglA and RglB functions were originally identified by their ability to restrict nonglucosylated T-even bacteriophages (reviewed in Refs. 50 and 51) that contain hydroxymethylcytosine in their DNA (52, 53). It has been generally assumed that Rgl recognizes only 5-hydroxymethylcytosine-containing DNA but this is evidently not so. In light of this observation, investigators might find it prudent to use only Rgl-deficient strains of E. coli, such as K802, for the primary cloning of prokaryotic and eukaryotic DNA that contains 5-methylcytosine. Moreover, it may be legitimate to speculate that the correlation between the methylation state of the DNA and gene expression, well known in eukaryotic systems, may also apply to prokaryotes.

in a Computer-readable Format
The sequence of the 1476-bp Hind111 fragment containing the MHhaI gene is presented in Softstrip data strips for a Cauzin strip reader. This softstrip was prepared with a Macintosh Plus computer using the Microsoft Word (Version 1.05) and the Cauzin Systems Stripper program. The data in this strip can be read directly into Apple Macintosh, IBM PC, or Apple I1 desk-top computers that are equipped with a strip reader. Both the strip reader and the necessary software programs can be obtained at nominal cost from Cauzin Systems, Inc., 835 South Main Street, Waterbury, CT 06706.
This figure is being reproduced as an experiment to test the feasibility of publishing nucleotide sequences in computerreadable form. Written comments from readers are invited, and should be sent to the Journal editorial office, 9650 Rockville Pike, Bethesda, MD 20814.