DNA Site Recognition and Reduced Specificity of the Eco RI Endonuclease*

It has been shown previously (Polisky, B., Green, P., Garfin, D. E., McCarthy, B. J., Goodman, H. M., and Boyer, H. W. (1975) Proc. Natl. Acad. Sci. U. S. A. 72, 3310-3314; Hsu, M., and Berg, P. (1978) Biochemistry 17, 131-138) that the cleavage sequence specificity of Eco RI endonuclease can be "relaxed" by various means. In this paper this phenomenon is explored in detail, in order to obtain further insight into the nature and selectivity of sequence recognition patterns between proteins and double-stranded nucleic acids. Using conditions of low ionic strength and alkaline pH, we have mapped the positions of potentially cleavable sites in the (completely sequenced) replicative form of the bacteriophage phi X174 genome, and have deduced their sequence. The time course of digestion of phi X174 DNA suggests that double-stranded sequences reading GGATTT, AAATTT, GAATTT, and GAATTA (only "top" strands, written 5' leads to 3', are shown) are cleaved readily under these conditions, while sequences reading CAATTN (N = A, T, G) resist attack. Cleavages at (at least) the more labile sites result in cohesive ends that are religatable. End group analysis of cleaved phi X174 DNA fragments indicates the presence of a 5'-terminal adenine residue on most of the fragments; some fragments may carry a 5'-terminal guanine residue, consistent with the cleavage site sequences suggested above. Addition of Mn2+ to cleavage reactions carried out at moderate salt concentrations and near-neutral pH induces the same pattern of cleavage seen at low ionic strength and alkaline pH. These results are combined with those from other studies, and are interpreted in terms of a model for the site-specific interaction of the Eco RI endonuclease with its substrate, considering both the effects of changes in DNA sequence and of environmental alterations. The resulting model is compared with data developed on similar grounds for Eco RI methylase (see Woodbury, C. P., Downey, R. L., and von Hippel, P. H. (1980) J. Biol. Chem. 255, 11526-11533), and attempts are made to define both common and differing molecular facets of the DNA recognition specificity of these companion (but genetically distinct) enzymes.

at the 1979 American Society of Biological Chemists Meeting in Dallas, Texas (Woodbury and Hagenbuchle, 1979). The costs of publication of this article were defrayed in part by the payment of page charges. T h i s article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. In the preceding paper , we examined the relaxation in the specificity of methylation by Eco RI methylase induced by various alterations in the solvent environment. Our results were consistent with a compensation of the loss of certain specific favorable (presumably hydrogenbonding) recognition contacts by an increase in the strength of nonspecific electrostatic interactions, resulting in methylation at sites of less than optimal base pair sequence. In this paper we report parallel studies with the Eco RI endonuclease, focusing on the details of the sequences of the reduced specificity sites and stressing particularly the effects of base replacements and extended sequence symmetries in and around these sites. MATERIALS AND METHODS Reagents and Buffers-Most of the reagents and buffem used have been described previously . 3zP (as carrier-free orthophosphate) was purchased from New England Nuclear. ATP was purchased from P-L Biochemicals. [y3'P]ATP was synthesized as described by Maxam and Gilbert (1977). [cx-~*P]~ATP was pur-a 1% agarose slab gel (0.3 cm X 12 cm X 16 cm) for 3 h at 35 mA using E buffer, and the positions of the labeled DNA bands determined by autoradiography. DNA fragment bands were excised and the DNA eluted by electrophoresis into dialysis tubing. The DNA was phenolized once, ether-extracted three times, concentrated by ethanol precipitation, resuspended in 10 m~ Tris, pH 8.0, 1 m~ EDTA, and stored at 4°C. ECO RI nuclease dieests of whole or 32P-labeled franments ofbX174 DNA under conditiok of reduced Eco RI nuclease-specificity were carried out with I to 5 pg of +X174 DNA in 20 m~ Tris, pH 8.0,2 mM MgC12 at 37"C, and with varying amounts of enzyme. Reactions were quenched with SDS and EDTA as described above, the DNA was ethanol-precipitated and resuspended in 10-fold dilute TBE buffer containing the tracking dyes xylene cyano1 FF (0.025%. Matheson, Coleman, and Bell) and bromphenol blue (0.025% Matheson, Coleman and Bell) plus glycerol to 5% (w/v). Samples of Hpa 11-and Hue 111-digested +X174 DNA, and of Ab2 DNA digested with Eco RI and HzndIII, were prepared similarly. Electrophoresis of the digests on agarose or polyacrylamide slab gels was performed at room temperature at 30 mA for approximately 3% h. Positions of the bands were determined by autoradiography using standard methods. The developed films were examined in some cases with a Grant Instruments minal residues at noncanonical cleavage sites, 13 pg of +X174 DNA scanning comparator-microphotometer. For determination of 5'-terwas digested at 37°C for 24 h with 50 units of Eco RI endonuclease in 125 pl of the reduced specificity buffer described above. After phenol and ether extractions, followed by ethanol precipitation, a 0.6-volume aliquot of the resuspended DNA (approximately 8 pg) was phosphorylated with 32P at the 5'-end by the method of Maxam and Gilbert (1977). A 0.9-volume aliquot of the ethanol-precipitated, resuspended DNA (in TE buffer) was electrophoresed on a 1.5% agarose gel slab; six distinct DNA fragment bands were recovered and purified as described above. Following ethanol precipitation, the samples were digested thoroughly with bovine pancreatic DNase and venom exonuclease and the resulting mononucleotide phosphates were separated by paper electrophoresis (Brownlee, 1972). After autoradiography, nucleotide assignments were made by comparison to the relative mobility of co-electrophoresed standards of dAMP, dCMP, dGMP, and dTMP. Relative incorporation of 32P into the four nucleotides was determined by excising the respective spots on the paper and counting by liquid scintillation . "Overmethylated" +X174 (RF) DNA, for the demonstration of blockage of noncanonical attack by Eco RI endonuclease, was prepared by incubating 6 p g of DNA with 4 pg of Eco RI methylase in 200 pl of 25 m M Tris, pH 9.0, 2 m~ EDTA, 2.5 mM dithiothreitol, 4 p~ [3H]AdoMet, and 400 p g / d of bovine serum albumin for 45 h at 37"C, leading to an average incorporation per +X174 molecule of 84 methyl groups (based on total 3H incorporation per pg of DNA Woodbury et al., 1980). Following extraction with phenol and ether, the DNA was concentrated by precipitation with ethanol and resuspended in 50 pl of TE buffer. Twenty-microliter aliquots of the resuspended DNA were then digested for 1 or 4 h at 37°C with 50 units of Eco RI endonuclease in 100 pl of 20 n m Tris, pH 8.0,2 m M MgC12, and prepared for electrophoresis as described above.

RESULTS
Previously, Polisky et al. (1975) had demonstrated that under low salt and high pH conditions Eco RI endonuclease shows reduced site specificity; i.e. it can cleave many doublestranded DNA sequences in addition to the canonical doublestranded Eco RI-specific G'AATTC (only top strand of sequence shown) sites. This reduced specificity activity, termed Eco RI* by these workers, was reported to be specific for sequences reading NIAATTN. Here we examine this reduced specificity in more detail.
The replicative form (RF) DNA of bacteriophage 4x174 provides a very convenient substrate for these studies. The entire nucleotide sequence of 4x174 has been determined (Sanger et al., 1977;revised, 1978), and has been shown to contain no canonical Eco R I recognition sites. The 4x174 double-stranded genome does, however, contain a large number of sequences which are more or less closely related to the canonical site, including 25 Eco RI* sequences (defined as NAATTN'; Polisky et al., 1975).
We planned to determine the preferred sites of cleavage by generating a series of partial digests of +X174 (RF) DNA, under conditions that stimulate the reduction of site specificity of the endonuclease. By mapping these cleavage sites on 4x174, we could then compare the sequence about these preferred sites to sequences digested subsequently (less favorable sites), and perhaps to sites where cleavage occurs only very slowly or not at all. Previous workers (Goodman et al., 1977) have shown that there seems to be a hierarchy of preference in cleavage sites for the Eco RI* activity, and have suggested, on the basis of relative nucleotide frequencies, that sequences reading GAATTT might be preferred over sequences reading GAATTA, which in turn seemed to be preferred over all other NAATTN sequences. Using 4x174 (RF) DNA we could examine this hypothesis further, as well as investigate cleavage sites for other features of interest, including those which might lie outside the hexameric canonical sequence region.
Mapping Cleavage Sites in @174 DNA-We followed two procedures in the preparation of partially digested 4x174; the first was suggested by the technique developed by Smith and Birnstiel (1976) for mapping cleavage sites of restriction endonucleases. This technique uses a series of limited digests of 32P-terminally labeled DNA molecule to develop a n overlapping set of partially digested molecules with a common 32Plabeled terminus. For the study of the noncanonical Eco RI activity, where we expected some sites to be cleaved much more rapidly than others, this approach served well to map labile cleavage sites. However, it did not work well for mapping the more refractory sites, since one or more highly susceptible cleavage sites were generally found between these refractory sites and the 32P-labeled termini. These latter sites were mapped using a second approach which involved preparation of both very limited and very prolonged digests of (here, unlabeled) 4x174 DNA. Comparison of the resulting gel patterns for various times of digestion, together with our map of highly labile cleavage sites developed from the first approach, provided a means to deduce the positions of the more refractory sites.
T o prepare 4x174 DNA with a 32P-terminus for use in the Smith-Birnstiel technique, the DNA was first digested with restriction endonuclease Xho I. This enzyme cuts at a single site on the +X174 genome and produces protruding 5'-termini, suitable for kinase-exchange labeling. Following 32P-label incorporation at these termini (as detailed under "Materials and Methods"), we digested the labeled DNA with restriction endonuclease Pst I, which cleaves 4x174 DNA at a single site.
This resulted in two terminally labeled fragments, a short and a very long (5226 base pairs) fragment. These were easily resolved by electrophoresis. Following purification of the large fragment, we incubated it for varying (short) periods with sufficient Eco RI endonuclease under conditions designed to stimulate the reduced specificity activity of the enzyme (20 mM Tris, pH 8.0, 2 m~ MgCL), electrophoresed the resulting partial digests, and examined the DNA band pattern of the resulting autoradiogram. T o deduce the size of the various fragments represented by these bands, we co-electrophoresed a sample of Ab2 DNA, labeled with 32P-containing nucleotides via the nick translation procedure and doubly digested with both Eco RI and HzndIII. The sizes of the resulting fragments from this cross-digestion of Ab2 are well known (Robinson and Landy, 1977), and provided a calibration standard for determining the sizes of the 4x174 DNA fragments. Fig. 1 is an autoradiogram of a series of limited digests of the large 32P-labeled 4x174 fragment. Scanning the autoradiogram by densitometer allowed us to determine quite precisely the relative mobilities of the various DNA fragments; a plot of the relative mobilities of the Ab2 fragments uersus the logarithm of their molecular weight was linear over the range from 520 base pairs to 5700 base pairs, and provided an accurate calibration curve for determining the sues of the 4x174 DNA fragments (data not shown). Densitometer scans of the gel band patterns of the partially digested large 4x174 fragment showed several prominent bands, as well as several bands of lower intensity; we suggest that the prominent bands represent fragments resulting from cleavages at very labile sites, and that the less intense bands represent fragments from subsequent cleavages at less favorable sites.
In Fig. 1 we see five intense bands, resulting from cleavage at five very labile sites in the large 4x174 fragment. The intensity of four of these bands decreases with time, while that of the fifth (and lowest in molecular weight) increases. The sues of four of the five "early" fragments are easily established by reference to the HindIII/Eco RI-digested Ab2 DNA fragment series (Fig. 1). The apparent sizes (in base pairs) of these early fragments are 1100 f 100, 1950 f 200, 3250 f 300, and 3600 f 300. The smallest of these five fragments lies outside the calibration range of the Ab2 DNA fragments, but linear extrapolation of the calibration assigns to it a size of 310 f 30 base pairs. (Fragment separation in gels of this type follows the logarithmic dependence of fragment sue to below 140 base pairs.) Based on these fragment sues we have assigned provisionally the points of cleavage to (approximate) positions 470, 1300, 2100, 3400, and 3800 (relative to the Pst I cleavage site) in the 4x174 sequence (see Fig. 3 below). In more extended digestions of 4x174 we find bands corresponding to fragments of 570 f 50, 800 f 50, and 1700 f 700 base pairs, with terminal points of cleavage mapping at approximately positions 730, 960, or 1860 on the 4x174 sequence.
The Smith-Birnstiel technique thus allowed us to determine the positions of a total of eight cleavage sites on the large 4x174 fragment. As pointed out above, this approach could not be applied to the mapping of very refractory cleavage sites, and so we supplemented this technique by examining the fragment patterns in prolonged digests of unlabeled whole 4x174 (RF) DNA. Fig. 2, A and B, are photographs of typical slab gels of 4x174 (RF) DNA digested with varying amounts of enzyme and for various times, to generate a spectrum of partial digestion products. Sizes of the various fragments have been determined by comparison to the mobilities of Hpu IIand Hue 111-generated fragments of 4x174 (Sanger et ul., 1977).
As can be seen from these figures, attack on 4x174 DNA by Eco RI endonuclease occurs fmt at a limited number of sites, generating only a few very large fragments. As digestion proceeds, these large fragments are cleaved into a host of smaller ones; transient species of intermediate size appear and disappear, with a general shift in intensity of the bands to fragments of lower molecular weight. In the most extended digestion achieved (Truck c of Fig. 2B), we can resolve fragments of approximately 145, 150,220, 235,275,280, 290,310,  470, 575, 690 to 715 (multiplet band), 850, 890, and 950 base pairs. Not all of these fragments can represent limit digests ( i e . complete cleavage a t all available sites), since the sum of the base pairs of all the observed fragments in this sample gives 7040 base pairs, exceeding the 5386 base pairs of the whole cpX174 DNA (RF) molecule. In fact, certain of these bands can be identified as partially digested intermediates. For example, the progressive loss in intensity for the bands corresponding to 310, 575, and 890 base pairs indicates that they are most likely intermediate species containing one or more potential cleavage sites. Subtracting the contribution of these putative intermediates from the previous sum (7040 base pairs), we obtain 5275 base pairs, a value comparable to the 5386 base pairs expected for whole cpX174.
The various partially digested fragments observed can be utilized to formulate a hierarchy of cleavage site labilities. For example, using the Smith-Birnstiel technique we found that cleavage occurs rapidly a t roughly positions 470, 1300, 2100, 3400, and 3800 on cpX174. Thus, a spectrum of fragment sues corresponding to cleavages a t these sites might be expected to appear in very limited digests of whole cpX174. In particular, if cleavage of cpX174 occurred only at the above five sites, a complete digest should generate fragments of approximately 400, 800, 830, 1300, and 2100 base pairs, and a partial digest would generate fragments of roughly 1600, 1700, 2100, 2500, 2900,3300,3700, and 3800 base pairs. Examination of Fig. 2   bands corresponding to fragments of approximately 570,900, 1700, and 2500 base pairs, also consonant with early attack at the five labile sites. A consistent pattern of cleavage thus emerges, with cleavages occumng first at the five sites mapped out by the Smith-Birnstiel technique, followed by subsequent cleavage of the large precursor fragments so generated. A cleavage map summarizing the assignment of the most readily cleaved noncanonical sites, as well as the sequence-numbering system, is shown in Fig. 3. Zons-Hsu and Berg (1978) have shown that Mn2+ stimulates the reduced specificity activity of the Eco RI endonuclease, and Goodman et al. (1977) have reported that Mn2+ (at concentrations of 100 to 150 p~) can replace Mg'+ (at 2 m) in the Eco RI' reaction. We ask if the Mn2+-stimulated activity follows the same pattern of attack on cpX174 as that found under conditions of alkaline pH and low ionic strength. As before, we used the Smith-Birnstiel technique to compare partial digests in 20 mM Tris, pH 8.0, 2 mM MgCL to partial digests in 100 m Tris, pH 7.5, 2 mM MnC12. Fig. 4 is an autoradiogram of the resulting slab gel band pattern. In the main, the pattern of attack is the same for both sets of conditions. The low salt, pH 8.0, reaction yields (in a brief digest) fragments of roughly 300, 1070, 2000, 3400, and 3800 base pairs (Track e, Fig. 4), in good agreement with the pattern seen before (cf. Fig. 1). The reaction carried out in the presence of Mn2+, with the same amount of enzyme On the Specificity of the Eco Rf Endonuclease and for the same period of time (Track b, Fig. 4), shows the same high molecular weight bands, but at reduced intensity and with many additional (some very faint) bands at intermediate positions, corresponding to fragments of roughly 410, 440,540, 710,850, 1050, 1600, 1700, 1900, and 2000 base pairs. These same bands also appear in the digests carried out in the absence of Mn2+, but seem to require more time for their appearance. The approximate positions of the cleavage sites of these fragments are roughly 570, 600, 700, 870, 1000, 1200, 1760, 1860, 2100, and 2200 on the @X174 map relative to the Pst I cleavage site. A most striking feature of Fig. 4 is the intensity of the band corresponding to the 300-base pair fragment, which strongly suggests preferential cleavage at about position 470 (Position A of Fig. 3) early in all the Mn2+stimulated reactions. The early appearance of the many partial digest bands in the Mn2+ incubation indicates a general stimulation of activity toward the less labile sites, as well. In separate experiments (not shown), we surveyed transition metal ions other than Mn2+ for their possible stimulation of any reduced specificity activity for the Eco RI endonuclease; we were unable to show any effect with 2 m~ Co2+, Zn2+, or Cu2+ dissolved in 20 mM to 100 m~ Tris buffers, at pH 7.5 or pH 8.3. Hsu and Berg (1978) also tested Cu2+ and Zn2+, with similar (negative) results.

Effects of Divalent
Nature of DNA Fragment Termini Generated in Noncanonical Site Cleavage-In their work on the Eco RI* activity, Polisky et al. (1975) found that double-stranded SV40 or pM89 DNAs were cleaved in a staggered fashion to generate cohesive termini. Such cohesive ends could be rejoined by E. coli ligase, and could serve as template-primers for the DNA polymerase from avian sarcoma virus (which has a strict requirement for internal 3'-ends and protruding 5"ends). The pattern of attack was found to match that seen for the canonical Eco RI activity (Hedgpeth et al., 1972). As a check on our procedures, we have used the same approach to examine our @X174 (RF) DNA digests. An extensively digested sample of @X174 (RF) DNA was tested for religation capacity by treating it for 72 h at 4°C with Escherichia coli ligase under the conditions defined by Modrich et al. (1973). Agarose gel electrophoresis of ligase-treated and untreated digests showed that a substantial portion of the fragments present could be rejoined covalently (data not shown). Similar experiments with ColEl plasmid DNA gave parallel results. Moreover, samples of either DNA that had been digested in the presence of Mn2+ (2 mM MnC12, 100 mM Tris, pH 7.5) were just as easily ligated as those generated under conditions of low ionic strength and alkaline pH (data not shown), in agreement with the findings of Hsu and Berg (1978). Polisky et al. (1975) reported that attack by the Eco RI* activity yields primarily adenine residues at the 5"terminus; we wished to determine whether the termini generated by our enzyme preparation followed this pattern. Using a 32P-termin d y labeled preparation of @X174 DNA that had been digested previously with Eco RI endonuclease, we electrophoresed the DNA on an agarose slab gel and selected six distinct fragment bands for further purification and analysis (see "Materials and Methods" for details of the purification and end group analysis). Table I summarizes the results; the samples are ranked in order of decreasing fragment size with Sample 1 representing a band several thousand base pairs in length, while Sample 6 (the smallest) represents a band less than 200 base pairs in length. (It is possible that a single apparent band may represent two or more poorly resolved fragment species.) We fiid that the bulk of 32P-label (70% to 95% of total 32P incorporated) is found in pA for every sample examined, in agreement with the results of Polisky et al. (1975). The low level of incorpo-

4.3
a Quantitation of label incorporation at PC and pA for Sample 4 may be in error due to incomplete resolution of the two species by paper electrophoresis. not sum to 100% due to round-off. ration into pT, PC, and pG most probably represents labeling at nicks or nonspecific breaks in the DNA induced by the various manipulations preceding the kinase-exchange incorporation of 32P.
An interesting trend toward increasing incorporation into pG is found in Samples 5 and 6. Assuming that incorporation into pT and PC in these samples represents a nonspecific labeling background of about 3% and 7%, respectively, for the experiments, then labeling of pG for these two samples is apparently at about twice the background level (7.2% and 14.8%, respectively). We return to this point under "Discussion." Protection of Noncanonical Cleavage Sites by "Overmethylation '' with the Eco RI Methylase- Berkner and Folk (1978) reported that overmethylation of polyoma DNA by the Eco RI methylase partially protects that DNA against attack by the Eco RI* activity of the endonuclease, suggesting that the methylase acts in or near Eco RI* sequences. Protection of the DNA was incomplete, however, because of the rather low level of overmethylation obtained in those experiments (approximately 4 times the expected level of methyl incorporation, based on the single canonical site in polyoma DNA molecules). With the much higher levels of overmethylation we have achieved using the Eco RI methylase and various DNAs (see preceding paper ), we wished to determine whether full protection against the noncanonical Eco RI endonuclease activity could be attained, and if not, which noncanonical endonuclease sites remained unmodified by the Eco RI methylase. We prepared overmethylated @X174 (RF) DNA (as detailed under "Materials and Methods") which had, on the average, over 80 methyl groups incorporated per +X174 molecule. Digestion of this DNA under Eco RI* conditions results in very little fragmentation, compared to unmethylated @X174 controls (Fig. 5 ) , confuming that overmethylation confers protection against the noncanonical cleavage activity of Eco RI endonuclease.
In Fig. 5 we can detect eight fragment bands for the overmethylated-digested @X174 DNA. Two of these bands, approximately 4500 and 3900 base pairs in length, contain the bulk of the digested DNA; however, six minor fragment bands, approximately 2000,1500,1200,940,710, and 300 base pairs in length, are also apparent. We suggest that most of the methylase-treated molecules resist endonuclease attack, and are susceptible at only one or two sites; this resistant population constitutes the two major high molecular weight bands seen in Fig. 5.
A comparison of the cleavage map established by the Smith-Birnstiel technique (Fig. 3) to the fragmentation pattern seen here reveals a surprising degree of agreement. This cleavage map would predict fragments of approximately 4500 base pairs and 900 base pairs resulting from attack at (unmodified) Sites A and B (Fig. 3), while attack at Site B and a site mapping at around position 5200 in the +X174 sequence (which we shall refer to as Site F) would generate fragments of about 1500 and 3900 base pairs. (Attack at Site F might well have gone undetected by the Smith-Birnstiel technique, for reasons discussed above.) Additionally, cleavages at Sites A and F would yield fragments of roughly 700 and 4700 base pairs. This rather neatly accounts for the two major fragments bands as well as three of the six minor bands seen in Fig. 5 for the "protected" DNA. Furthermore, attack at Sites C and D would account for the observed fragment bands of 1200 and 900 base pairs, while a partially digested fragment of 2100 base pairs might also result. This scheme, however, does not account completely for all the bands observed in the digest of overmethylated +X174 (RF) DNA. In particular, the fragment of about 300 base pairs cannot be accommodated within the limited set of cleavage sites proposed here. The simplest explanation for this incongruency is the presence of one or more additional sites of cleavage, left unmodified and unprotected by the methylase treatment and not included in the original set of highly labile cleavage sites.

DISCUSSION
The following general conclusions can be drawn from these results: (i) there are (at least) five mapped sites in +X174 that are very susceptible to cleavage by the Eco RI endonuclease under conditions of low ionic strength and alkaline pH; (ii) there are a number of more refractory sites whose positions (and sequences) are yet to be deduced; (iii) addition of Mn2' produces the same pattern of cleavage as does low salt and alkaline pH; (iv) cleavage at these sites generates cohesive ends which carry principally 5'-adenine termini (although some may carry 5'-guanine); (v) protection of most, but not all, such sites is afforded by overmethylation with Eco RI methylase, implying a substantial overlap in specificity of the two enzymes in their reduced specificity mode of action. (The lack of complete protection may provide additional insight into differences between the sequence recognition mechanisms of the two enzymes.) Possible Contamination by Other Endonucleases-As in connection with the noncanonical methylation considered in the previous paper . we must ask here whether an additional nuclease, which might be co-purified with Eco RI in our preparative procedure, could be responsible for the noncanonical cleavage activity observed. The following lines of evidence argue against a contaminating enzyme hypothesis: (i) more than 98% of the protein in our preparation migrates (at the expected molecular weight of the endonuclease) as a single band in SDS-polyacrylamide gel electrophoresis experiments; (ii) in common with Hsu and Berg (1978), we point out that any putative contaminant purported to be responsible for noncanonical cleavage must be inactive with Mg+, but active with Mn2' (under standard endonuclease assay conditions), and active with either ion under conditions of low ionic strength and alkaline pH; this represents a most unusual pattern of divalent ion activation; (iii) in common with the canonical cleavage products of Eco RI endonuclease, our enzyme preparation generates (ligatable) DNA fragments with cohesive ends carrying primarily a 5"terminal adenine residue; (iv) all the noncanonical sites cleaved share most of the canonical site sequence, and not the sequences recognized by most other known endonucleases; (v) most of the noncanonical cleavage sites can be protected by prior overmethylation with the Eco RI methylase; and (vi) a single gene is responsible for both the canonical Eco RI and the Eco RI* activities (Polisky et al., 1975;Goodman et al., 1977). Taken in aggregate, these results argue strongly against any significant contribution from a contaminating nuclease.
Noncanonical Cleavage Sequences-In what follows, we define the position of a candidate hexanucleotide recognition sequence by the number of the first base pair of that sequence in the overall +X174 base pair sequence of Sanger et al. (1978) (see Fig. 3). We consider first hexameric, rather than shorter sequences, since the endonuclease is quite specific for such sequences under "standard" conditions; moreover, work on the nicking of noncanonical sites (Bishop, 1979) as well as on noncanonical double strand cleavage (Polisky et al., 1975;Goodman et al., 1977) suggests a requirement for four or five base pairs of the canonical sequence is retained under most conditions. Thus, in considering sequences in +X174 (RF) DNA which might be cleavage candidates, we search first for hexanucleotide blocks that resemble the canonical sequence at four or more positions, and that yield 5"terminal adenine (or possibly guanine) residues when cut in the staggered fashion characteristic of the canonical specificity.
As described under "Results," five early cleavage sites were mapped to the approximate positions 470, 1300, 2100, 3400, and 3800. An examination of the sequence of +X174 around these positions shows one or more candidates for noncanonical cleavage near each mapped site. There are several possibilities for cleavage sites around the 2100 and the 3800 map positions; these possibilities w i l l be sorted out below. First we consider the candidates for recognition sequences at the other three early cleavage sites. These three sites include the cleavage site where endonuclease activity was so markedly stimulated by .e. position 470), as well as positions 1300 and 3400. For each of these three cleavage sites the sequence, GGATTT, appears to be the only reasonable choice. This sequence is found at positions 470,1349, and 3369. The agreement between mapped cleavage position and the position of these sequences is quite good, certainly well within the experimental error in map position. Other candidate sequences are too far removed (generdy by 300 or more base pairs) to be considered. In addition, the presence of G residues in the "second" position of this sequence is compatible with the observation that G residues seem to occur at the 5"termini of some noncanonic d y cleaved +X174 fragments (Table I).
We note that these are the only three sites in the +X174 genome containing this particular hexameric sequence (although there are many other possible cleavage sites). This sequence does match the canonical Eco RI recognition sequence at four (underlined) points: GGATTT . It should also be pointed out that this sequence, unlike the canonical cleavage sequence, is not palindromic (centrisymmetric); in the 5' -+ 3' direction along the ''lower'' strand it reads AAATCC. We come back to this point below.
Returning to the other two cleavage sites, with their multiple candidate sequences, there are five potential cleavage sites near position 2100, and another five sites around position 3800. To distinguish between these candidates we must consider the data from digests of whole +X174 ( Fig. 2A ). First, consider the candidate sites for cleavage around position 2100 (Site C; listed in Table 11 Next, we consider the five candidate sites for cleavage around position 4000 (Site E). Their sequences begin at positions 3860, 3899, 3939,3962, and 4147 (Table 11). (We include

TABLE I1
Nucleotide sequences of candidate sites in -1 74 RF DNA for early attack by Eco RI endonuclease DNA sequence resembling the canonical GAATTC sequence at four or more positions in the sequence (underlined) were selected, based on proximity to the deduced position of cleavage. Site E as a candidate sequence here the one beginning at position 3962, although it has no more than a three-base pair homology with the canonical sequence, because it does share a number of key recognition features which will be elucidated below.) The fist of these candidates may be dismissed, since cleavage at that point (3860) and at position 3369 would generate a fragment of -490 base pairs early in the digestion. Such a fragment is not seen. Early cleavage at position 3899 is argued against by the lack of a 530 base pair fragment, expected for early attack at this point in conjunction with attack at position 3369. A fragment or sets of fragments produced early in the digests of whole +X174 runs around 570 to 590 base pairs; early cleavage at positions 470, 1359, 2222, or (most particularly) 4147, cannot yield a fragment in this size range by subsequent attack at any other candidate site, and so we can eliminate position 4147. We are left with the two remaining possible sites, at positions 3939 and 3962. Because of the very short separation between these sites (23 base pairs), and the experimental errors in determining fragment sizes from slab gel mobilities, we are unable to state definitely which of these sites is preferred. Cleavage at either, in combination with cleavages at nearby sites, would fit the observed gel band patterns. We tend to be biased against the Eco RI* site at position 3939 since its sequence reads CAATTG, and other evidence presented here, as well as the work of Goodman et al. (1977), suggests that the Eco RI* sequences beginning with cytosine are strongly discriminated against. We cannot, however, exclude this possibility completely.
Temporarily dropping the latter site from consideration, we compare the sequences of the five remaining noncanonical cleavage sites we have defined for points of similarity. Three of these share the sequence, GGATTT the two other sites read AAATTT (position 2222) and AAATCT (position 3962). All five sites (reading in at least one direction) contain the sequence, AAT, and four of the five sites share the sequence ATTT. The tetrameric sequence, AATT, is also shared by most of the candidates for the cleavage sites deduced by the There are, however, two special Eco RI* sites we must consider; those which contain C (or G) in the apparently favorable first (or sixth) position in the hexameric sequence (e.g. positions 2988 and 3860). Cleavage at either site would yield fragments in either partial or complete digests which are not observed. In this connection, we recall the refractory site at position 4269 (GAATTG) and note the apparently unfavorable terminal guanine residue. A similarly resistant site of the same sequence is found at position 744; facile cleavage at this site should have been readily detected, but was not seen with the Smith-Birnstiel technique. An apparent contradiction to this rule regarding the position of guanine or cytosine residues is the cleavage site or sites at positions 956 to 962.
Here the close proximity of two candidate recognition sequences (AATT) for the reduced specificity activity may overcome the inhibiting influence of terminal guanine or initial cytosine residues. Another possibility is that introduction of a single nick in each of the two sites, but on opposite DNA strands, would generate a f u l l double strand cut in this fashion:

CAATTTTJAATTG GTTAATAATTAAC
(note the apparent reversed polarity in strand scission). There is precedent for such "one-sided nicking" by ECO RI (Bishop 1979). Moreover, the preference against terminal guanines is not absolute, since bands are observed (with extended digestion) that are consistent with eventual cleavages of the refractory sites at positions 744 and 4269; the initial guanine residues in the sequences for these two sites may possibly exert a counterbalancing favorable influence.
This brings us to the hypothesis advanced by Goodman et al. (1977); i.e. that Eco RI* activity involves preferential cleavage at sites containing guanine as the initial residue (GAATTN; where N is T, A, or G). There are a number of such sequences in the @X174 genome (positions 81, 102, and 139 would, of course, not have appeared since these sites lie on the small DNA fragment generated by Xho 1-Pst I cleavage; Eco RI endonuclease attack on this fragment was not studied). It does appear likely that cleavage occurs at some of these Eco RI* sites rather soon after the initial cuts have been made at the early sites, judging from the digests of whole @X174, and that eventually most or all of these sites are cleaved. We conclude that a guanine residue preceding the core tetramer, AATT, has a favorable effect for recognition and cleavage by the endonuclease, but that this is not necessarily the most favorable combination of bases (short of the canonical sequence) since our evidence points toward the sequence, GGATTT, as the most readily cleaved under our conditions. In any case, cleavages seem to occur more readily at sites with a purine in the first position and a pyrimidine in the sixth.
Other possibly important features of the sequences at cleavage sites are the symmetry and the base composition of immediately neighboring regions. We are unable to discern any particular bias in cleavage activity toward sites flanked by regions rich in either A .T or G .C base pairs, or rich in purine or pyrimidine residue on any given strand. Symmetry in base sequence that extends beyond the core tetramer, AATT, for Eco RI* sites seems to have no especially favorable influence on enzyme activity at a number of sites (positions 1983, 2650, 3939, 4436, or 5340), but conceivably, symmetry plays a role in stimulating cleavage at positions 2222, 956 to 962, and 2324 to 2330. The "tandem" nature (close proximity of two potential Eco RI* sites) of the last two pairs of sites may also play a role.
Mnz+-induced Noncanonical Cleavage Sequences-As indicated under "Results," we have observed that replacement of Mg2+ by Mn2+ in moderate ionic strength buffer promotes Eco RI cleavage at many of the same noncanonical sites attacked by the endonuclease under conditions of low salt and alkaline pH. Comparison of gel patterns for the two systems suggest that much the same hierarchy of relative site lability is followed in those two sets of noncanonical cleavages as well. Furthermore, this effect is Mn2+-specific; Co2+, Zn2+, or Cu2+ are not effective in reducing the specificity of the enzyme. This result was also obtained by Hsu and Berg (1978) in examining Eco RI digests of SV40 DNA in the presence of Mn2+.
These workers found six highly labile sites of Mnz+-stimulated Eco RI cleavage in SV40 DNA, in addition to the single canonical site present in the molecule, and located these sites on the SV40 restriction map. At the time their results were published, the sequence of SV40 had not been determined fully; subsequently, Fiers and co-workers (1978) and We&man and colleagues (Reddy et al., 1978) have published the complete sequence of this DNA. We have compared the DNA sequence with the approximate map positions of the six labile sites observed by Hsu and Berg (1978). Even though there are a number of candidate sequences near each mapped point of cleavage, in every case there is a candidate sequence reading GAATTN (N = A, T, or G). The occurrence of this sequence around all six cleavage loci goes beyond happenstance; it appears that GAATTN is the favored recognition sequence for the Mn2+-stimulated noncanonical cleavage activity in these digests.
We do find one significant discrepancy between our results with @X174 DNA, and those of Hsu and Berg (1978) with SV40. As indicated above, we find the sequence GGATTT to be the most labile under noncanonical cleavage conditions. Although SV40 DNA contains four such sequences, none lie near the map positions of the six early cleavage sites assigned by Hsu and Berg (1978). While we can provide no definitive interpretation of this discrepancy at present, it may reflect the possibility that flanking sequences play a part in controlling the relative lability of noncanonical, as well as canonical sequences (see below).
Hierarchy of Recognition Elements for Eco R I Endonuclease-As a point of departure for consideration of site recognition mechanisms, we summarize the relative rates of attack of the enzyme on the various potential Eco RI endonuclease cleavage sites observed. The full canonical sequence, GAATTC, is most reactive, although the rate of cleavage of sites with this sequence appears to depend in addition on the nature of the flanking base pairs (Thomas andDavis, 1975 Forsblom et al., 1976). The sequence GGATTT seems to be next most reactive. A check of the whole @X174 sequence shows that possible cleavages at the sequences GGATTA or GGATTG will not fit the observed patterns of attack (4x174 contains no sequences GGATTC), and so we discard these latter two possibilities as candidates for recognition sites. Following GGATTT in the hierarchy, we place AAATTT as well as GAATTN (where N is either A or T), basing this assignment again on their relative reactivity in 4x174 (RF) (as well as SV40) DNA. After this we suggest that the general Eco RI sequence NAATTN (both N and N' are unspecified nucleotides) provides the bulk of the (slowly reacting) noncanonical cleavage sites (Polisky et al., 1975). We conclude the hierarchy with the central base pair dimer, AT, which is found in all the above suggested recognition sequences and contains the adenine residue which lies closest to the center of the recognition sequence and which, when methylated, is responsible for blocking the cleavage reaction.* were added and the mixture incubated at 37°C for 10 min. The reaction was quenched and the DNA precipitated by addition of sodium pyrophosphate and trichloroacetic acid; the precipitate was collected on a prewetted GF/C filter and washed, then dried and counted. After 25-h treatment with restriction enzyme, approximately 1 of every 2 poly[d(A-T)] molecules had been cleaved once, as evidenced by a 43% rise in incorporated [3ZP]phosphate (11.6 x IO3 cpm uersus 8.1 X IO3 cpm in the control sample). Arguments ruling out possible contamination of the endonuclease preparation (see above) strongly suggest that these cleavages are due to the Eco RI endonuclease. Thus, the reduction in specificity of the Eco RI endonuclease appears to parallel that of the companion methylase, even Recognition Elements Utilized by the Endonuclease-There now exists a moderate amount of information on the effects of alterations in the canonical sequence with respect to the activity of both the Eco RI methylase and the endonuclease; here we attempt to combine this body of information with our own results on the specificity of the noncanonical activity in order to deduce specifk points of contact on the DNA which may be important for the binding specificity of the endonucIease.

* To investigate further the similarity of the relaxed specificity of the Eco RI endonuclease with the companion methylase, we carried out preliminary experiments to determine whether the nuclease can cleave poly[d(A-T)]. To this end we compared nuclease-treated and control samples of poly[d(A-T)], using a kinase technique to look for the introduction, by the enzyme, of new 5"ends that could incorporate 3zP in a kinase-mediated exchange reaction with [y3'P]ATP. Doublestranded poly[d(A-T)] (5 pg) was digested for 24 h at 37°C in
The bulk of the available evidence suggests strongly that the endonuclease makes most, if not all, of its recognition contacts with the initial G e C base pair of the canonical sequence in the major groove of the DNA helix. Thus, Berkner and Folk (1977) and Ito et al. (1975) have shown that glycosylated hydroxymethylcytosine residues in the canonical recognition sequence will block cleavage activity completely, implicating contacts in the major groove; however, the presence at the C5 position of cytosine of an hydroxymethyl group does not abolish cleavage activity (Kaplan and Nierlich, 1975), suggesting that the cytosine C5 position itself is not an intrinsically important point of contact. Furthermore, substitution of inosine for guanine in the canonical sequence has no appreciable effect on endonuclease activity (Modrich and Rubin, 1977), demonstrating that the N2 amino group of guanine (in the minor groove) is not required by the endonuclease for site recognition.
As pointed out by Seeman et al. (1976), a single directed hydrogen bond to a given base will not suffice to distinguish that base pair from all other possibilities; a minimum of two directed interactions are required for unambiguous recognition. For recognition here of the initial G -C (or the terminal C G) base pair two interactions are required which are sufficiently specific to discriminate these pairs from the "wrong" pairs: A . T, T A, and C G. The strong prejudice shown against cleavage at sequences beginning with C (or ending with G) suggests that at least one of these specific interactions is lost (and perhaps unfavorable contact points are introduced) in these cases. Hydrogen bond contacts in the minor groove are quite similar for G.C and C.G (Seeman et al., 1976); in addition the evidence of Modrich and Rubin (1977; discussed above) argues against important contacts in this groove. In the major groove, for G-C pairs, we can eliminate the C5 position on cytosine as an important contact point, since hydroxymethylation at this position does not block cleavage. Remaining sites for possible hydrogen bonds are the N7 position on the purine, the N4 amino group of cytosine, and the 06 carbonyl of guanine. Seeman et al. (1976) present a very useful set of diagrams illustrating the stereochemistry of double-helical A-U (or A. T) and G. C base pairs. By inspection, it is apparent that G C and C G base pairs are readily distinguished via the major groove by contacts at any of the donor or acceptor groups mentioned above. Ambiguous points of contact in the major groove are completely absent, and this probably explains the strong discrimination shown against the replacement of G .C base pairs by C G pairs in the canonical sequence. By comparison, G . C and T. A base pairs would be difficult to resolve if the endonuclease probed only the center of the major groove for hydrogen-bonding groups, since the 04 and 06 carbonyl of thymine and guanine differ in position by only 1.1 A (as do the N4 and N6 amino groups of cytosine and adenine). Unambiguous resolution of T -A from G . C could be provided, however, by the N7 imino group on the purine; any pyrimidine to recognition of (and slaw action at) the central two base pairs (A-T) of the recognition sequence. replacement for guanine would suffer from the lack of a comparably positioned hydrogen bond acceptor group.
In turn, A. T and G -C base pairs are unambiguously distinguished by contacts in the center of the major groove of the helix; this discrimination is ambiguous only if contact is made solely with the N7 position of the purine ring, at the sides of the major groove. We note here that a single hydrogen-bonding donor probe from the protein probably could not distinguish between the two different purines by bonding to the N7 position; moreover, a thymine replacement for guanine would place an acceptor group (the 04 carbonyl) rather close to the original N7 donor group (approximately 2 A toward the center of the groove and 1 A farther out), and so a slight displacement of the probing moiety might still enable it to make sufficient contact for the enzyme to bind and cleave the altered sequence.
Turning now to recognition of the four A .T or T -A base pairs of the canonical sequence, we note that the endonuclease shows a moderate sensitivity to alterations of the C5 substituent on thymine. Berkner and Folk (1977) have shown that while a uridine replacement has no effect on Vmax or K , for the endonuclease, substitution by hydroxymethyluridine (-CH20H replacing "CH3 at the C5 position of thymidine) lowers Vmax 20-fold, while showing little or no effect on K,.
More recently, Marchionni and Roufa (1978) have reported that 5-bromouridine-containing DNA is digested more slowly by the endonuclease than is unsubstituted DNA. The effects of these various substituents at the C5 position may reflect either steric interference by the bulkier bromine or hydroxymethyl group, or the altered polarizability of the C5 substituent, with possible consequences for hydrophobic interactions (Berkner and Folk, 1977).
A central major groove interaction involves the methylation or non-methylation of the N6 group of the central adenine residues in the canonical recognition sequence. The endonuclease is highly sensitive to methylation of these adenine residues, since half-modified sites are as well protected against cleavage as fully modified sites (Modrich and Zabel, 1976). Previous work from this laboratory on the conformation of N6-methyladenine residues in polynucleotides has demonstrated that such residues destabilize the DNA helix by -0.4 to -0.9 kcal/mol (Engel and von Hippel, 1978). The helical structure, however, is still intact; bases are fully paired, with no "eversion" of the modified residues of the double helical conformation. The N6-methyl groups project into the major groove of the helix, not inward to the interior of the helix, and so they can provide an obvious recognition feature. It is not clear exactly how introduction of the methyl group blocks cleavage. Perhaps it interferes with a hydrogen bond to the same adenine moiety, at either the N6 or N7 positions, or it might interfere sterically with groups on the protein and block either cleavage or binding. Protection of half-modified sites may reflect the dimeric structure of the endonuclease and a possible symmetry of interaction with the basically symmetric recognition sequence (Modrich and Zabel, 1976); interference with a single subunit by a single N6-methyladenine might be sufficient to protect the site fully. An intriguing possibility is that the N6-methyladenine, through its destabilizing effect on the DNA helix locally, might affect binding or cleavage by the enzyme. With regard to this last possibility it is interesting to note that a double-stranded substrate is necessary for enzyme activity; Greene et al. (1975) observed neither cleavage by the nuclease, nor modification by the methylase, of a self-complementary oligonucleotide containing the canonical recognition sequence at temperatures above the melting point of the oligomer.
These observations, taken together, indicate that the en-donuclease makes most or all of its critical recognition contacts in the central region of the canonical sequence via the major groove of the helix. We can bolster this assertion further by considering the effects on sequence recognition within this region by various base pair replacements: (i) replacement of A. T by T A presents much the same sort of array of hydrogen-bonding donor groups in the minor groove for either base pair, but differences in the arrangement of such groups is readily apparent in the major groove; (ii) replacement of A. T by C-G is readily apparent through hydrogen-bonding contacts in either the major or the minor groove, which easily explains the strong discrimination shown against this particular substitution; (iii) replacement of A T by G -C could be detected through contacts in either groove; however, the endonuclease does accept a G C replacement for A. T in some noncanonical sites (e.g. GGATTT). This last fact suggests that minor groove contacts are relatively unimportant, as G -C and C.G base pairs are not readily distinguished from the minor groove. This forces us to consider contact points in the major groove where G -C base pairs might bear some similarity to A -T pairs. A leading candidate is the N7 hydrogen bond acceptor present on both purine rings; this readily discriminates against T. A and C-G replacements, while allowing some ambiguity in distinguishing A -T from G.C base pairs. More stringent discrimination of A.T base pairs could be provided by a second hydrogen-bonding contact, for example, at the N6-amino group of adenine, or by hydrophobic or steric interactions around the Cbmethyl group of thymine.

A Model for Endonuclease Site Recognition-The points
of contact within the recognition sequence that have been deduced can be summarized as follows: (i) the leading G C base pair of the sequence is probably recognized by contact at the N7 position of guanine, and also possibly by contacts at the 06 carbonyl of guanine or the N4-amino group of cytosine; (ii) the innermost A .T (or T. A) base pair is probably recognized by contact with the N7 amino group and the N6-amino group of adenine, and possibly by contact with the C5 methyl group of thymine; (iii) the outer A .T (or T.A) base pair is probably recognized by contact at the N7-imino group of adenine, the 04-carbonyl of thymine, or the C5-methyl group of thymine. Using Fig. 6 (top), we show (in a highly schematic form) the set of contacts in the canonical sequence just enumerated for the endonuclease. Briefly, each base pair is represented by a line from which extend (vertically out into the major groove of the helix) a set of hydrogen bond acceptor (A) or donor ( D ) moieties, the linear arrangement of which follows that dictated by the orientation (e.g. G-C uersus C-G) and nature (e.g. A. T uersus G-C) of the base pair. For a more detailed explanation of this scheme, with applications to other site-specific DNA-protein interactions, see .
Recalling that the sequences GGATTT, AAATTT, and GAATTA represent especially favorable noncanonical cleavage sequences, we can compare in this fashion the trimer (half-site) sequences GAA, GGA, AAA, and TAA (Fig. 6,  bottom). A very intriguing feature, common to all four halfsites, emerges; i.e. an alignment along the axis of the helix of two hydrogen bond acceptor-donor pairs which can easily be bridged by a bifunctional hydrogen-bonding amino acid moiety such as glutamine or asparagine. Recognition of specific base pairs by such "bridging" probes on a protein has been previously proposed (Seeman et al., 1977) but consideration was limited to contacts made in the plane of the base pairs (perpendicular to the helix axis). We propose (see also von Hippel, 1979) that bridging contacts are equally likely to be made across base pairs in the recognition sequence (parallel to the helix axis), as illustrated by the shaded rectangles in Fig. 6, drawn about the proposed hydrogen bond donor-acceptor pairs. This model is entirely consistent with previous deductions concerning specific base contacts, it accounts for the preferences enumerated above in noncanonical cleavage site sequences, and it predicts that sequences where such bridging contacts are very distorted or totally absent w i l l resist cleavage even under noncanonical conditions.

Eco RI Endonuclease
One of the four noncanonical half-site sequences, GGA, exhibits a lateral displacement within the major groove of one of the two cross-base bridges; the resulting distorted recognition element could be responsible for the reduced rate of cleavage at the sequence GGATTT, as compared to canonical site cleavage. (A comparable distortion is required to form the double cross-bridge structure for the half-site sequences AGA and CAA, which have been suggested for the noncanonical site at 6x174 position 3962). We note that other possible candidate half-site sequences at which cleavage has not been observed (GTA or GCA) can be eliminated a priori by the total absence of the proposed set of double bridge contacts (GCA) or by the need to grossly distort both bridging contacts (GTA) .
Sequences such as GGATTT are not palindromic (symmetric); since cleavage of the two strands appears to involve independent binding and catalytic acts (see Modrich, 1979), it may be that in such cases the "better aligned" half-site sequence (e.g. here AAA) is cleaved fist, introducing additional flexibility for the enzyme to better align (and better bind to?) the other half-site sequence. The single strand nicking of the nonsymmetrical sequence GAATTA (Bishop, 1979) under canonical conditions is certainly consistent with such a hypothesis; the TAA half-site sequence here is somewhat distorted (Fig. 6B) and thus would be expected to be cleaved only under noncanonical conditions. Recognition Elements Utilized by the Methylase-Based on these results for the endonuclease, we consider now a parallel analysis of the recognition mechanism of the Eco RI methylase. From the results of Modrich and Rubin (1977) on the effects of inosine-substituted recognition sites, the methylase apparently uses the N2 amino group of guanine (in the minor groove) to distinguish the initial G a C base pair of the canonical sequence. This is supported by the observation that the methylase acts readily on phage T4 DNA (Berkner and Folk, 1977) which contains substantial amounts of glucosylated hydroxymethylcytosine; one might expect the bulky sugar substituent to block contacts not only at the C5 position of cytosine, but at other nearby points in the major groove of the helix as well (e.g. at the N4-amino group of cytosine or the 06-carbonyl of guanine). The fact that the methylase apparently disregards such potentially blocked contacts implies their lack of significance in the site-recognition mechanism of the methylase. However, based on our observation that in overmethylated +X174(RF) DNA sequences such as AAATTT are protected against noncanonical attack by the endonuclease, we conclude that the N2 amino group of the initial guanine residue of the canonical sequence is not absolutely essential for the methyltransfer reaction. We speculate further that the methylase might make some other contact in either the major or minor groove with the initial G-C base pair, since the enzyme obviously discriminates between G -C and C .G base pairs at this position in the recognition sequence; these base pairs differ only slightly in terms of contact at the N2 of guanine (Seeman et al., 1976), and a second, independent contact-interaction (in either groove of the helix) would seem necessary for the precise discrimination seen. Perhaps it is the interaction with this (postulated) second contact point, even in the absence of the guanine N2 donor group, that allows methylation of such noncanonical sequences as AAATTT, GAATTT, or GAATTA, with concomitant protection against attack by the endonuclease.
The methylase is quite sensitive to substitution at the C5 position on thymine, since replacement of the methyl group here by either hydrogen or an hydroxymethyl group will essentially eliminate canonical site methylation under standard conditions (Berkner and Folk, 1977). In contrast to this, replacement of this methyl group by bromine allows both canonical and noncanonical reactions to proceed with little apparent effect on the rate of reaction. It is not clear whether these differing results are due to steric effects on binding or catalysis, or to some more subtle effect of the varying electro-negativities of the substituents.
Our results on the apparent lack of protection of sequences such as GGATTT imply a strong discrimination by the methylase of A. T from G -C base pairs at the second position of the recognition sequence. The observed protection afforded sequences such as GAATTT argues against the lack of methylation being solely due to replacement of the terminal (canonical) C-G base pair by T.A. Possibly the G.C for A . T substitution (in GGATTT) does not provide a proper contact on the pyrimidine (see above), since the C5 hydrogen substituent of cytosine would mimic the disfavored uridine substitution in this respect; many other possible contacts in this base pair substitution are distorted or absent as well, of course, and may play a role here also.
Concerning recognition of the central A -T base pair to be methylated, we suggest that the sensitive discrimination of the (proper) A. T base pair from a C . G base pair (which offers a similarly positioned exocyclic amino group in the major groove) may be due in part to contacts at the C5 position of the thymine residue; the C .G base pair would, of course, completely lack a comparably positioned substituent.
Methylation of canonical sites is apparently insensitive to the difference between half-modified and unmodified sites, implying an asymmetry in recognition contacts in the canonical sequence (Modrich and Rubin, 1977). More particularly, the methylase must "ignore" the state of modification of the central adenine residue on one strand while it operates on the corresponding adenine of the other strand of the recognition sequence. This implies that the N6-amino group of the former adenine residue (modified or not) is not a contact point for the methylase during this process.
A Model for Methylase Site Recognition-We can draw these various results and speculations together in the form of a template model for recognition contacts made by the methylase (Fig. 7), including only those contacts which appear to be important. New features are introduced into this template representation because of the possible importance of contacts in the minor groove and with methyl groups in the major groove. To represent possible hydrogen-bonding groups in the minor groove, these acceptors or donors are drawn downwards from the line representing the basic pair under consideration (major groove contacts still extend upward). Thymine C5 methyl groups appropriately positioned on the template are represented as me, and extend upward from the base-line into the major groove; monomethylated N6-amino groups of adenine are represented by a bracketed donor symbol ( D ) .
Comparison of Figs. 6 (top) and 7 brings out the differences in recognition contacts made by the Eco RI endonuclease and methylase. The asymmetry of site recognition by the methylase appears in the center of Fig. 7, where the one N6 adenine contact is methylated and "ignored" (bracketed ''D'). By contrast, the array of contacts made by the endonuclease is fully symmetric. Another difference which is apparent in these representations is the predilection of the endonuclease for contacts in the center and to the "purine" side of the major groove, while contacts of the methylase in the major groove are on the opposite ("pyrimidine") side. There is, of course, the important contact made in the minor groove by the methylase; no good evidence exists for any important minor groove contacts for the endonuclease. With the present information on Eco RI methylase recognition contacts we are unable to discern any pattern of intra-or cross-base pair bridging contacts such as those suggested for the Eco RI endonuclease.
Effects of Environmental Alterations on Site Specificity of Interaction-The dependence of Eco RI endonuclease site specificity on pH, low ionic strength, and polar organic solutes could be due to titration of groups on the protein that are responsible for recognition contacts. Alternatively, titration of certain groups on the protein (not necessarily directly involved in interactions with the nucleic acid) could change the protein's conformation at key points sufficiently to alter the specificity of interaction by other groups on the protein. The mechanism(s) by which Mn2+ ions affect the specificity of the endonuclease must be quite similar to the mechanism(s) for pH, ionic strength, etc., since the same trends in site specificity are observed. It seems unlikely that Mn2+ produces its effects by simply binding to the nucleic acid and somehow altering recognition contacts on the bases. Although Mn2+ does show an appreciable affinity for the N7 position on purines, relative to binding to phosphate groups (reviewed by , transition metal ions such as Cu2+ and Co2+, which have even greater base versus phosphate affinities, do not stimulate Eco RI* activity by the endonuclease. Thus, it seems more likely that the reduction in sequence specificity effected by Mn2+ is due to protein-Mn2+ interactions. Possibly the Mn2+ ion acts as a bridge between groups on the protein and on the nucleic acid (at the phosphate backbone, at the sugars, or at the bases themselves). Another possibility, consistent with a possible titration of groups on the protein, is that the Mn2' ion forms a complex with the titrated form of this group or groups, thus shifting its (their) pK, value(s) downward (Freeman, 1973). This could explain why Mn2+ effects on nuclease site specificity are seen under considerably less alkaline conditions, and at higher ionic strengths, than the other perturbations examined.
The overall rate of reaction at noncanonical sites exhibited by the endonuclease or methylase is only 1% to 10% of that seen at canonical sites. We assume that this difference is due to a change in the stability of the enzyme nucleic acid complex, and in particular, is manifested in a low rate of dissociation of the enzyme from "good" sites, as compared to "poor" sites.
Noncanonical sites presumably lack one or more necessary contacts for a stable complex (or they have gained new destabilizing interactions) and are thus generally "rejected" by a (relatively) rapid dissociation of the putative enzymevsubstrate complex before the catalytic step can take place. If, for example, a 100-fold difference in rate reflects a 100-fold difference in binding affinity, at 37°C (310 K) this corresponds to a difference in binding free energy between the complexes of roughly 2.8 kcal/mol, a quantity easily accounted for by minor alterations in the hydrogen-bonding interactions responsible for the extra specificity of the canonical sequence. (We note that a destabilization of 2.8 kcal/mol is about 1% times the destabilization of repressor. operator complexes observed for single base pair changes in operator-constitutive mutations of the lac operator sequence (see Goeddel et al., 1978;von Hippel, 1979).) To illustrate these notions in a semiquantitative fashion, we conclude with a heuristic calculation on the relative effects of ionic and base-specific interactions on the stability of the enzyme. substrate complex. Various results (Goppelt et al., 1980;Woodhead and Malcolm, 1980;Modrich, 1979) suggest that at least the Eco RI endonuclease can bind specifically to potential cleavage sites in the absence of divalent cations (and thus of catalysis). (This result also supports the assumption that it is binding, rather than catalysis, which is nucleotide sequence-specific.) In addition, preliminary results in our lab-oratory3 and by Goppelt et al. (1980) indicate that nuclease binding decreases appreciably with increasing salt concentration. Following the analysis of Record et al. (1976Record et al. ( ,1978, these results can be used to calculate the contribution of electrostatic interactions to the binding free energy of complex formation as follows:

AG,,,,, = m'$RTln[Na']
where m' is the number of protein-nucleic acid charge-charge interactions formed in the complex and IJ is the fraction of counterions bound thermodynamically (per phosphate) to the unliganded DNA. Using a value of m' = 2 charge-charge interactions (Goppelt et al., 1980), and 4 = 0.88 (Record et al., 1976), we calculate that for the noncanonical sites at 37°C AGionic changes (become more favorable) by --2.5 kcal/mol with an -10-fold decrease in salt concentration. This increase in electrostatic binding free energy just about offsets the loss of nonelectrostatic binding free energy associated with the loss of one to two base pair-specific interactions in the complex, as estimated from the lac repressor-operator studies (see above), and suggests that the general approach outlined here may indeed have quantitative validity.