Mutagenesis by Transient Misalignment”

Based upon a consideration of two mutational hot spots produced during DNA synthesis by a eukaryotic DNA repair polymerase, we suggested that certain base substitution errors result not from direct miscod- ing but from correct coding by a transiently misaligned template-primer (Kunkel, T. A., and Alexander, P. S . (1986) J. Biol. Chern. 261, 160-166). This model, which we called dislocation mutagenesis, has been directly tested. Introducing a single, phenotypically si- lent G + A base change into the template switches the base substitution specificity at the immediately adja- cent hot spot, a T residue, from T + G transversions to T -., A transversions. The cumulative change in frequency, represented by the disappearance of the T + G events and the appearance of the T + A events, is >300-fold. These data demonstrate that during DNA synthesis in vitro, a base at one position can code a mutation at another position. This mechanism can operate over greater distances to produce complex mutations as well. We present one example in which a 123-base deletion containing three base changes at one end of the deletion can be precisely explained by tran- sient misalignment. It remains to be established whether mutagenesis by dislocation operates in vivo to produce biologically significant changes in genetic information. mutational target, the hcZa complementation sequence. Correct synthesis to fill the gap produces DNA that, when used to transfect the appropriate E. coli host strain, will produce dark blue M13 plaques. Errors during synthesis are scored as lighter blue or colorless plaques. Since the assay measures loss of a gene function that is not essential for phage production, a wide variety of mutations at many different sites can be recovered and scored, including all 12 base substitution errors, frameshifts, deletions, and complex errors.

+ G events and the appearance of the T + A events, is >300-fold. These data demonstrate that during DNA synthesis in vitro, a base at one position can code a mutation at another position. This mechanism can operate over greater distances to produce complex mutations as well. We present one example in which a 123-base deletion containing three base changes at one end of the deletion can be precisely explained by transient misalignment. It remains to be established whether mutagenesis by dislocation operates in vivo to produce biologically significant changes in genetic information.
DNA is a structurally dynamic macromolecule capable of assuming a variety of conformations. This presents a significant challenge to an organism during the replication and maintenance of genetic information. The ability of DNA strands to misalign, creating unusual intermediates that can be processed into mutations, provides explainations for a variety of mutations involving the loss or gain of one or more bases (1-11). As early as 1960, Fresco and Alberts (12) suggested that misalignments could in principle also lead to base substitution errors, a suggestion advanced again by Fowler et al. (13) in discussing the base substitution specificity of an Escherichia coli mutator strain.
We have been examining molecular mechanisms of mutagenesis during i n vitro DNA synthesis catalyzed by purified eukaryotic DNA polymerases acting on defined templates (14)(15)(16)(17)(18)(19). For this purpose we use a mutagenesis assay which scores a variety of base substitution, frameshift, deletion, and complex errors at numerous positions within the lacZa DNA present in M13mp2 DNA. In the course of studies with rat DNA polymerase-@ (pol-@),' we observed two mutational hot spots that appeared to be related both by DNA sequence and * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The abbreviation used is: pol-@, polymerase-@. error specificity. At one site pol-@ produced primarily frameshift errors while at the other site this polymerase produced a high frequency of T + G transversions, bordered by a template G neighbor. These data led us to propose a model (14), which we called dislocation mutagenesis, to explain certain base substitution errors based upon a transient template-primer misalignment. In this report we describe results that support this model and then extend it to explain a complex deletion in which a block of genetic information is moved from one position in the genome to a distant position.

EXPERIMENTAL PROCEDURES
All materials and procedures have been described in Refs. 14,18,and 19.
M13mp2 Mutagenesis Assay-The assay measures the fidelity of a single round of in vitro gap-filling DNA synthesis by a DNA polymerase. A gapped M13mp2 DNA substrate is constructed such that the 390-base single strand gap contains the 250-base mutational target, the hcZa complementation sequence. Correct synthesis to fill the gap produces DNA that, when used to transfect the appropriate E. coli host strain, will produce dark blue M13 plaques. Errors during synthesis are scored as lighter blue or colorless plaques. Since the assay measures loss of a gene function that is not essential for phage production, a wide variety of mutations at many different sites can be recovered and scored, including all 12 base substitution errors, frameshifts, deletions, and complex errors.

RESULTS
The M13mp2 forward mutation assay was used to describe the spectrum of errors produced by DNA polymerase B during a single round of i n uitro gap-filling DNA synthesis (14). This enzyme was found to be highly inaccurate, committing a variety of base substitution, one-base frameshift, and deletion errors.
The Dislocation Model-The dislocation model for base substitution errors was suggested by considering two mutational hot spots in the spectrum of errors produced by pol$. These two hot spots were related in both their template DNA sequences and their error specificities. At the template sequence 5'-C-G-T-T-T-T-A-C-3', a high frequency of -T frameshifts was observed. At a lower (but still high) frequency, T + G transversions were observed at the 5'-most T in the run (position 70), having as a 5"nearest neighbor a template G residue. The second hot spot 33 bases downstream contained a quite similar template sequence, 5'X-G-T-T-A-C-3'. Here both -T frameshifts and T + G transversions at the 5'-most T in the run (position 103) were again observed, but the relative frequencies of the two errors were reversed. At this shorter (two-base) T run, the frameshift frequency was much lower and the T + G events predominated.
The specificity of the base substitutions, exclusively T + G transversions, was unexpected. This mutation involves a template T . dCMP mispair. The simplest way to produce this intermediate is by direct misinsertion by pol-B of dCMP opposite the template T residue (Fig. L4,  substitution). This possibility seems reasonable since pol-@ is has occurred, pol-& which cannot remove the mistake since error prone for a variety of misinsertions throughout the h c Z a it lacks associated proofreading exonuclease activity (20), mutational target, precedents exist for large and specific sitecould process the mispaired intermediate through the base to-site differences in mutational phenomena, and the precise substitution pathway (Fig. L4, pathway B B ) . Alternatively, rules governing misinsertion fidelity, particularly with eukarbecause the misinserted nucleotide (dCMP) is complementary yotic enzymes, have not yet been defined. Once misinsertion to the next template base (a G ) , a rearrangement could occur (pathway BF) to produce a -T frameshift.
Considering the chemistry and structure of mispairs (21-23) and measurements of misincorporation of pyrimidines opposite pyrimidines with prokaryotic DNA polymerases (24,25), such direct misinsertion might be expected to occur only rarely, while the other two base substitutions that are known to be detectable at these positions (16) should occur at equal or higher frequencies. However, no T + C transitions (via a more favorable T. dGMP intermediate) or T + A transversions (via a T.dTMP mispair) were observed. This unique base substitution specificity, the tendency of pol-@ to produce minus-one-base frameshifts within runs of a common base (14) by a slippage mechanism (17) and the fact that errors occurred at the 5'-most T in a run of T residues followed by a template G residue, led to the model shown in Fig, 1A (pathway F, for Frameshift). In this instance, correct synthesis proceeds until a misalignment occurs, resulting in an extrahelical template T. Continued correct incorporation of additional nucleotides from the misaligned template-primer (pathway F F ) fixes the extrahelical T and results in the -T event. However, if a single correct incorporation (of dCMP) occurs followed by realignment of the extrahelical T prior to the next incorporation event (pathway FB), a T. dCMP mispair is created at the 5'-most T in the run. Continued correct incorporation from this mispair results in the T + G base substitution. This base substitution is not due to miscoding by pol-/3 but rather to repetitive use of the template G, first from a misaligned intermediate and then from a misparied primer terminus.
Oligonucleotide-directed Mutagenesis of the Hot Spots-The dislocation model invokes multiple and complex interactions to explain an unexpectedly frequent event. The specificity and symmetry observations for the hot spots do not reveal whether misalignment or misinsertion is the initiating event for errors at these sites. To directly distinguish between these alternatives, we performed oligonucleotide-directed mutagenesis (26) to change a G to an A at both 5"neighbors to the positions where the T + G errors were observed. If the initiating event is misinsertion of dCMP, then the expected results with this new template would continue to be T + G transversions (Fig. 1B, pathway B then B B ) and either a double change (-T and A + G, following pathway BF) or a decrease in -T frequency, since the realigned intermediate contains both an A. C mismatch and an extrahelical T. Alternatively, if the initiating event for the base substitution errors is misalignment, then the base substitution specificity should switch from T + G to T + A transversions.
In order to determine which base substitution errors at the hot spots (positions T70 and T103) were detectable as light blue or colorless a-complementation plaques when their neighbors (positions 69 and 102) were either G or A, oligonucleotide-directed mutagenesis experiments were performed to establish the phenotypes of all codons of interest. Fortunately, both G "-* A changes are phenotypically silent, i.e. the resulting M13mp2 plaques are dark blue on X-gal plates, indistinguishable from true wild-type plaques. The altered template can therefore be used in the standard M13mp2 forward mutation assay (14) in which mutants are identified as light blue or colorless plaques. With wild-type DNA (i.e. G69 and G102), all three possible base substitutions were constructed at both sites, and all were found to result in scorable color phenotypes (all light blue). With the pseudowild-type derivative (A69A102), all three substitutions were again constructed at both sites. With this template, three of the six possible changes were found to be detectable (also light blue phenotypes). These are T + G and T + A at position 103 and T -G at position 70.
Mutational Specificity of Pol-@ at the Hot Spots-To test the dislocation model, gapped templates were prepared (19), using either the original wild-type M13mp2 DNA (to confirm our previous observations performed with different preparations of DNA and pol-@) or M13mpZA69A102 DNA. Pol-@ gap-filling synthesis reactions were performed and an aliquot of each reaction was analyzed (14), confirming that the 390base gap was filled (data not shown). A second aliquot of the products was used to transfect competent cells to score light blue and colorless mutants (18,19). The overall frequency of mutants produced by pol-@ was similar for the two templates (Table I), being only slightly lower with the A69A102 derivative, and was similar to previous measurements of pol-@ fidelity (14).
In order to determine mutational specificity at the hot spots, the sequence of the DNA from each of 494 independent and randomly chosen mutants was determined (Table I and Fig. 2). With wild-type DNA the results were essentially as previously described, demonstrating the reproducibility of the original observations. Thus, -T frameshifts and primarily T + G transversions were produced at high frequency at the two hot spots and the ratio of these two errors differed at the two sites. However, the substitution of A for G at position 102 had a marked effect on base substitution specificity at position 103. Only one T * G transversion was observed with the new template while T * A transversions were now produced at a high frequency. Comparing the two templates at the base substitution hot spot where both errors could have been scored with both templates, the frequency of T + G events decreased more than 3O-fold, while the frequency of T + A events increased more than 10-fold. The cumulative effect is thus >300-fold and precisely as predicted by the dislocation model, following pathways F and FB. These frequencies, plus the absence of T + G mutants and -T103, A + G102 double mutants in the pseudowild-type spectrum, suggest that base substitutions are produced at position 103 at least 10-30-fold more frequently by dislocation than by direct miscoding.

TABLE I
Pol-@ error frequency and mutational specificity at wild-type versus pseudowild-type hot spots Reactions were performed with rat pol-6 as described (14). Transfections were performed as described (141, using an E. coli S9OC strain containing the recA56 mutation and completely lacking the lacZ gene, to eliminate any potential contribution of i n vivo recombination to the spectrum of sequenced mutants. The background mutation frequency, determined as described (14), was 6.7 x Not one of 128 spontaneous mutants obtained by transfection and then sequenced contained any of the mutations described in this work. The mutation frequency values in parentheses are taken from Refs. 14 and 16 and represent the initial observations (with 296 sequenced mutants) upon which this work is based.

Dislocation
Mutagenesis 14787 DNA. In the wild-type spectrum, two -TT frameshifts, both at the run of four Ts, were also detected (not shown). The T + C change at position 103 in the lower spectrum is a silent change which was linked to a phenotypically detectable change at a distant position. Pol @-generated double mutants of this type were observed previously (14) as well.

C-A-T-T-
The 9-fold decrease in T "-f G frequency at position 70 (Table I) also is predicted by the dislocation model. Unfortunately, T + A events are not scorable at this site. The appearance of several T "-f C transitions at the hot spots and a T + G transversion at position 70 when an A is present at position 69 demonstrates that, as expected, simple direct miscoding can contribute t o the base substitution spectrum at these sites.
The mutation frequencies at both hot spots are lower when A rather than G is present at positions 69 and 102. The decrease is 4-fold for -T70 mutations, &fold for -T103 mutations, and 2.3-fold for dislocation base substitutions a t position 103. The reason(s) for these decreases is not known. Complex Deletion Mutagenesis by Transient Misalignment-The steps involved in the dislocation process, misalignment, limited incorporation, realignment, and continued incorporation, are not limited to base substitution errors. The original pol-@ mutant collection contained a number of deletions (14). One particular deletion was recovered five times and at a frequency >200-fold above the spontaneous background frequency. It was the loss of 317 bases of the 390-base single-stranded gap (Fig. 3A). This included one of two 5base direct repeat sequences, having the template sequence 5'-C-C-C-G-C-3' (positions 166-170 in the lacZa coding sequence) and all 312 intervening bases.
The simplest model (Fig. 3B) to explain this mutation involves the following steps: 1) synthesis by pol+ of nine nucleotides starting with the provided 3'-OH primer and proceeding through the first C-C-C-G-C template repeat, 2) disruption of (at least) the five G.C primer-terminal base pairs, 3) rearrangement of the DNA with reformation of five hydrogen-bonded G . C base pairs involving the newly made DNA and the second direct repeat 317 bases downstream. This intermediate may be further stabilized by three G.C base pairs that can potentially form a stem within the resulting loop of the heteroduplex intermediate. Through this step, this mechanism is formally equivalent to forming a frameshift intermediate (pathway F, Fig. l), but it involves more nucleotides, greater distances, and a larger misaligned heteroduplex. 4) Continued synthesis from the intermediate (equivalent to pathway FF in Fig. 1) produces the heteroduplex which upon transfection and expression of the minus strand yields the 317-base deletion.
In the collection of mutants generated by rat, pol-/3 was a complex mutant in which 123 bases were deleted, and three base changes were present a t one end of the deletion (Fig.  3A). This complex deletion can be precisely explained by a transient misalignment mechanism (Fig. 3C). The model involves four blocks of DNA sequences positioned at three different locations spread over the entire 390-base singlestranded gap. The initial events (steps 1 and 2 in Fig. 3C) are as just described for a simple direct repeat mechanism, but in this case involving two 5'-G-C-C-C-G-3' sequences 370 bases apart. The first of these repeats is almost the same repeat as for the 317-base deletion, except that one additional base has been incorporated the repeat is thus at positions 165-169. The resulting intermediate contains a five-G. C base pair primer stem and a 375-base loop, in this case without a stem of hydrogen-bonded bases to stabilize the intermediate. This intermediate retains five bases of single-stranded template to be filled before the 5'-end of the gapped template is encountered. Pol$ is capable of completely filling gaps to the last nucleotide (27) and is also capable of limited strand displacement synthesis (28, 29). Thus, gap completion followed by strand displacement synthesis of six bases would produce the intermediate shown in Fig. 3, step 3. Seven of the eight bases in the sequence shown in block 3 are complementary to the sequence in block 4, which is 253 bases away, in what is still single-stranded DNA. Rearrangmeent would form the intermediate shown in step 4. The final step is continued synthesis to fill the remaining gap, followed by transfection and expression of the newly made strand.
This model precisely explains the 123 bases deleted and, most importantly, the three unexpected bases at the deletion end point. The explanation is formally equivalent to base substitutions by dislocation in that a misalignment occurs, followed by a limited amount of correct incorporation from the misaligned intermediate, then realignment, followed by continued synthesis from the second heteroduplex intermediate. The result is that a block of 11 nucleotides has been moved to a position 253 nucleotides distant.

DISCUSSION
The results presented here demonstrate that a template sequence at one position can code for what is observed as new genetic information, i.e. a mutation, at an entirely different position. The dislocation mechanism has implications for a variety of mutational phenomena. The data presented here provide a novel alternative to direct misinsertion to explain a subset of base substitution errors. In addition, just as these results demonstrate that base substitutions can be initiated by misalignments, frameshifts could be initiated by misinsertions, by following pathway B and then BF in Fig. lA. This could explain frameshifts at nonrun or short run sequences. The interplay between mispaired and misaligned intermediates depicted in Fig. 1 provides one framework for considering multiple mutational endpoints induced by a single DNA adduct, such as frameshifts resulting from miscoding lesions. Furthermore, transient misalignment provides one possible mechanism by which various barriers to DNA synthesis, such as DNA adducts or palindromic sequences, might be bypassed, albeit mutagenically. Each of these possibilities can be examined with existing mutational assays.
Dislocation is only one of several mechanisms for producing errors during DNA synthesis in uitro. Thus far, base substitution mutagenesis by dislocation has been clearly demonstrated at only one template site with one set of reaction conditions and the simplest eukaryotic DNA polymerase known. It does not operate at detectable frequency with more complex polymerases, including the human KB cell DNA polymerase a-primase complex or DNA polymerase y (15), which contains a proofreading exonuclease (19). Nevertheless, the mechanism was suggested by considering two hot spots that contained 35% of the mutants in the collection (14). While we do not yet know why this site is so prone to dislocation mutagenesis, the site and enzyme specificity demonstrate that both the primary DNA sequence and protein interactions with the template-primer stem are obviously important. A number of approaches are available to examine the parameters that influence this process, including further engineering of the DNA sequence at the hot spots and measuring polymerase kinetic constants for the intermediates shown in Fig. 1. Whether the dislocation mechanism operates to generate mutants in vivo remains to be determined. The idea that distant sequences can template mutations during DNA replication, repair, or recombination has been proposed to explain a variety of mutations generated in uiuo. For example, the processing of imperfect palindromes (6, 7) is one mechanism that could explain certain complex mutations in the E. coli lacZ (30), T4 rZZB (31), and yeast iso-1-cytochrome c (2, 32) genes. The data in Fig. 3 suggest that dislocation during DNA synthesis provides an additional mechanism to explain the origin of certain other mutations recovered in these same genetic systems (2, 11, 32-35). This mechanism could have biological significance for the evolution of mammalian multigene families such as the human interferon genes (36) or for systems requiring rapid generation of diversity such as the immunoglobulin genes (37). We have, in fact, already reported that synthesis by terminal deoxynucleotidyltransferase, either alone or in combination with pol-p, produces, through imperfect, misaligned intermediates, complex errors that are similar to sequences in rearranged immunoglobulin genes (38, 39). It will also be interesting to see if any of the mutants presently being generated and recovered from eukaryotic systems in uiuo and analyzed by DNA sequencing (9, 10 and, for review, see Ref. 40) are consistent with this mechanism. Even more importantly, it should be possible in the future to directly test this model in uiuo.