The Structural Basis for Processing of Unnatural Base Pairs by DNA Polymerases

Abstract Unnatural base pairs (UBPs) greatly increase the diversity of DNA and RNA, furthering their broad range of molecular biological and biotechnological approaches. Different candidates have been developed whereby alternative hydrogen‐bonding patterns and hydrophobic and packing interactions have turned out to be the most promising base‐pairing concepts to date. The key in many applications is the highly efficient and selective acceptance of artificial base pairs by DNA polymerases, which enables amplification of the modified DNA. In this Review, computational as well as experimental studies that were performed to characterize the pairing behavior of UBPs in free duplex DNA or bound to the active site of KlenTaq DNA polymerase are highlighted. The structural studies, on the one hand, elucidate how base pairs lacking hydrogen bonds are accepted by these enzymes and, on the other hand, highlight the influence of one or several consecutive UBPs on the structure of a DNA double helix. Understanding these concepts facilitates optimization of future UBPs for the manifold fields of applications.


Introduction to Unnatural Base Pairs
Genetic information in all living organisms is encoded in DNA, which consists of nucleotides with four differentn ucleobases that form nucleobase pairs. Adenine pairs with thymine (or uracil in RNA) through two hydrogen bonds and cytosine pairs with guanine throught hree hydrogen bonds ( Figure 1A). Decades ago, the plan emerged to design synthetic nucleotides that can form additional base pairs, so-called artificial or unnatural base pairs (UBPs). [1] The benefits of having at hird base pair are diverse. As UBPs are structurally different from the natural pairs (differences can range from minimal to large), ac lear gain is the increased chemical and structurald iversityi nD NA and RNA strands that can be created if it consists of six instead of four buildingb locks. Increased diversity is, for example, useful in the search for affinity binders like aptamers. Including an UBP in SELEX (systematic evolution of ligands by exponential enrichment) processes can be used to generate aptamers that bind to proteins and cells, as hasa lready successfully been demonstrated. [2][3][4][5][6][7] Apart from generatingd iversity in DNA and RNA, athird base pair can be used to incorporate non-proteinogenic amino acids into ap olypeptide chain by ribosomebased translation. Generation of proteins containing unnatural amino acids by the use of UBPs has already been realized in vitro [8,9] but would be even more useful in vivo and af irst success in this field has already been achieved. [10] One ultimate aim of synthetic biologyi st he generation of as emisynthetic organism (SSO) in which the artificial base pair is stably included during growth andr eproduction. Af uture practical application of such an SSO would be the production of new proteins with therapeutic or diagnostic value, which include non-natural amino acids at specific sites. Artificialb ase pairs can also be used for site-specific post amplification labeling of DNA, [11,12] for example, to identify DNA lesions. [13] Furthermore, Benner and co-workerss howed, for example, the beneficialc ontribution of their artificial pairs in multiplexed polymerase chain reaction (PCR), [14] diagnostic of different viral RNA sequences in a complex environment, [15] and the synthesis of large DNA constructs from short fragments. [16]  To be applicable in the above-mentioned approaches, the UBP candidate needs to fulfill severalp roperties. The UBP needs to be fully orthogonal to the naturalp airs and efficiently and selectively replicated by DNA polymerases (during multiple cycles of PCR) and translated to RNA by RNA polymerases. Thereby,t he pairing partnerss hould be inserted into DNA with an error rate per base pair (also termed fidelity) at least as low as 10 À3 , [17] meaning one error in 1000i ncorporation reactions. For comparison, natural DNA is replicated with fidelities of up to 10 À5 to 10 À6 when using aD NA polymerase with an associated 3'-5' exonuclease activity. [17] One error in 1000 reactions would requireaselectivity of at least 99.9 %p er replication step. A9 9.9 %s electivity in turn leads to 97 %r etention of the UBP after 30 cycles of PCR (0.999 30 = 0.97) and only 90 %r etention after 100 cycles of PCR. Even though this degree of selectivity is sufficient for an umber of applications (e.g.,i nt he use of primers containing UBPs in nestedP CR or use in diagnostics), [13,14] for others, where high amplification of the DNA or plasmidc ontaining the UBP is performed and loss of the UBP is critical( e.g.,i fi mplementedi na nS SO that should produce proteins containing an unnatural amino acid), [9] as electivity truly approaching that of natural pairs is crucial.
In this review,w ef eature differentU BPs with the main focus on their acceptance by DNA polymerases and structural studies, investigating the base pairs in free duplex DNA and in the active site of KlenTaq DNA polymerase (KlenowF ragment of DNA polymerase Io fThermus aquaticus). We use the abbreviations UB nucleosides (dN), and UB nucleotides (dNMP for the monophosphate, dNTP for the triphosphate) in the following.

Different UBPs and their Acceptance by DNA Polymerases
Chiefly,t hree different groups headed by Benner,R omesberg, and Hirao have most significantly advanced the development of UBPs in the past few decades and all three groups have developedd ifferent candidate molecular scaffolds, which are well replicated by DNA polymerases( Figure 1B,C). In this review, we only introduce the currently most successful and investigated pairs developed by these research groups. Thereby,w ed ifferentiate between the two families:h ydrogen-bonding UBPs (including the candidates from the Benner lab) andh ydrophobic, non-hydrogen-bonding UBPs (comprising the mostr ecent pairs developed in the Hirao and Romesberg labs). Ad etailed history of the developmento fU BPs is described elsewhere. [18][19][20][21][22][23] Furthermore, the numerousa nd diverse applications of the well-replicated UBPs in the creation of DNA aptamers and an SSO with as ix-lettera lphabet, but also other in vitro applications,a re described in the following reviews. [22,[24][25][26]

Hydrogen-bonding UBPs
Based on orthogonal hydrogen-bonding patterns,t he Benner lab developed af ully Artificially Expanded Genetic Information System (AEGIS) including 12 nucleotides that in total form six specific nucleobase pairs. All pairs have ad ifferent,d istinct ar-rangemento fh ydrogen-bond donor and acceptorg roups, form three hydrogen bonds, and retain Watson-Crick geometries. [18,27] The most prominentm embers of the AEGIS system are the nucleobases 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8 H)one( shortly termed P)a nd 6-amino-5-nitro-2(1 H)-pyridone (shortly termed Z), which form a P-Z base pair through three hydrogen bonds [28] (Figure 1C). The dP-dZ pair is replicated by diverse DNA polymerases of the A-and B-family, albeit with lower efficiency compared with the natural counterparts. [29] In PCR reactions using Ta q( familyA ), Vent (exo-), and DeepVent (exo-) (both familyB )D NA polymerases, the fidelity (or selectivity) per round is reported to be 94.4, 97.5, and 97.5 %, respectively.Amore recent protocol runs with retention of one dP-dZ pair in an amplified DNA strand of 99.2 % per theoretical PCR cycle with standard triphosphate concentrationsa nd even 99.8 %u nder optimized triphosphate concentrations. [30] The study by Yang et al. revealed that the highest retentiono ft he dP-dZ pair is reached at ap Ho f7 .8-8.0 by using Vent (exo-) or DeepVent (exo-) but with the drawback that natural dC-dG pairs are likely converted to dP-dZ pairs. For an optimal overall fidelity (low misincorporation of unnatural opposite natural nucleotides plus high retention of the unnatural nucleotides), TaqD NA polymerase appeared to be better. [29] Several consecutive (up to four) dP-dZ pairs can be enzymatically incorporated into aD NA strand before the incorporation stops. Furthermore, DNA templates containing up to four consecutive dP-dZ pairs can be PCR amplified by Taqa nd Phusion DNA polymerases. [30] Karin Betz studied LifeS ciences at the University of Konstanz (Germany), where she obtained her PhD degree in 2014 under the supervision of Prof. Dr.A ndreasM arx. Currently, she is senior scientist in the same group. Her researchi nterests include structural studies on DNA polymerasesa nd othere nzymes that process or are regulated by nucleotide species.
Owing to the rather low incorporation efficiency,d irected evolution of Ta qD NA polymerase was performed to improve the enzymep roperties. The generation of aK lenTaq DNA polymerase mutant (M444V,P 527A,D 551E, andE 832V)i ncreased the incorporation efficiency of dZMP opposite dP (judgedb y primere xtension experiments). [31,32] The reverse process, incorporation of dPMP opposite at emplating dZ,h owever,w as inefficient. Am ajor drawback of the dP-dZ pair in general is the mispairing with natural nucleotides (mainly misincorporation of dGMP opposite deprotonated dZ). [30,33] 2.2. Non-hydrogen-bonding, hydrophobic UBPs Ad ifferent strategy to develop artificial base pairs was followed by the Hirao and Romesberg groups.B oth groups decided to investigate base pairs that structurally differ from the natural base pairs and that pair through hydrophobic and packing forces rather than hydrogen bonds. This approachw as inspiredb yt he work of Kool and co-workers who showed that hydrogen bonds are not necessarily needed to form ab ase pair that can efficientlya nd selectively be replicated by DNA polymerases. [34,35] The Hiraogroup therebyfocusedonthe concept of shape complementarity by combining one smaller and one larger scaffold like an atural pyrimidine-purinep air.T he Romesberg group relied on structuresw ith little to no homology to the naturalcounterparts ( Figure 1B).
The most prominentb ase pair from the Hirao group is the pair formed between 7-(2-thienyl)imidazo [4,5-b]pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px)( Figure 1B). [36] This pairc an efficiently and selectively be replicatedb yD NA polymerases and is successfully used in different applications,f or example, in the generation of aptamers. [2] The Px base can carry different functional groups at the propynyl linker such as amino, diol, and aromatic groups (see Figure 1i nr ef. [36]) or azide, ethynyl, and biotin (see Figure 1i nr ef. [37]). Several of these dPx nucleotides can easily be modified further (before and after insertion into DNA) with even large functional groups, which is ap owerful tool for the generation of site-specifically modifiedD NA. [37] As the diol-dPx (shown in Figure 1B,g ray box) was shown to be the best pairing partner of dDs in PCR amplification, [36] we used it in our structural studies together with the Hirao group and refer to the diol-dPx as just dPx throughout this review.
The Hiraop air dDs-dPx is moste fficiently replicated by family BD NA polymerases. Members of the A-family (Taq and TITANIUM Ta qD NA polymerases, which have intrinsically no 3'-5' exonuclease activity) showedm uch lower selectivity for the dDs-dPx pairing in PCR amplification with dDs-containing templates. [36] In an optimized protocol for PCR amplification of DNA containing dDs and dPx,t he DeepVent (exo +)D NA polymerase (B-family) is used. [36,37] The fidelities reached in these experiments( dependento nt emplates equence and modification at dPx)a re 99.96 up to > 99.97 %p er doubling event. Although dDs-dPx is efficiently replicated within various DNA sequences, there are still some sequences preferred over others (for detailss ee ref. [38]). PCR amplification of DNA containing two dDs bases separated by 4, 6, 9, or 12 natural bases showedt hat at least six natural bases inserted between two dDs bases are neededt oe xhibit high amplification efficiency under the tested conditions. [38] In several rounds of screening and optimization based on structure-activity relationship data, the Romesberg group developed the base pair between 2,6-dimethyl-2 H-isoquiniline-1thione( d5SICS)a nd 2-methoxy-3-methylnaphthalene (dNaM; Figure1B). The UBP dNaM-d5SICS was intensively studied and was the first artificial base pair to be replicated by the endogenousr eplication machinery in ap lasmid in E. coli cells. [39] The dNaM-d5SICS pair is most efficiently amplified by OneTaq, am ixture of the A-family TaqD NA polymerase and the Bfamily Deep Vent (exo +)D NA Polymerase. [40] Depending on the template sequence, the remarkably high amplification fidelitiesp er doubling of 99.66 to > 99.98 %a re reached. [40] As PCR amplification with exonuclease-deficient DNA polymerases proceeded with higher efficiency than when using exonuclease-proficient enzymes (with standard concentrations of natural substrates) but exonuclease activity is needed to reach high fidelity,amixture of two enzymes wasf ound to yield the best results. Furthero ptimization of the pair led to the, to date, most efficientlyi nv itro replicatedp air dNaM-dTPT3 (fidelity per doubling in replication: > 99.98 % [12] ), which was first used in the creation of an SSO that not only stores [41] but also retrieves increased information. [10] For the realizationo fa n SSO, in vivo screening of base pair candidates led to the pairs dMTMO-dTPT3, dPTMO-dTPT3, [42] and dCNMO-dTPT3 [43] (Figure 2), whicha ll showi ncreased retention in an SSO compared with the dNaM-dTPT3 pair.T he fact that these pairs show increased replication proficiency,m eaningh igherr etention rates in vivo but not in vitro, emphasizes the importance of the way of evaluating candidates. An important factor contributing to the different results in vitro and in vivo might be the different uptakeo ft he substrates into the cell, or stability within the cell, the different DNA polymerases( e.g., E. coli Pol III and/orP ol II) that replicate DNA in the SSO, [43] and the presence of other components in the in vivo replisome (e.g.,t he bclamp processivity factor or DNA repair mechanisms). [44]

Furtherartificial base pairs
Apart from the hydrogen-bonding and hydrophobic UBPs just mentioned, new base-pairingc oncepts have emerged. Size-expandedb ase pairs, also termedb enzo-expanded DNA or xDNA [45,46] and base pairs with four instead of three or two hy-  [47] have been introduced, but fidelities and efficiencies in polymerase reactions are currently low. [17] Additionally,m etal-mediated base pairs that consist of two ligand-type nucleobases connected throughacentral metal ionh ave been developed. [48] Through its coordination, the metal ion stably crosslinks two strands and therefore these pairs are interesting in DNA nanotechnology. [49,50] Furthermore, DNA containing such pairs function as metal ion sensors and are, for example, used for the detection of Hg 2 + in specimens. [48,50] 3. Structure of Hydrophobic UBPs:The Hirao and Romesberg Pairs It is remarkable that such high amplification efficiencies and fidelitiesa re reached with artificial base pairs that significantly differ in shape compared with the natural base pairs and only rely on hydrophobic and packing forces. Understanding the structureo ft he hydrophobic UBPs themselves and their influence on the DNA structurei ns olutiono rt heir processing by enzymesi sk ey to understandingt he molecular basis of these processesa nd might enable optimizing candidates for different applications. In the following, we review structural data gained from experimental and computational studies on hydrophobic UBPs either as free pairs, in duplex DNA,o ri nt he active site of aD NA polymerase.
The structureso fh ydrophobic artificial base pairs have been studied in different ways and contexts. DNaM-d5SICS andr elated base pairs from the Romesberg group were investigated as isolated pairs by computational methodsa nd in free duplex DNA by meanso fc omputational and experimental methods. Further,t he structures of dNaM-d5SICS and dDs-dPx in complexesw ith KlenTaq DNA polymerase were studied by using Xray crystallographyi no ur group in collaborationw ith the Hirao andR omesbergg roups.

Computational studies
DFT (dispersion-corrected density functional theory) calculations of free nucleobases (sugar andp hosphate moieties omitted) revealed that NaM-5SICS, NaM-TPT3,a nd relatedp airs favor a" slipped parallel stacked dimer arrangement" [51,52] with the nucleobases positioned on top of each other rather than formingaWatson-Crick-type planar structure. The interplanar distance of the stacking bases is 3.3 to 3.5 and the exemplary center-to-center distance of NaM-5SICS is 3.6 . [51] Negi et al. found similar distorted parallelg eometries that enable pp stacking for an umber of hydrophobic pairs from the Romesberg group (including NaM-5SICS)b ut contradicting results were found for the pair NaM-TPT3. [53] The nucleobases in NaM-TPT3 do not stack but pair in ah ighly bent structure, which may in part be stabilized by aw eak interaction between the sulfur of TPT3 and the methoxy group of NaM.

Computational studies of structures within duplex DNA
In contrast to the results from free nucleobases,c lassical molecular dynamics simulations show that when positioned within aD NA doubles trand (11-mer), the dNaM-d5SICS pair forms an atural-like planars tructure with aC 1 'ÀC1' distance of 10.7 . [51] In this orientation, p-stacking with naturalb ases above and below can be maximized. In this study,i tw as concludedt hat the stability of the UBPs arises from the rather strong dispersion interactions between the planar d5SICS and dNaM nucleobases and their neighborings tacked natural bases (intrastrand interactions) rather than the interactions within the UBP (interstrand). As imilarr esult was found by molecular dynamics simulations by Negi et al.,w hich shows that d5SICS-dMMO2 (a precursor of d5SICS-dNaM)a nd d5SICS-dNaM (C1'ÀC1' distance of 10.0 )a dopt an early planar geometry and dTPT3-dNaM (C1'ÀC1' distance of 10.9 )t akes a completely planar geometry within DNA [53] ( Figure 3A). Thereby, dTPT3-dNaM shows the least perturbations of the DNA double helix geometry.T his result is consistentw itht he higher replication fidelity and efficiency of dTPT3-dNaM compared with many other hydrophobic UBPs. [12] In another computational study,p ublished by Galindo-Murillo et al.,a13-mer DNA strand containing one, three, or five dNaM-d5SICS base pairs was investigated. [54] In the presence of one artificial pair,t he group observed-similart oJ ahiruddin et al. and Negi et al.-a rather planar orientation of the base pair in the most populated structure within a1 0mss imulation. This structure has aC 1 'ÀC1' distance for the UBP of 11.4 , which is similar to naturalp airs, however,o ther base pair and backboneg eometry parameters differ greatly.D FT computations of at rimer with the dNaM-d5SICS in the middle again yield as lightly different result.T he dNaM-d5SICS pair still adopts as omehow edge-to-edge structure with aC 1 'ÀC1' distance of 10.8 but with ap ropeller angle of À138,w hich significantly deviates from being planar.Inthe same study,molecular dynamics simulation results show that embedding more than two dNaM-d5SICS pairs within as hort DNA strand (13mer), the structure of the double helix is heavily disturbed until it collapses (with five unnatural pairs). [54] All in all, computational studies indicate stacking arrangementsf or isolated hydrophobic artificial base pairs but rather edge-to-edge oriented nucleobases with different extents of distortion when embedded within short sequences of natural DNA (5SICS-FEMO being an exception,f or details see ref. [53]). The different geometries of the UBPs found by computational methods in DNA and for isolated pairs emphasize the importance of interactions with neighboring (natural) nucleotides on the structure. The computational results (within short DNA sequences) support the use of hydrophobic artificial base pairs as they would not significantly hamper the stability and geometry of double helical DNA at least as long as only one UBP is embedded within natural nucleotides. [51][52][53] The highly planar structure of dNaM-dTPT3 that only weakly disturbs the overall DNA doubleh elix correlates wellw ith the high efficiency and fidelity of the pair in PCR. This fact renders the computationals tudies useful in screening for even betterperforming artificial base pair candidates. Chem

NMR studies
Severaly ears before the computational studies were performed, NMR studies revealed that the hydrophobic artificial base pair dNaM-d5SICS [55] and the related dMMO2-d5SICS, [56] do not pair edge-to-edge like the hydrogen-bonding natural pairs but adopt ap artially intercalating structure in free duplex DNA ( Figure 3B). In detail,i nt he studied 12-merD NA duplex containing dNaM-d5SICS in the centero ft he duplex,t he edges of the two nucleobases lie on top of each other with an average distance of 3.5 .I nt his state, the stacking interactions between the pairing partnerss eem to be maximized. The internucleotide distance between the C1' atoms of the 2'-deoxyribose moieties is 9.1 ,w hich is significantly shorter compared with natural base pairs (usually around 10.4-10.5 for A-T and G-C pairs [57] ). Interstrand intercalation has also been observedf or other hydrophobic nucleobase analogs, for example, biphenyl and bipyridyl nucleotides, [58][59][60] the self-pair PICS-PICS, [61] and aromatic chromophores. [62,63] Thus, intercalation seems to be ageneral feature of large aromaticnucleobase analogs, which is consistent with the important role of hydrophobicity and dispersion interactions. [22] For the dDs-dPx pair,n os tructural information in free duplex DNA is availablea nd it is thus not knownw hichp airing geometry is adopted.Aprecursor of dDs-dPx,t he pair dQ-dPa ( Figure 3C), however,w as structurally studied by NMR spectroscopy within a1 2-mer DNA duplex. [64] In this structure, the dQ-dPa pair forms ag eometry similart oaW atson-Crick base pair although with some minor variations (bases are tilted with respectt oe ach other,e nlarged C1'ÀC1' distance) and higher structuralf lexibility indicated by the broad NMR signals. Compared with dQ, dDs contains an additional thienylm oiety and dPx exhibits an itro instead of the aldehyde group in dPa as well as an additional propynyl moiety.T he main structures of theset wo pairs, however,a re the same. Thus, it is likely that dDs-dPx can pair in as imilar way as observed for dQ-dPa in an edge-to-edge, planar manner,c losely resembling the geometry of aW atson-Crick base pair.
Ta ken together,t he results of the NMR study with dNaM-d5SICS do not match the computational results that were obtained later (described ins ection3.2). The discrepancy between an intercalating structure of dNaM-d5SICS and dMMO2-d5SICS within aD NA strand in the NMR studies and am ore planar orientation of dNaM-d5SICS (and relatedp airs) in DNA in simulation studies needs to be furthere valuated.I t would furthermore be interesting to see if the dDs-dPx pair from the Hirao group, for which such studies were not made, likewise behaves differently in structural and computational studies.
The intercalating structureo fdNaM-d5SICS was somehow surprising considering the proficient acceptance of the pair by DNA polymerases. This circumstancer aised the question of how DNA polymerases deal with the artificial pair(s) on am olecular level and motivateds tructural studies.

Structure of UBPs in the active siteo fK lenTaq DNA polymerase
To shine light into the mechanisms of UBP recognitionb yD NA polymerases, we, together with the Romesberg and Hirao groups,d ecided to investigate hydrophobic artificial base pairs in the active site of the structurally and functionally well-characterized KlenTaq DNA polymerase.F or the enzymatic incorporation of artificial nucleotides into DNA, at emplate containing  [51] and B) an NMR study. [53,54] The studied DNA duplexes containingadNaM-d5SICS pair are shown on the left side with ac lose-up of the base pair on the rights ide. dNaM, d5SICS,and dTPT3 are shown in marine,d ark blue, and cyan, respectively.A)The pairingo fdNaM-dTPT3 investigated in the samestudy is shown in addition.B)Asecond orientation of dNaM-d5SICS is shown to betterv isualize the stacking.D istances between the C1' atoms of the ribose moieties and the averaged istance of the edges between dNaM and d5SICS in the NMR structurea re giveni n.C )Structure of an NMR study investigating the dQ-dPa pair,aprecursor of the dDs-dPx pair discussed in this review.T he dQ-dPa pair in an edge-to-edge-like manner with as lightly larger C1'ÀC1' distance compared with the natural pairs in the strand (averaged istanceo fe leven natural pairs:1 0.5 ).
an artificial nucleotide at the templating position as well as a cognates ubstrate triphosphate have to be bound and recognized as a" correct" pair in the active site of aD NA polymerase. For the two hydrophobic artificial base pairs dNaM-d5SICS and dDs-dPx,s everal crystal structures with KlenTaq were solved. Our study in total resulted in eight crystal structures in four differentr eaction states of the enzyme with components of the dNaM-d5SICS pair whereas for the Hirao pair dDs-dPx one structure is available. An overview of all structuresw ith KlenTaq is given in Figure 4together with the protein database (PDB) codes. Based on the KlenTaq complexes with the dNaM-d5SICS pair and previously obtained functional and structural data, [56,[65][66][67][68] am echanism of replication for hydrophobic artificial base pairs was proposed, [69] which probably also holds true-at least in some aspects-for other similar or less hydrophobic artificial base pairs lacking hydrogen bonds. In this section, we introduce the obtained crystal structures containing hydrophobic unnatural nucleotides and compare them with the analogousn atural binary complexes with a dG or dT at the templating position (termedK lenTaq dG and KlenTaq dT ,r espectively) and the naturalt ernary complex with a dG-dCTP pair in the insertion site (termedK lenTaq dG-dCTP ).
The generalc rystallization strategy for binary and ternary complexes is the following:the KlenTaq DNA polymerase is purified and mixed with ap reviously annealed primer/template complex( for sequences of duplexes,s ee Figure 4). For the preinsertionc omplexes, at erminator dideoxy nucleotide is added, which after insertiont erminates the primero wing to the lack of the 3'-OH group. Binary crystalsa re grown ande ither measured or soakedw ith the respective substrates to obtain terna-ry crystal structures. Soaking conditions differedf or the three substrates (for details, see refs. [55,69,70]).
KlenTaq DNA polymerase consists of four domains that are termeda ccording to the topology of ah and:t he finger, thumb, palm, and N-terminal domain( Figure 5). Upon DNA binding, the thumb domain closes and together with the   Figure 5, transition of At oB ). During this rearrangement, the tyrosine movesa way and the templating nucleotide rotates towards the insertion site where it pairs with the incomings ubstrate triphosphate. The O-helix of the finger domain is placed on top of the newly formed base pair.T hereby,aclosed complex is formed in which the enzyme can geometrically select (in addition to previouss election steps [68] )f or the conserved Watson-Crick structureo ft he natural base pairs. [65][66][67]71] The catalytic residues Asp785 and Asp610 are situated in the palm domain and coordinate-together with the triphosphate moiety of the substrate-two magnesium ions (see Figure 5B). In this arrangement, all components involved in catalysis are positioned in aw ay that the 3'-OH group of the primert erminus (which is not present in the crystals) can attack the a-phosphate of the triphosphate substrate to form ap hosphodiester bond. After the reaction, the enzymet ranslocates on the primer/template strand whereas the newly formed base pair is handedo nt ot he post-insertion site.

Binary complexes
The overall structures of the binary complexes with at emplating dNaM or at emplating d5SICS unnatural nucleotide (Klen-Ta q dNaM andK lenTaq d5SICS )a re very similart oK lenTaq dG and KlenTaq dT .A ll binary structures are characterized by an open finger domain, which shows flexibility indicated by elevated Bfactors. One significant difference in the structures is the position of the templating nucleotide and the 5'-single-stranded templateo verhang.I nK lenTaq d5SICS and KlenTaq dG ,t he templating nucleotides are rotateda way from the insertion site and are positioned at the pre-insertion positionp ointing towards the solvent( Figure 6A,s hown for KlenTaq d5SICS ). The three upstream 5'-single-stranded template nucleotides are flexible and not resolved in the structures.I nK lenTaq dNaM and KlenTaq dT ,t he templating nucleotides are also flipped away from the insertion site, however,t oadifferent position. Thes ingle-stranded template is rotatedt ot he developing DNA duplex where two of the nucleotides stack between the base pair in the post-insertion site andP he667 of the finger domain O-helix ( Figure 6B,s hown for KlenTaq dNaM ). The different arrangements seem to depend on the templating nucleotide andt he sequenceo ft he single-stranded overhang and most probably also its length. Longer single-stranded overhangs might not undergo the backward rotational movement observed in Klen-Ta q dNaM and KlenTaq dT and therefore this template arrangement is most probably not relevant in insertion reactions in solution.
As the different arrangements are observed forb oth natural and unnatural templating nucleotides, it is concluded that nei-ther dNaM nor d5SICS in the templating position perturb the structure of the enzyme in the open state.

Closed ternary complexes
As it wasf ound in NMR studies that dNaM-d5SICS prefers an intercalated structure in free duplex DNA not resembling acorrect natural nucleobase pair,i tw as difficult to imagine how the efficient replication observedi nf unctional studies can be accomplished by the enzyme. To investigate this circumstance, crystallization of ternary KlenTaq DNA polymerase complexes with hydrophobic UBPs in the active site was desired. Closed ternary complexes were obtained with dPxTP paired opposite dDs (termed KlenTaq dDs-dPxTP )a nd d5SICSTP paired opposite dNaM (termedK lenTaq dNaM-d5SICSTP )a nd compared with the fully natural complex KlenTaq dG-dCTP .I nb oth complexes,a ddition of the substrate triphosphate induced the transition from an open to ac losed state of the DNA polymerase by closure of the finger domain during which the templating nucleobases are flipped backf rom their extrahelical positions into the insertion site where the two nucleotides pair. The overall structures are very similart oK lenTaq dG-dCTP with rmsd values for Ca atomso f0 .188 and0 .236 for KlenTaq dDs-dPxTP and Klen-Ta q dNaM-d5SICSTP ,respectively.
The triphosphatem oieties together with Asp610,A sp785, and the backbone of Tyr611c oordinate two magnesium ions, characterizing an activec losed complex prior to the insertion reaction( Figure 7A-C). The distances between the primer 3'end (C3' used for measuring as 3'-OH is missing) and the aphosphate is virtually identical for the two modified complexes and the natural complex (3.8 for KlenTaq dDs-dPxTP and Klen-Ta q dNaM-d5SICSTP ,a nd 3.9 for KlenTaq dG-dCTP ). In addition to the metal coordination, the triphosphate substrates seem very well stabilized at their positions through diversei nteractions with the enzyme ( Figure 7A-C). Thet riphosphate moieties interact with the side chains of Lys663, Arg659, and His639a nd the backboneo fG ln613.O nt he minor groove side, the nitro group of dPxTP and the sulfur atom of d5SICSTP engagei n The most interesting finding from our study was that both hydrophobic UBPs do not intercalate but form ac oplanar structure similar to the cognate Watson-Crick base pairs (Figure 8A). The dNaM-d5SICS pair is positioned edge-to-edge (with an average distance of 4.2 between the hydrophobic edges of the nucleobases) and aC 1 'ÀC1' internucleotide distance of 11.0 .T his distance indicates that the pair is slightly enlarged in width compared with the natural dG-dCTP pair (10.6 ). The dDs-dPxTP paira lso adopts ap lanar orientation of the pairing partners.T he average distance between the edges is 4.9 ,r esulting in ap air that is even larger in width with aC 1 'ÀC1' distance of 11.3 between the pairing partners. To accommodate the wider base pair,i nb oth cases the templating nucleotide is shifted towards the template backbone whereas the triphosphate residues stayi nt he same well-defined position found for the natural substrates ( Figure 8A). Along with the shift of the templating nucleotide, interacting amino acids (Arg677, Ser674, and Met673) are also shiftedi n both structures in as imilar way ( Figure 7D,E). To accommodate an artificial base pair with an elevated width, KlenTaq DNA polymerase seems to adjust the insertion site such that only residues on the template side are rearranged but residues in the catalytic site interacting with the triphosphates tay.T his behavior would ensure propera lignmento ft he triphosphate substrate for attack by the 3'-OH group of the primer end also for pairs that slightly differ in their dimensions from the natural consensus structure. Besides the enlarged base pair width, both hydrophobic pairs have different heights compared with the Watson-Crick pairs ( Figure 8A)a lthough the exact dimensions differ in the two pairs and in their strand context (the orientation of the pairing partners in the primero rt emplate). On the major groove side, the base pair is restricted by the Ohelix of the finger domain. In both structures, the larger base moieties cause as mall shift of the overall O-helix and connected helicesa way from the insertion site whereas the enzyme residues confining the base pair on the minor groove side stay unperturbed (Figure7D, E). More specifically,T hr664, which is situatedc losest to the thiophenyl moiety of dDs andt ob oth dNaM and d5SICSTP,s hifts upwards by 0.6 (measured at Ca atoms) in KlenTaq dDs-dPxTP and 0.9 in KlenTaq dNaM-d5SICSTP .T oa ccommodate the propynyldiol moiety of dPxTP,A rg660, located at the N-terminus of the O-helix,s hifts away in as imilarw ay as already observed for other KlenTaq structures with modified substrates ( Figure 7E). [72][73][74] Owing to the shift, an interaction of  Figure 8B). Apparently,t he finger domain cannot close as tightly as in the natural complex, which could explain why the studied hydrophobic artificial base pairs are still formed with somewhat diminished efficiency compared with the natural counterparts.
The more detailed orientation of the hydrophobic pairing partnersw ith respectt oe ach other can be described by socalled base pair parameters. These parameters wered etermined by using the 3DNA webserver. [75] As the artificial nucleotides are not recognized by the software, dG and dCTP were superposed manually in the program Coot [76] on the artificial nucleotides fitting the glycosidicb ond and the nucleobase plane. The propeller twist (relative torsion betweenp airing partners with respectt ot he base pairing axis) and the buckle angle (angle of the bend between the two base planesa cross the line of base pairing) is similarb etween dDs-dPxTP and dG-dCTP but not to dNaM-d5SICSTP where it is significantly smaller( Figure 8C). In addition, the dNaM-d5SICS pair exhibits al arger relative shift of the bases along the z axis (stagger = À1.2 )c ompared with dG-dCTP and dDs-dPxTP,w here the stagger is close to zero. We show that although both artificial base pairs studied adopt an edge-to-edgeo rientation just as natural pairs in the active site of KlenTaq DNA polymerase, the base pair non-planarity parameters can still varyb etween the different base pair candidates. All-in-all, concerningt he three base pair non-planarity parameters, buckle,p ropeller,a nd stagger,t he dDs-dPxTP pair is more similart ot he natural dG-dCTP pair in the active site of KlenTaq than dNaM-d5SICSTP.
In summary,b inding of d5SICSTP opposite dNaM as well as dPxTP opposite dDs to KlenTaq DNA polymerase induces the formation of ac losed enzymec omplex that is poised forc atalysis. In this complex, the sum of interactions between the developinga rtificial base pairs and the active site of KlenTaq DNA polymerase seem to well stabilize the hydrophobic pairs in the natural Watson-Crick-like geometry.T his finding thereby explains the high incorporation efficiencies of hydrophobic artificial base pairs by DNA polymerasesd espite lacking connecting hydrogen bonds.

Partially closed ternarycomplexes
In contrast to the fully closed complexes KlenTaq dNaM-d5SICSTP and KlenTaq dDs-dPxTP ,ad ifferent reactions tate was trappedi n the complex KlenTaq d5SICS-dNaMTP with the opposite sequence context comparedw ith KlenTaq dNaM-d5SICSTP (now d5SICS in the template and dNaMTP added as substrate). [69] Soakingo f binary crystalsc ontaining d5SICS at the templating position with dNaMTP led to at ernary complex, however,the transition to ac losed ternary complex did not fully take place. The finger  Figure 9B)a nd the templating d5SICS slightly movesf rom its extrahelical positiont owardst he insertion site. Similar partially closed complexes have been described for BF DNA polymerase (Klenow-like fragment of Bacillus stearothermophilus DNA polymerase I) with mismatched nucleotidesi nt he insertion site. [68] It has been suggested that this conformationi sapre-selections tate in which the DNA polymerase tests for complementarity betweent he incoming substrate and the templating nucleotide before transitioning to the closed catalytically competent state. Additionally,partially closed ternary structures are reported for KlenTaq with an abasic site analog in the templating position. [77,78] Apart from these structural observations, ap artially closed state was also found in Fçrsterr esonance energy transfer (FRET)s tudies for the homologous E. coli DNA polymerase I. [79][80][81][82] It has been shown that the intermediate state is especially favored in the case of an incorrect nucleotide or ribonucleotide substrates bound to the enzyme and only sparsely populated with complementary dNTPs. Therefore, it is suggested that the state is a primary checkpoint for nucleotide selection on the pathway to the chemical step. [83] For efficient dNaMMP incorporation,w e assume that as imilar planara rrangementa so bserved in Klen-Ta q dNaM-d5SICSTP is adopted, which would be reached by additional conformationalc hanges based on our partially closed structure. The fact that for dNaM-d5SICSTP,afully closed complex was readily obtainedb ut for the d5SICS-dNaMTP only the described intermediate complex wast rapped, is consistentw ith the often lower insertion efficiency of dNaMMP opposite d5SICS compared with d5SICSMP opposite dNaM. [84] The lower incorporation efficiency could be explained by changes or clashes in the active site of af ully closed enzyme with the d5SICS-dNaMTP pair.Asuperimposition of the d5SICS-dNaMTP pair on the dNaM-d5SICSTP pair in the closed Klen-Ta q dNaM-d5SICSTP reveals that the methyl group of the d5SICS nucleobasei nt he templating position could come close to the O-helix residue Thr664( Figure 9C). [85] This potential clash might make it more difficult fort he finger domain to fully close. The base moiety,h owever,s hould still have enoughf reedom to rotate around the glycosidicb ond, which would enable ac losed complex but with additional energetic penalty.

Post-chemistry extensioncomplexes
To understand the process of elongation after an UBP is formed in aD NA duplex, binaryc rystal structureso fK lenTaq with dNaM-d5SICS in the post-insertions ite in both strand contexts( dNaM in the template and d5SICS in the primer and vice versa) and in different sequence contextsw ere solved ( Figure 10 A-C). [69] The structures reveal that after synthesis and transition of the closed to the open enzyme complex, the artificial pair forms an intercalated structure similar to the one observedi nt he NMR study of free duplexD NA containing dNaM-d5SICS (see section 3.3). [55] With either a dG or dC nucleotide 5' to the templating nucleotide mainly two different modes of intercalation are observed ( Figure 10 A-C, different sequence contexts are termed Ia nd II here). With ad G5 ' to the template dNaM, the d5SICS at the primert erminus is placedo nt op of dNaM in the template (Figure 10 A). With a dC 5' to dNaM or d5SICS in the template, the hydrophobic primern ucleotide is placed below its pairing partner (Figure 10 B, C). All intercalating structures are characterized by a decreased C1'ÀC1' distance of the pairing partners( between 8.4 and 10.0 ). Both intercalation modes show uniques tabili- zation patterns with surrounding protein residues (for more details, see ref. [69]). As ac onsequenceo fi ntercalation in the post-insertion site, shifts in the thumb domain and the position of the primer/template duplex are observed. These become apparent in an overlay of the post-insertionc omplexes with the natural binaryc omplexes,e xemplarily shown for the KlenTaq complex with dNaM-d5SICS and the template 5' overhang with the sequence "GA" (termedK lenTaq dNaM-d5SICS(I) )a nd KlenTaq dG (Figure 10 D). The primer as wella st he templaten ucleotides are shiftedc ompared with their natural position. The 3'-OH group responsible fort he next insertion reaction shifts by 4.5 (arrow in Figure 10 D) and to as imilar extent in the other post-insertion complexes. Based on the observed arrangement, extension of the primer with the next incomingn ucleotide seems difficult. It is assumed that reversal of the intercalation of the artificial pairing partners has to take place before or during finger domain closure after bindingo fa new cognate substrate. The necessity of large conformational changes before extension of the hydrophobic artificial base pairs likely explainst he low extension efficiencies observed in primere xtension reactions, rendering the extensiont he bottleneck in the replication of DNA containing the UBP. [86] 3.4.5. Proposed mechanism of replication for hydrophobic UBPs and itsconsequences Based on the described data of the pre-insertion and the elongation complexes of the dNaM-d5SCIS pair as well as previously reported kinetic and structurald ata, [65][66][67][68] ar eplication mechanism for hydrophobic artificial base pairs was proposed ( Figure 11). In af irst step, the hydrophobic substrate binds to the O-helix after which the enzymes amples different conformations and transitions to ac losed state as soon as sufficiently stabilizing hydrophobic andp acking interactions are made (Figure 11 A). The enzyme closure in turn induces the UBP to adopt ap lanar,W atson-Crick-like structure that fits into the constraints of the active site ande nables insertion of the substrate into the growing primer strand.D epending on the substrate bound, the intermediate states can be more or lesst ransient. In the case of ab ound dNaMTP,t he crystallographically trappedp artially closed state somehow seems to be more stable than the corresponding closed complex. After the insertion reaction, the DNA polymerase returns to the open state and pyrophosphate is released. [87] In this state, the UBP adopts ac ross-strand intercalated structure, which would hamper continued primere longation (Figure 11 B, D). It is assumed that additional thermal fluctuationsa re necessary to resolve intercalation of the terminal base pair and reorganization of the DNA polymerase active site before the next nucleotide can be incorporated (Figure 11 C). The postulated mechanism clearly shows that the critical reactions tep is the elongation step as reversal of intercalation and reorganizationo ft he actives ite is needed to continue the synthesis (Figure 11 C). To overcome this drawback, it wasc oncluded that intercalation properties have to be reduced to ease elongation with natural nucleotides.T his can be realized, for example, by reducingt he aromatic surface area of the nucleobases. In addition, interstrand intercalation should also be reduced by favoring intrastrand packing( meaning stabilization of unnatural pairing partnersw ith naturaln ucleotides in the same strand) as opposed to interstrand packing. [42] Both concepts were realized by the Romesberg group in the developmento fdTPT3 as ap airing partner for dNaM Figure 10. A-C) Intercalating structures of dNaM-d5SICS at the primer/template end in two different sequence contexts (I and II). For sequence context II, the structure was solved in both strand contexts (dNaM in the primer and d5SICS in the template and vice versa;Band C). C1'ÀC1' distances between dNaM and d5SICS (marine and dark blue, respectively) are given in .5 '-Single-stranded template overhangingnucleotides are shownasl ines and are indicated in bold letters in the sequence below the structures. D) Displacementoft he intercalating primer and template nucleotides is exemplarily shownfor KlenTaq dNaM-d5SICS(I) (sand)compared with the natural situation in KlenTa q dG (pale yellow). C3' atomso ft he primer terminus are showninb lackand the displacements in the primer and template are indicated by blackarrows.
( Figure 1B). Distal ring contraction and heteroatom derivatization of d5ISCS resulted in the dTPT3 nucleotide, which shows improved incorporation and elongation properties. [12] Whether the dMaM-dTPT3 adopts an intercalatedp ost-insertion structure and to what extent is not known. If intercalation occurs, it is possible that the arrangement is resolved more easily owing to its lower stability.T his might be the reason for the higher PCR efficiency and fidelityo bserved for dNaM-dTPT3 compared with dNaM-d5SICS.S imilar variations as for d5SICS to yield dTPT3 were madef or dNaM and yielded the nucleotides dMTMO and dPTMO carrying at hiophene moiety ( Figure 2). As mentioneda bove, the base pairs dMTMO-dTPT3 and especially dPTMO-dTPT3 show improved in vivo retention rates compared with dNaM-dTPT3 despite their inferiorp roperties in vitro. [42] Am odeling study suggestst hat interaction of the thiophenes ulfur atoms of both dMTMO and dPTMO favors internucleotide interactions with the primern ucleotide, which in turn disfavors intercalation. [42] For dCNMO,w hich bearsa smaller,s ingle ring nucleobase (Figure 2), pairing with dTPT3 also shows superior in vivo performance compared with dNaM-dTPT3.I ti sa ssumed that the smaller ring is less prone to cross-stand intercalation and the pairm ore likely adopts an edge-to-edge structure even without the constraints of the closed DNA polymerase. [43] 4. Structure of Hydrogen-Bonding UBPs:The Benner Pair(s) Besides the discussed hydrophobic base pairs, the second well-replicatedb ase pair family consists of the pairs developed in the Benner lab. The structure of the best candidate, the dP-dZ pair,i nf ree duplex DNA and in the actives ite of KlenTaq DNA polymerase was also studied. [32,[88][89][90]

Structure in free duplexDNA
The dP-dZ pair does not significantly perturb the double helical structure of DNA. Crystallographic studies showed that within a1 6-mer DNA double-strand dP hydrogen bondsw ith dZ with geometriesa nd distances similart ot he canonical base pairs and the DNA duplex adopts known helical forms:A -form for six consecutive dP-dZ pairs and mostly B-form for two consecutive dP-dZ pairs (Figure 12 A). [88] Onec haracteristic is that the major groove width is enlargedb yu pto 1 with respect to comparable G-C pairs in A-and B-DNA,w hich may be necessary to accommodate the nitrog roup on dZ.A nother unique feature is the stacking interaction of the nitro group in dZ with the adjacent nucleobasei nt he A-form duplex. Even with the addition of ad ifferent UBP of the AEGIS family (the dB-dS pair,F igure 1C), the double helix structure remains intact. [90] The three studied 16-mer DNA duplexes consisting of four different base pairs (including six consecutiveU BPs) only show minor geometrical differences compared with unmodified DNA. [90] This sequence-independent structuralr egularity is attributed to ab ig extent to the presence of hydrogen bonding in the UBPs andisakey prerequisite for different molecular biological applications.
In addition to these crystallographic studies, the DNA duplex containing six consecutive dP-dZ pairs was studied by using long timescale (50 ms) molecular dynamics (MD)s imulations. [89] Here, as ignificantly wider major groove and differing average valueso fs tagger,a sw ell as the dinucleotide step parameters, slide, twist, and h-twist, as observed for an analogous natural Figure 11. Scheme of the proposedm echanism of replication for hydrophobic artificial base pairs. The steps correspondingt oi ncorporation of the unnatural monophosphate (dark green) and subsequent extension of the nascent unnatural base pair are shown. Thereby,the O-helix of the protein is showna sb lue rectangles, phosphates are indicated with circles,n atural nucleosides are shownasg ray rectangles, andunnatural nucleotides are shown as dark-and lightgreen rectangles. The structure of the unnatural pair in free duplex DNA [53,54] after several rounds of extension and enzyme dissociation is shown in the gray oval. Figure  oligonucleotide were identified (for more details, see ref. [89]). Interestingly,acumulative effect of the number of dP-dZ pairs on the major groove width was observed. This finding could imply that inclusion of al arge number of consecutive dP-dZ nucleobase pairs couldr esult in an unstableD NA double helix.

Structure in KlenTaq DNApolymerase
The acceptance of the dP-dZ pair by DNA polymerasesw as studied by using X-ray crystallography. [32] Therefore, aK lenTaq mutant (M444V, P527A,D 551E, and E832V)t hat showed improvedi ncorporation of dZMP opposite dP was used. [31] More specifically,aclosed ternary pre-insertion complex with dZTP paired opposite templating dP (KlenTaqM dP-dZTP )a nd ap ost-incorporation complex with dP-dZ at the primer/template end in an open binary complex (KlenTaqM dP-dZ )w as trapped (for primer/template sequence and PDB codes, see Figure 5).
The overall structure of KlenTaqM dP-dZTP in the insertion site is similar to KlenTaq dG-dCTP (rmsd:0 .347 for Ca atoms). As expected, dP and dZTP pair throught hree hydrogen bondsa nd the pair is oriented edge-to-edgew ith similar geometricp arametersa sdG-dCTP (Figure 12 B). The base pair width characterized by the C1'ÀC1' distance is virtually identicalb etween dP-dZTP (10.7 )a nd dG-dCTP (10.6 ). Similari nteractions for the enzyme, the primer/template duplex,a nd the incoming dNTP as in KlenTaq wild-type (WT) with natural substrates are found (Figure 12 C). As am ain differenceo fK lenTaqM dP-dZTP compared with KlenTaq dG-dCTP ,t he Benner group identified a larger closure angle of the mutant'sf inger domain when comparing the transitions of the mutanta nd WT binaryt ot ernary complexes.H igherB -factors at the tip of the finger domain, however, compared with KlenTaq dG-dCTP ,a nd the fact that parts of the finger domain could not be modeled, indicate al ess stable closed complex, which is similart oo ur finding in Klen-Ta q dDs-dPxTP and KlenTaq dNaM-d5SICSTP .
The overall structure of the binary post-insertion complex KlenTaqM dP-dZ is again similart ot he binaryc omplex KlenTaq dG (rmsd:0 .334 for Ca atoms). Minor groove andm ajor groove interactions of the respective terminal base pair with the enzymea re almost identicalf or either the UBP or an atural base pair (Figure 13 A). In as uperimposition of the two structures (ford etails, see ref. [32]), the Bennerg roup identified relevant differences in the template region in the vicinity of the active site (Figure13B,c ircled regions). The slightly different positioning of the phosphate moiety of the templating dG and presence of the nitro group of dZ would cause two clashes within the WT structure.T herefore, it is concluded that the post-incorporation product (dP-dZ in the post-insertion site) presentsachallengef or the WT enzyme, which would need to be resolved by additional movements within the enzyme. The position of the ribose C3' carrying the catalytic 3'-OH at the primert erminus, however,i so nly slightly displaced when comparing KlenTaqM dP-dZ and KlenTaq dG (Figure 13 C). This is in great contrast to the binary post-insertion complexes with the hydrophobic dNaM-d5SICS pair (Figure 10 D).

Comparison of Different UBP Candidates
Apart from their molecular structure and pairing concept,t he three UBPc andidatesd iscussed in this review differ in their processing by DNA polymerases, their structure in the active site of KlenTaq DNA polymerase, and their structural influence on free duplex DNA. Thereby,p ronounced differences exist between the two families:h ydrogen-bonding versusn on-hydro- Figure 12. A) Structure of dP-dZ in free duplex DNA for two different sequences. The duplexes forming B-DNA or A-DNAare shown as cartoonsa nd the artificial nucleotides dP and dZ are colored raspberry and pink, respectively,and are shownass ticks in aclose-up representation. Sequences used for crystallization are shownb elow the duplexes. B,C) Superimposition of KlenTaqM dP-dZTP (raspberry) and KlenTa q dG-dCTP (gray). B) Structureoft he dP-dZTP pair and the natural dG-dCTP pair in two different orientations. C1'ÀC1' distance is given in .C)Residuesinteracting with the substratetriphosphate as well as coordinating magnesium ions (raspberry and light gray) and waterm olecules (pink and darkgray) are shown. Interactionsa re indicated by dashedl ines.
gen-bondingb ase pairs, but some differences are also present within the group of hydrophobic UBPs.

Processing by DNA polymerases
The Hiraop airs dDs-dPx (meaning dDs and differently modified dPx)a re replicatedm ore efficientlyb yaB-family DNA polymerase (Deep Vent exo +)w hereas for the Romesberg pairs (dNaM-d5SICS and relatedp airs), the best in vitro results are obtained with am ixture of Ta q( familyA )a nd Deep Vent DNA polymerase (familyB ). [36,37,40] As replicative DNA polymerases share ac ommon selection mechanism, [66] our structural studies-althoughm ade with the A-family KlenTaq DNA polymerase-can explain the high incorporation efficiency and selectivity that is reached by other DNA polymerases. We act on the assumption that the utilized B-family DNA polymerase Deep Vent (and of course also the full length Ta qD NA polymerase) also enforcesaWatson-Crick-like pairing of the hydrophobic artificial base pairs upon closure of the finger domain. The different UBP acceptance by different members of A-and/ or B-family polymerasess uggests that this parameter should be considered in optimizing in vitro amplification conditions for new candidates.

Structure in the active site of KlenTaq
Regarding the ternary structures KlenTaq dDs-dPxTP and Klen-Ta q dNaM-d5SICSTP, only small differences are observed within the two UBP/enzyme complexes, but we find similar differences compared to the fully natural complex. Both hydrophobic artificial base pairs have an elevated base pair width andh eight and amino acid side chains shift on the template and the major groove side of the UBPs in both cases. The finger domain is more flexible in both UBP/enzyme complexes compared with the natural complex,w hich indicates that the unnaturalp air leads to al ess stable closed complex, explaining the stilll ower insertion efficiency of these unnatural substrates. In ad etailed analysis regarding the base pair parameters, [70] the Hirao pair dDs-dPxTP in the active site of KlenTaq dDs-dPxTP is more similart ot he natural dG-dCTP pair than dNaM-d5SICSTP in KlenTaq dNaM-d5SICSTP ,w hich, however,d oes not lead to significantly better incorporation properties.

Processing by DNA polymerases
Generally,t he hydrogen-bonding dP-dZ pair shows less amplification fidelity in PCR experiments compared with the hydrophobic UBPs discussed here( see values in section 2.1), mainly owing to mispairing with natural nucleotides. [30,33] Therefore, in terms of orthogonality to natural pairs, it seems to be advantageous if pairingr elies on ad ifferent principle. The reported pairs that rely on hydrophobic and packing interactions rather than hydrogen bonds show low incorporation efficiencies opposite naturalp airs, leadingt oh igh fidelities in replication. If, however,s everal consecutiveU BPs should be inserted in aD NA strand, hydrogen-bonding UBPs perform explicitlyb etter.A lthough enzymatic incorporation of up to four consecutive dP-dZ pairs into aD NA strand can be accomplished, [30] consecutive incorporation of hydrophobic UBPs is more difficult. In case of dDs-dPx,highly efficient amplification could only be reached if two dDs bases were separated by at least six natural bases. [38] For the dNaM-d5SICS pair,s equences containing two consecutive unnatural pairs or two unnatural pairs separated by one or six natural nucleotidesc ould indeed be amplified but with lower fidelity than with only one UBP in the investigated DNA strand. [40] This different fidelity most probably relies on the two different types of pairing:h ydrogen-bonding opposed to hydrophobic and stacking interactions,h owever,h ydrogen-bonding is advantageous in this case.

Structure in duplex DNA
Pairings based on hydrophobic and packing interactions favor intercalation of the pairing partners under some conditions, for example, in free duplex DNA in the reported NMR studies [55,56] or in post-incorporation complexes with KlenTaq DNA polymerase, [69] whichd istorts the structure of the DNA double helix at the site of the UBP.I nc ontrast, even several consecutive dP-dZ pairs do not destroy the helical structure of aD NA duplex. [88] Albeit, also here the three-dimensionals tructure is affected (wider major groove and differing step and helix parameters than observed for the analogous natural oligonucleotide), as was shown by moleculard ynamics simulations [89] and crystal structures. [88] Experimental structures of free duplex DNA containing two or more consecutive hydrophobic artificial base pairs do not exist to our knowledge.H owever,M Ds imu- Figure 13. A) Minor and majorg roove interactions of KlenTa qw ith the terminal primer/template pair in the binary complexes KlenTa q dG (paley ellow) and KlenTaqM dP-dZ (violet).B,C) Superimposition of KlenTa q dG and Klen-Ta qM dP-dZ to visualize potentialclashesi fdP-dZ is formed in the WT enzyme (B) and to visualizedifferences in the position of the primer terminus (C, indicatedbya rrows). lations by Galindo-Murillo et al. show that the doubleh elical structureo fD NA is disturbed to ag reat extent if more than one dNaM-d5SICS pair is included in the DNA and completely collapses into ag lobular structure with five UBPsp resent in the sequence. [54]

Structure in the active site of KlenTaq
In the closed ternary complexes,t he three UBPs dP-dZ, dNaM-d5SICS,a nd dDs-dPx behave similarly.A ll pairs adopt aW atson-Crick-like planar edge-to-edge structure and induce the DNA polymerase to close.T he enzyme establishes interactions with the triphosphate substrate through the same residues in the three complexes. In contrast to the hydrophobic UBPs, dP-dZ does not show an elevated base pair width and only ad ifference in height owing to the nitro group of the substrate dZ,w hich does not seem to disturb the closure of the finger domain. Base pair geometricp arameters (stagger, buckle,and propeller) are similarf or dP-dZTP, dDs-dPxTP,a nd dG-dCTP but differ in the case of dNaM-d5SICSTP.T his difference, however,d oes not seem to directly influence the incorporation efficiency as dNaM-d5SICSTP is well replicated by the related TaqDNA polymerase. [12] As ignificant difference exists regarding the binary post-insertion complexes with either the hydrophobic dNaM-d5SICS pair or the hydrogen-bonding dP-dZ pair.I ntercalation of dNaM and d5SICS in the post-insertion site distinctly distorts the primer/template duplex and we assume that large conformationalr earrangements are necessary to enablea ne longation reaction. In KlenTaqM dP-dZ ,t he primer3 '-end is not significantly different compared with the natural complex and elongation seems easier.

Summary and Conclusions
In this review,w ed iscussed two different families of artificial base pairs. The dP-dZ pair is based on an alternative hydrogen-bondingp attern (rearranged hydrogen-bonding donor and acceptor groups)c ompared with the naturalb ase pairs dA-dT and dG-dC anda dopts similar structures in free duplex DNA and the active site of KlenTaq DNA polymerase as the natural pairs. Therefore, known DNA double helical forms (A-and B-DNA) are adopted even if several consecutive dP-dZ or related pairs are present in aD NA strand.C ompared with completely natural DNA, however,s everal structuralp arameters that characterize the double helix differ,r esulting, fore xample, in ad ouble helix with aw ider major groove. The fidelity of replication by DNA polymerasesi ss till lower for dP-dZ than for the discussed hydrophobic UBPs, mainly owing to the higher propensity for mispairing with natural nucleotides.
The second family,t he hydrophobic UBPs, are different in structure and pairing mechanism compared with the natural nucleotides. Nevertheless, both dNaM-d5SICS and dDs-dPx are replicated with high fidelity in PCR reactions. Our structural studies of KlenTaq and dNaM-d5SICS and dDs-dPx emphasize that the pairs relying on hydrophobic andp acking forces are sufficiently plastic to adopt the edge-to-edge structure neces-sary for positive selection by aD NA polymerase in the insertion site. In free duplex DNA or at the post-insertion site within the binary DNA polymerase/primer/template complex, in contrast, dNaM-d5SICS pairs in an intercalative mode where stacking interactions between the pairing partners seem to be maximized. [55,69] This is in contrast to the later performed computational studies, which showed that in free duplex DNA, dNaM-d5SICS and related pairs adopt ar ather planaro rientation. For dDs-dPx,s tructures in free duplex DNA or post-insertion KlenTaq complexesd on ot exist. Partly inspiredb ys tructural data and the proposed mechanism of replication for hydrophobic artificial base pairs, the optimized dNaM-dTPT3 pair was developed in the Romesberg group, which is PCR amplified with even higher fidelities compared with dNaM-d5SICS.I nsertiono fc onsecutive hydrophobic UBPs or several hydrophobic UPS separated by only af ew natural nucleotides into aD NA strand is stillc hallenging.T his is consistent with MD simulations, which show that DNA strands containing several hydrophobicU BPs do not form stable DNA double helices. For many applications (e.g.,t he codingf or unnatural amino acids), however, the presence of severalUBPs within ashort sequencei snot necessary.A na dditional feature of the dDs-dPx pair is that DNA containing the dPx nucleotide can be further modified with functional groups of interestt hrough Schiff base formation involving the diol moiety.
All described artificial base pairs and potentiale merging ones show different properties and are useful in diverse applications.E ach of these pairs has got its own advantages or disadvantages, which definitely support their parallel existence. Fields of application are manifold.T his motivates us to develop and characterize different families of artificial base pairs in the future,t hus generating ap ool of candidates from which one can select accordingt ot he respective requirements.