Exploring the Chemistry of Genetic Information Storage and Propagation through Polymerase Engineering

Conspectus Nucleic acids are a distinct form of sequence-defined biopolymer. What sets them apart from other biopolymers such as polypeptides or polysaccharides is their unique capacity to encode, store, and propagate genetic information (molecular heredity). In nature, just two closely related nucleic acids, DNA and RNA, function as repositories and carriers of genetic information. They therefore are the molecular embodiment of biological information. This naturally leads to questions regarding the degree of variation from this seemingly ideal “Goldilocks” chemistry that would still be compatible with the fundamental property of molecular heredity. To address this question, chemists have created a panoply of synthetic nucleic acids comprising unnatural sugar ring congeners, backbone linkages, and nucleobases in order to establish the molecular parameters for encoding genetic information and its emergence at the origin of life. A deeper analysis of the potential of these synthetic genetic polymers for molecular heredity requires a means of replication and a determination of the fidelity of information transfer. While non-enzymatic synthesis is an increasingly powerful method, it currently remains restricted to short polymers. Here we discuss efforts toward establishing enzymatic synthesis, replication, and evolution of synthetic genetic polymers through the engineering of polymerase enzymes found in nature. To endow natural polymerases with the ability to efficiently utilize non-cognate nucleotide substrates, novel strategies for the screening and directed evolution of polymerase function have been realized. High throughput plate-based screens, phage display, and water-in-oil emulsion technology based methods have yielded a number of engineered polymerases, some of which can synthesize and reverse transcribe synthetic genetic polymers with good efficiency and fidelity. The inception of such polymerases demonstrates that, at a basic level at least, molecular heredity is not restricted to the natural nucleic acids DNA and RNA, but may be found in a large (if finite) number of synthetic genetic polymers. And it has opened up these novel sequence spaces for investigation. Although largely unexplored, first tentative forays have yielded ligands (aptamers) against a range of targets and several catalysts elaborated in a range of different chemistries. Finally, taking the lead from established DNA designs, simple polyhedron nanostructures have been described. We anticipate that further progress in this area will expand the range of synthetic genetic polymers that can be synthesized, replicated, and evolved providing access to a rich sequence, structure, and phenotypic space. “Synthetic genetics”, that is, the exploration of these spaces, will illuminate the chemical parameter range for en- and decoding information, 3D folding, and catalysis and yield novel ligands, catalysts, and nanostructures and devices for applications in biotechnology and medicine.


■ INTRODUCTION
The natural nucleic acids DNA and RNA serve as the repositories and carriers of genetic information for all life on earth. They may be viewed as a specialized form of aperiodic polymer composed of repeating scaffolds of ribofuranose and phosphodiester units on which the four bases are arranged in a linear sequence, the embodiment of information content ( Figure 1). Arguments can be made that this structure is uniquely suited to the task of information storage and readout. These include the unusual kinetic stability of phosphodiester bonds to hydrolysis (compared to other esters including the closely related arsenate diesters), 1 the decoupling of physicochemical properties from information content (i.e., nucleotide sequence) due to the dominant influence of the polyanionic phosphodiester backbone, and the extended backbone conformation (facilitating complementary strand pairing and information readout) owing to charge repulsion along the backbone. 2 Nevertheless, nucleic acids are not simple linear information strings but can fold into intricate threedimensional shapes to form specific ligands (aptamers), sensors (riboswitches), and catalysts (ribo-and deoxyribozymes). Thus, within the same molecule, nucleic acids encode genetic information, the genotype (i.e., the sequence of nucleobases), as well as the phenotype (the three-dimensional fold or function) and thus can be evolved directly at the molecular level. 3,4 Despite this "Goldilocks" chemistry, variations to all three components (nucleobase, sugar ring, and backbone linkage) are possible and have been systematically explored by synthetic chemists, with a view to finding the critical chemical parameters for molecular information encoding, as well as the chemical etiology of life's genetic system. 5 These fundamental studies have uncovered the profound influence of even minor chemical variations on nucleic acid structure, conformation, and the capacity for helix formation and information transfer. These variations can in turn engender a wide range of non-canonical double helical structures, 6 as well as novel useful properties such as increased biostability. 7 However, deeper examination of the potential of these candidate genetic polymers for molecular heredity and evolution necessitates a replication system.
Non-enzymatic replication of nucleic acid monomers and oligomers has been explored with a view to understanding the emergence of replication at the origin of life, 8,9 as well as  enabling the evolution of non-canonical nucleic acid backbones. For example, non-enzymatic replication of peptide nucleic acids (PNA) (16), a potential alternative prebiotic genetic polymer, 10 has been explored using tetra-and pentanucleobase PNA units. 11 However, although non-enzymatic strategies potentially offer a wide range of chemistries (even beyond the chemical neighborhood of nucleic acids), their limited efficiency and challenges in decoding polymer sequences have so far restricted their application in selection experiments.
We reasoned that enzymatic replication could be immensely powerful, provided the stringent substrate selectivity of natural polymerases could be overcome. If polymerases could be engineered to accept non-cognate nucleotide-triphosphates as substrates at full substitution, they would provide customizable precision nanomachines for the encoded synthesis, replication, and evolution of novel synthetic genetic polymers. Furthermore, enzymatic synthesis and replication of non-canonical nucleic acid substrates by engineered polymerases might illuminate the mechanisms of substrate recognition, discrimination, and fidelity of replication in natural enzymes.

NATURAL POLYMERASES
Indeed, there are precedents for the efficient incorporation of unnatural substrates whereby chemical modifications on the nucleotide exploit a natural promiscuity of the polymerase active site. For example, modifications to the C5 position of pyrimidines (8) (or C7 (7) of N7-deazapurines) project into the major groove and cause relatively minor clashes with the polymerase or DNA structure even for large and bulky substituents. 12,13 Recent examples include incorporation of horseradish peroxidase (HRP) conjugated to dTTP by a linker at the C5 position 14 and nucleotides conjugated to side chains of the amino acids His, Ser, and Asp. 15 SomaLogic has systematically exploited this principle to create Slow-Off-Rate Modified Aptamers (SOMAmers), in which the C5 of dU has been modified to bear a range of hydrophobic substituents (9− 11). 16 Recent crystal structures of SOMAmers bound to their targets show a remarkable expansion in aptamer functionality, including folding motifs not previously seen in DNA. 17 Another route utilizes C5-ethynyl substituted pyrimidines, which are readily incorporated by polymerases (see above) and quantitatively functionalized with an azide-containing substitu-ent using copper dependent azide−alkyne Huisgen cycloaddition (CuAAC) "click-chemistry". This approach has been validated by the isolation of a DNA aptamer specific for green fluorescent protein, with binding dependent on "clicked" indole substituents (12). 18 Another approach is the design of nucleotides that fit the natural polymerase active site either through close geometric and electronic analogy with the canonical system or by an iterative medicinal chemistry approach. Both of these strategies have been successfully applied to the expansion of the genetic alphabet by the creation of novel base pairs; for example, Benner and colleagues devised four novel base pairs by permutation of the H-bond donor and acceptor groups on purine-and pyrimidine-like heterocycles. 19 Romesberg and colleagues screened a large number of hydrophobic compounds to eventually arrive at a highly efficient heteropair (4), that adopts native-like planar geometries in the polymerase active site 20 and could even persist in vivo when incorporated into a plasmid. 21 Using a similar shape complementarity approach, Hirao and colleagues designed artificial base pairs, including Ds and Px (2), which are retained in PCR over 100 cycles 22 and have enabled the selection of higher affinity aptamers to a number of targets. 23,24 ■ OVERCOMING POLYMERASE SUBSTRATE SPECIFICITY However, for many non-cognate substrates and indeed to access polymerase phenotypes not found in nature, engineering of novel polymerase variants is preferable. Early polymerase engineering efforts include the rational design of mutants with improved incorporation of dideoxynucleotides 25 for Sanger sequencing or improved incorporation of ribonucleotides (NTPs) by removal of the steric gate residue in the polymerase active site that precludes incorporation of nucleotides with a 2′ OH group (or other bulky 2′ substituents) by steric exclusion. 26,27 A key mutation for expanding the substrate spectrum of polymerases, A485L, was discovered by scientists at New England Biolabs in the 9°N exo-polymerase 28 (commercially sold as Therminator) and is transferable to a range of polB family polymerases. Although its mechanistic basis remains obscure, the Therminator mutation appears to improve the incorporation of a wide range of non-cognate substrates 29−31 and, together with adjacent residues, provides

Accounts of Chemical Research
Article the key to enabling Next Generation Sequencing (NGS) on the Illumina platform. 32 Polymerase variants with desirable properties can be discovered by selection (see below) or screening approaches ( Figure 2), including in vivo complementation, 33 highthroughput liquid-handling, 34 or polymerase arrays. 35 These approaches have yielded polymerases with useful properties including increased NTP incorporation, 36,37 RNA reverse transcriptase (RT) activity, 38 or 2′SeMeUTP (24) incorporation, 39 improved discrimination against mismatches 40 or epigenetic methylation marks, 41 or, as a result of screening variants of T7 RNA polymerase, a mutant efficient at transcribing all four 2′OMethyl (21) (2′OMe) nucleotides. 42 We developed an ELISA-like screening assay for polymerase function (polymerase activity assay, PAA) based on the capture of an extended biotinylated primer on a streptavidin surface followed by readout using a complementary digoxigenin (DIG)-labeled oligonucleotide and an anti-DIG HRP-conjugated antibody (Figure 2A). 43

POLYMERASE EVOLUTION
Larger polymerase repertoires can be examined using selection technologies such as phage display, whereby the polymerase and primer/template substrate are tethered proximally on the phage tip ( Figure 2B). 4,46 These approaches have identified polymerases with novel properties including RT activity, 47 improved NTP or 2′OMe-dNTP incorporation, and improved extension of a PICS−PICS unnatural base pair. 48 Recent improvements to the phage display method include expressing an unnatural amino acid, p-azidophenylalanine, on the pIII protein allowing a cycloalkyne-primer/template duplex to be attached by CuAAC click chemistry. 49 This improved display method yielded a Stoffel fragment mutant with the ability to transcribe fully modified 2′-OMe templates (60mer) and also reverse transcribe into DNA. Remarkably, another selected polymerase enabled PCR amplification with partial substitution of dNTPs with 2′OMe-dATP or 2′F-purines (17).
We sought to develop methodologies for the directed evolution of polymerases, where polymerase function could be interrogated in solution. The first selection strategy we developed, compartmentalized self-replication (CSR), 3 is based on a feedback loop, whereby the polymerase replicates its own encoding gene within the aqueous compartments of a water-in-oil (w/o) emulsion ( Figure 2C). In such a system, each polymerase replicates only its own encoding gene (to the exclusion of those in other compartments) and adaptive gains by the polymerase translate into an increase in the postselection copy number of the encoding gene. We developed thermostable emulsion mixtures that allowed selection of selfreplicating polymerases during PCR thermocycling 3 as well as emulsion PCR approaches used in several NGS and digitial PCR platforms. 50 CSR has proven a versatile method at evolving enzymes for different activities including polymerases with increased thermostability 3 and resistance to potent polymerase inhibitors such as the anticoagulant heparin 3 or a range of environmental inhibitors such as humic acid. 51 CSR also proved useful for the identification of polymerases with an expanded substrate spectrum including variants capable of PCR amplification from damaged DNA 52 or of amplifying DNA with complete substitution of dNTPs with phosphorothioate (αS)-dNTPs (5) thus creating all αS-DNA 53 or with a generic ability to extend or replicate large hydrophobic base analogues. 45 This polymerase also displayed a significantly enhanced ability to copy and PCR amplify bisulfite treated DNA, possibly due to an enhanced capacity to utilize dU as well as the bisulfite adduct 5,6-dihydrouridine-6-sulfonate (dhU6S) as template base and decode them correctly as dT. 54 This may enhance sensitivity and efficiency in the bisulfite sequencing workflow, a key methodology in epigenomics.
CSR has also been used successfully by a number of other groups and biotechnology companies for the discovery of novel polymerases, for example, in the isolation of Pfu mutants capable of incorporating nucleotides bearing a bulky γphosphate-O-linker-dabcyl substituent (6). 55 CSR also proved useful in evolving polymerases for the expansion of the genetic alphabet by improved incorporation of the unnatural base dZTP opposite its cognate dP template (3). 56 In a similar approach CSR was used to evolve RT activity using chimeric RNA−DNA primers requiring reverse transcription through the RNA portion for self-replication. 57 The most proficient RT demonstrated 3-fold improved fidelity compared to natural RTs. The selected polymerase was also able to reverse transcribe a 2′OMe template albeit with greatly reduced activity compared to its RNA reverse transcriptase activity.
CSR has been adapted to improve the thermostability of a mesophilic polymerase from the Bacillus subtilis phage Φ29 (Phi29). 58 Phi29 is a replicative polymerase with high fidelity, exceptional processivity and strong strand displacement activity with important applications in whole genome amplification and single molecule sequencing. 32,59 By freeze−thawing bacterial cells for lysis within a w/o compartment, the typical heat lysis step could be circumvented. Multiple displacement amplification of plasmids encoding Phi29 variants in emulsion allowed enrichment of library members with greater activity at higher temperatures. An evolved Phi29 mutant maintained activity at 40°C and generated up to five times more product than wild type (wt) Phi29. 58 Furthermore, deep sequencing of whole genome amplification products generated by the mutant polymerase displayed sequence coverage with reduced bias compared to wt Phi29.
CSR requires full replication of the >2kb polymerase genes, thereby imposing a rigorous adaptive burden restricting recovery to only the most active polymerases. To increase the sensitivity of the method, we devised short-patch CSR (spCSR), in which both diversification and self-replication is limited to a short, defined segment of the polymerase gene ( Figure 2D). spCSR has allowed the isolation of variants of Taq with an expanded substrate spectrum allowing enhanced incorporation and replication of 2′-substituted nucleotides 43 as well as a variant of Pyrococcus f uriosus DNA polymerase (Pfu), capable of complete replacement of dCTP with the fluorescent dye labeled nucleotides Cy3-and Cy5-dCTP in PCR. 44 The resulting CyDNA is brightly colored and highly fluorescent due to the dense display of cyanine heterocycles on the DNA helix. It also exhibits significantly altered physicochemical properties, including organic phase partitioning during phenol extraction and a 40% increased diameter as determined by atomic force microscopy. 44 CyDNA probes allowed significant signal gains in microarray applications and show promise for applications in super-resolution microscopy. 60

Article
We have continued technology development in the areas of polymerase evolution and design specifically for the detection and evolution of very weak and nonprocessive polymerase activities with challenging substrates. To this end, we have developed compartmentalized self-tagging (CST). 30 CST is based on a positive feedback loop, whereby a polymerase tags its own encoding gene (or rather the plasmid containing said gene) by extension of a metastable biotinylated oligonucleotide primer ( Figure 2E). Extension stabilizes the oligonucleotide− plasmid complex and enables the selective capture of plasmids encoding active polymerases. In contrast to CSR and spCSR, CST decouples selection from self-replication; thus recovery of a synthetic polymerase is not dependent on simultaneous RT activity within the same polymerase. We have systematically optimized CST working parameters for selection efficiency and sensitivity. 61 For even higher sensitivity, we developed compartmentalized bead tagging (CBT) with sensitivities below single turnover events per polymerase molecule, which was validated during the selection for improved RNA polymerase ribozymes. 62 CST proved a key enabling technology for the discovery of polymerases for synthetic genetic polymers with entirely unnatural backbones. 30 In these xeno-nucleic acids (XNAs), the canonical ribofuranose ring structure found in DNA or RNA is replaced with synthetic congeners. Specifically, we examined HNA (14, 1,5 anhydrohexitol nucleic acid), CeNA (13, cyclohexenyl nucleic acids), LNA (19, 2′-O,4′-Cmethylene-β-D-ribonucleic acids; locked nucleic acids), ANA (18, arabinonucleic acids), FANA (15, 2′-fluoro-arabinonucleic acid), and TNA (20, α-L-threofuranosyl nucleic acids), which has been proposed as a predecessor to the RNA world. 63 To expedite the discovery of XNA polymerase activities, we created 22 separate repertoires of Tgo, a polB family DNA polymerase from the hyperthermophilic archaeon Thermococcus gorgonarius. Diversity comprising phylogenetic variability as well as targeted random mutations at conserved positions was focused on short sequence motifs located within 10 Å of the nascent DNA strand and its hydration shell as modeled in the tertiary complex of the related polB family DNA polymerase from phage RB69. Together with the CST and PAA methods, this enabled the discovery of efficient XNA synthetases for all six XNAs. 30 To pinpoint potential key residues involved in XNA RT function, we used statistical coupling analysis (SCA) 64 of polB phylogeny to identify functionally important residues. Rooting SCA analysis to the vicinity (5 Å) of L408, a residue implicated in RNA RT activity in the related Pfu DNA polymerase, 65 we identified a single point mutation (I521L) in Tgo that displayed RT activity for several XNAs. The same mutation also provided

Accounts of Chemical Research
Article proficient TNA synthesis and RT activity in the same polymerase framework. 30 Prior to us, TNA synthesis and reverse transcription had been investigated by Szostak and Chaput. 66,67 Initially Therminator polymerase was shown to be adept at transcribing a defined DNA template into TNA 68 but failed to synthesize unbiased random TNA repertoires comprising all four nucleotides. However, using a three letter TNA alphabet (tA, tG, tT) and a TNA-display approach 69 Chaput and colleagues were able to isolate TNA aptamers against thrombin. To improve TNA synthesis and generality, they recently developed a microfluidic double emulsion droplet-based optical polymerase sorting method (DrOPS) to evolve polymerases. 70 DrOPS involves encapsulation of an Escherichia coli cell expressing polymerase in a w/o droplet with TNA triphosphates and a DNA primer−template duplex ( Figure 2F). After TNA synthesis, a second encapsulation step is employed to generate double emulsions. TNA products are detected using a fluorescent read-out allowing double emulsions to be sorted by flow cytometry. Focused libraries of 9°N DNA polymerase at positions that had previously been shown to expand the substrate specificity of Tgo 30 and 9°N 28 yielded TNA polymerases with improved activity and ∼40-fold higher fidelity.
Polymerase engineering experiments have also revealed key aspects of polymerase function ( Figure 3).
Specifically, CST and SCA approaches have begun to map out the broad structural features critical for processive XNA synthesis and XNA reverse transcription. Key mutations enabling XNA synthesis were found to cluster predominantly in the polymerase thumb subdomain in proximity to the nascent strand but unexpectedly distant (>20 Å) from the polymerase active site. We hypothesized that these mutations likely reshape a postsynthetic checkpoint region in the thumb domain. Indeed, careful dissection of selected mutations enabled the optimization of XNA as well as RNA polymerase activity. 31 Key to the engineering of the RNA polymerase activity was the identification of a critical gatekeeper mutation (E664K) in the thumb domain, which together with the steric gate mutation (Y409G) yielded a processive primer-dependent RNA polymerase from a DNA polymerase scaffold in just two mutational steps (TGK). 31 Primer dependent RNA synthesis is useful for example for the generation of mRNAs with defined cap structures and post-transcriptional modifications such as m 6 A m . 71 Combination of the E664K gatekeeper mutation with the previously identified I521L mutation in Tgo (RT521K) also yielded reverse transcriptase activity on LNA and CeNA. 30 While nucleic acids generally exhibit exclusively 3′−5′ phosphodiester backbone linkages, regioisomeric 2′−5′ linkages have been identified in nature and are involved in innate immune signaling. 72 The 2′−5′ linkages destabilize the DNA or RNA duplexes and have been postulated to facilitate primordial RNA replication and evolution. 73 Starting from the processive RNA polymerase TGK (see above), we discovered a polymerase that was capable of synthesizing DNA and RNA with sitespecific regioisomeric 2′−5′ linkages by full replacement of dATP or dGTP with the corresponding 3′deoxy-(23) or 3′OMe-ATP or -GTP (22) nucleotides. 74 Taq, avian myleoblastosis virus RT, and RT521K were adept at reverse transcribing partial 2′−5′-DNA and -RNA to canonical 3′−5′-DNA demonstrating genetic interconversion of 2′−5′ linkages to canonical 3′−5′ linkages in DNA and RNA.

■ APPLICATIONS OF SYNTHETIC GENETIC POLYMERS
The ability to synthesize and reverse transcribe XNAs demonstrated a capacity for genetic information storage and propagationmolecular hereditybeyond the natural nucleic acids DNA and RNA. This also enabled an XNA replication cycle progressing through a DNA intermediate (conceptually similar to retroviral replication) and opened up the novel XNA sequence spaces for exploration and Darwinian evolution. As a stringent test for evolution and the acquisition of higher order functions such as folding and specific ligand binding, we initiated selection of XNA aptamers directly from diverse repertoires of XNA sequences. We obtained multiple HNA aptamers directed against the HIV-TAR RNA motif or hen egg lysozyme (HEL), which bind their targets with high affinity and specificity. 30 DeStefano and colleagues exploited XNA polymerases for aptamer isolation, including a FANA aptamer that binds to HIV RT with picomolar affinity. 75 The demonstration that XNAs can fold into defined threedimensional structures and bind to diverse ligands with high affinity encouraged us to investigate whether XNAs could also support the evolution of catalysts. Indeed, we were able to discover a range of XNA catalysts (XNAzymes). These included RNA endonucleases elaborated in four different chemistries (ANA, FANA, HNA, and CeNA), an RNA ligase in the FANA system, and an XNA−XNA (FANA−FANA) ligase metalloenzyme (dependent on Zn 2+ and Mg 2+ ). 76 These results demonstrated catalysis in synthetic genetic polymers not found in nature and established technologies for the discovery of catalysts in a wide range of polymer scaffolds. XNA aptamers and catalysts exhibit increased resistance to nuclease degradation, which increases their potential for in vivo applications in comparison to DNA or RNA equivalents. XNA also provides the potential to open up more structural space with expanded motifs in aptamer or catalyst libraries.
Finally, we have recently been able to exploit the synthetic power of the engineered polymerases to synthesize and assemble the building blocks for nanostructures composed entirely of XNAs, such as the classic "Turberfield" tetrahedron elaborated in a range of chemistries including 2′F-RNA, FANA, HNA, and CeNA, as well as a FANA octahedron requiring synthesis of a 1.7kb FANA "origami" strand for assembly. 77 Future advances in methodologies for the synthesis, replication, and evolution of chemically ever more divergent genetic polymers should help to resolve questions such as the comparative phenotypic richness of the respective XNA versus DNA/RNA sequence spaces. We anticipate that such advances will also yield an increasing range of XNA catalysts and ligands that fully exploit their expanded range of physicochemical properties and biostability with potential applications ranging from medicine to nanotechnology.