tRNA sequences can assemble into a replicator

Can replication and translation emerge in a single mechanism via self-assembly? The key molecule, transfer RNA (tRNA), is one of the most ancient molecules and contains the genetic code. Our experiments show how a pool of oligonucleotides, adapted with minor mutations from tRNA, spontaneously formed molecular assemblies and replicated information autonomously using only reversible hybridization under thermal oscillations. The pool of cross-complementary hairpins self-selected by agglomeration and sedimentation. The metastable DNA hairpins bound to a template and then interconnected by hybridization. Thermal oscillations separated replicates from their templates and drove an exponential, cross-catalytic replication. The molecular assembly could encode and replicate binary sequences with a replication fidelity corresponding to 85–90 % per nucleotide. The replication by a self-assembly of tRNA-like sequences suggests that early forms of tRNA could have been involved in molecular replication. This would link the evolution of translation to a mechanism of molecular replication.

-3 -Joyce, 2016). In an interesting alternative to strand separation by temperature, Schulman et al 58 used moderate shear flows to separate DNA tile assemblies (Schulman et al., 2012).

59
In the past, metastable hairpin states have been prepared in a physically separated manner.

60
The reaction was triggered by mixing. For example, the mixing of hairpins with a trigger 61 sequence has been shown to form long concatemers (Dirks & Pierce, 2004). With a similar 62 logic, mixing a low entropy combination of molecules was used to create entropically driven 63 DNA machines, including exponentially amplifying assemblies (Zhang et al., 2009) . These 64 reactions run downwards into the binding equilibrium. However, the preparation of the required 65 initial low entropy state needs human intervention or a unique flow setting for mixing.   (Sharp et al., 1985), with stem loops consisting of 30-33 nt and the information-90 encoding interjacent domains of 15 nt. As the replication mechanism is based on hybridization 91 only it is expected to perform equally well for DNA and RNA. Here, we implemented the system 92 with DNA for practical reasons . Nevertheless, due to short heating times and very moderate 93 magnesium concentrations, we also estimate that an RNA version can survive for weeks (Li &

95
Replication mechanism. The replication mechanism is a template-based replication,

96
where instead of single nucleotides, information is encoded by a succession of oligomers. The 97 domain, at the location of the anticodon in tRNA, is the template sequence and thus contains 98 the information to be replicated. We therefore term it information domain. The goal is to replicate 99 the succession of information domains.

100
To allow longer replicates we chose the resulting meta-sequences to be periodic with a 101 periodicity of four different hairpins. This makes the minimal cyclic meta-sequence large enough 102 to keep the information domains accessible even in cyclic configuration. The information 103 domains feature a binary system and contain sequences marked by "0" and "1" (blue/red). For -5 -replication cross-catalytic in a subsequent step. Later, high temperatures spikes can unbind 117 and recycle all molecules for new rounds of replication.

118
Because of the initial fast cooling all hairpins are closed in free solution. This inhibits the 119 formation of replicates without template. While the binding of adjacent hairpins with template 120 happens within minutes, hairpins in free solution connect without template only on timescales 121 slower than hours and thus give false positives at a very low rate.

122
The core principle of this replication mechanism was previously explored in a minimal system 123 that amplified single hairpins into dimers (Krammer et al., 2012) . However, these experiments

138
All complexes were formed at concentrations of 200 nM of each strand and could be resolved 139 despite their branched tertiary structure. Friction coefficients of complexes of two to four strands 140 were 1.6-1.8-fold higher than for linear dsDNA, and 2.4-fold higher for larger complexes (4:4 141 configuration, ca. 660 nt, Fig. S1). This agrees with the branched structure of the suggested 142 strand assembly geometry (Fig. 1a). Partially assembled complexes of two or three strands 143 bound to a four-strand template could be resolved (Fig. S3). Complexes containing single 144 bound information domains were not stable during electrophoresis (Fig. 2, lane 2 and Fig. S3).

145
This allowed to differentiate fully assembled complexes from those where individual strands are 146 bound to a template but have not formed backbone duplexes. Covalent end labels and two 147 reference lanes on each gel were used to quantify concentrations from gel intensities using 148 image analysis as described in Methods.

149
Selection by agglomeration and sedimentation. For a replicator to be autonomous, there 150 must be a mechanism in place to select, assemble and (re-)accumulate its molecular 151 components purely at one location. We argue that DNA hydrogels could offer such a solution.

152
While DNA often, also in our case, assembles into agglomerates, DNA hydrogels have been

155
We combined eight matching hairpin sequences of design as introduced in Figure 1 at 156 moderately elevated concentrations and cooled the system to only 25 °C after separating the 157 molecules at 95 °C (Fig. 3). We found the spontaneous formation of agglomerates that were 158 large enough to sediment under gravity. The initial homogeneous fluorescence turned into 159 micrometer-sized grains and sedimented within hours. The fluorescence was provided by a 160 covalently attached label to either strand 0 A or 1 A . Since the double hairpins have a periodic 161 boundary condition they are able to create large assemblies (Fig. 3a).

162
The sedimentation was very selective. When only seven of the eight matching hairpins were 163 present, sedimentation was much weaker and, in most cases, undetectable (Fig. 3b, c). For the

184
The above results suggest that agglomeration could serve as an efficient way to assemble 185 matching hairpins from much less structured and selected sequences in an autonomous way.

207
Assembly rates showed a strong dependence on incubation temperature (Fig. 4c). At 39 °C, 208 the reaction proceeded significantly slower than at 42 °C or 45 °C. This is because the hairpins 209 are predominantly in closed configuration and cannot bind to neighboring molecules in the 210 assembly. Binding between complementary information domains still occurs, but the formation 211 of bonds between neighboring strands becomes rate limiting. Above the melting temperature 212 of the information domain (48 °C) (see Fig. S2), template-directed assembly becomes slower.

213
However, the slower kinetics of template-directed product formation are partially superposed 214 by the spontaneous product formation lacking an initial template (

254
The ability to withstand consecutive dilutions is characteristic for exponentially growing  (Fig. 5d). This high frequency of dilutions prevented the reaction from transitioning into 259 the saturating regime. The cross-catalytic model was fitted to the data with the dilution factor 260 -11 -as single free parameter, that was found to be 0.43. The difference from the theoretical value 261 of 0.50 was likely due to strands sticking to the reaction vessels before dilution. As a control, a 262 reaction with the same initial concentration of template 0 A 0 B 0 C 0 D , but without monomers 0 A , 263 0 B , 0 C , 0 D , was subjected to the same protocol. As the control could not grow exponentially, it 264 gradually died out (Fig. 5d, open circles).

265
Sequence replication. The above-mentioned reactions did amplify, but not replicate actual 266 sequence information, as they only contained strands with 0/0 information domains. To study 267 the replication of arbitrary sequences of binary code, replication reactions with all 16 strands 268 encoding for "0" and "1" were performed. To discriminate sequences encoded in equally sized 269 complexes and deduce error rates, we compared these results to those from different reaction

275
Leaving out a single strand (reaction label "+++−", e.g. leaving out 0 D for template 0 A 0 B 0 C 0 D ) 276 reduced the yield of full-size product to 40 % (Fig. 6a, b). Instead, mostly complex 0 A 0 B 0 C : 277 0 A 0 B 0 C 0 D (3:4) was formed, in particular during the first few cycles (Fig. S3). This was 278 expected given the lack of strand 0 D and provides an upper limit on the error rate of the full 279 replication. The fact that the full reaction produced almost no complexes 3:4 or 4:3 indicates 280 that the incomplete product was indeed caused by the lack of a particular strand.

281
Removal of a further strand either directly next to the previous one ("++−−", missing strands 282 0 C /0 D ) or not ("+−+−", missing strands 0 B /0 D ) reduced the yield of tetramers even further. 283 Replication of the other two templates 0 A 1 B 0 C 1 D and 0 A 0 B 1 C 1 D produced very similar results. 284 End points after 6 cycles are given in Fig. 6c for each of the three templates as well as an   (panels a, b), 0 1 0 1 , and 0 0 1 1 after 6

304
Replication fidelity. The observed rate of erroneous product formation can be attributed to 305 the spontaneous background rate (Fig. 4b, c and Fig. 6b). Reaction "+−+−" (dark green) 306 proceeded the same as the untemplated reference reaction (solid line), as it did not contain any 307 strands that could bind next to each other to the template and form a backbone duplex (Fig. 6b).  with 0* containing three point mutations, met that criterion (Fig. 7a). 335

-13 -
To calculate the per-nucleotide fidelity p, we then, for simplicity, assumed that the replication 336 did not differentiate between information domain 0 and any information domain 0* with less 337 than K point mutations. The fidelity per information domain ( ) is given by a cumulative 338 binomial distribution

339
(2) Here, N is the information domain length, and p the per-nucleotide replication fidelity. Using

374
The replication mechanism is expected to also work with shorter strands, as long as the order 375 of the melting temperatures of the information domain and the backbone duplexes is preserved.

376
Smaller strands would also be easier to produce by an upstream polymerization process, simply 377 because they contain less nucleotides. In addition, binding of shorter information domain 378 duplexes could discriminate even single base mismatches, resulting in an increased selectivity.

379
It is not straightforward to estimate a minimal sequence length for the demonstrated 380 mechanism. However, it is worth noting that it has been suggested that tRNA arose from two 381 proto-tRNA sequences (Hopfield, 1978).

382
Pre-selection of nucleic acids for the presented hairpin-driven replication mechanism can be 383 provided by highly sequence-specific gelation of DNA . This gel formation has been shown to

393
The replication mechanism could serve as a mutable assembly strategy for larger functional

400
The proposed replication mechanism of assemblies from tRNA-like sequences allows to 401 speculate about a transition from an autonomous replication of successions of information 402 domains to the translation of codon sequences encoded in modern mRNA (Fig. 1a). Short

423
Strand design. DNA double-hairpin sequences were designed using the NUPACK software 424 package (Zadeh et al., 2011). In addition to the secondary structures of the double-hairpins, the 425 design algorithm was constrained by all target dimers. Candidate sequences were selected for 426 optimal homogeneity of binding energies and melting temperatures. Backbone domains 427 connecting consecutive strands (e.g. 0 A 0 B 0 C ) had to be the most stable bonds in the system, 428 in particular more stable than between a template and a newly formed product complex (e.g.

464
For the determination of reaction yields, the intensities of all gel bands containing strands of the 465 sequence length of interest were added up. For strings of four strands, these were the single 466 tetramer as well as its complex with di-and tri-and tetramers. Single strands separated from 467 their complements during electrophoresis ( Fig. 2 and Fig. S3).     Zeiss, Germany) with two LEDs (490 nm and 625 nm, Thorlabs, Germany) using a 2.5 x 491 objective (Fluar, Zeiss, Germany). The observed sedimentation was independent of the 492 attached dye and its position (Fig. S6c). The ratio of sedimented fluorescence relative to the 493 first frame after heating was used to quantify sedimentation (Fig. 3). The sedimentation time-494 traces (Fig. 3b) were fitted with a Sigmoid function to determine the final concentration increase 495 c/c0 (Fig. 3c). The experiment was also performed with random 84 nt DNA strands at 5 µM total 496 concentration to exclude unspecific agglomeration (Fig. S6c).

512
The authors declare no competing financial interests