Unbounded growth patterns of reproducing, competing polymers—similarities to biological evolution

Since the origin of life the interplay between reproduction, variation, and selection has been driving the emergence of new species. The evolution of the Earth’s biosphere appears to innovate unceasingly instead of coming to a stall. Here, we introduce a model system of linear molecules where new polymers appear by spontaneous ligation. The polymers proliferate following a template-based mechanism. Our combined experimental and theoretical study shows that for sufficiently rapid autocatalysis the reproduction process selects particular lengths—while ever longer polymers emerge. We suggest similarities to biological evolution.


Rationale behind the DNA sequences used in the experiments
In this section we present and discuss the DNA sequences chosen for our experiments.
For the first experiment ( Fig. 3A) we used 20 b DNA single strands of sequence Σ = GGC ATC TTA GTA CAT ATC AG and its complement Σ c = CTG ATA TGT ACT AAG ATG CC. These sequences were chosen randomly, given that the melting temperature of the resulting double strand is in the range of highest ligase activity, and single strands form no secondary structures.
The experiment presented in Fig. 3C is based on sixteen pair-wise complementary 10 b long single DNA strands. Their sequences were chosen to minimize cross-hybridization among noncomplementary sequences. At the same time, ligated sequences that act as templates have to hybridize very specifically to avoid overhangs. The melting temperatures of the eight corresponding 10 bp double strands, need to be as close as possible.
To find appropriate sequences we use the following strategy: We initially generated random 10 bp single-stranded sequences and their complements. To narrow the expected melting temperatures, in the following we consider only those sequences with the same number of (C or G)s as (A or T)s. This way, we drastically reduce the computational time for the following steps. From the remaining pool of 290,000 sequence pairs, we create all possible sets, consisting of sixteen 10 b as well as eight 20 b and four 40 b pairwise complementary single-stranded sequences that result from ligating pairs of the 10 b sequences.
We determine the expected melting temperatures for all possible pairs of sequences in a set using melt.pl from the UNAFold [51,52] package for a fixed concentration of both DNA (1 µM) and salt (Na: 25 mM and Mg: 10 mM). We choose sets where the cross-hybridization temperature for ligation of the 20 b single-stranded sequences is below 7 • C, for the 40 b singlestranded sequences below 36 • C, and for the 80 b single-stranded sequences below 66 • C.
A suitable set of sequences for the sixteen pair-wise complementary DNA single strands following the conditions outlined above is composed of (and that we used in the experiment presented in Fig. 3C): S 1 = TGT GCC TTT G and S c 1 = CAA AGG CAC A, S 2 = ACA CCT GAT C and S c 2 = GAT CAG GTG T, S 3 = GGA GTT GGT T and S c 3 = AAC CAA CTC C, S 4 = ATT TGG ACC G and S c 4 = CGG TCC AAA T, S 5 = CTT ACA CGA G and S c 5 = CTC GTG TAA G, S 6 = CTA CAT GAC G and S c 6 = CGT CAT GTA G, S 7 = GGG AAG TTG T and S c 7 = ACA ACT TCC C, S 8 = GAA GAC TAG C and S c 8 = GCT AGT CTT C.
We chose the following pair-wise complementary 20 b sequences to result from ligation: S 1 S 2 = TGT GCC TTT GAC ACC TGA TC and S c 2 S c 1 = GAT CAG GTG TCA AAG GCA CA, S 3 S 4 = GGA GTT GGT TAT TTG GAC CG and S c 4 S c 3 = CGG TCC AAA TAA CCA ACT CC, S 5 S 6 = CTT ACA CGA GCT ACA TGA CG and S c 6 S c 5 = CGT CAT GTA GCT CGT GTA AG, S 7 S 8 = GGG AAG TTG TGA AGA CTA GC and S c 8 S c 7 = GCT AGT CTT CAC AAC TTC CC. The 40 b molecules have the sequences: S 1 S 2 S 3 S 4 = TGT GCC TTT GAC ACC TGA TCG GAG TTG GTT ATT TGG ACC G and S c 4 S c 3 S c 2 S c 1 = CGG TCC AAA TAA CCA ACT CCG ATC AGG TGT CAA AGG CAC A, S 5 S 6 S 7 S 8 = CTT ACA CGA GCT ACA TGA CGG GGA AGT TGT GAA GAC TAG C and S c 8 S c 7 S c 6 S c 5 = GCT AGT CTT CAC AAC TTC CCC GTC ATG TAG CTC GTG TAA G.
The 80 b molecules have the sequences: S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 = TGT GCC TTT GAC ACC TGA TCG GAG TTG GTT ATT TGG ACC GCT TAC ACG AGC TAC ATG ACG GGG AAG TTG TGA AGA CTA GC and S c 8 S c 7 S c 6 S c 5 S c 4 S c 3 S c 2 S c 1 = GCT AGT CTT CAC AAC TTC CCC GTC ATG TAG CTC GTG TAA GCG GTC CAA ATA ACC AAC TCC GAT CAG GTG TCA AAG GCA CA.

Quantification of DNA abundance
To quantify the amounts of DNA in a sample, we perform polyacrylamide gel electrophoresis (PAGE, Fig. S1A) with fluorescent staining (see Methods). Gel pictures are analyzed using ImageJ [54,55]. After subtracting background fluorescence, the total area beneath a peak is proportional to the amount of double-stranded DNA in that band (Fig. S1B,C). To quantify the amounts of DNA double strands, in Figure 3, we directly use the corresponding calibration data for DNA lengths matching the calibration DNA (Fig. S1B). For other lengths, we use interpolated values.
For the experiments presented in Figure 4, we normalized all fluorescence intensities to the fluorescence intensity of the DNA band of the smallest DNA molecules of the reference sample extracted at the time point zero. From these ratios we derive the relative amount of DNA molecules in each band.

Determination of the spontaneous ligation and autocatalytic proliferation rates
In this section we show how we estimate the template-directed proliferation rate constant α and the free ligation rate constant β from the experimental data. To determine the value of β , we consider situations where new molecules are mainly formed by spontaneous ligation. In that case, the proliferation of molecules A n is described by Here, the [A n ] denotes the number of double-stranded DNA molecules A n , and n corresponds to the number of base pairs of the initially present DNA double strands. Spontaneous ligation dominates in the early phases of an experiment, when predominantly the initial DNA strands are present. We obtain the value of β from   where t 1 is the time frame until the ligated DNA double strands A 2n can be detected for the first time.
We determine the autocatalytic reproduction rate through ] is approximated by the difference of the amount [A 2n ] between two successive time points. Note that in general the value of α obtained in this way presents an upper estimate, because other processes than free and autocatalytic ligation could contribute to the generation of A 2n .

Further enzymatic approaches to suppress overhang-ligation
In the experiment presented in Fig. 4C, Mungbean nuclease removes overhangs and thus suppresses overhang ligation. At the same time, this enzyme also digests single-strands in a se- ω 20 0-6 10.14 · 10 −6 10.35 6-12 8.75 · 10 −6 8.93 quence unspecific way. Digestion of freely ligated templates effectively inhibits the templatedirected reproduction of DNA strands. As a consequence, only a small population of 40 bp double strands emerges, and longer strands are no more detectable (Fig. 4C).
We check whether a reduced activity of Mungbean nuclease leads to the emergence of populations of longer than 40 bp strands. To this end, Mungbean nuclease is added for the first time at a later time point, namely, after six cycles. Before the initial addition, the samples are incubated as in the experiment presented in Fig. 3A followed by a single incubation step at 30 • C for 5 min.
Again, from the seventh cycle on, we use modified cycles: 15 min at 30 • C, 10 min at 60 • C, 1 min at 66 • C, 10 min at 60 • C, 1 min at 84 • C followed by 5 min at 30 • C. Furthermore, now only half of the concentration (0.5 units) of Mungbean nuclease is added to all samples at the beginning of each cycle. The concentration of Taq DNA ligase is 8.0 units/µl and the concentration of each of the two complementary 20 b single-stranded DNA of sequences Σ and Σ c is 500 nM. Fig. S3 shows that the reduction of the enzymatic activity of the nuclease leads to more intense 40 bp bands, but also to the appearance of weak 60 bp bands that result from overhangligation. Beyond the 60 bp double strands no longer DNA bands are detectable. This result supports our idea that unbounded elongation results from overhang ligation.
The two restriction enzymes AfeI and BstUI in the experiment presented in Fig. 4D reduce overhang formation since they digest the inhomogeneous DNA double strands of sequences σ σ c and σ c σ that primarily form overhangs. However, the homogeneous DNA double strands of sequences σ σ and σ c σ c are not digested. These strands also have a low probability for Table 2: Autocatalytic proliferation rate α[A n ] and its ratio ω n to the free ligation rate for the experiment presented in Fig. 3A. [A n ] denotes the number of double-stranded DNA molecules A n , and n corresponds to the number their base pairs. t 1 and t 2 define the time interval (cycles) in which the autocatalytic redproliferation of a [A n ] is analyzed. The average value for ω ≈ 40.    . We observe that 60 bp strands occur, however, unbounded strand elongation cannot be seen.
overhang formation that leads again to unbounded strand elongation in the course of the experiment. In the following, we combine Mungbean nuclease, AfeI and BstUI, to completely suppress overhang-ligation and to test if overhang ligation ceases. The incubation scheme and the concentration of the Taq DNA ligase agrees with the experiment presented in Fig. 4D. However, temperatures are increased a later time point, namely, after 22 cycles. Also, the concentration of the 10 bp strands is reduced to 200 nM. As in the experiment presented in Fig. 4D, 1 unit of AfeI and of BstUI are added to the samples after eight and twelve cycles. 0,5 units of both enzymes as well as 0,25 units of Mungbean nuclease are added after 15 and 18 cycles. A final addition of 0,5 units of the two restriction enzymes is performed after 22 cycles. Fig. S4 shows a regime where 10 bp, 20 bp and 40 bp double strands can be detected, whereas the DNA abundance of the 30 bp double strands is below the detection limit. However, the resulting abundance of the 40 bp strands does not suffice to form detectable amounts of longer DNA strands.

Duration of the plateau-phase
In the supercritical case α > α c , there are periods during which one molecule length dominates the system (Fig. 3D, S5). For further quantitative analysis, we define the molecule of length A n to dominate if n[A n ] exceeds a threshold-value that we fix to 70% of [A 1 ] at t = 0. The qualitative results do not depend on the exact threshold-value.
In periods with a dominating molecule length, the corresponding number of molecules changes only weakly with time (Fig. S5). There are two possible mechanisms leading to this plateauphase: first, the production and loss rates of the dominating molecule length balance each other (Fig. S5A). Second, both rates are essentially zero (Fig. S5B). In the latter case the average duration of the plateau-phase is and the distribution of lifetimes is exponential, 1−e −t/τ (Fig. S6) indicating that the appearance of a new length-doubled molecule is a Poisson process. This occurs if autocatalytic reproduction is very large so that all shorter molecules are quickly consumed in formation of the dominating molecules. Further changes of the length distribution of the molecules is inhibited until a new length-doubled molecules forms spontaneously. In our simulations with [A 1 ] = 2 17 this occurs for α ≥ 10 3 . The values of τ obtained from the simulation agree with the theoretical values (4): τ A 4 = 2.35 · 10 −10 , τ A 8 = 9.37 · 10 −10 , τ A 16 = 3.79 · 10 −9 , and τ A 32 = 1.50 · 10 −8 (Fig. S6). Furthermore, with increasing α the periods of coexistence shorten (Fig. S5).
The dependence of τ on [A n ] remains the same as in the previous case if the formation and disappearance rates of the dominating molecule length are finite (Fig. S7). Deviations can be detected only if α is close to α c .  Figure S5: Plateau phases in the stochastic simulations. A) Plateau phases due to balanced production and loss (same data as in Fig. 3D). The preferentially occurring molecules initially grow exponentially (dashed lines). The characteristic times are 6.67(5) · 10 −10 for A 2 , 2.63(7) · 10 −9 for A 4 , 1.1(6) · 10 −8 for A 8 , 1.0(5) · 10 −8 for A 16 , 2.1(4) · 10 −7 for A 32 , and 1.1(4) · 10 −6 for A 64 . Fluctuations of the plateau duration are small compared to the duration. B) Plateau phases for α = 10 3 α c .

Exponential growth during plateau phase
Let us now consider the case when the production and loss rates equal each other. Let α > α c , β be negligible and A 2n be the dominating molecule length. In this case A 4n grows exponentially ( Fig S5A). To understand this exponential growth let us neglect reactions of all molecules other than A n , A 2n , and A 4n . We are thus left with ] = 0 and consequently the decay of A n is also exponential. The characteristic timescale is α −1 [A 2n ] −2 . Simulations confirm these relations (Fig. S5A).

Effects for α > α c
We observe that during the plateau phase of [A 2n ] not all molecules [A n ] are necessarily concatenated. In this case, following an initial decay after the plateau, A 2n can rise again because A 4n is consumed to produce A 8n (Fig. S8 for n = 1). If A k is not completely consumed during the plateau phase of A l , autocatalytic production of A k+l may be initiated. This can lead to co-development of other molecule families.

Distribution of appearance times
In Figure S9 we present the average times of appearance of molecule A n as a function of n. This dependence is quadratic for α > α c . For α = 0.1, t app,n ∼ n 2.10(2) , the red line is a guide to the eye. Figure S10: Consequences of backreactions, disintegration into monomers. Parameters as in Fig. 3D but δ = 1, 000. Length-doubling is virtually unaffected by the back reactions and the molecules with n = 3 · 2 k remain readily detectable.

Back-reactions
As we did not observe DNA breakage in the experiments, we neglect back reactions in our theoretical analysis. To give a measure on how strong the impact of back reactions is on the stability of the dynamics of the preferentially occurring molecules, we consider disintegration into monomers A n − → δ nA. For rates δ < 1000, the time course is essentially the same as for the case without back reactions, δ = 0 (Fig. S10).