Engineering multistate DNA molecules: a tunable thermal band-pass ﬁ lter

Engineering of biopolymer ‘ partner folds ’ that exist in competitive equilibrium with the native state to produce exotic behaviours remains a relatively unexplored area of molecular engineering. Previously, a temperature-sensitive DNA nanodevice that operates by harnessing such a partner fold to implement a thermal band-pass ﬁ lter was proposed, modelled, and experimentally validated. Due to its peculiar hill-shaped ef ﬁ ciency pro ﬁ le, which differs markedly from the sigmoidal melting curves of simple DNA hairpins, this device could be used to implement temperature-speci ﬁ c control of other molecular machines, and thus represents a promising biotechnological advance. However, no effort was made to examine the detailed dependencies of the peak temperature T † , width Δ T 50 , and maximum ef ﬁ ciency ε max on the stabilities of device components. In this work, closed-form expressions for T † and ln ε max are derived and validated. The functional behaviours of these expressions are then examined and harnessed to construct an ef ﬁ cient algorithm for producing designs with target T † and Δ T 50 values and optimised ε max , thereby establishing the feasibility of algorithmic device design. Method effectiveness is validated via production of a target ﬁ lter, with detailed simulations of device behaviour. Finally, a discussion is presented regarding model effectiveness, extension, and scope.


Introduction:
The DNA molecule has recently attracted substantial attention as a material for constructing functional devices, allowing the programming of structural changes and self-assembly at the nanometre scale via sequence design according to Watson-Crick (WC) base-pairing rules [1, 2]. Numerous molecular algorithms and devices based on hairpin folding have been proposed for various applications, including the Whiplash polymerase chain reaction [3], molecular memory [4], and as simple sensors [5]. Furthermore, DNA hairpin formation, along with extension by polymerase, is one of the most important operations in the theory of DNA computing models [6]. However, attention to substable 'partner folds' has largely been restricted to error estimation and prevention [7].
Meanwhile, although multistate RNAs are regarded to play a role in RNA evolution [8], relatively few studies have addressed the design of such partner folds, which may exist in equilibrium with the native state, to achieve custom behaviours. In this regard, in [9], a temperature-sensitive bistable DNA hairpin-based nanodevice was proposed, modelled in terms of partner-fold occupancy, and experimentally validated as a potential platform for exploiting partnerfold behaviour to implement a thermal band-pass filter. In related work [10], an algorithm was proposed for designing a bistable RNA to implement a T-dependent structural switch, although the development focused on the energy barrier between the alternative structures, rather than tuning the occupancy of the partner fold to support custom application, as in [9].
As shown in Fig. 1, this nanodevice consists of a single DNA strand capable of transforming between two competing hairpins (dominant fold, S 1 ; less stable partner fold, S 2 , which is the device target) in response to changes in T. As shown in Fig. 2 for the device encoding in [9] under an improved parametrisation (cf., Section 2.4), S 2 exhibits a hill-shaped thermal stability characterised by a peak value e max at temperature T † and width ΔT 50 . This behaviour, which differs remarkably from the S-shaped melting curves of simple hairpins, was experimentally observed in [9] via fluorescence measurements (0.95 correlation with predicted behaviour), confirming both the hill-shaped device efficiency and the ability of an equilibrium model to provide quantitative predictions.
As noted in [9], the unusual thermal response of this DNA device is promising, as it might be used to implement a tunable thermal band-pass filter. If practically designable to operate at target T † and ΔT 50 values, this device could be applied to control DNA selfassembly, or as a multi-purpose, T-dependent modulator of chemical circuits (e.g. control a polymerisation reaction via gated activation), similar to the use of RNA hairpins in nature as in-vivo thermometers to regulate transcription [11]. Accordingly, in the current work, the statistical thermodynamic model of this device is extended to derive closed-form expressions for T † and ln e max . Based on the predicted scaling behaviours, an algorithm is then constructed for producing designs with targeted T † and ΔT 50 values and optimised e max , establishing the feasibility of algorithmic design. Both model validity and method effectiveness are verified in the context of production of a target filter design, with simulations of device behaviour at each step. A preliminary version of partial results was presented at IEEE-NEMS 2016 [12].
2. Statistical thermodynamic analysis 2.1. Device efficiency: An expression for the device efficiency, e at equilibrium has previously been derived [9], using the tools of equilibrium chemistry in [13]. For clarity, an abbreviated treatment is also presented here. The derivation begins by invoking strand conservation for the coupled equilibrium shown in Fig. 1, yielding C tot = C ss + C 1 + C 2 , where C tot denotes the total strand concentration, and C ss , C 1 , and C 2 are the equilibrium concentrations of melted coils and hairpins S 1 and S 2 , respectively. The mass action expression for each simple folding equilibrium has the form C i = C ss K i , where K i is the equilibrium constant for formation of fold S i . Equating device efficiency to the fractional occupancy of the target fold, e = C 2 /C tot , followed by insertion of the above expressions with rearrangement yields The equilibrium constant for formation of each folded hairpin, i = {1, 2} may be estimated via the Gibbs factor where R is the gas constant, T is the absolute temperature, and DG • i is the Gibbs free energy, which is estimated via 2.2. Device peak temperature, T † : A closed-form expression for T † , which corresponds to the peak of the device efficiency curve, may be derived from (1) using standard analytical methods for locating an extremum, as follows. The derivation begins by defining the quantity, j ; (1 + K 1 )/K 2 , so that Calculation of the derivative of e then proceeds, as follows and This expression may be reduced via the following basic relations Insertion of these expressions for K 1 and K 2 , with rearrangement then yields Insertion into (3), followed by evaluation at T † and equating the result with zero to correspond to an extremum yields where ' †' denotes evaluation at T † . As the first factor is strictly positive, the last factor must be equal to zero Expansion of K † 1 via (2), and taking the logarithm of both sides with rearrangement then yields Finally, rearrangement to isolate T † yields the desired result where a ; DH • 1 /DH • 2 is a useful parameter for discussing system design. This equation may also be expressed in terms of T m (1) = DH • 1 /DS • 1 , the melting temperature of S 1 in isolation, as As discussed in Section 3.2, the tendency of T † to approach T m (1) with increasing α provides insight towards algorithmic filter design. In this regard, the value, α = 2, at which point T † = T m (1), represents an important special case, and is termed as the design zero-point (ZP) for a given design process with fixed T m (1).
2.3. Device peak efficiency, e max : Equation (4) provides a convenient method of obtaining an exact value for e max via insertion into (1) as e(T † ) for any encoded instance, and also enables visualisation of the accompanying scaling behaviours. However, characterisation of the functional forms of these scaling tendencies requires a closed-form expression for e max . A useful approximate form may be produced via successive approximation, as follows. First, evaluating (1) at T † and noting that K † 1 ≫ K † 2 , by design, yields the form For the purpose of deriving a closed-form expression for e max which Fig. 1 Bistable DNA nanodevice formed by two competing hairpins, S 1 and S 2 , with stem lengths L 1 = 13 and L 2 = 12 bps, and codewords for hybridisation separated by poly-T spacers, sp3 and sp5, with lengths L sp3 = 28 and L sp5 = 29 bases. The less stable partner fold, S 2 is the targeted fold of the device is useful in clarifying the scaling behaviour of designs in the primary design range, viz., beneath ZP (i.e. 1 < α < 2), this expression can be further simplified via application of a second, less general approximation, K † 1 ≫ 1, which yields This approximation can be expected to perform reasonably well for designs operating well below ZP (i.e. with large K † 1 values). As a concrete example, for the reference encoding, K † 2 ≃ 1.2 and K † 1 ≃ 10, which corresponds to a relative error of η ≃ 0.22 between the two approximate forms. However, at ZP, K † 1 = 1 and η ≃ 1, so that agreement has degraded to order-of-scale.
Combining the exponentials in (6) and taking the logarithm of both sides, followed by isolation of T † on the left-hand side and equating the result with (4a) for T † then yields Elimination of T † , with rearrangement to isolate ln e max yields where β is defined for convenience as b ; ln (a − 1) + DS • 1 /R. However, as the magnitude of the logarithmic term in β is very small relative to that of the trailing term for all relevant designs, b ≃ DS • 1 /R. Insertion of this approximation then yields Finally, this expression can be simplified as 2.4. Model implementation: A statistical thermodynamic approach was used to model the K eq for each device hairpin [13], employing a nearest-neighbour (NN) model of duplex energetics for each hairpin stem, and a Jacobson-Stockmeyer (J-S) power law to model the loop entropy. The NN model was applied using the parameters in [14] for estimating the stacking enthalpy and entropy of WC base-pair (bp) doublets. Based on conditions in [9], a net ionic strength of [Na + ] = 0.165 M was adopted for all simulations, necessitating an entropic correction for variation from 1.0 M Na + . The energetic penalty for helix unwinding at the two duplex ends was estimated using the helix initiation parameters in [14], which estimate the cooperativity parameter, σ, when applied as a statistical weight. As an improvement over the original model parameterisation in [9], the statistical weight of hairpin loop closure was modelled using a J-S inverse 2.44 power law, as suggested by Goddard et al. [15], and applied in [14] in a piecewise manner for the combined modelling of all types of DNA loops. However, for theoretical clarity, a pure J-S power law employing this exponent was selected, as DS • loop = −2.44R ln (n + 1). Parameterisation for mean-case NN energetics (hereinafter, 'mean-case') simulations was performed similarly, using: (i) mean WC doublet stacking parameters, DH • nn and DS • nn obtained via the respective averages over the NN parameters in [14]; and (ii) mean initiation parameters, DH • nuc and DS • nuc obtained via respective averages over the accompanying parameters for initiation of helices with GC and AT terminal bps. Note that in terms of the net ΔH°and ΔS°of hairpin formation, this yields the linear form For clarity, each hairpin K eq was estimated as a product of the subweights for loop closure, initiation, and stacking, from the respective ΔH°and ΔS°values, as K eq = s exp (−DG • stem /RT) (n + 1) −2.44 , where DG • stem is the net free energy of stacking. These equilibrium constants were then combined with the derived model equations to simulate behaviours for both the original device encoding and mean-case scaled implementations, under their respective parametrisations. Simulations were performed via 3. Results 3.1. Functional behaviour of T † : An examination of the behaviour of T † against the stabilities of hairpins S 1 and S 2 is essential for gaining traction in development of an algorithmic method for device design. First, (4) displays a surprising lack of dependence on DS • 2 . In addition, the form of (4b) indicates that T † / T m (1), suggesting a strong sensitivity to variations in the thermal stability of the dominant hairpin, S 1 . In contrast, the placement of DH • 2 inside the logarithm indicates a much lower sensitivity to variations in the stability of S 2 . Finally, (4b) indicates that T † reduces to T m (1) at ZP, where DH • 2 = DH • 1 /2. This represents a useful special case that divides the T † values of designs obtained by successive decreases in L 2 into two very roughly linear regimes in the vicinity of T m (1): (i) for DH • 2 , DH • 1 /2 (more-stable S 2 ), T † , T m (1), and approaches T m (1) from below; and (ii) for DH • 2 . DH • 1 /2 (less-stable S 2 ), T † . T m (1), and diverges from T m (1) above.
To illustrate these behaviours, Fig. 3 presents a mean-case simulation of variations in T † against changes in the system's component hairpin stem lengths, which confirms the predicted relative insensitivity to changes in L 2 . In addition, Fig. 3b shows the scaling behaviour of T † for a specific set of fixed L 1 values, with T † values plotted relative to the fixed T m (1) value for each curve, indicating that T † slowly approaches T m (1) from below as ΔL is increased, reaching T m (1) near L 2 = L 1 /2, as predicted.
3.2. Functional behaviour of e max : The form of (8) indicates a log-linear relationship between e max and each of the bulk thermodynamic parameters of S 2 . However, since DH • 2 and DS • 2 directly oppose, the net dependence on L 2 is not immediately evident. Given adoption of a mean-case NN model (9), this dependence can be clarified via differentiation, yielding where g ; 1 − T m (1)/T m (1), and T m (1) ; DH • nn /DS • nn corresponds to the T m value of a hairpin with mean NN energetics in the infinite stem-length limit, and thus bounds T m (1), above. Accordingly, γ is strictly negative, so that ln e max ∝ L 2 with a constant positive slope. Expressed in terms of finite changes, this equation indicates that an isolated unit decrease in L 2 will be accompanied by a constant log-linear efficiency decrease of The behaviour of ln e max against L 1 can be addressed similarly, Unlike (10), this expression is a complex function of the independent variable, so that a discussion in terms of finite changes is more involved, and is thus deferred to future study. However, it indicates that isolated, small variations in L 1 will act to oppose like variations in L 2 in terms of effect on ln e max , and will be dampened by a factor of 1/α, which approaches 1/2 at ZP. Accordingly, given that T † is primarily fixed by L 1 , algorithmic attempts to offset efficiency losses due to repeated L 2 decreases during design via a compensating increase in L 1 appear unlikely to be well motivated.
To illustrate these behaviours, Fig. 4 presents a mean-case simulation of variations in ln e max against changes in L 1 and L 2 , as predicted by (8), along with exact values obtained by e(T † ) via (4b). As shown in log 10 scale for clarity, both curve sets predict e max to scale log-linearly against L 2 , but to have a more complex dependence on L 1 . Based on these curves along with values listed in Table 1, (8) provides an effective means of examining the scaling behaviour of e, and a reasonable approximation of ln e max for specific designs (within ≃ 10%). However, when converted to e max values, agreement deteriorates to order of scale at ZP, as predicted.
The scaling behaviour of ln e max against n 2 may also be addressed similarly. Adoption of a mean-case NN model and approximating ln (n + 1) as ln (n) in the J-S loop entropy, due to this architecture's large loops, followed by differentiation yields d( ln e max ) dn 2 ≃ − 2.44 n 2 , which indicates a log-log relationship between e max and n 2 . This equation may be re-expressed in terms of finite changes via separation of variables and definite integration of the respective sides over d(e max ) and dn 2 , yielding where '(0)' denotes initial design values. This equation predicts an isolated fractional decrease in n 2 to be accompanied by a fractional increase in e max , as discussed more fully in the next section, in the concrete context of the proposed design method, viz., Algorithm 1 (see Fig. 5).
3.3. Algorithm for filter design: Although no closed-form expression for predicting the scaling behaviour of ΔT 50 is available, prospective simulations in [9] indicate that ΔT 50 consistently narrows with increasing ΔL. Combined with this empirically determined behaviour, the behaviours predicted by (4b) for T † suggest an efficient method for targeted filter design. In particular, if the device is designed to initialise T m (1) to roughly the desired T † target value, then ΔL can be iteratively increased from unity by decreasing L 2 , while the stability of S 1 is maintained via fixed L 1 and L sp5 . Successive increases in ΔL will then result in a gradual decrease in ΔT 50 , with a simultaneous approach to T † target . Convergence occurs when a suitably narrow ΔT 50 value is obtained, given that T † remains suitably close to T † target . Herein, each L 2 reduction is implemented via a 3′-word-5′-trim, which deletes the 5′ base of the 3′ DNA codeword for S 2 formation. While this also increases n 2 by unity, causing a small change in DS • 2 as a side-effect, T † is predicted to be invariant to this change.
The scaling behaviour predicted by (14), combined with the invariance of T † to changes in DS • 2 suggests the utility of a postconvergence maximisation of e max via an isolated reduction in n 2 . However, in the absence of a closed-form expression for ΔT 50 , the accompanying effect on curve width is unclear. Accordingly, this optimisation step is implemented via an iterated, single-base trimming of sp3, while monitoring ΔT 50design , and halting if W-convergence is lost, which ensures that the performance achieved during convergence is maintained.
A substantial simplification may also be achieved by adopting a mean-case approach, which exploits the dependency of device behaviour on the encoded bulk thermodynamic properties, rather than specific sequence. By effectively classifying large sets of encodings as thermodynamically equivalent, this approach obviates the need for repetitive, computationally intensive sequence-specific search. This approach is notably less fruitful in the related problem, DNA Strand Design [16], which seeks to design strand sets with minimal mis-hybridisation, which is highly sequence dependent.
This design strategy is expressed formally as Algorithm 1 (see Fig. 5). The initial seed of the design process is taken as the reference structure in Fig. 1, implemented using mean-case NN energetics and appropriate variations of L 1 and L 2 . The process goal is production of a system design which satisfies the following two properties: (i) T-convergence: DT † ; |T † target − T † design | ≤ s T ; and (ii) W-convergence: ΔT 50design ≤ ΔT 50thr , where σ T is a tolerance for T † from its target value and ΔT 50thr is a threshold maximum width. It is also necessary to account for the possibility that input condition ΔT 50thr is too narrow to permit W-convergence. In this case, T † will rise above T m (1) until T-convergence is lost, a detectable condition referred to as 'T-divergence', which serves as a test for non-convergent halting. Note that Algorithm 1 (see Fig. 5) also employs several helper functions: InitLG(T † target ), which uses the model in Section 2.4 to determine the L 1 value producing the hairpin T m nearest to T † target ; TDag(L 1 , L 2 ), which computes T † using (4a); and, Width(L 1 , L 2 , L sp3 ), which computes ΔT 50design via numerical estimation of T − and T + (cf. Fig. 2) using (1) and (4a).
As a concrete example, Fig. 6 shows the results of application of Algorithm 1 (see Fig. 5) to produce a filter design with target values: T † target = 60.0 • C, which is intentionally near that of the meancase implementation of the reference system in Fig. 1, σ T = 2.0°C, and ΔT 50thr = 12.5°C. As shown in Fig. 6 for efficiency plots for ΔL = 1-5 bps, application of Algorithm 1 (see Fig. 5) results in the narrowing of ΔT 50 for successive structures, while the accompanying T † values simultaneously approach T m (1). For clarity, the trajectory taken by the algorithm during this process is also shown in Figs. 3 and 4. Algorithm 1 (see Fig. 5) produced an initial S 1 stem length of L 1 = 13 bps (T m (1) = 61.0 • C) and converged after five iterations, yielding a first design with {L 1 = 13, L 2 = 8; ΔL = 5} bps, and operating characteristics: T † = 60.4 • C, ΔT 50 = 12.1°C, and e max = 0.00036. The detailed thermal profile of this untrimmed design is shown in the panel inset of Fig. 6. As shown in Table 1, T-and W-convergence were achieved after three and five iterations, respectively. For completeness, T † values provided by (4a), along with numerical (exact) values for T † , e max , T − , T + , and ΔT 50 are shown in Table 1 for the sequence-specific simulation in Fig. 2 as well as the mean-case simulations in Fig. 6. As expected, the agreement between T † values produced by (4a) and numerical values is exact. As anticipated, numerical values obtained for ΔT 50 with fixed L 1 decrease consistently with decreasing L 2 . Table 1 also shows both exact and approximate values for log 10 e max provided by e(T † ) and (8), respectively. Although the iterative 3′-word-5′-trim of Algorithm 1 (see Fig. 5) converged successfully, it also resulted in a log-linear reduction in e max , as expected from the scaling behaviour of (8). The magnitude of this efficiency loss was predicted at D( log 10 e max ) ≃ −0.50 per step by e (T † ), which was reasonably well approximated by (8) as ≃ − 0.45. The individual contributions due to the 1 base L 2 decrease and n 2 increase inherent in each 3′-word-5′-trim were resolved into approx. − 0.43 and − 0.014 per step, respectively, via appropriate application of (11) and (14), showing that the latter change makes only a minor contribution of about 3% to the net efficiency decrease, as expected.
4. Discussion: By exploiting the functional behaviour of T † with respect to its tendency to approach T m (1) with an iterated decrease in L 2 , and its invariance to changes in DS • 2 , which suggests the effectiveness of an e max -optimising final sp3 trim, Algorithm 1 (see Fig. 5) establishes the feasibility of algorithmic device design. Based on the success of the design process in Figs. 6 and 7, this method is expected to be effective in producing filters with tailored T † and ΔT 50 values, and optimised e max , given reasonably narrow target widths. Nevertheless, the scaling behaviour and tunability limits of ΔT 50 require further investigation. While a monitored stepwise reduction in n 2 , as implemented via the final sp3 trim of Algorithm 1 (see Fig. 5) is well motivated given lack of information regarding the functional form of ΔT 50 , its insensitivity to the trim suggests that like T † , it may be invariant to DS • 2 . If so, this operation could be replaced with simple elimination of n 2 regions with no other function, which would aid the design of device variants with different word placement. Accordingly, derivation of a closed-form expression for ΔT 50 is required to clarify these points. In addition, the co-tuning of n 1 with L 1 during initialision of ZP remains an unexplored avenue to achieve more flexible device tuning, which bears investigation.
A practical consideration is the need to produce specific encodings for the mean-case designs generated by Algorithm 1 (see Fig. 5). Fortunately, selecting DNA sequences to encode oligonucleotide helices with desired ΔH°, ΔS°, and T m values is known to be easy [14]. Accordingly, the selection of device encodings which closely satisfy a set of thermodynamic characteristics determined via mean-case simulations is not likely to be challenging. Another issue involves the practicability of filter applications, given a relatively low ε max . As shown in Table 1, the narrowing of ΔT 50 implemented by Algorithm 1 (see Fig. 5) carries the cost of a reduced occupancy of S 2 , in solution. However, the very high copy number typical of DNA experiments indicates that this efficiency level will be adequate, in practice. For instance, the wet-lab implementation in [3] employed ≃ 1.2 × 10 13 DNA strands immobilised onto streptavidin beads in a 400 μL reaction volume. An equivalent implementation of the designed filter operating at T † = 60.4 • C at e max = 1.09 × 10 −3 would contain a plentiful 1.31 × 10 10 properly folded target structures at equilibrium, which would suffice for applications that do not require a large excess of product.
As noted in Section 1, the engineering of partner folds to implement exotic behaviours, such as a thermal band-pass filter, is largely unexplored. In this regard, the bulk equilibrium model developed herein (e, T † , e max ) is general in scope for multistate biopolymer engineering. Accordingly, this model has potential applications beyond the current device, for engineering the behaviours of partner folds in other contexts and biopolymers (e.g. RNA, proteins).   Efficiency behaviour for designs produced on the path to convergence during an implementation of Algorithm 1 (see Fig. 5) with fixed L 1 = 13 bps and ΔL = 1-5 bps, prior to application of the final sp3 trim. Inset: scaled curve for the pre-trim convergent design