Sequence-Selective Formation of Synthetic H-Bonded Duplexes

Oligomers equipped with a sequence of phenol and pyridine N-oxide groups form duplexes via H-bonding interactions between these recognition units. Reductive amination chemistry was used to synthesize all possible 3-mer sequences: AAA, AAD, ADA, DAA, ADD, DAD, DDA, and DDD. Pairwise interactions between the oligomers were investigated using NMR titration and dilution experiments in toluene. The measured association constants vary by 3 orders of magnitude (102 to 105 M–1). Antiparallel sequence-complementary oligomers generally form more stable complexes than mismatched duplexes. Mismatched duplexes that have an excess of H-bond donors are stabilized by the interaction of two phenol donors with one pyridine N-oxide acceptor. Oligomers that have a H-bond donor and acceptor on the ends of the chain can fold to form intramolecular H-bonds in the free state. The 1,3-folding equilibrium competes with duplex formation and lowers the stability of duplexes involving these sequences. As a result, some of the mismatch duplexes are more stable than some of the sequence-complementary duplexes. However, the most stable mismatch duplexes contain DDD and compete with the most stable sequence-complementary duplex, AAA·DDD, so in mixtures that contain all eight sequences, sequence-complementary duplexes dominate. Even higher fidelity sequence selectivity can be achieved if alternating donor–acceptor sequences are avoided.


■ INTRODUCTION
The encoded recognition properties of nucleic acids are currently unrivaled in any other material. High-fidelity sequence-selective duplex formation is the molecular basis for replication of the genetic information encoded by DNA and is finding widespread applications in the programmed assembly of complex nucleic acid nanostructures. 1 There is no fundamental reason that these properties should be restricted to biological polymers, and a range of synthetic nucleic acid analogues have been demonstrated to form duplexes. 2−5 In principle, any synthetic polymer equipped with complementary recognition units has the potential to show sequence-selective duplex formation and the associated properties found in nucleic acids. Figure 1a shows a minimalist blueprint for such polymers. A two-letter recognition alphabet would be sufficient to encode sequence information in binary form. Then all that is required is reliable chemistry for the synthesis of oligomers and a compatible backbone to link the components together.
A number of synthetic duplex-forming oligomers have been reported, 6−10 and in some of these systems, it was possible to investigate the effect of changing the sequence of the building blocks. Lehn et al. showed that oligomeric bipyridine and terpyridine ligands form duplexes with complementary metal ions, demonstrating both length and sequence selectivity. 6 Gong et al. have described oligomers that form duplexes due to H-bonding interactions between amide groups located in the backbone. It was possible to control the recognition properties of these systems by changing both the sequence and the geometrical spacing of H-bond donors and acceptors along the chain. 7 Yashima et al. have demonstrated sequence-selective duplex formation between oligomers equipped with carboxylate and amidinium recognition units that form salt bridges. 8 We have been investigating a range of different duplexforming oligomer systems based on the blueprint in Figure 1. 11 Figure 1. Blueprint for assembly of a polymer that forms a duplex with sequence selectivity based on a two-letter recognition alphabet. The key design components are the covalent chemistry used for synthesis (red), the noncovalent chemistry used for recognition (blue), and the backbone linker that determines the geometric complementarity of the two chains (black). The most promising system that we have characterized to date is shown in Figure 2. Strong H-bonding interactions between the phenol and pyridine N-oxide recognition units give rise to stable duplexes in toluene solution. For duplexes formed between homo-oligomers, the stability increases by an order of magnitude for every additional recognition unit in the chain, which is indicative of cooperative H-bond formation along the duplex. The X-ray crystal structure of the duplex formed by the self-complementary AD 2-mer is shown in Figure 2b. 11e The recognition units are too far apart in the duplex for the longrange secondary electrostatic interactions that are observed in other H-bonded arrays to be important in this system. 12 The solution phase self-assembly properties of the AD 2-mer also show that there is no intramolecular 1,2-folding between adjacent H-bond donors and acceptors in the monomeric free state. This system is therefore ideally suited for a more detailed investigation of the selectivity of duplex formation for longer mixed sequence oligomers.
The simplest systems for which the sequence selectivity of duplex formation can be studied are the mixed sequence 3mers. Figure 3 shows the structures of all possible 3-mer sequences of the system shown in Figure 2. In this paper, we describe the synthesis of these eight compounds and measurement of the pairwise binding affinities in toluene. The results allow quantification of the fidelity of the single H-bond recognition system and provide insights into competing processes that could be targeted to improve the sequence selectivity of duplex formation.

Journal of the American Chemical Society
Article as the triisopropylsilyl ether, and the protecting groups were removed using tetra-n-butylammonium fluoride (TBAF) in the final step of the synthesis. In some cases, the acetal protecting group on the end of the 3-mer was removed during workup, so these compounds were isolated as the aldehyde as indicated in Scheme 1. The H-bond acceptor properties of aldehydes and acetals are both poor compared with pyridine N-oxide, so the presence of different terminal groups should not significantly affect the assembly properties of the oligomers.
The use of amino−aldehyde monomer units confers directionality on the oligomer backbone, so we will describe the sequence of recognition units in the direction of synthesis, starting from the amino-terminal end (the nitrobenzyl group) to the aldehyde-terminal end (acetal or aldehyde group). For example, the oligomers described as ADD and DDA in Scheme 1 differ in the orientation of the backbone with respect to the sequence of the recognition units.
NMR Titrations. Interactions between all pairwise combinations of the 3-mer sequences were investigated by 1 H NMR titration and dilution experiments in toluene-d 8 . The titration data all fit well to 1:1 binding isotherms, and the dilution data fit well to dimerization isotherms. The resulting association constants are reported in Table 1. The stabilities of the complexes span 3 orders of magnitude. The most stable complex is the sequence-complementary AAA·DDD duplex, which has an association constant of 10 5 M −1 , but some of the other sequence-complementary complexes are significantly less stable.
For each sequence-complementary combination of recognition units, up to four different duplexes are possible due to the directionality of the backbone. For example, Figure 4a shows the structures of four different duplexes that have the same arrangement of H-bonded recognition units but different backbone directions. Using the N-to-C terminal description of sequence, these four duplexes are designated DDA·AAD, DDA· DAA, ADD·AAD, and ADD·DAA. In these systems, the structure of the duplex is dictated by the sequence of the recognition units, so it should be possible to distinguish parallel (DDA·AAD and ADD·DAA) and antiparallel (DDA·DAA and ADD·AAD) arrangements of the backbone. If the sequence of recognition units is symmetric, then it is possible for parallel and antiparallel duplexes to coexist in equilibrium. For example, for the AAA·DDD duplex both parallel and antiparallel directions of the backbone are compatible with the arrangement of the recognition units ( Figure 4b). To simplify the discussion, we will start by considering only the arrangement of the recognition units, but we will return to the directionality of the backbone later.
Single-Site Mismatch Analysis. Figure 5 shows an analysis of the data in Table 1 comparing the stabilities of duplexes formed by complementary sequences of recognition units with the stabilities of the corresponding duplexes containing a single mismatch. Where different arrangements of the backbone are possible, the results for all backbone arrangements are plotted side by side in the same bar of the chart. For example, the association constants for the four duplexes illustrated in Figure 4a are shown as four different values that make up the first sequence-complementary entry in Figure 5c. The data in Table 1 can therefore be analyzed in terms of three sequence-complementary trimer duplexes: the homo-oligomer duplex, AAA·DDD, the alternating oligomer duplex, ADA·DAD, and the four duplexes shown in Figure 4a. If a single recognition unit is modified in a sequencecomplementary 3-mer duplex, then a total of three A → D and three D → A mutations are possible. For symmetric sequences of recognition units, some of the mutations are equivalent, and in these cases, the data appear twice in Figure 5. Figure 5a shows that AAA·DDD is the most stable duplex and that mutation of any of the recognition units leads to a decrease in stability of an order of magnitude. The stability of

Journal of the American Chemical Society
Article the ADA·DAD duplex is 4 times lower than that of AAA·DDD ( Figure 5b). Again the sequence-complementary duplex is the most stable complex for this system, but some of the mismatch sequences are surprisingly stable. For example, the mismatched duplex involving DDD is only 2-fold lower in stability than the sequence-matched duplex. Figure 5c shows that for the third type of duplex the stabilities of the sequence-complementary complexes span an order of magnitude, and they are 10−100 times less stable than AAA·DDD. Moreover, the sequencematched duplexes are not the most stable complexes in Figure  5c, and the mismatched duplexes involving DDD are more stable.
Stabilization of D-Rich Complexes. Closer examination of Figure 5 reveals some interesting patterns. In general, the A → D mutations give complexes that are more stable than the D → A mutations. For example in Figure 5a, the D → A mutants have stabilities of (2−8) × 10 3 M −1 , whereas the A → D mutants have stabilities of (1−2) × 10 4 M −1 . These values can be compared with the stabilities of the corresponding 2-mer duplexes, AA·DD and AD·AD, where only two H-bonds are made ((2−5) × 10 3 M −1 ). The association constants for formation of the 2-mer duplexes are comparable to the values for the D → A mismatch 3-mer complexes and significantly lower than the values for the A → D mismatch 3-mer complexes, suggesting that A → D mutations introduce additional stabilizing interactions. There is a fundamental difference between the D → A and A → D mutations: phenol has one H-bond donor site and so can only interact with one H-bond acceptor; in contrast, pyridine N-oxide can accept more than one H-bond from multiple donors. Thus, a D → A mutation removes all possibility of forming a H-bonding interaction, because there are no additional H-bond donor sites in the oligomers that could interact with the new mismatch pyridine N-oxide acceptor. However, when an A → D mutation is made, the two unpaired phenols that do not have complementary pyridine N-oxide partners to interact with can form additional interactions with pyridine N-oxides that are already H-bonded to complementary phenols.
Molecular mechanics calculations on the structures of the duplexes support this hypothesis. Figure 6 shows the lowest energy conformations of three different duplexes: AAD·DDA, ADA·DDD, and AAA·DAD. The sequence-complementary 3mers form a duplex with three H-bonds as expected (Figure   13 The backbone is shown in gray, the H-bond donor recognition units are in blue, and the H-bond acceptor units are in red. The terminal groups were simplified to methyl and phenyl and are shown as lines for clarity.

Journal of the American Chemical Society
Article 6a). In the mismatch duplex that has an excess of H-bond donor recognition units (Figure 6b), one of the pyridine Noxide acceptors is H-bonded to one phenol donor, but the other pyridine N-oxide is H-bonded to two phenol donors (this structure also shows an additional phenol−phenol interaction). In the mismatch duplex that has an excess of H-bond acceptor recognition units (Figure 6c), two intermolecular H-bonds are formed as expected, and the unsatisfied acceptor units dangle freely from the side of the duplex.
The thermodynamic consequences of doubly H-bonded acceptor units can be tested directly by measuring the interaction of a simple pyridine N-oxide monomer with the DD 2-mer. 1 H NMR titrations of p-cresol (D) or DD into 4methylpyridine N-oxide (A) were carried out in toluene-d 8 .
Performing the titrations in this way ensures that the concentration of A is too low for the 2:1 A 2 ·DD complex to be formed. The titration data fit well to 1:1 binding isotherms in both cases, and the resulting association constants were 3.3 ± 0.8 × 10 2 M −1 for the A·D complex and 1.7 ± 0.2 × 10 3 M −1 for the A·DD complex. The larger association constant observed for the compound with two H-bond donor sites suggests that interactions of the type illustrated in Figure 6b stabilize 3-mer duplexes with A → D mutations.
The observed equilibrium constant for the formation of the 1:1 A·DD complex is given by eq 1.
where K 1 and K 2 are the stepwise equilibrium constants illustrated in Figure 7.
Rearranging eq 1 gives eq 2, which allows estimation of the value of K 2 , the equilibrium constant for formation of a second intramolecular H-bond to a H-bonded pyridine N-oxide, assuming that the value of K 1 is 2K A·D . The statistical factor of 2 accounts for the degeneracy of the singly H-bonded complex.
The value of K 2 for the system shown in Figure 7 is 3, which means that the doubly H-bonded complex represents 75% of the bound state. The presence of the second H-bond donor in the DD·A complex increases the observed association constant by a factor of 5 compared with the D·A complex, where only one H-bond can be formed. This value represents an upper limit on the increase in association constant that is expected due to formation of a second intramolecular H-bond to a bound pyridine N-oxide in the 3-mer mismatch duplexes, because the overall geometry of the duplex is likely to restrict the possible arrangements of the recognition units. However, stabilization by a factor of 5 is consistent with the higher association constants observed for D-rich mismatch complexes in Figure 5. Intramolecular Folding. The analysis above indicates the complexes with D → A mutations are not perturbed by additional interactions involving unsatisfied recognition units. However, there is some variation in the stabilities of the complexes with D → A mutations. In both Figure 5a and b, making D → A mutations at the chain ends leads to complexes that are significantly less stable than mutating the recognition unit in the center. The common feature of the less stable complexes is that they contain oligomers that have a H-bond donor at one end of the chain and a H-bond acceptor at the other. Such sequences could fold via intramolecular H-bonding interactions between the terminal recognition units, and folding would compete with duplex formation. This observation would also account for the exceptionally low stability of the sequencecomplementary duplexes in Figure 5c, because for these systems, both oligomers can fold in the unbound state ( Figure  8).
The potential of the oligomers to fold was investigated using molecular mechanics calculations. Figure 9 shows an overlay of the lowest energy conformation found for each of the oligomers AAD, ADD, DAA, and DDA. In all four cases, there is an intramolecular H-bond between the terminal recognition units, and the backbones adopt very similar conformations in order to achieve this interaction. Thus, there appears to be a welldefined conformation of the backbone that places the recognition units in an arrangement that allows intramolecular H-bonding in a 1,3-folded state.
It is possible to estimate the extent of folding experimentally by comparing the stabilities of complexes involving oligomers Figure 7. Pyridine N-oxide can accept two H-bonds, leading to enhanced stability in complexes with an excess of H-bond donors. The stepwise equilibrium constants for formation of the doubly H-bonded complex between DD and A are K 1 and K 2 . The global minimum conformation of the 1:1 complex obtained from a molecular mechanics conformational search is shown (right). 13 The backbone is shown in gray, the H-bond donor recognition units are in blue, and the H-bond acceptor unit is in red.

Journal of the American Chemical Society
Article that can and cannot fold. We have shown previously that 1,2folding between neighboring recognition units does not compete with duplex formation in AD 2-mers, and so we assume that none of the 3-mers discussed here suffer from 1,2folding. For the AAA, DDD, ADA, and DAD sequences, intramolecular 1,3-folding is not possible, so folding equilibria can only compete for duplex formation for complexes involving AAD, ADD, DDA, and DAA. Complexes with A → D mutations are complicated by additional H-bonding interactions as discussed above, so we will consider only complexes with D → A mutations. A direct comparison can be made between AAA·DAD, where 1,3-folding is not possible, and AAA·DDA, AAA·ADD, ADA·DAA, and ADA·AAD, where one of the two binding partners can form an intramolecular H-bond between the terminal recognition units. The four duplexes that compete with 1,3-folding equilibria have very similar association constants (1.6 × 10 3 , 1.7 × 10 3 , 2.6 × 10 3 , and 1.0 × 10 3 M −1 ), and these values are on average 5 times lower than the association constant for AAA·DAD (8.4 × 10 3 M −1 ), where there are no competing folding equilibria.
For duplexes where one of the two binding partners folds, the observed association constant, K obs , is given by eq 3.
where K fold is the equilibrium constant for 1,3-folding, and K duplex is the association constant for formation of the duplex from the unfolded state (Figure 8).
Equation 3 can be rearranged to estimate the value of K fold using the association constants measured for complexes that compete with one intramolecular folding equilibrium (K obs ) and complexes that do not (K duplex ) (eq 4).
The analysis of the D → A mismatch complexes described above therefore indicates that K fold for 1,3-folding is approximately 4 for oligomers with complementary terminal recognition units. The folded state therefore represents 80% of the monomeric unbound state for these oligomers, with 20% in the unfolded state. For complexes where both binding partners fold, the observed association constant for duplex formation is given by eq 5. 5) Using K fold = 4 in eq 5 suggests that for sequencecomplementary duplexes where both binding partners can form intramolecular H-bonds between the terminal recognition units, the stability of the duplex will be reduced by a factor of 25 compared with sequences where 1,3-folding is not possible. This estimate accounts rather well for the results shown in Figure 5c: the association constants for formation of the four sequence-complementary duplexes are between 10 and 100 times lower than the association constant for formation of the AAA·DDD duplex. It should be noted that although 1,3-folding competes with duplex formation in these systems, folding does not abolish duplex formation. For example, in a 10 mM sample of a 1:1 mixture of AAD and ADD, the population of duplex is 10 times greater than the population of the 1,3-folded state. Backbone Arrangement in Duplexes. Although the duplexes illustrated in Figure 4a have the same arrangement of recognition units, the measured association constants span an order of magnitude (see the first sequence-complementary bar in Figure 5c). The major difference between the structures of these duplexes is the parallel and antiparallel arrangement of the backbone. The association constants for formation of the two antiparallel duplexes (4.2 × 10 3 and 8.0 × 10 3 M −1 ) are higher than for formation of the two parallel duplexes (1.0 × 10 3 and 2.2 × 10 3 M −1 ). These results suggest that on average the antiparallel arrangement of the backbone is preferred by a factor of 4. For systems where the arrangement of the backbone is not dictated by the sequence of the recognition units, a similar preference is expected, i.e., a 20% population with the backbone in a parallel arrangement in equilibrium with 80% in an antiparallel arrangement.
Sequence Selectivity in Mixtures. In order to assess the sequence selectivity of duplex formation in this system, it is important to define what is meant by selectivity. The selectivity of a recognition event depends on what the competition is. For example, consider formation of the ADD·AAD duplex. If ADD competes with DDD for duplex formation with AAD, then DDD will win because, as illustrated in Figure 5c, the AAD· DDD mismatch complex is more stable than the sequencecomplementary duplex. However, if this competition is repeated in the presence of AAA, then the two sequencecomplementary duplexes AAA·DDD and ADD·AAD will be formed, because the AAA·DDD duplex is much more stable than any other complex in this system. The association constants in Table 1 can be used to estimate the speciation of complexes in mixtures of the 3-mers. Figure  10a illustrates the populations of all possible duplexes calculated for an equimolar mixture of all eight 3-mers at a concentration at which all of the compounds are fully bound (>95% bound at 100 mM). 14 There are six complexes for which association constants are not reported in Table 1. However, the titration experiments suggest that these are all weak binding systems, and assigning association constants in the range 10 2 − 10 3 M −1 to these complexes has no significant effect on the speciation of the other complexes or the appearance of Figure  10a. The sequences in Figure 10a are organized so that all of the antiparallel sequence-complementary duplexes lie on the diagonal of this plot, and it is clear that these duplexes are the most populated complexes (blue regions). At first sight, this result is counterintuitive, because, as illustrated in Figure 5, some of the mismatch duplexes are more stable than the

Journal of the American Chemical Society
Article corresponding sequence-complementary duplexes. However, in a system where all sequences compete for optimal binding partners, the effects that are apparent in the mismatch analysis are damped, and the result is that sequence-complementary duplexes dominate.
The noncomplementary duplexes that compete most effectively with sequence-complementary duplex formation are ADA·DDA and DAD·DAA. These two duplexes correspond to the four off-diagonal peaks in Figure 10a (each duplex appears twice due to symmetry). The populations of the corresponding sequence-complementary duplexes (ADA·DAD and DDA·DAA, which each appear twice on the diagonal in Figure 10) are somewhat reduced by population of the two mismatch duplexes. The two mismatch complexes are both less stable than the sequence-complementary ADA·DAD duplex, but they are both slightly more stable than the sequencecomplementary DDA·DAA duplex. The appearance of mismatch duplexes is therefore the result of competition between all of the different possible complexes in the system. The off-diagonal mismatch peaks in Figure 10a suggest that longer mixed sequence oligomers are unlikely to exhibit highfidelity sequence recognition. However, the identity of the mismatch duplexes provides a clue as to how this fidelity might be improved. Although the ADA·DAD duplex is relatively stable, these two 3-mer sequences participate in the most significant mismatch duplexes. The speciation in a mixture of the other six 3-mers that does not contain ADA or DAD is illustrated in Figure 10b. In this case, the sequence selectivity is excellent, with a high degree of discrimination between matched and mismatched duplexes. This result provides an important strategy for enhancing the fidelity of sequence recognition in longer oligomers. If alternating donor−acceptor sequences are avoided, then mismatch duplexes will be suppressed.

■ CONCLUSIONS
Selective recognition between oligomers programmed with information encoded in the form of a sequence of recognition sites is the basis for the unique chemical properties of nucleic acids. We describe a synthetic oligomer system that recapitulates some of these properties. Oligomers equipped with a sequence of phenols (H-bond donors, D) and pyridine N-oxides (H-bond acceptors, A) show sequence-selective duplex formation due to H-bonding interactions between complementary recognition sites. This paper describes the synthesis of all eight 3-mer sequences and measurement of the pairwise binding affinities of the oligomers in toluene. The stabilities of the complexes vary by 3 orders of magnitude depending on sequence complementarity. There are three factors that govern the overall stabilities of the complexes in addition to the number of complementary H-bonding interactions.
1. Backbone Orientation. For the oligomer sequences AAD, DAA, ADD, and DDA, it is possible to characterize the relative stabilities of duplexes that have parallel and antiparallel backbones, because the orientation of the backbones is dictated by the sequence of recognition units. These systems show that the antiparallel arrangement of the backbones is more stable than the parallel arrangement by a factor of 4. The other duplexes presumably exist as a 80:20 mixture of the two backbone arrangements.
2. Doubly H-Bonded Acceptors. A single site mismatch analysis reveals that an A → D mutation leads to unexpectedly stable complexes, because the pyridine N-oxide recognition units can accept a second H-bond from an unpaired phenol recognition unit. These additional H-bonding interactions can stabilize D-rich mismatch complexes by up to a factor of 5.
3. 1,3-Folding. We have previously shown that 1,2-folding of adjacent complementary recognition sites does not take place in this system. However, for 3-mers that have a H-bond donor and acceptor at each end of the oligomer, 1,3-folding is significant in the monomeric free state. Folding equilibria compete with duplex formation and reduce the stability of the corresponding duplex by a factor of 5.
The latter two factors conspire to make the stabilities of some of the mismatch complexes greater than the stabilities of some of the sequence-complementary duplexes. However, the measured association constants show that in a mixture of all eight 3-mer sequences the sequence-complementary duplexes are the predominant species present in solution. The most problematic sequence from the point of view of mismatched duplex formation is DDD, which competes effectively with the fully matched sequence in a number of cases. However, DDD has a much higher affinity for AAA than for any other sequence, and so, in the presence of one equivalent of AAA, DDD will not form a mismatched duplex with other sequences. Thus, the fidelity of the recognition system in a complex mixture is higher

Journal of the American Chemical Society
Article than might be expected by comparing the stabilities of individual duplexes. Moreover, if alternating donor−acceptor sequences are avoided, it is possible to show that competition from mismatched duplexes can be almost completely eliminated. It should therefore be possible to extend these studies to longer oligomers to obtain high-fidelity sequence recognition.
This issue of doubly H-bonded acceptors can be addressed in a straightforward manner by replacing the pyridine N-oxide recognition units with pyridines, which can only form a single H-bond with phenol. We have shown previously that although the pyridine−phenol H-bond is weaker than the pyridine Noxide−phenol interaction, the increased conformational restriction imposed by the oriented pyridine nitrogen lone pair compensates to yield stable duplexes. 11c The issue of folding equilibria is more difficult. Folding will always compete with duplex formation in synthetic information molecules of this type, because the oligomers carry mutually complementary recognition units. However, the properties of nucleic acids show that, for long oligomers, sequence complementarity can be used to ensure that duplex formation predominates or that the absence of a complementary partner can be used to ensure that intramolecular folding predominates. The same should be true of the systems described here, and this duality of behavior offers interesting avenues for future research. There are some differences between the synthetic H-bonded duplexes and nucleic acid duplexes that may lead to differences in behavior. In nucleic acids, formation of the first base pair is thermodynamically unfavorable, which leads to a nucleation and growth mechanism of duplex assembly, whereas formation of the first base pair in the synthetic duplexes is thermodynamically favorable. Nucleic acid duplexes form compact structures that promote cooperativity and selectivity, whereas the synthetic duplexes are less organized. However, the welldefined assembly properties of nucleic acids are not apparent for short oligomers and only emerge for sequences several bases long. Work on longer synthetic oligomers will reveal whether more organized structures emerge for larger molecules, how the fidelity of sequence recognition is affected, and the impact on the kinetics of strand exchange.