Nearest neighbor rules for RNA helix folding thermodynamics: improved end effects

Abstract Nearest neighbor parameters for estimating the folding stability of RNA secondary structures are in widespread use. For helices, current parameters penalize terminal AU base pairs relative to terminal GC base pairs. We curated an expanded database of helix stabilities determined by optical melting experiments. Analysis of the updated database shows that terminal penalties depend on the sequence identity of the adjacent penultimate base pair. New nearest neighbor parameters that include this additional sequence dependence accurately predict the measured values of 271 helices in an updated database with a correlation coefficient of 0.982. This refined understanding of helix ends facilitates fitting terms for base pair stacks with GU pairs. Prior parameter sets treated 5′GGUC3′ paired to 3′CUGG5′ separately from other 5′GU3′/3′UG5′ stacks. The improved understanding of helix end stability, however, makes the separate treatment unnecessary. Introduction of the additional terms was tested with three optical melting experiments. The average absolute difference between measured and predicted free energy changes at 37°C for these three duplexes containing terminal adjacent AU and GU pairs improved from 1.38 to 0.27 kcal/mol. This confirms the need for the additional sequence dependence in the model.


INTRODUCTION
Over 80% of the human genome is transcribed into RNA, but <3% of the RNA codes for proteins (1,2). Functions for most RNA in the biosphere are still being discovered but already include catalysis (3), control of transcription, translation and expression (4)(5)(6), templating for synthesis of DNA (7) and RNA (8), recognition of sites for modification and editing (9)(10)(11) and sometimes combining such functions (12). RNA is the genomic material for many viruses, including human pathogens such as SARS and SARS-CoV-2, influenza, HIV, Ebola and Hepatitis C. RNA can also be the basis for vaccines against some of these viruses. For example, mRNA vaccines are effective against SARS-CoV-2 infections (13).
RNA sequence determines the base pairing and 3D structure as well as function of the RNA. Prediction of secondary structure, i.e. the canonical set of Watson-Crick-Franklin (WCF) and GU base pairs, from sequence is a first step in predicting 3D structure (14) and in finding RNAs with common structures and functions (15,16). Some RNA, such as riboswitches, have more than one structure, and the ability to change structure is critical to function (5).
Secondary structure can be predicted from one or more sequences by minimizing free energy change for folding, G • , often augmented with information from chemical mapping and/or sequence comparison. Usually, about half the nucleotides in structured non-coding RNA are canonically paired (17,18). GU pairs play important roles in RNA structure and function as sites for binding metal ions (19,20), therapeutics (21), proteins or metabolites (22).
A database of thermodynamic measurements for helices with canonical pairs and model non-canonical motifs forms the foundation for folding free energy predictions of RNA structure. These data are then fit to a nearest neighbor (NN) model to estimate parameters that can be used to predict folding stabilities of any RNA secondary structure (23).
Hallmarks of the NN model are that each stability increment depends on local sequence and that total stability is the sum of the increments.
The model and parameters for approximating stabilities of WCF base-paired helixes have not changed substantially since 1998 (24,25). Individual parameters for nearest neighbors containing at least one GU pair, however, were revised on the basis of new data (25). In that revision, the penalty of 0.45 kcal/mol applied to terminal AU pairs and previously assumed for terminal GU pairs (24), was found unnecessary for GU pairs. Expansion of the database for duplexes, particularly those with terminal GU pairs (26) and the data presented here, make possible more extensive considerations of terminal effects on base pair stability. In particular, the data allow expansion of the model to include six new parameters specific for terminal nearest neighbors, i.e. sequence-specific terms for the ends of helices that account for the last and penultimate base pairs. Surprisingly, stabilities of terminal GU and AU pairs depend on whether the neighboring pair is a GC, AU or GU pair. Incorporating this effect in the NN model also produces significant revision of parameters for internal 5 GU 3 UG and 5 AG 3 UU nearest neighbors, where stacks are shown for a top strand in the 5 to 3 direction pairing to a bottom strand in the opposite direction. With these changes, 5 GGUC 3 CUGG fits the NN model rather than being an outlier as considered previously (27).
Thus, the new model presented here will be especially important for predicting structures containing GU pairs. It is not surprising that GU pairs are more idiosyncratic than WCF pairs. Guanine has more hydrogen bonding groups and a larger dipole moment than other bases (28). Base stacking and hydrogen bonding that stabilize GU pairs can vary depending on local context, including position in a helix. Base stacking depends on interactions with both bases of a nearest neighbor. GU pairs can adopt different hydrogen bonded configurations and stacking interactions ( Figure 1) (22). In Figure 1A, the terminal GU pair in the foreground is in a 5 UG 3 GU nearest neighbor and has a single hydrogen bond while the penultimate UG pair has two hydrogen bonds. The conformation of the terminal GU pair may be influenced by solvent interactions or crystal contacts through stacking interactions with the terminal GU pair of an adjacent molecule. In Figure 1B Figure 1C), each GU pair has a single bifurcated hydrogen bond, and there is no cross-strand stacking. Reported thermodynamic stabilities for the internal NN GU stacks in Figure 1B  terminal consecutive GU pairs, i.e. a GU end on a GU pair, like those shown in Figure 1A. In contrast, previous models did not add favorable folding free energy for this sequence motif. In many NMR structures with terminal GU and AU pairs, these pairs show more dynamic behavior relative to internal pairs. This is observed as broad imino proton resonances and fewer NOE restraints (26,(29)(30)(31)(32)(33). Thus, sequence orientation, stacking interactions, hydrogen bonding and nucleotide dynamics are important factors in the structure and stabilities of GU pairs. They are now more accurately accounted for in the new thermodynamic parameter set. This should improve predictions of secondary structure from sequence.

Optical melting experiment database
For this analysis, optical melting experiments were compiled through an extensive literature review (25,26,29,. Enumeration of all melting experiments included in Nucleic Acids Research, 2022, Vol. 50, No. 9 5253 this analysis is available in Supplementary Table S1 and in the spreadsheet provided in the supplementary materials. Experiments are included if only unmodified nucleotides are present and buffer has 1 M Na + with pH between 6.5 and 7.5. Additionally, duplexes that were reported by the original authors to have non-2-state unfolding transitions were excluded from this analysis (43). A total of 223 experiments were included in the fits, with 125 for Watson-Crick-Franklin pair parameters and 98 for GU pair parameters.
Most melting experiments for generating NN parameters have used 1 M Na + . This was initially chosen to assure formation of duplexes rather than hairpins, as also seen for deoxy A-T oligonucleotides (63) and to allow measurements of the concentration dependence of duplexes that could only be synthesized with many AU pairs (64,65). The high melting temperatures dependent on 1 M NaCl became more important when comparisons with calorimetry revealed that interpretation of optical melting improved if sloping upper and lower baselines were considered (66).
The most important result from thermodynamic studies is the relative sequence dependence of nearest neighbor stability. This is expected not to depend on salt conditions because there is no site binding of Na + to fully base paired RNA (67)(68)(69)(70)(71)(72). Local concentrations of mobile cations around large folded RNAs, however, depend on the local charge density of phosphate groups. Manning developed a first order cation condensation model that predicts local concentrations of cations around RNA do not depend on bulk concentrations (73). For A-form double helical and single strand RNA, respectively, the local 'ion atmosphere' in the absence of multiple charged cations is predicted to have 1.7 M and 0.4 M of M + ions (74). They respectively neutralize 0.8 and 0.6 of the phosphate charge. In the absence of M + cations, M 2+ cations are predicted to neutralize 0.9 and 0.8, respectively, of backbone charge. More detailed computations and experiments agree qualitatively with expectations from Manning theory (29,52,58,67,68,70,71,(75)(76)(77). GU pairs can be the sites of metal ion binding (20,58,78,79), but optical melting experiments of duplexes with consecutive GU pairs did not find differences in stabilities in 1 M Na + and in 150 mM K + with 10 mM Mg 2+ (26). Together these suggest that, while salt conditions vary between and within cells (80)(81)(82)(83)(84), they are unlikely to affect dramatically the relative stabilities of NN canonical pairs.

Feature correlations
Feature correlations were calculated for each model using the R statistical programming language. The resulting correlation matrices were then visualized with the R corrplot library, available at https://github.com/taiyun/corrplot.

Fitting linear models
Parameter models were fit using measured G • 37 and H • values for each optical melting experiment. For the fit of WCF stacking parameters, the theoretical contribution of RT ln (2) due to 2-fold symmetry of self-complementary duplexes, was subtracted from the experimentally measured duplex G • 37 (24,85). For the fit of GU stacking parameters, the contributions due to sequence symmetry and the WCF stacks from each duplex with any GU base pairs were subtracted from measurements. The calculated G • 37 and H • are then used to fit linear models in the R statistical programming language using the base function lm. S • values for the nearest neighbor parameters are calculated from the G • 37 and H • values. To estimate uncertainty in NN parameter values, a covariation analysis was used to account for the dependencies (due to the nested nature of the regressions) and correlation (due to a base pair appearing in up to two neighboring stacks) between parameters (86,87). To perform covariation analysis, the optical melting data were resampled within experimental error ( H • = 12% H • and S • = 13.5% S • (24)). The resampling was performed with the mvrnorm function from the R MASS library (88), which preserves the observed correlation between H • and S • ( = 0.9996 (24)). The updated experimental values are then used to recalculate multiple sets of model parameters. The sets of model parameters are then used to calculate average values for each parameter as well as covariation (86,87). The standard errors of regression, which neglect the correlations and the effect of nested regressions, for the NN parameters can be found in Supplementary Table S2.

Leave-one-out analysis
To assess the impact of any one experimental value on the fit models, models were fit in which each experimental value was individually excluded from the fitting data. The root mean square deviations (RMSDs) in parameter values were calculated from the model fit to the full data set to measure the impact of excluding each individual experimental value.

Optical melting experiments to test the revised model
Optical melting experiments were conducted on three additional duplexes, (5 UGUCGAUA) 2 , (5 AUAGCUGU) 2 and (5 AUUCGAGU) 2 , following standard protocols described in (89). Oligonucleotides were purchased from Integrated DNA Technologies including purification with standard desalting procedures and assessment of purity by mass spectrometry. Oligonucleotides were dissolved in milliQ water, and the absorbance at 260 nm at 80 • C was measured. The appropriate amount of oligonucleotide was dried in a speed vac and resuspended in standard melting buffer of 1 M NaCl, 20 mM sodium cacodylate, pH 7, and 0.5 mM Na 2 EDTA. Optical melting experiments were conducted in a Beckman DU800 UV-Vis spectrometer with a custom sample holder and cuvettes at 0.1 cm and 1.0 cm path lengths. Absorbance vs. temperature was measured at 280 nm. Data was analyzed with Meltwin software (52).

Stacking term counts
An archive of RNA sequences of known secondary structure (18,90) was analyzed to count the number of occurrences of each NN stacking parameter. A Python script was used to parse each structure into individual helices and then to parse each helix into component NN stacking and helix end parameters.

AU end parameters depend on penultimate pair
Prior work demonstrated that multiple GU terminal base pairs impact the stability of helical duplexes (26), and this motivated a reexamination of the treatment of helix ends. New terms to account for the end of a helix were introduced into the NN model. This was done by including a parameter for an AU terminal pair on an AU penultimate pair (not accounting for orientation of the two pairs and therefore applying to 5  Comparisons between the updated model and those used in the 1998 and 2004 NN models are in Table 1A.  Figure S1). The correlations between the model feature frequencies are modest and are mostly limited to expected correlations between stacks that can extend on each other (Supplemental Figure S2). The predicted folding G • 37 were within 0.5 kcal/mol of the measured value for 86.4% of the experiments (Supplemental Figure S3). Predicted H • are within 5 kcal/mol of the measured value for 76% of the experiments (Supplemental Figure S3).
The impact of each optical melting experiment was determined by fitting the NN parameters on a data set that excluded that individual experiment and comparing the resulting parameter values to those fit on the full data set. The root mean squared deviations (RMSDs) in G • 37 and H • parameter values for these leave-one-out (LOO) data sets can be seen in Supplemental Figure S4

GU stacking parameters
A similar model for terminal AU and GU stacks was used when fitting duplexes with GU base pairs. The model re- Experimental ΔG° (kcal/mol) Predicted ΔG° (kcal/mol) Figure 3. Correlation between predicted and observed G • 37 for duplexes with WCF and GU pairs. G • 37 values predicted from parameters in Table  1 (Figure 3 and Supplemental Figure S5) Figure 3) and H • (R 2 = 0.7659, Supplemental Figure S6). Correlations between model feature frequencies are mostly limited to expected correlations between the GU on GU end feature and the three possible stacks that can form that end (Supplemental Figure S7). For folding G • 37 , 53.1% of experiments had predicted values within 0.5 kcal/mol of the measured value (85.7% were within 1 kcal/mol) (Supplemental Figure S8). For H • , 57.1% of experiments had predictions within 5 kcal/mol of the measured value (79.6% were within 10 kcal/mol) (Supplemental Figure S8).
RMSDs in G • 37 and H • parameter values for the LOO data sets can be seen in Supplemental Figure S9 Uncertainty in parameter values for the updated NN model presented in Tables 1A and 1B were determined from a covariation analysis, which randomly perturbed the experimental values within experimental uncertainty and calculated the covariance matrix from observed changes in parameter values. We previously found that this approach is important for estimating the uncertainties of folding free energies and enthalpies because the correlated nature of the NN parameters and the use of sequential regressions break the assumptions used in the calculations of standard errors of regression (87). Covariances between parameters are presented in Supplemental Figures S10 and S11. Covariances were generally small, with the strongest interactions between the intermolecular initiation parameter and parameters for individual stacks. There are also weaker interactions between GU end terms and equivalent internal GU stacks that can form that end term. For example, the GU on GU end parameter value is negatively correlated with the values for internal 5 GU 3 UG , 5 UG 3 GU , and 5 GG 3 UU stacks.

DISCUSSION
The database of thermodynamic parameters forms the foundation for predictions of RNA structure and function in many widely used software suites (14,(91)(92)(93)(94)(95)(96)(97)(98). These RNA structure prediction programs enable design of mRNA vaccine sequences (13,99,100), analysis of metaproperties of transcriptomic changes in response to stress (101-103), determination of effects of nucleotide modifications on folding stability (104)(105)(106), discovery of accessible regions to target with antisense DNA or siRNA (107)(108)(109)(110), and rational design of small molecules targeting RNA (111)(112)(113)(114). Curation and improvement of the RNA thermodynamic database facilitates hypothesis-driven RNA research in many fields and has significant impact on the RNA community. The progress reported here expands, compiles, and presents the thermodynamic NN parameters for WCF and GU pairs. Statistical significance of the new parameters is robust. Inclusion of helix-end effects for AU and GU pairs improves predictions of helices with these common motifs and resolves previously poorly understood terms for 'special cases' of motifs containing GU pairs. Supplemental Figure S12 shows the improvement in the residuals for the new model compared to the previous.
To illustrate a NN calculation to estimate helix stability, Figure 4 provides two example calculations. The first is the sequence (5 UGUCGAUA) 2 , with experimental stability provided in Table 2. The second is 5 UAGGUCAG paired with 5 CUGGUCUA. This calculation illustrates that the 5 GGUC 3 CUGG motif, an outlier in prior nearest neighbor models, is now handled with nearest neighbor stacks. An Excel spreadsheet is provided with the Supplementary Materials to calculate user-inputted helical NN stabilities. This work presents the next advance in development of a robust NN model for predicting RNA duplex stabilities. The NN model for G • of RNA helixes composed of canonical pairs uses stacks of adjacent base pairs (115)(116)(117). This assumes that the total G • and temperature dependence, H • , for helix formation can be approximated by summing G • s and H • s assigned to nearest neighbors of canonical pairs. The experimental foundation for this approach was laid by Uhlenbeck and Martin in the Doty lab when they used optical melting to measure thermodynamics of duplex formation (64,65). Uhlenbeck and the Tinoco lab used biochemical methods to expand the database of sequences. Because WCF base pairing depends on strong, local hydrogen bonding and stacking interactions, a NN model developed for polynucleotides was tested and found to fit the database (118). This suggested that the NN model would allow predictions for unmeasured sequences (116). In collaboration with related efforts in the Crothers lab, this led to original rules for predicting the thermodynamics of RNA folding (115). Predicted G • 37 are calculated from the previous models described in (25) and (24)  Subsequent insights and research have continually improved success of the NN method. A rotational symmetry term was added to the model to account for the difference between duplexes formed by self-or non-selfcomplementary strands (43,85,119). Application of T4 RNA ligase and development of chemical synthesis on polymer supports allowed expansion of sequences available (42,43,120,121). Particularly important was addition of duplexes not beginning with multiple AU pairs and having melting temperatures near 37 • C, human body temperature. Analysis of the number of parameters allowed by the model (122,123) led to discovery that duplexes with the same nearest neighbors can have different thermodynamics depending on the terminal base pair (24).
Applications of the method were expanded to larger RNAs by modifying dynamic programming algorithms to predict folding that optimized G • (91,124,125) rather than base pairing (92). The thermodynamic approach lends itself to modeling ensembles of structures, including calculations of base pairing probabilities and stochastic samples (98,126,127). Additional applications include predicting structures for multiple interacting strands (128), designing sequences to fold to specific structures (129,130), and integrating mapping or conservation data into structure prediction (131)(132)(133)(134). Recently, applications to even larger RNAs have become possible due to linearization of the algorithms (135,136).
NN parameters for canonical pairs have undergone substantial revisions over time, including treatment of end effects (24,25,27,43,116). GU terminal base pairs were initially assumed to be equivalent to AU terminal base pairs and were given the same penalty term (27). Expansion of the database and refitting of the model indicated that GU terminal base pairs do not require an end penalty (25). Measurements on helices with consecutive terminal GU pairs, however, revealed they are surprisingly more stable than predicted (26). Our updated NN model includes new parameters that account for end effects of both AU and GU pairs, including dependence on the penultimate pair.
Context-dependent variation of GU pair conformations ( Figure 1) provides a structural rationale for treating terminal GU pairs differently. A fundamental assumption of a NN model is that strong local interactions dominate the energetic contributions determining conformation and stability for a particular nucleotide sequence. This implies that a stack of two WCF pairs will have the same thermodynamic stability in the middle of a helix as at the end of a helix. This NN approximation is consistent with structures of WCF RNA helices and the regular periodic shape of an RNA double helix. The diversity of GU pair conformations and stabilities, however, introduces variation into WCF paired helices. The unique structures of GU pairs facilitate binding recognition and specificity for metal ions, RNA tertiary interactions, protein interactions and drug binding (22). The challenge is incorporating this functionally important and structurally diverse motif into a NN model.
Prior models attempting to combine GU and WCF pairs into one set of thermodynamic parameters always had a few unexplained exceptions. For example, the motif 5 GU 3 UG   Table 2 to have an experimentally determined G • 37 of -6.10 kcal/mol. This sequence is self-complementary and therefore the symmetry penalty is added. Panel (B) shows the stability calculation for 5 UAGGUCAG paired to 5 CUGGUCUA. This demonstrates the difference in treatment for the (GGUC) 2 motif. For both sequences, calculations are provided for the current parameters derived here and the previous parameters (24,25). The total stability is the sum of the stability increments.
had one NN parameter value with an exception for the motif 5 GGUC 3 CUGG , which had an extra bonus. Analysis of NMR structures and crystal structures of this motif, however, did not indicate a reason for this additional stability (52,53,137,138). In addition, the crystal structure of consecutive terminal GU pairs in (5 GGUGGCUGUU3 ) 2 had three slightly different helical conformations in the asymmetric unit but an overall remarkably A-form like structure that did not reveal a physical explanation for exceptional thermodynamic parameters (30). NMR studies of duplexes with consecutive terminal GU pairs usually showed broad resonances and few or weak NOEs in the final two GU pairs (26,30). Fluorescence and NMR studies have quantified different base pair dynamics in the middle and ends of helices for various types of base pairs (139)(140)(141)(142). Consistent with this, the new NN model has increments for terminal nearest neighbors to distinguish them from internal nearest neighbors. Interestingly, these increments are penalties of +0.22 and +0.44 kcal/mol at 37 • C for an AU end pair on a penultimate AU or CG pair, respectively (Table 1). A similar penalty of +0.45 kcal/mol has previously been attributed to the presence of one fewer hydrogen bond when duplexes with identical nearest neighbors have two terminal AU pairs rather than terminal GC pairs (24). In contrast, incremental bonuses of -0.31 to -0.74 kcal/mol are assigned to terminal nearest neighbors consisting of an AU and a GU pair or two GU pairs. This would be consistent with a S • bonus due to increased base pair dynamics at the ends of helices. For example, equal populations of three conformations at the end of a helix would provide a G • bonus of -RT ln (3) = -0.68 kcal/mol at 37 • C.
The NN approximation is essential for efficient dynamic programming approaches to computing the minimum G • secondary structure for an RNA sequence (91,92). In prior models, the special cases for GU pairs required additional considerations in dynamic programming algorithm computations. In current application of the nearest neighbor parameters (143), a helix end occurs not only at the 5 and 3 ends of an RNA molecule but also at every junction, internal loop, hairpin loop, and mismatch or bulge in an RNA secondary structure. GU helix end pairs occur in accepted RNA secondary structures at a rate of approximately 13 per 1000 bases in the sequences (Supplemental Figure  S13). They occur at a much higher rate in predicted structural ensembles. Thus, consideration of GU pairs and special rules for positional dependence present a frequent step in the computations. The NN parameter model presented here improves predictions for sequences and structures with terminal AU and GU pairs and will also accelerate computation of the minimum free energy structure for any sequence.
For example, several terminal AU and GU motifs occur in the secondary structure for the packaging sequence in HIV-1 RNA (21) and the motif 5 UUUU 3 GAGG binds a novel drug. Each helix in the three-way junction that binds the drug has an AU or GU pair at the end, and the new NN parameters in this work would estimate that the G • 37 for these three helices is at least 0.9 kcal/mol more stable than current predictions.
Another recent example is the SL3 helix that forms between the 5 and 3 ends of SARS-CoV-2. This helix has been identified experimentally (144) and computationally (145). One end of SL3 terminates in a 5 UG 3 GU nearest neighbor. Results in Table 1B assign a G • 37 of -0.38-0.74 = -1.12 kcal/mol to this end, which is more stable than previous predictions.
Free energy predictions from nearest neighbors for RNA secondary structures provide the base line for analysis of the stabilities of RNA interactions with drugs and proteins, and thus provide a foundational resource for RNA struc-Nucleic Acids Research, 2022, Vol. 50, No. 9 5259 ture and function studies. Our future analyses will evaluate the impact of the new NN parameters on the thermodynamic parameters for mismatches, internal loops, bulges, and helix junctions. These loop motifs form many of the recognition sites for proteins, metal ions, and therapeutics.
While the parameters in Tables 1A and 1B provide excellent predictions of the measured duplex stabilities at 37 • C, there is slightly less agreement for duplexes with WCF and GU pairs (Figures 2 and 3). This is not surprising because the database with GU pairs is smaller than that with only WCF pairs. Additionally, GU pairs are more likely to have different structures (Figure 1) (22). One example of this is (GGCGUGCC) 2 , where measured and predicted values for G • 37 are, respectively, -9.72 and -11.24 kcal/mol. On the basis of NMR and nucleobase substitution results (137), and mesoscopic modeling based on melting temperatures (146), the GU pairs in (GGCGUGCC) 2 were determined to have only one H-bond each ( Figure 1C) rather than the usual two. That may explain the overprediction of thermodynamic stability.
There are known limitations to the current parameterization of the nearest neighbor rules. These parameters depend on the two-state fits of melting data and assume that the enthalpy and entropy changes are temperature independent. It has been shown that parameters can be determined without assuming two-state melting by fitting directly to the optical melting data (absorbance as a function of temperature) (147), although much of these data are not currently available for the duplexes studied here. It is also known that enthalpy change and entropy change both depend on temperature (148)(149)(150)(151). The changes have antagonizing effects with respect to free energy change, however, so folding free energy estimates at 37 • C are probably little affected by the temperature dependencies because most strands are designed to have melting temperatures close to 37 • C (149). On the other hand, extrapolation of folding free energies to other temperatures and estimates of melting temperatures are likely to be affected by these temperature dependencies. Future work could develop new nearest neighbor parameters that do not rely on these assumptions.
In summary, the updated NN model is consistent with previous parameters for WCF pairs, includes new parameters accounting for increased base pair dynamics at ends for helices ending in AU or GU pairs, improves predictions for duplexes with terminal AU or GU pairs, and resolves a prior exceptional parameter for a specific GU motif. The model for the NN parameters has low uncertainty in G • and H • and low correlations between parameters. The statistically robust model maintains the physical basis that differences in hydrogen bonding, stacking, and nucleotide dynamics determine the sequence dependence of NN base stacks. The new thermodynamic parameters will help improve RNA structure prediction tools and facilitate discoveries in RNA biology, catalysis, and therapeutics.

SUPPLEMENTARY DATA
Supplementary Data are available at NAR Online.