Context-sensitivity of isosteric substitutions of non-Watson–Crick basepairs in recurrent RNA 3D motifs

Abstract Sequence variation in a widespread, recurrent, structured RNA 3D motif, the Sarcin/Ricin (S/R), was studied to address three related questions: First, how do the stabilities of structured RNA 3D motifs, composed of non-Watson–Crick (non-WC) basepairs, compare to WC-paired helices of similar length and sequence? Second, what are the effects on the stabilities of such motifs of isosteric and non-isosteric base substitutions in the non-WC pairs? And third, is there selection for particular base combinations in non-WC basepairs, depending on the temperature regime to which an organism adapts? A survey of large and small subunit rRNAs from organisms adapted to different temperatures revealed the presence of systematic sequence variations at many non-WC paired sites of S/R motifs. UV melting analysis and enzymatic digestion assays of oligonucleotides containing the motif suggest that more stable motifs tend to be more rigid. We further found that the base substitutions at non-Watson–Crick pairing sites can significantly affect the thermodynamic stabilities of S/R motifs and these effects are highly context specific indicating the importance of base-stacking and base-phosphate interactions on motif stability. This study highlights the significance of non-canonical base pairs and their contributions to modulating the stability and flexibility of RNA molecules.


INTRODUCTION
Compared to double stranded DNA, the interaction patterns of bases in structured RNA molecules are considerably more complex. Atomic resolution structures of large RNAs have revealed that a large fraction of interacting bases do not form Watson-Crick (WC) base pairs. In fact, the standard cis Watson − Crick/Watson-Crick (cWW) AU and GC base pairs only account for ∼70% of all base pairs in structured RNAs including the small and large ribosomal RNA subunits (1)(2)(3). The remaining one-third of all base pairs are non-WC base pairs. The non-WC base pairs occur primarily in hairpin, internal or junction loops. They serve to structure these loops and to expose functional groups, especially those on the Watson-Crick edges, to form specific binding sites for proteins, small molecule ligands, metals, or other RNAs (4).
Previous work on RNA adaptation to different environments has focused on the ratio of GC to AU Watson-Crick base pairs (5) as GC base pairs are generally more stable than AU pairs. These authors found that in ribosomal RNA (rRNA) the relationship between the proportion of GC base combinations at Watson-Crick base paired positions and the optimal growth temperatures is very weak up to ∼60 • C optimal growth temperature. Above this temperature, there is a roughly linear relationship between the percent of GC pairs and the optimal temperature, as GC base pairs generally confer more stability to Watson-Crick helices than AU pairs. The same positive correlation of GC content with optimal growth temperature was found in helices of the RNA component of the signal recognition particle among Archaea (6). In contrast, no such simple relationship was found to exist for genomic DNA; the average GC content varies widely from approximately 30% to more than 60% in bacteria irrespective of the optimal growth temperature (7). Many loop regions are important for the function and overall structure of RNA molecules. Therefore, we suspect that the sequences of loop regions may also adapt to temperature.
In our recent work, we compared the 3D structures of the rRNAs of the distantly related bacteria Escherichia coli (mesophile) and Thermus thermophilus (thermophile) by aligning the structurally conserved base-paired positions in the 5S, 16S and 23S rRNAs (1). We aligned all non-WC as well as Watson-Crick base pairs, and found that (i) the geometric families of aligned base pairs were nearly 100% conserved, and (ii) almost all base substitutions resulted in isosteric or near isosteric base pairs. This result strongly supported the hypothesis that base pair isostericity is fundamental for understanding the rules of sequence adaptation during RNA evolution. Isostericity indicates which base pairs can potentially substitute for each other preserving the 3D structure of the motif. However, nothing is known about how the isosteric base substitutions impact the stabilities of well-structured RNA 3D motifs and, if there is a selection for particular base combinations depending on optimal growth temperature of the organism. These related questions were addressed in this paper.
The energetic values or RNA base pairs have been studied for several decades. Initially, the focus was on the canonical (i.e., G-C, A-U, and G-U) base pairs and later studies were included non-canonical base pairs (8)(9)(10)(11). Particularly, thermodynamic studies of RNA internal loops have taken a 'bottom up' approach which attempts to cover the sequence space of internal loops starting systematically with small loops (1 × 1, 1 × 2, 2 × 2, 1 × 3 and 2 × 3) (12). Such an approach quickly confronts the problem of combinatorial explosion. It is simply not possible to study all possible sequence variants of internal loops of size 4 × 4 or greater without a high throughput methodology.
In this work, we use a different approach. We chose a recurrent, well-structured and relatively large internal loop (5 × 4 nucleotides not counting the closing bases or 6 × 5 with closing bases) for detailed study. We have chosen the sarcin-ricin (S/R) motif as the focus of study of phylogenetic and ecological sequence variation and thermodynamic measurements for these reasons: (i) The motif is recurrent and widespread in structured RNA molecules (13)(14)(15); (ii) It is highly conserved, yet different sequence compositions can give essentially the same 3D structure of S/R motif as it will be shown below; (iii) the S/R motif is modular and autonomous, it occurs independently in different molecules and in different locations in the same molecule (16); (iv) the motif is unique, the G-bulged region of the S/R motif was found to have unusually complex electronic structure arrangement due to the combination of S-turn, stacking and base-phosphate (BPh) interactions which was not observed in similar small RNA internal loops (17) making the S/R motif very difficult target for molecular modeling and simulation force fields.
We began by cataloguing the sequence variations observed in atomic-resolution 3D structures, and then we examined sequence variations of the motif at corresponding positions in aligned sequences. We collected these data to choose sequence variants for thermodynamic characterization. We also studied sequences containing single mutations that are predicted to disrupt the motif, guided by base pair isostericity matrices (18).
Thermodynamic analysis of S/R motif variations found in organisms occupying different temperature niches and present in 3D structures indicates that a single change in a non-WC pair can significantly affect the overall stability of the motif. To verify whether sequence variants that have different thermodynamic stabilities still form the same motif, we probed their solution structures using single-strand specific nucleases T1 and A, double strand specific nuclease V1, as well as, DEPC. We found that isosteric base pair substitutions preserve the overall conformation of the motif, while single base substitutions that disrupt key non-WC base pairs destabilize the entire S/R motif.

RNA oligonucleotides
RNA oligonucleotides used for thermodynamic studies were obtained from IDT DNA (www.idtdna.com) and purified using preparative denaturing polyacrylamide gel electrophoresis (PAGE) or preparative 20 cm × 20 cm thinlayer chromatography (TLC) (n-propanol:ammonium hydroxide:water, 55:35:10). RNA was eluted from gel slices or TLC matrix using a buffer containing 0.3 M NaCl, 10 mM Tris pH 8.0, 0.5 mM EDTA, then ethanol precipitated with 2.5 volume of cold 100% ethanol, rinsed twice with cold 80% ethanol, dried, and dissolved in 30 l of double deionized (dd H 2 O) water. Purified RNAs produced single bands on a 16.5 cm × 20 cm analytical denaturing 20% polyacrylamide gel. For UV-melting measurements, the RNA stock solutions were extensively dialyzed first against 10 mM EDTA, 100 mM NaCl, and 10 mM sodium cacodylate and then against dd H 2 O and dissolved in dd H 2 O for storage. For each melting profile, a small volume of the oligomer stock was dissolved in 0.1 ml of buffer containing 10 mM sodium cacodylate (pH 6.9), 0.5 mM EDTA and either 10 mM MgCl 2 and 100 mM NaCl or just 1.0 M NaCl.
The RNA hairpin molecules for structural probing were prepared by runoff transcription of PCR amplified DNA templates. Synthetic DNA molecules coding for the antisense sequence of the desired RNA were purchased from IDT DNA (www.idtdna.com) and amplified by PCR using primers containing the T7 RNA polymerase promoter. Template strands for T7 transcription were designed using the online program written by Jesse Stombaugh (link: http://rna.bgsu.edu/oldwebsite/rnatodna.html). PCR products were purified using the QiaQuick PCR purification kit (Qiagen Sciences, Maryland 20874). RNA molecules were prepared by in vitro transcription using T7 RNA polymerase (Takara Bio Inc., http://www.takara-bio.com) and purified on denaturing PAGE as describe above. 32 P ATP labeling of RNA oligonucleotides. RNA hairpins, prepared by in vitro transcription, were labeled with gamma 32 P at the 5 terminus. Molecules were first dephosphorylated using Calf Intestinal alkaline phosphatase--10 000 U/ml (CIP, http://www.neb.com) using the manufacture's procedure. The dephosphorylated RNAs were 5 -end labeled with [␥ -32 P] ATP (PerkinElmer Co.) using T4 polynucleotide kinase (New England Biolabs Inc.) according to procedure adopted from Ambion Applied Biosystems INC (http://www.ambion.com/techlib/ misc/RNA5 labeling.html). Labeled RNAs were purified on 20% denaturing PAGE before use.

Thermodynamic experiments
Optical melting experiments. Absorbance versus temperature profiles were measured to determine the melting transitions of the duplex RNA molecules and to characterize the temperature-dependent transitions of the single strands. To form duplexes, complementary single strands were mixed at equimolar concentrations. All melting curves were obtained in buffer containing 10 mM sodium cacodylate buffer (pH 6.9), 0.5 mM EDTA and either 0.1 M NaCl/10 mM MgCl 2 or just 1 M NaCl. Single-stranded extinction coefficients were calculated from the extinction coefficients for dinucleotide monophosphate and nucleosides using the literature parameters (19). Strand concentrations were determined from high-temperature (90 • C) absorbance at 260 nm (20). Absorbances were measured in 1 cm path length quartz cuvettes (Starna Cells, Inc), using a 100 CARY-BIO UV-Visible spectrophotometer (Varian, Inc.) equipped with a thermoelectrically controlled cell holder. Absorbances were obtained as a function of temperature at 260 nm or 280 nm with a heating rate of 1 • C/min. Absorbance readings were taken every 0.5 • C over the range 15-90 • C. At least five melting transitions were obtained for each sample.
The transitions of RNA duplexes melted in presence of 1 M NaCl were reversible for two temperature cycles and no significant changes in T M were observed. However, transitions in the presence of Mg 2+ were not reversible since Mg 2+ catalyzes the hydrolysis of RNA backbone (21).
Data analysis of UV denaturation curve. Thermodynamic parameters ( G 37 • , H • , S • ) for duplex formation of RNA S/R motif constructs were calculated from the melting profiles by curve fitting using the MeltWin (version 3.0) software. This program fits the shape of each curve to the two-state model with sloping base lines given by Equation (1) using a nonlinear least-squares procedure (22).
In Equation (1): A T is the total absorbance, A SS is the absorbance of RNA single strands, A DS is the absorbance of the RNA double strand complex and f is the fraction of total RNA in the duplex form. Assuming the transitions from the duplex to single strands conform to the two-state model, then, the fraction of RNA duplexes can be related to equilibrium dissociation constant K D according to Equation (2): where C T is the total RNA concentration. This allows for the extraction of the thermodynamic parameters assuming linear sloping baselines and temperature independent H • and S • values (22): Correlation coefficients (R 2 ) of ≥0.99 were observed for the fits to the melting curves using Equation (1). Reported H • and S • values and their standard deviations were based on at least five separate experiments for each sample.
To confirm the validity of the two-state model for the duplexes we measured T M as a function of C T for the U2G/C12A variant at four concentrations ranging from 5 to 20 M. We constructed a Van't Hoff plot to calculate thermodynamic parameters of non-self complementary duplexes according to the method of Borer et al. (23): where R is the gas constant, 1.987 cal mol −1 K −1 ; T m is melting temperature of a duplex. All transitions confirm the two-state model, H • values from the two methods generally agree within 10% error (24,25).
The Gibbs free energy change at 37 • C was obtained using the following equation: RNA structure probing Diethylpyrocarbonate modification and cleavage of RNA hairpin bases. RNA chemical modification by a diethylpyrocarbonate (DEPC) was performed under native conditions. Briefly, labeled RNA hairpins (1 l, ∼ 100 000 cpm) were annealed in SC buffer (pH 6.9) in presence of 10 g yeast tRNA. Next, 15 l of DEPC (SigmaAldrich) was added to each RNA sample. Samples were incubated for 30 min at 20 • C with mixing every 10 min. Treated RNA strands were precipitated with 2.5 volume of cold ethanol and dried RNA pellets were resuspended in 20 l anilineacetate buffer (1 M Aniline, 13.9 N glacial acetic acid, pH 4.5). The cleavage reaction was performed by incubating the mixture for 15 min at 60 • C in the dark. Samples were frozen on dry ice and then lyophilized in the SpeedVac. Pellets were resuspended in 50 l of dd H 2 O, frozen, and dried again on the SpeedVac to remove organics. Finally, cleaved RNA products were resuspended in 6 l of denaturing loading dye solution (8 M urea, Tris-HCl pH 7.4, EDTA 1 mM, 0.1% (w/v) bromophenol blue) and analyzed on 8 M urea-15% sequencing PAGE.

Searching for sarcin-ricin motifs in the 3D structure database
Like many other 'loops' the nucleotides of S/R motif are represented as single-stranded in 2D structure. The standard S/R motif is an asymmetrical 5 × 4 internal loop ( Figure 1, left), although some versions comprise two more To examine the sequence variation in the S/R motifs present in atomic resolution 3D structures, we carried out mixed symbolic and geometric searches, using WebFR3D (26). We searched a non-redundant set of annotated RNA 3D structures from the PDB with resolution better than 4Å. (http://rna.bgsu.edu/rna3dhub/nrlist) using geometric discrepancy 0.8Å/nucleotide as the similarity cutoff (27). The non-redundant set contained ribosomal structures from six different organisms. Most notably, the existing data allowed us to compare sequence variation between two related microbial organisms, the mesophile Deinococcus radiodurans and the thermophile T. thermophilus, that have adapted to different thermal environments, as well as between two phylogenetically diverged organisms that have adapted to similar thermal ranges, the mesophiles E. coli and D. radiodurans. Figure 2 shows 45 instances of S/R and S/R-like motifs that have been identified in 15 distinct PDB files (listed in Table 1). Consistent with previous results (18), five recurrent S/R motifs are found in conserved positions in the large ribosomal subunit (LSU) of each of the six organisms and three in the small ribosomal unit (SSU). Throughout this paper we will refer to the S/R motif located in H95 of 23S rRNA from Haloarcula marismortui as the prototype, the sequence of which is indicated by the red rectangle in the upper row of Figure 2A. The H95 motif of bacterial 23S rRNA is sensitive to the toxins sarcin and ricin giving the motif its name (28).
The base pair families and the order in which they occur in S/R motif are conserved, but the sequence varies in different organisms and locations. For example, in almost all eubacterial 23S rRNA the tHH base pair in the H95 motif is AC, while in most Archaeal and Eukaryal LSUs it is AA. In addition, some S/R motifs have one more non-WC base pair forming 6 × 5 loop. Overall, however the H95 motif is very conserved in sequence.
Generally, S/R motifs in other locations display more sequence variability, especially, in the tSH base pair, where we observe seven different base combinations in LSU, SSU and 5S rRNA 3D structures AG, AA, UC, CC, AC, GG and UA (Figures 1 and 2). All these sequence variants, with the exception of G-G, form isosteric tSH base pairs (29).
Unusual substitutions are observed in some S/R motifs. For example, in the S. cerevisiae version of the LSU H11 S/R motif, there is an unusual bulged A in place of highly conserved G which forms a cSH/tWH base triple with conserved U and A bases of the motif. Also, an unusual GA tHH base combination is observed in the H. marismortui version of 23S rRNA S/R motif that is embedded in the complex junction from domain II.
Within 16S rRNA, S/R-like motifs occur in three places: h27; the three-way junction comprising h28, 29 and 43; and h17 ( Figure 2D). These motifs differ from the prototypical structure in that the h27 S/R motif contains a conserved bulged C, the 3WJ S/R motif is missing one tSH base pair and the S/R motif from h17 shares only tHS AG and tWH UA base pairs with the prototype. Nevertheless, the overall geometry of these motifs is similar to the prototypical structure, and we include them in the analysis of sequence variations.

Base pair isostericity in S/R motifs
Two base pairs are said to be isosteric when they meet three criteria (1): (1) The C1 -C1 distances are nearly the same; (2) the paired bases are related by identical rotations in 3D space; and (3) H-bonds are formed between equivalent atoms of the bases. Consequently, isosteric base substitutions in RNA 3D motifs are structurally neutral and do not interrupt the 3D geometry, while non-isosteric mutations distort the structure or prevent it from forming. Table 2 represents the observed sequence variations from 3D structures of S/R motif tabulated in 4 × 4 Isostericity matrices. Squares representing isosteric base pairs in the same base pair family are shaded the same color. Different colors indicate different isosteric groups. For example, there are two isosteric groups in the tHS family and almost all sequence variants belong to the same family for both tHS base pairs of the motif. Supplementary Figure S1 compares structural and geometrical parameters of isosteric and near isosteric tHS and tHH base pairs that appear frequently in S/R motifs. While substitution of isosteric base pairs does not distort the 3D structure of the motif, it may affect the thermodynamic stability or the rigidity/flexibility of the structure. This aspect of isosteric substitutions in conserved motifs has not been addressed adequately in previous work.
Guided by the structural analysis, we suspected that some of the observed sequence variation in non-WC base pairs in the S/R motifs may be due to thermal adaptation. For example, Figure 2 demonstrates that in both thermophilic organisms (T. thermophilus, and T. maritima) the upper tHS base pair is always AG, whereas in mesophilic groups AA also occurs (e.g. E. coli h27 16s rRNA in Figure 2D and B. subtilis RNAse P in Figure 2C). The lower tSH base pair accommodates multiple sequence combinations; however, it is not known to what extent, if any, these variations affect the stability of the S/R motif. For example, there are two N-H . . . N bonds in the tSH GA base pair but only one amine -carbonyl N-H . . . O = C H-bond in the isosteric tSH UC pair. Does this mean that motifs in which GA substitutes for UC at this position are significantly more stable? Are there correlations between the organism's optimal growth temperature and the sequence variations? The present study aims to address these questions.

Thermodynamic study of RNA duplexes containing sarcinricin motifs
Having analyzed sequence variability in the S/R motifs found in the 3D structural database, we designed RNA duplexes ( Figure 3) that systematically incorporate the key sequence variations identified by these comparisons. Next, we describe the design of these duplexes and the results of UVmelting experiments that were carried out to quantify their stabilities.
Design of sequence variants and nomenclature. For consistency, we numbered each nucleotide in the S/R motif sequences from the 5 to 3 direction starting with the leftmost strand and including the flanking base pair, as shown in Figure 3 (the red rectangle). Numbered nucleotides are enclosed in red rectangles and nucleotides that differ from prototype in Figure 3 are colored in red in all figures.
Each duplex has its unique number 1 through 16 and is named and labeled to reflect the following: a) Nucleotide substitution(s) relative to the prototype duplex. For example, the '#2 U2C' duplex U at position 2 changed to C. Each nucleotide substitution is colored in red ( Figure 3). b) Motif location in rRNA, indicated by the helix number and organisms name, e.g. 23S-H13 E. coli. c) Optimal growth temperature class (psychrophilic, mesophilic, thermophilic, hyperthermophilic).
We searched for S/R motif instances in the nonredundant RNA 3D structural database using WebFR3D (26) and identified 14 unique S/R motif sequences ( Figure  2). These 4 × 5 internal loops were inserted in RNA duplexes flanked by two canonical base pairs. To prevent sequence end flaying of the flanking helices prior to melting of the motif we terminated each helix with GC Watson-Crick pairs. The sequences were checked with Mfold (www. mfold.bioinfo.rpi.edu, Zuker 2003) to avoid the formation of self-complementary secondary structures, internal hairpin structures or misaligned duplexes. We used duplexes for melting experiments rather than hairpin constructs to avoid specific interactions of hairpins with divalent metal ions (30,31). Moreover, base substitutions are easier and cheaper to design in duplex RNAs and data analysis in duplexes is more robust.
We also added two control molecules that were not observed in 3D structures. These controls contain base substitutions intended to prevent the formation of the S/R motif by disrupting one of the two tHS base pairs. For example, in the #4 U2G molecule U2, which forms a tSH base pair with C12, is substituted by G. The resulting GC juxtaposition is not expected to form a tSH pair since GC is never observed to form this type of base pair in any 3D structure. On the other hand, the GC combination can form a canonical and stable Watson-Crick base pair, which is expected to disrupt the formation of the S/R motif. Likewise, the A6C substitution in the #16 U2G/A6C/C12A is expected to disrupt the formation of the other tHS base pair and thus the whole motif by producing a CG between positions 6 and 9 of the motif. All 16 designed RNA duplexes, fourteen of which are observed in 3D structures and two of which are unobserved controls are shown in Figure 3 with differences from the prototype indicated by red rectangles.

Selecting S/R motif flanking base pairs
The flanking Watson-Crick base pairs of S/R motifs can vary. In addition to the canonical cWW GC and AU flanking pairs, in 3D structures cWW GU, CC, UC, UU and CA base pairs also occur ( Figure 2 and summarized in Table 2). It is not possible to perform thermodynamic studies on all variations of flanking pairs. Thus, we designed a controlled study varying one flanking cWW pair of the motif (molecule #10 U2G/C12A). We kept the G7C8 closing base pair constant and tested four different variants of the other closing base pair (C1G13, G1C13, U1G13 and U1A13). The results of these UV-melting studies are shown in Table 3. Except for the wobble U1G13 flanking pair all variants showed distinct transition curves. For duplex design we chose the C1G13 cWW base pair for two reasons. First, this variant has a high melting temperature of 37.3 • C, and secondly, it is easier to purify this molecule on TLC plates because it has fewer guanines on the left strand.

UV-melting of duplexes containing S/R motifs
UV-melting experiments of S/R motif RNA duplexes were performed by monitoring of the absorbance change at 260 nm. Some RNA melting curves were also recorded at 280 nm as a check and gave similar results. Melting controls of single strands did not exhibit distinct transitions. Table 4 shows the thermodynamic parameters for duplex formation that were obtained from averaging the independently fitted melting curves to the two-state model, adjusted to 100 M Ct with the assumption that extinction coefficients for single strands depend linearly on temperature. To confirm two state behavior we generated a Van't Hoff plot of T M −1 versus log (Ct/4) of the #10 U2G/C12A duplex and obtained   Mg 2+ and Na + buffers. It is standard to perform UV melting experiments of RNA duplexes in the presence of 1 M NaCl, to compensate for the absence of divalent ions. However, 3D motifs like S/R do not fold correctly in the absence of divalent ions. Defined conformation of RNA bases prior to melting is crucial and often depends on the addition of divalent ions. Serra et. al. showed that the stability of RNA structural motifs depends on Mg 2+ ions, in particular, a melting temperature increase in the presence of 50 mM MgCl 2 has been demonstrated for eukaryotic 5S loop E (32). Moreover, Brownian dynamic simulation studies suggested three potential but delocalized metal-binding sites in the S/R motif (33). This was confirmed by extensive molecular dynamic simulation (34). Therefore, to investigate the role of Mg 2+ on the S/R motif RNA duplex stability we performed UV-melting studies in two distinct buffer compositions, one with Mg 2+ and one without: 1) 10mM Sodium Cacodylate buffer pH 6.9, 0.5 mM EDTA with 0.1 M NaCl and 10 mM MgCl 2 . 2) 10mM Sodium Cacodylate buffer pH 6.9a, 0.5 mM EDTA and 1 M NaCl.
The differences in thermodynamic parameters of RNA duplexes from 10 mM Mg 2+ and in 1M NaCl are summarized in Table 3 (grey shaded cells correspond to experiment with no Mg 2+ ions). The data suggest that majority of RNA duplexes generally have higher stability in Mg 2+ containing buffer. The differences in thermodynamic parameters obtained between two buffers are summarized in Table 4.
To find the optimal Mg 2+ concentration for the S/R motif formation we measured T M of the #10 U2G/C12A RNA duplex as a function of [Mg 2+ ] from 1 to 50 mM. Figure 4 shows that at 10 mM MgCl 2 the T M curve reaches its plateau and further increase in [Mg 2+ ] does not significantly affect the T M . Thus, we used 10 mM MgCl 2 in all experiments involving magnesium. Typical thermal denaturation curves of five representative RNA duplexes containing S/R motif sequence taken at 10 mM MgCl 2 are shown in Figure  5. The samples containing Mg 2+ were only melted once due to Mg 2+ driven RNA cleavage.

Influence of tHS base pair combinations on S/R motif duplex stability
The tHS base pair family frequently occurs in functional RNA motifs including S/R motif, Kink-turn, loop E, multiway junctions as well as hairpin loops (e.g. GNRA hairpin loop). The most well known member of this family is the   tHS A-G pair, also known as sheared AG base pair which often contributes to RNA stability by extensive cross-strand purine stacking.
Sarcin/Ricin motifs contain two conserved tHS base pairs with opposite orientations. We refer to the tHS pair between nts 6 and 9 as the 'upper tHS' pair and the pair between 2 and 12 as the 'lower tSH' pair.
Upper tHS position. Four variants of the upper tHS pairing position (nt 6 and 9) were studied: A6-G9, A6-A9, C6-U9 and C6-G9 corresponding to molecules #10, #12, #15 and #16 respectively. In all these duplexes the lower tSH position was G2-A12 (see Table  4, yellow shaded cells). Three of these combinations are isosteric with the prototype and should preserve the conformation of S/R motif. The fourth variant juxtaposes C6 and G9, which cannot form the tHS pair and likely forms cWW, which is expected to disrupt the motif structure. Comparison of their thermodynamic parameters reveals the following results: the most stable RNA duplex is #10 U2G/C12A, which has A6-G9 with G • 37 of -15.9 kcal/mol in the presence of Mg 2+ ions. This tHS base combination is the most common in S/R motifs according to WebFR3D results (Summarized in Table 2). The single mutation in the upper tHS position in molecule G9A generates the isosteric A6-A9 tHS base pair. This substitution dramatically decreased the stability of the RNA duplex by ∼6 kcal/mol compared to tHS A6-G9 (see entry for the #10 U2G/G9A/C12A duplex in Table 2). The intrinsic energies for these isosteric tHS AG and tHS AA pairs were compared previously (35) using quantum chemical calculations and it was found that the tHS AG base pair is substantially more stable than tHS AA consistent with our observations. Molecule #15 U2G/A6C/G9U/C12A with isosteric C6-U9 has a G • 37 value of -12.9 kcal/mol which is also less stable than A6-G9, but more stable than A6-A9.
Data shows that many of these duplexes have relatively similar G • 37 of ∼15.5 kcal/mol and the order of stability follows: tSH U2-C12 ( G • 37 = -16.6 kcal/mol) > tSH A2-C12 ( G • 37 = -16.5 kcal/mol) > tSH G2-A12 ( G • 37 = -15.9 kcal/mol) > tSH C2-C12 ( G • 37 = -15.6 kcal/mol) > tSH A2-A12 ( G • 37 = -15.3 kcal/mol) >> tSH U2-A12 ( G • 37 = -12.2 kcal/mol). All lower tSH pairs are isosteric (see above). And only tSH U2-A12 is ambiguous. The A at position C12 could form a Watson-Crick base pair with U2, which is expected to disrupt the motif. However, UA is also an allowed base combination for tSH, and so it is possible that the tSH pairing geometry is conserved. This sequence variant tests the Table 5. Effect of Mg 2+ ions in stabilization of S/R motif duplex a a Values obtained by subtracting S/R motif data (see Table 5) melted in presence of MgCl 2 and its absence. For example for prototype S/R motif molecule: resilience of the motif to ambiguous base substitutions. In fact, this sequence variant occurs in the central junction of yeast 26S rRNA as shown in Figure 2. In the 3D structure (PDB ID: 3U5H) we observe the U and A forming the tSH geometry, so we expect this variant to form the motif. On the other hand, the control molecule #4 U2G comprises tSH G2-C12. The tSH GC combination has never been observed in any 3D structure and thus is not expected to form this combination. UV-melting data shows that this RNA duplex has a G • 37 of -11.5 kcal/mol and suggests that this substitution significantly reduces the stability of the S/R motif.
Comparison of the lower tSH isosteric sequence combinations of the S/R motif RNA duplexes melted in the presence of 1 M NaCl shows a slightly different order of stability as compared to the experiment with 10 mM Mg  Interestingly, all isosteric combinations at the lower tSH depend on the presence of Mg 2+ , and, generally Mg 2+ ions stabilize the S/R motif of these duplexes, while we do not observe this effect at the upper tHS position.' 'Moreover, while isosteric base pair substitutions at the lower tSH position do not significantly impact the stability of S/R motif, substitution in the upper tHS may depending upon the sequence. For example, comparison the free energy parameters between RNA duplex #10 U2G/C12A with upper and lower tHS A-G and #11 U2A/C12A RNA duplex with upper tHS A6-G9 but lower tSH A2-A12 shows the difference in G • is small, only -0.6 kcal/mol (15.9-15.3). However, comparison between molecules #10 U2G/C12A and #12 U2G/G9A/C12A with upper tHS A6-A9 and lower tSH G2-A12 shows a much larger difference in G • of -6.8 kcal/mol (see Table 4, their T M are in green colored cells) So why do we observe such big changes in stabilities of RNA duplexes mutated in the upper tHS and not in the lower tSH? The stability of S/R motif must be defined by the context of tHS base pair family within the structure. 'For example, while lower tSH base pair stacked between tHH and cWW pairs, the upper tHS stacked between cWW on one side and tWH pair of the triple on the other. These different nearest neighbor nucleotides of the upper and lower tHS pair, in fact, could explain the overall duplex stability.' In addition, the stability of S/R motif structure depends not only on the stacking interactions but also on base-phosphate (BPh) interactions (36,37). 3D structures of S/R motifs with different upper tHS base pair combinations revealed formation of different BPh hydrogen bonding as demonstrated in Figure 6. This figure shows that in the prototype S/R motif, nucleotide U2693, a part of the triplet, interacts with nucleotide G2701 forming 5BPh interaction. The same hydrogen bond between corresponding nucleotides U891 and A906 in S/R motif containing upper tHS AA belongs to 6BPh family. This suggests that BPh interactions might be crucial in stabilizing the S/R motif. In fact, the importance of BPh interactions was pointed out by Zirbel et al. (37), and revealed their evolutionary roles in ribosomal RNA. It was shown that ∼12% of nucleotides in large RNAs are involved in direct BPh interactions, and these interactions are the most conserved during evolution. This may explain why we see such large differences in stability when the upper tHS varies. Surprisingly, while there is no BPh interaction between nucleotides U269 and U294 when the upper tHS is C-U, as shown in Figure 6, these duplexes are more stable than tHS A-A.

Bulged nucleotides stabilize the S/R motif duplex by high entropy contribution
We studied three S/R duplexes (#1, #8 and # 9) that have differences only in the bulged nt G4, A4 and C4 (Table 4 in brown shaded cells). The prototype RNA duplex with highly conserved bulge G4 revealed the lowest G • 37 value of -16.6 kcal/mol, while G • 37 for the S/R motifs with bulges A4 and C4 were found to be -10.6 kcal/mol and -11.5 kcal/mol respectively. The free energy difference of the bulged G4 compared to A4 and C4 is ∼-5.5 kcal suggesting that the prototype duplex much more stable than A4 When the upper tHS is AG there is a hydrogen bonding between oxygen of phosphate group of U and aminogroup of G which belongs to 5BPhfamily (A). When the upper tHS is AA then interaction between the phosphate group of U and aminogroup of A belongs to 6BPh family (B). There is no BPh interaction when the upper tHS has CU base pair combination (C). and C4 (duplexes G4A and G4C). However, mutations in bulged nucleotides result in significant differences in calculated entropy values. The calculated entropy values for these duplexes are shown in Table 4 in brown colored cells for clarity. The prototype motif with the bulged G4 has a S • of -260.5 e.u. while substitution to bulged A4 or C4 resulted in significantly higher S • values of -114.2 e.u. and -158.3 e.u. Even larger entropy change was observed in presence of 1 M NaCl for A4 and C4 (see Table 4). Evidently, the high entropy values for S/R motifs containing bulged a A4 or C4 contributes to the overall stability of the entire duplex. Why there is a big difference in entropy parameters of A4 and C4 compare to prototype G4? The survey of 3D structure demonstrates that A and C bulged nucleotides in the S/R motif context are very rare ( Figure 2). As shown in the crystal structure of 23S rRNA in H-11 of S. cerevisiae, the bulged A does not participate in the triple interaction and the nucleotide is bulged out from the motif entirely suggesting the flexibility of entire S/R motif. The same pattern is observed for the bulged U (not used in our studies) at equivalent position from S/R motif of 16S rRNA of H17 ( Figure  7). Moreover, the previously reported NMR data also suggest that the replacement of bulged G to A destabilizes the internal loop while overall conformation appears to be similar (38). Our data is consistent with reported observations in the literature and demonstrates that even if the bulged A or C do not participate in triplet interaction, they help to stabilize the formation of S/R motif by a large entropy contribution. In most S/R motifs the bulged base is G, as exemplified in the S/R motif from H-95 of 23S rRNA (Panel A). The G is always observed to form a base triplet and base phosphate interaction. When this G is replaced by A (Panel B) or U panel (C), the base is no longer observed to form these interactions.

Contribution of trans Hoogsteen/Hoogsteen base pair combinations to stability of S/R motif
Three RNA duplexes (#1, #6 and #7) were studied in which only tHH the base combination differed: tHH A3-A11, tHH G3-A11 and tHH A3-C11 (Table 4, violet shaded cells). The calculated G • 37 revealed that the tHH A3-A11 base pair combination is more stable than tHH A3-C11 and the least stable is tHH G3-A11. The order of stability is well correlated with the occurrences of tHH base combinations, as tHH AA is the most frequent in existing 3D structures ( Figure 2, and summarized in Table 3) and most stable. These data clearly demonstrate that the tHH base pair is important in the S/R motif, as a single mutation of the tHH A3-A11 pair combination to tHH A3-C11 or tHH G3-A11 influences the overall motif stability. In addition, the stability of the S/R motif with tHH G3-A11pair is entropically driven as its S • value of -131.9 e.u. is significantly higher than that of tHH A3-A11, S • = -260.5, and tHH A3-C11, S • = -187.4 in presence of MgCl 2 .

Structural probing of S/R motifs
Next, to determine which of the sequence variants studied by thermodynamic methods likely form the S/R motif, we employed established enzymatic and chemical probes, including RNases T1, A and V1 and the chemical probe DEPC (diethylpyrocarbonate).
To directly probe the RNA duplexes used in UV-melting studies is not feasible, so we designed longer hairpin shaped molecule which included a control motif for probing experiments. The sequences of twelve of the sixteen RNA constructs designed for structure probing are shown Figure 8. Each contains a stable GAGA hairpin tetra-loop, which was intended to form a standard GNRA structure. All constructs are 62 nucleotides long. Using the hairpin design we were able to implement an efficient method for S/R motif structure probing: Each probing construct has a constant (control) and a variable region, corresponding to S/R motif sequences from the melting experiment as shown in Figure  8. The S/R motif closest to the 5 and 3 ends of each construct is the control motif (enclosed in the dotted-line rectangle in Figure 8) and corresponds to the sequence of the #10 U2G/C12A S/R variant. It is the same in all molecules and serves as a control for establishing the relative intensities of the cleavage when probing different molecules under different conditions. The motif closest to the hairpin loop is variable and is highlighted in a red rectangle in Figure 8. It corresponds to the different sequence variants inserted in RNA duplexes for UV-melting studies. This design strategy allows us to analyze the probing results for all RNAs simultaneously as the same control S/R motif is presented in each construct.

Structure probing by nucleases
RNA structure probing was carried out with the helix specific RNase V1 and two single-strand specific nucleases T1 and A. RNAs were 5 -end labeled with 32 P phosphate. All RNAs were probed in their native conformations in the presence of 10 mM MgCl 2 . The buffer conditions and annealing procedures were identical to the UV-melting experiments described above to ensure that the RNA molecules adopt the same conformation. The cleavage data averaged over several experiments are summarized in Figure 8 and corresponding autoradiogram images are provided in Figure 9.
RNase T1. The cleavage pattern of RNA hairpins treated with RNase T1 is indicated by green arrow in Figure 8 (for cleavage pattern see Figure 9A). These data show that in all constructs, nucleotides G33 and G31 in the hairpin loop are the most sensitive to RNase T1 digestion and likely are most accessible to enzymes. These nucleotides are located in the hairpin loop region and the GAGA sequence could fold into GNRA-type structure. The difference in sensitivity between G33 and G31 to RNase T1 suggests that G33 remains unpaired, while G31 makes some interaction, consistent with 3D structures of known GNRA motifs, which shows that G33 is unpaired while G31 forms a tSH non-WC base pair with A34 (39,40). Thus, the probing data is consistent formation of the GNRA motif as intended in our design.
All other G's in the probed constructs remained unreactive to RNase T1 except for G24 in the molecule that contains the #12 U2G/G9A/C12A sequence variant. Nucleotide G24 corresponds to the bulged G that forms a cSH pair with U25. In this hairpin, the base pair that stacks on the cSH G24-U25 is tHS A26-A39 instead of tHS A26-G39. The increased sensitivity observed for G24 suggests that tHS A26-A39 pairing makes the S/R motif more flexible and thus lowers its thermostability. This is consistent with the data obtained from UV-melting analysis for the duplex containing the tHS A6-A9 and tHS A6-G9 pairs, demonstrating that the A6-G9 pair favors formation of the S/R motif by G • 37 ∼ -6 kcal/mol compared to the tHS A6-A9 pair (see Table 4).
RNase A. To further characterize the flexibility of S/R motifs we employed RNase A which is specific for unpaired pyrimidine nucleotides. The RNase structure probing data are summarized in Figure 8, using blue arrows to indicate the cleavage sites, and Figure 9B. The cleavage pattern obtained after subsequent treatment of hairpins with the RNase revealed consistent cleavage of U12 across all S/R motifs, suggesting structural similarity in the constant region of all constructs. U12 participates in the base triple and is very conserved among all S/R variants obtained from crystal structures ( Figure 2). However, we observed different cleavage patterns in the variable regions of hairpins. The conserved U25 in the variable region was sensitive to RNase A among all hairpins with the exception of the hairpin #16 U2G/A6C/C12A (lane #16 U2G/A6C/C12A in figure 9B). As discussed above, this hairpin sequence was designed to disrupt the S/R formation thus we assume that the U 25 forms a canonical cWW pair with A40, which prevents the cleavage of U25. This indicates that mutation of tHS A26-G39 to the non-isosteric C26-G39 pair distorts the S/R motif conformation and thus the stability is reduced (Table 4).
U's are observed to be sensitive to RNase A when they are present in tHS basepairs ( Figure 8). Interestingly, the C's in the tSH C22-C42 and U22-C42 pairs are not sensitive to RNase A, except when there are two adjacent Y-Y tSH pairs (molecules #13 C1U/U2C/G13C and #14 C1U/U2C/G4AC/G13C). The RNase A data provides evidence that the G22-C42 juxtaposition does not affect the formation of the base triple, because U25 shows the same strong cleavage as in the other sequence variants. These results are consistent with those obtained by RNase A treatment of 5S rRNA from X. laevis, where loop E also forms an S/R motif (41). The U that participates in the base triple UAG interaction was found to be sensitive to RNase and is very conserved in S/R motifs (42).
RNase V1. Strong RNase V1 cuts are observed in all regions of the RNA constructs that we expected to form Watson-Crick helices (Figure 8 red arrows and Figure 9C). The regions of constructs that contain the S/R and hairpin motifs are resistant to nuclease V1 cuts. Only in the #16 U2G/A6C/C12A construct are the nucleotides U25 and C26 susceptible to cleavage. This is additional evidence that the hairpin containing disrupting mutations has a distinctly different structure in variable sequence region.
Overall, these results are consistent with all 12 designed molecules folding into the intended long hairpin conformation with two internal loops and with U25 forming a non-Watson-Crick base pair in all cases except for the disrupting Nucleic Acids Research, 2021, Vol. 49, No. 16 9587   molecule where it is likely forming a Watson-Crick conformation instead.

Chemical probing
The reactivity of the N7 purine positions of three sequence variants #10 U2G/C12A, #12 U2G/G9A/C12A and #16 U2G/A6C/C12A was probed with DEPC. This probe reacts more strongly with A/N7 than G/N7. The reactivity of the adenine residues was determined by subsequent aniline cleavage of 5 -end labeled RNA hairpins. An autoradiogram image and the results from several such experiments are provided in Figure 10.
Comparison of chemical reactivity data for variable and constant regions of the hairpin constructs indicates that the conformation depends on sequence. DEPC accessibility of A32/N7 and A34/N7 in the hairpin loop is very similar for all studied RNAs. This is also true for Gs at position 11 and 24 located in the core base triple. Moreover, we do not observe a significant difference in cleavage patterns of RNAs between native and semi-denaturing conditions. Under the semi-denaturing condition, where the Mg 2+ ions were eliminated from reaction buffer, only slightly increased cuts were observed.
The variable region in the variant #16 U2G/A6C/C12A is designed to disrupt the S/R motif by preventing the upper tHS base pair from forming, because C26-G39 juxtapositions do not form tHS pairs. As expected, the #16 U2G/A6C/C12A hairpin behaves very differently in the variable region ( Figure 10B, lanes 6 and 9). In addition to G24 reactivity, nucleotides G39 and A40 become fully reactive with DEPC in native and semi-denaturing buffers indicating that this internal loop does not have a stable conformation ( Figure 10B, rightmost molecule). These data are also consistent with the nuclease digestion experiments discussed above.
In conclusion, our structure probing experiments demonstrate that all S/R sequences fold into the 3D structure characteristic of this motif with an exception of one control molecule which is not expected to form the motif. Although exact orientations of all nucleotides cannot be established based on structure probing alone, G24 and U25 are cleaved in a manner consistent with forming the core base triple with A40. The structure probing data validates our observations made during the UV-melting experiments.

Sequence variation in S/R motifs observed in 16S rRNA sequences from Eubacteria with known optimal growth temperatures
Thus far our analysis was based on the sequence variation data derived from the RNA X-ray structures. These structures come from relatively few organisms and provide limited information on sequence variability. To expand our knowledge of S/R motif sequence variation we analyzed ribosomal sequence alignments. We hypothesize that the organisms living at higher temperatures have more rigid RNA 3D motifs while organisms living at lower temperatures have less rigid structures. In order to test this hypothesis, we constructed a special dataset containing 16S alignments with sequences annotated by the optimal growth temperature and phylogenetic group of the source organism.
Finding organisms with known optimal growth temperatures. Bacterial type strains associated with the optimal growth temperature were manually extracted from the literature, with a particular effort devoted to identifying extremophiles. Manual extraction was used because the previous attempt at extracting this data, the PGTdb (43), is outdated and no longer maintained. Species which grow optimally <20 • C were classified as psychrophilic; species growing from 20 • C to <50 • C were classified as mesophilic, species growing between 50 • C and less than 70 • C were thermophilic while species growing optimally at or >70 • C were hyperthermophilic (44). The distribution of organisms in different phylas by their optimal growth temperatures can be found in Supplementary table.
Selecting a source of sequence alignments. There are several alternative sources of aligned rRNA sequences. We examine 16S alignments because the 16S is often sequenced and therefore provides a large dataset to examine. Unfortunately, very few 23S sequences for extremophiles are available, so we did not attempt to analyze the large ribosomal subunit. In order to choose one 16S alignment, we aligned two 16S 3D structures from distantly related bacteria E. coli (PDB 2AVY) and T. thermophilus (PDB: 1J5E) using R3DAlign (45). Next, we compared the resulting structural alignment with sequence alignments downloaded from Green Genes, RDP and SILVA (46)(47)(48). As a result, Green Genes alignment has been chosen as it had the most agreement with the R3DAlign structural alignment.
Selecting a representative sequence alignment dataset. The representative dataset was created by extracting all sequences corresponding to each species and all of its subspecies with known optimal growth temperatures from the Green Genes 16S alignment updated on October 6, 2010 (http://greengenes.lbl.gov). Each species and its subspecies were represented with a single sequence selected based on the highest A, C, G and U content. This was done in order to reduce the bias due to over representation of some organisms in the dataset. This dataset contains 561 sequences; within these sequences 82 belong to psychrophilic, 328 mesophilic, 108 thermophilic and 44 hyperthermophilic organisms (Supplementary table). We used Uniprot's taxonomy to obtain the phylogenetic information for each organism.
Correlating structures with sequence alignments. Since some nucleotides in crystal structures are not resolved, the Green Genes alignment and the sequence from the 3D structure may not match. Therefore, we need to determine the correlation between the columns of the sequence alignments and the structure positions. To this end, we extracted all rows from Green Genes, which came from E. coli, and removed all gaps, then aligned each sequence to the sequence from the E. coli 3D structure (PDB: 2AVY). The Green Genes sequence with the largest number of aligned columns was used to correlate positions in the Green Genes alignment with the PDB sequence.
AG and AA trans Hoogsteen-Sugar base pairs correlate with organism growth temperature. Using the correlations between sequence alignments and 3D structures we now examine the variations in S/R-like motifs H17, H27 and 3WJ found in the 16S rRNA. We extracted the columns from the alignment which corresponded to each motif. We then grouped each sequence by the phylum of its source organism and optimal growth temperature class and then counted unique sequences. The resulting tables can be found in Supplementary Table S1.
Examining the S/R motif in the H27 of the 16S shows a strong correlation between AG and AA in the upper tHS base pair and growth temperature. As temperature increases the fraction of tHS AG also increases ( Figure 11 top panel). This does not appear to be due to phylogeny as the variation is found in Bacteriodes, Firmuctues, and Betaproteobacteria. Other positions remain conserved or nearly conserved. This correlation is noticeable only in one tHS base pair of one S/R motif, but it is supported by our observations based on the UV-melting experiments.
The bottom panel in Figure 11 shows the sequence variations of the S/R motif in H17. The bulged base in the triplet is a G in thermophiles and hyperthermophiles, while in psychrophiles it may be a U. All other examined motifs do not show a strong correlation between sequence and optimal growth temperature.
Overall, we do see some variations which may be due to temperature adaptation. However, our current methods are not precise enough to detect subtle changes. Further statistical analysis on a larger dataset is needed to validate these initial, limited findings.

CONCLUSION
In the present work, we systematically study sequence variation at structurally conserved positions of S/R motif in rRNA from different organisms. Herein, our effort is aimed at gaining insight into the thermodynamics of isosteric and non-isosteric substitutions within the S/R motif. Besides the stability investigation, a structural probing analysis is performed to observe which mutations would disturb the motif.
This study demonstrates that isosteric substitutions do not disturb the S/R motif as it was expected from survey of 3D structures and survey of 16S rRNA from organisms adapted to different temperatures.
The most frequent variation was observed at the 'lower' tSH base pair and any neutral substitutions at this place do not significantly impact on the overall stability of the motif. On the other hand, non-isosteric substitutions destabilize the motif but the conformation is appears to be modest as reflected from structural probing experiments.  Substitutions at the 'upper' tHS pair are the most crucial for the stability and conformation of the motif. Nonisosteric mutations destroy the geometry of S/R motif. Isosteric mutations tHS A6-A9 or C6-U9 destabilize the motif but the geometry is preserved. Moreover, tHS A6-A9 makes the motif more flexible as reflected by RNAse T1 digestion pattern.
Any mutations at bulged nucleotides participating in triplet interaction significantly impact on the stability of the motif as shown by UV-melting experiments.
Our phylogenetic survey suggests a correlation between non-canonical base pairs and optimal growth temperatures. In high temperature environments tHS AG is the most frequent base pair in S/R motifs, whereas tHS AA is prevalent in cold environments. In addition, the bulged G, which is conserved in most organisms, mutates to U, A or C mostly in psychrophiles and mesophiles. This suggests that organisms may adapt to different temperature environments not only by adjusting the number of GC and AU Watson-Crick pairs in the helical regions as demonstrated previously (5), but also by selecting optimal isosteric sequence combinations for their non-canonical base pairs.
Using S/R motif as an example, we have demonstrated that non-canonical interactions in structured RNA motifs are very important to overall structure and stability. By changing base pair at certain non-canonical positions we can control stability, structure, and perhaps function of large RNAs.