Allosteric Logic of the V. vulnificus Adenine Riboswitch Resolved by Four-dimensional Chemical Mapping

The structural logic that define the functions of gene regulatory RNA molecules may be radically different from classic models of allostery, but the relevant structural correlations have remained elusive in even intensively studied RNA model systems. Here, we present a four-dimensional expansion of chemical mapping called lock-mutate-map-rescue (LM2R), which integrates multiple layers of mutation with nucleotide-resolution chemical mapping. This technique resolves the core mechanism of the adenine-responsive V. vulnificus add riboswitch including its gene expression platform, a paradigmatic system for which both Monod-Wyman-Changeux (MWC) conformational selection models and non-MWC alternatives have been proposed. To discriminate amongst these models, we locked each functionally important helix through designed mutations and assessed formation or depletion of other helices via high-throughput compensatory rescue experiments. These LM2R measurements give strong support to the pre-existing correlations predicted by MWC models, disfavor alternative models, and reveal new structural heterogeneities that may be general across ligand-free riboswitches.


INTRODUCTION
Conformational changes in RNA molecules are ubiquitous features of gene regulation in all living cells and viruses. Recent years have seen an explosion of discoveries of compact cis-acting mRNA elements that sense small molecules, proteins, and other RNAs and modulate transcriptional termination, ribosome recruitment, splicing, and other genetic events. [1][2][3] Beginning with classic work by Yanofsky and colleagues in the 1970s, these discoveries have been associated with elegant models of allostery in which each molecule interconverts between at least two conformational states. [4][5][6] In particular, the allostery is typically presumed to occur through Monod-Wyman-Changeux mechanisms (also called 'conformational selection' or 'population shift' models) [6][7][8] . In these pictures, ligand binding to an aptamer region of the RNA shifts a preexisting equilibrium between strikingly different secondary structures, resulting in allosteric exposure or sequestration of another sequence-distal region harboring a ribosome binding site or other gene expression platform. [9][10][11] If these multiple structures could be determined in detail, they would each offer a potential new target for biological control and antibiotic development. 12 Despite decades of work on RNA allostery, experiments directly testing the base pair correlations posited for specific systems have been difficult. A paradigmatic example is the add riboswitch, which has become a dominant model system for studying ligand-sensing riboswitches. This compact adenine-sensing element resides in the 5´ untranslated region (UTR) of the add adenosine deaminase mRNA of human pathogenic bacterium Vibrio vulnificus, and controls translation of the mRNA in response to adenine. Domains of this RNA have been subjected to nearly every in vitro biochemical and biophysical technique available, including time-resolved Xray laser diffraction. [13][14][15][16][17][18][19][20][21][22][23][24] Almost all studies to date have assumed some variation of an MWC model of action. 15 Recently, however, a tour de force study, bringing together multidimensional NMR spectroscopy, NMR relaxation studies, designed model systems, and supporting measurements from stopped-flow and calorimetry techniques, revealed detailed singlenucleotide-resolution base pairing information and intriguing temperature dependences for the complex structural ensemble of the add riboswitch. 25 Although similar in some respects to previously proposed MWC models, the new model contradicted a standard MWC assumption [7][8]26 : it proposes that, in the absence of adenine, riboswitches that sample the correct aptameric secondary structure do not also concomitantly open their ribosome binding sites and increase gene expression, as predicted by MWC conformational selection. Despite this fundamental distinction, the non-MWC model appears fully consistent with all available data, and it remains unclear what structural correlations, if any, underlie the allosteric logic of the add riboswitch. As a reflection of current uncertainties in the system, the same group that first proposed an non-MWC model for the add RNA has recently revived the MWC framework to interpret newer single molecule measurements on the same system. 27 To resolve the allosteric logic of the add riboswitch, we have developed a high-dimensional expansion of single-nucleotide-resolution chemical mapping (Figure 1). We previously showed that structural perturbations from mutating each nucleotide of an RNA could be read out at every other nucleotide through chemical mapping to give rich two-dimensional data sets and accurate blind predictions for RNAs with single, well-defined structures (mutate-and-map, M 2 ). [28][29][30] Expanding to three dimensions, testing compensating mutations at pairs of nucleotides through their effects on the reactivities of other nucleotides gave incisive evidence for specific base pairs (mutate-map-rescue, M 2 R), again for molecules with single dominant structures. [31][32] Here, we expand the M 2 R method to allow dissection of more complex ensembles involving numerous functionally important structures, taking advantage of a Bayesian framework and extensive simulations to connect experimental observables to underlying helix frequencies. By identifying 'lock' mutants that stabilize helices posited for each state, we infer how the presence of one helix enhances or suppresses the presence of other helices, through further rounds of M 2 R on the mutant backgrounds ( Figures 1A and B). This four-dimensional chemical mapping workflow (termed lock-mutate-map-rescue, LM 2 R) resolves fundamental questions raised by the existing models of the add riboswitch and also exposes new ensemble properties that were not expected in prior models.

Inferring helix frequencies through quantitative compensatory rescue
Our key methodological innovation arose from attempts to more quantitatively analyze compensatory mutation experiments for RNA structure read out through chemical mapping (mutate-map-rescue, M 2 R). [31][32] Briefly, if mutation of either side of a base pair (mutant A or mutant B) disrupts the reactivities at other nucleotides in the RNA, but the compensatory double mutation (mutant AB) restores the SHAPE reactivities to the wild type (WT) profile, these observations suggest that the two sites were base paired in the starting WT sequence. Our prior work assumed that such observed compensation would arise with negligible probability if the mutated nucleotides were not paired. To test this assumption and automate the protocol, we carried out simulations of these experiments and developed a quantitative 'rescue factor' metric that accurately recovered expert assessments from simulated and experimental data, including the where the Euclidean difference diff between two SHAPE profiles d over probed nucleotides i = 1 to N is given by: These in silico tests confirmed that observation of strong rescue implies high helix frequency ( Figure 1C) but unexpectedly also showed that modest rescue ratios could be informative. Values lower than one but greater than zero corresponded to 'partial' rescue, where the reactivities at some nucleotides revert to wild type (WT) values but others do not ( Figure 1D, center). While individual base pair tests were noisy, averaging recue factors over tests on three or more base pairs gave values that showed a striking correlation with helix frequencies ( Figure 1C). Furthermore, absence of rescue (rescue factor close to zero; Figure 1D, bottom) gave upper bounds on the helix frequency (Supporting Figure S1). To estimate uncertainties in these estimates, we used a Bayesian framework to derive posterior probabilities of each helix frequency given the observed rescue factor, calibrated based on these simulations. Below, we present cumulative distributions of these posterior probabilities in figures and tables (see also Supporting Figure S1); for ease of reading in the main text, we give simply median values of the helix frequency posterior probability distributions. We also developed a separate category-based classification to check that compensatory rescue of different base pairs within the same helix gave concordant results, and that automated analysis reproduced manual analysis (Supporting Results).
These simulation and experimental studies suggested that compensatory rescue measurements, even ones that resulted in partial or no detectable rescue, could be used to estimate helix frequencies and uncertainties for those values, without relying on complex fits 16 , though we sought more detailed tests, as described next.

The add riboswitch system
To test the applicability of M 2 R helix frequency estimation to understanding complex RNA structural ensembles, we applied it to the paradigmatic V. vulnificus add adenine-sensing riboswitch including its expression platform ( Figure 2). On one hand, the NMR analysis of this system offered detailed data for validating or falsifying helix frequencies. On the other hand, that study and other analyses have revealed an unresolved but fundamental question in how the RNA's multiple structures give rise to function. There is currently no controversy about the riboswitch's adenine-bound state (termed holo; Figure 2A, right). Helices P1, P2, and P3 are formed in the 5´-UTR, defining an adenine binding aptamer at nucleotides 15-81, with a welldefined tertiary structure that has been determined through crystallography. Also, a linker including an additional hairpin (P5, nts 89-110) connects the aptamer to the Shine-Dalgarno sequence and AUG start codon (nts 112-122), which are open and available for ribosome recruitment. In the absence of adenine, all available models posit that the riboswitch forms a secondary structure ensemble dominated by a new P4 domain that pairs with the Shine-Dalgarno sequence and the AUG start codon (nts 112-122) with nucleotides that were originally part of P1 (nts 75-88, here called the 'switch region' 15 , cross-hatched in Figure 2A, left). Concomitantly, the aptamer helices P1 and P2 are rearranged into an alternative structure. In keeping with the previous NMR study, these conformations with a rearranged adenine aptamer secondary structure are grouped into a state apoB. The major uncertainty regards what happens when adenine is not present but the RNA nevertheless samples a secondary structure poised to bind adenine, with P1, P2, and P3 formed. This state is termed apoA.
At 20-30 °C, several studies have estimated that the fraction of the riboswitch in the adeninebinding-ready apoA conformation occurs with a frequency of ~40% both in vitro and in cellulo. 13,33 An MWC conformational selection model would assume that this thermally sampled state should have not only the same adenine aptamer secondary structure as adenine-bound holo, but also the same open Shine-Dalgarno conformation as the adenine-bound holo, without the P4 domain. That is, apoA should be ON, able to recruit ribosomes. A key prediction of the MWC conformational selection model is therefore an exact structural identity of the gene expression platform in the molecule's active configuration, whether sampled at thermal equilibrium (apoA) or stabilized by binding of ligands at the distal pocket (holo) (compare Figure 2A, center and right). (We note that the MWC model allows for local rearrangements in the aptamer binding pocket in going from apoA to holo, but these adenine-induced changes are not allowed to propagate to the gene expression platform, as the fixed stem P1 spatially separates the aptamer and the expression platform in either apoA and holo.) At first glance, the NMR-informed portrait 25 is similar to the MWC conformational selection model, with high-population and low-population adenine-free states that display distinct structures incompatible and compatible with adenine binding (apoB and apoA), and a third adenine-bound state (holo) that is similar to the adenine-free apoA. However, the key feature 8 of the MWC conformational selection model is lost ( Figure 2B). In the absence of ligand, the NMR measurements could not directly detect opening of the gene expression platform at low frequency in the absence of ligand, i.e., transient sampling of the ON conformation. While this lack of signal could be due to ambiguities in the NMR measurements, both ligand-free states apoA and apoB were interpreted to be OFF with respect to gene expression, in conflict with the MWC conformational selection model. 34 The new model, referred to as the 'non-MWC' model henceforth, suggests instead a mixed conformation for apoA, where a shortened P1, P2, and P3 form the adenine aptamer secondary structure but also some parts of the P4 domain (here denoted P4A and P4B) are retained, blocking the Shine-Dalgarno site, i.e., apoA should be mostly OFF.
The non-MWC model does not explain how subsequent adenine binding to the aptamer of apoA might communicate through P1 to release the Shine-Dalgarno site and promote gene expression, raising the prospect of a radically different mechanism for allostery in RNA compared to classic MWC systems. Posed as a simple question, then: is apoA ON (MWC) or OFF (non-MWC)?
Before carrying out extensive mutagenesis and compensatory rescue experiments, we tested whether one-dimensional SHAPE profiling on the WT RNA might discriminate between MWC and non-MWC models ( Figure 2 and SI Figure S3). In the absence of adenine, the add riboswitch SHAPE profile ( Figure 2C) showed protections within the 'switch region', which is predicted to have pairings in both apoA and apoB, in both MWC and non-MWC models; other regions show variable levels of protection. Incubation with 5 mM adenine ( Figure 2D) gave protections in the 5´-most aptamer region, consistent with formation of P1, P2, and P3, and RNA tertiary structure formation at and around the adenine binding pocket, as expected for the holo state. In addition, adenine-dependent increases in SHAPE reactivity in the expression platform, the Shine-Dalgarno ribosome binding site and the AUG codon (nts 112-122), supported the adenine-coupled opening of the expression platform that is the hallmark of riboswitch function ( Figure 2D). Unexpectedly, in between these two regions, nucleotides 105-111 also modestly increased in SHAPE reactivity upon adenine binding, although, in both the MWC and non-MWC models, this segment is predicted to be sequestered into a P5 helix in all states ( Figure 2). However, this observation can be accommodated into both the MWC and non-MWC models by assuming partial opening of P5 in the adenine-bound holo state (see Discussion, below) and therefore does not discriminate between the models.

Three-dimensional mutate-map-rescue recovers ensemble helix frequencies
We then tested if quantitative mutate-map-rescue ( Figure 1) might confirm or contradict prior NMR results and enable discrimination between MWC and non-MWC models. We carried out M 2 R method on the separate adenine-bound and adenine-free ensembles. Prior to rescue experiments, which involve double compensatory mutants, we systematically scanned single mutants of each nucleotide to its complement, followed by SHAPE profiling across the riboswitch, which gave the two-dimensional patterns in Figure 3. Automated analyses of these mutate-and-map (M 2 ) data to determine the dominant secondary structure and to infer an approximate structural ensemble supported both the MWC and non-MWC models but did not carry the precision to discriminate between them (Supporting Figure S4). Nevertheless, the M 2 data were suggestive. As expected for a multi-state RNA, several mutations produced dramatic perturbations across the SHAPE profile. Compared to the WT sequence, many of these mutants gave protections or enhancements in the 5´-most aptamer region of the riboswitch (nts 25-32) and concomitant exposures or protections (respectively) near the 3´ end of the riboswitch (nts 105-122), which included both the start codon and Shine-Dalgarno ribosome binding sequence (arrows, Figure 3A-B). Such correlations between structuring the aptameric region and opening the gene expression platform, even in the absence of ligand, qualitatively supported predictions of the MWC conformational selection model, but we sought stronger, quantitative tests through deeper mutational analysis.
Compensatory rescue experiments assessed helix frequencies in the presence and absence of adenine and quantitatively agreed with prior models ( Figure 3E-F). First, in all prior models, the adenine-bound riboswitch (holo, Figure 2) was expected to display P1, P2, and P3 in the aptamer and P5. Indeed, in the presence of 5 mM adenine, SHAPE measurements for single and compensatory mutations targeting these helices and rescue factor analysis gave helix frequencies of 79%, 64%, 56% and 29%, respectively ( Figure 3E,3G). Each of these cases showed visually striking rescue in their SHAPE profiles as mutations disrupted adenine binding, and compensatory mutations restored the adenine-bound profiles ( Figure 3C, top 'quartet' of profiles); the lower helix frequency of P5 was consistent with its assumed partial opening, as noted with one-dimensional mapping. Measuring rescue factors for other tested helices, including P1B, P2B, P4A, P4B, and P4C predicted for the apoB state and numerous helices (P6, P7, through P17; SI Figure S4) proposed to occur at low frequencies from M 2 -guided modeling 16 , gave no evidence for their presence. We derived upper bounds on these helices' frequencies of 16% or lower; in cumulative distributions of posterior probabilities, there was a clear separation from the high-frequency helices ( Figure 3C, bottom quartet of profiles; Figures 3E-3G; SI Figure   S5; and Table 1). In addition, we used a double-base-pair mutation scheme for M 2 R, which induced stronger perturbations and rescuing effects, and supported the same pairings as above (SI Figure S6 and Table 2).
In addition to their predictions above for the adenine-bound state, all prior models of the add riboswitch expected the RNA to show a diverse set of helices in the absence of adenine, reflecting an admixture of apoA vs. apoB. Helices P3 and P5 are expected to be present at high frequencies, and both are detected by compensatory rescue experiments, with median helix frequencies of 73% and 75%, respectively ( Figure 3F). In all prior models (Figure 2), other helices were predicted to be present at lower but significant frequencies. Compensatory rescue experiments recovered evidence for the apoB-specific helices P4A (48%), P1B (44%), and P2B (17%); the apoA-specific helices P1 (53%) and P2 (31%); and the apoB helices whose presence in apoA is under question, P4B (55%) and P4C (68%) ( Figure 3D, 3F, 3H). As with the adenine-bound conditions, the double-base-pair M 2 R results were consistent with the single-base-pair M 2 R results (SI Figure S6). In addition, helices P6, P7, through P17 (SI Figure S5) again served as negative controls and gave low helix frequencies by compensatory rescue, as expected (pink cumulative distributions in Figure 3H; Table 1). In principle, the results for helices P4B and P4C could discriminate between the models under question (Figure 2A-B), as the non-MWC model predicts their frequencies to be close to 100%, while the MWC conformational selection model predicts their frequencies to be lower, in the same range as P4A (48%; compare yellow and blue cumulative distributions in Figure 3H). While the low helix frequency estimates favor the predictions from the MWC conformational selection model, remaining uncertainties in these values (broadness of probability distributions in Figure 3H; and Tables 1-2) precluded incisive discrimination between the models based purely on M 2 R experiments with the WT add sequence, and we turned to the four-dimensional chemical mapping method that involves M 2 R in mutational backgrounds that lock each riboswitch state.

Four-dimensional Lock-Mutate-Map-Rescue enables dissection of ensemble states
To stringently discriminate between the MWC and non-MWC models, we sought to measure the presence or absence of helices P4B and P4C in the apoA state, taking advantage of mutant backgrounds that 'lock' this state and the ability of quantitative compensatory rescue to estimate helix frequencies (lock-mutate-map-rescue, LM 2 R). While our original single-mutant scan ( Figure 3B and SI Figure S4) suggested several such mutants, we were concerned that their effects involved destabilizing the P4 domain and might bias our measurements towards favoring the MWC model. Instead, we took advantage of mutations outside P4 that stabilized P1 or P2, components of the aptamer secondary structure that define apoA in all available models. A number of double-base-pair compensatory mutants for P1 and P2 exhibited SHAPE profiles that agreed well with each other ( Figure 4A), supporting their use as mutant backgrounds for isolating apoA. We noted that all of these apoA-stabilizing mutants exhibited increased reactivity in the expression platform region (nts 105-122) relative to the adenine-free WT RNA, at a level comparable to the adenine-bound WT RNA ( Figure 4A); this observation already suggested that apoA is similar to holo in this region, favoring the MWC model over the non-MWC model. We also observed similarities in the gene expression platform between these adenine-free apoAstabilizing mutants and adenine-bound WT RNA with other chemical modifiers (dimethyl sulfate, glyoxal, ribonuclease V1, terbium(III), and ultraviolet irradiation at 302 nm; SI Figure S7).
We selected two mutants for lock-mutate-map-rescue probing apoA: lock-P1 (A19C, U20G, A76C, U77G) and lock-P2 (U28G, A29C, U41G, G42C). As a consistency check, we confirmed that compensatory rescue gave a strong P1 helix frequency (72%) when P2 was locked. Similarly, compensatory rescue targeting P2 gave a significant helix frequency (42%) when P1 was locked ( Figure 4C, SI Figure S8, and Table 2). As a further check, both models under question predicted the absence of P4A in apoA; indeed, in either the lock-P1 or lock-P2 background, we did not detect compensatory rescue targeting P4A, and could set an upper bound on its frequency of 4%.
Finally, we tested the helix frequency of P4B, which was predicted to be absent (0%) in apoA by the MWC model and present (100%) in the non-MWC model (Figure 2A). We saw no evidence for compensatory rescue of P4B in either apoA-stabilizing background. Measurements in the lock-P1 background and in the lock-P2 background gave P4B helix frequencies of 4%, in both cases ( Figure 4C; and compare blue and yellow curves in Figure 4E). Separate experiments on a variant MutP2 (A29C, A30G, U40C, U41G) used to stabilize apoA in prior studies 25 similarly gave no evidence for P4B (SI Figure S9, Table 3). Taken together, these measurements provided strong evidence against the non-MWC model and favor the MWC model.
As an independent test of the helix content in apoA vs. apoB, we carried out a 'flipped' lockmutate-map-rescue experiment compared to the experiments above. Rather than locking P1 and P2 and assessing the presence of P4B, we locked P4B and assessed the presence of P1 and P2. In the MWC model ( Figure 2A background showed no rescue for P1 and P2 (median helix frequencies estimated at 5% and 6%, respectively), again favoring the MWC model over the non-MWC model (SI Figure S10).

Four-dimensional chemical mapping for complex RNA ensembles
Complex secondary structure ensembles underlie many, and perhaps most, RNA regulatory elements. Understanding how a panoply of structures underlies the allosteric logic of these elements requires assessing not just the frequencies of different helices in the RNA structural ensemble but how the presence of one helix enhances or suppresses the frequency of others. Fourdimensional chemical mapping offers an experimental route to this information that is unique in its high throughput and use of standard equipment (capilly electrophoretic sequencers). The strategy ( Figure 1) is to 'lock' one helix into place through mutations (dimension 1) and to then introduce mutations elsewhere (dimension 2). Perturbation of the chemical mapping profile at other nucleotides (dimension 3) provides evidence of involvement by the mutated nucleotide in RNA structure, perhaps a second helix. If compensatory mutation of a candidate partner in the second helix (dimension 4) then gives no, partial, or complete rescue of the mapping profile, the frequency of the second helix can be inferred to be low, medium, or high, with uncertainties estimated from simulation. The current throughput of PCR assembly and capillary electrophoretic sequencing enables single-nucleotide-resolution mapping of many hundreds of variants per experiment at the signal-to-noise needed for this four-dimensional lock-mutate-map-rescue (LM 2 R) workflow. With this throughput, dissection of the co-occurrence or exclusivity of variable helices underlying a 100 to 200 nucleotide RNA domain becomes feasible.

Resolving the core mechanism of a paradigmatic riboswitch
This work demonstrates application of four-dimensional chemical mapping to resolve a fundamental biochemical question. Applying the methodology to the add riboswitch from V. vulnificus gives the model shown in Figure 5, consistent with a Monod-Wyman-Changeux-like 'standard model' of allostery rather than a non-MWC revision proposed after detailed NMR experiments. The two models differed in one core aspect: how ligand sensing in an aptamer region is allosterically communicated to expose a ribosome binding site and turn on add mRNA gene expression ( Figure 2). Early M 2 experiments showed that, even in the absence of adenine, mutations that gave protections in the 5´ region of the add riboswitch also gave exposures in the 3´ region including the ribosome binding site ( Figure 3A). Furthermore, different mutants that locked the correct adenine secondary structure without adenine exhibited SHAPE data that were indistinguishable from the adenine-bound riboswitch in the gene expression platform and, more broadly, at all nucleotides downstream of the aptamer region ( Figure 4A-B). The most incisive analysis came from LM 2 R. In experiments locking P1 and P2, which are specific to the aptameric secondary structure, the rescue data give evidence against the formation of P4 helices that would sequester the ribosome binding site. In other words, the add gene expression platform is ON when the sequence-separate aptamer region is folded even in the absence of the adenine ligand ( Figure 2; Figure 5, apoA). Conversely, LM 2 R experiments locking P1B, P4A, or P4B turn the riboswitch OFF and give evidence against the formation of P1 and P2 helices.
These data favor anti-correlation between aptamer secondary structure formation and structures sequestering the ribosome binding site even without adenine binding, as predicted in the MWC conformational selection model, and disfavor a non-MWC revision that omits this coupling (compare Figure 2A and Figure 2B). Stated differently, allosteric communication of the aptameric region and the gene expression platform is an intrinsic property of the folding landscape of the add riboswitch sequence that can be established without adenine. A simple conceptual understanding of this property involves considering a "switch region" 15 (nucleotides 75-81, Figure 5) that is able to form base pairs with nucleotides at the 5´ end of the RNA (anti-switch region) but is also able to form pairs (in P4A) with four nucleotides in the Shine-Dalgarno ribosome binding site at the 3´ end of the untranslated region (SD, Figure 5). Depending on whether the former or latter pairing is 'chosen' by the switch region, the RNA either closes the P1 helix of the aptamer secondary structure (P1) or closes the P4 domain that sequesters the ribosome binding site. As noted above, despite its similarity to MWC allosteric mechanisms for proteins, this model for the add riboswitch shows much more dramatic changes in structure than is typically seen in protein allostery, where conformational shifts typically preserve secondary structure 8,26 .

Comparison to prior data and proposals
The model in Figure 5 is consistent with all data collected in this study as well as all measurements on the V. vulnificus add riboswitch published to date. First, a number of prior studies applied techniques that do not directly read out base pairing but are sensitive to ligand binding and can quantify energetics and kinetics. Functional measurements in vitro and in vivo, use of fluorescent reporters, and single-molecule force experiments all have given apoA frequencies of ~40% at 20-30°C. [13][14][15][16][17][18][19][20][21][22][23]33 Second, X-ray crystallographic studies have been limited to constructs with just the aptameric region 24,[35][36][37][38][39][40][41][42] ; these structures all show P1, P2, and P3 helices, in accord with Figure 5. Third, the detailed NMR measurements that suggested the alternative apoA ( Figure 2B) were also consistent with the apoA base pairings in the MWC model ( Figure 2A): nuclear Overhauser effect (NOE) spectroscopy unambiguously established that pairings in both P1 and P4A occur in the absence of adenine, but could not infer whether these helices might co-occur or be mutually exclusive (Figure 2A-B). Fourth, analogous to our locking approach, the NMR study used a variant MutP2 (A29C, A30G, U40C, U41G) to stabilize the aptamer secondary structure. While our chemical mapping measurements suggest that this mutant only partially stabilizes P2 (SI Figure S5), both the NMR study and our LM 2 R measurements detect P1 and no P4A in this background (SI Figure S9), supporting the MWC model in Figure 5.
Fifth, our SHAPE and multi-probe chemical mapping data suggest partial opening of P5 in apoA and holo ( Figure 2G-H), and our LM 2 R analysis suggests only partial formation of P2 in apoA ( Figure 3E) and partial formation of P2B in apoB ( Figure 3F). These observations are again consistent with NMR studies which could unambiguously detect P5 and precisely measure relative frequencies of, e.g., P2 vs. P2B, but could not establish these helices' absolute frequencies. Sixth, newer single molecule FRET measurements detect transient formation of the L2/L3 tertiary contact prior to adenine binding in apoA and stabilization upon adenine binding to holo 27 . These data are consistent with, and indeed were interpreted within, an MWC framework in which formation of the aptameric secondary structure occurs in concert with opening of the gene expression platform, even without adenine to shift the equilibrium in favor of the 'ON' state.
Seventh, the prior detailed NMR analysis led to a compelling non-MWC model of riboswitch temperature robustness that involves compensating the improving affinity of the aptamer for adenine at colder temperatures by a pre-equilibrium that favored apoB over apoA at those temperatures. Although the precise predictions of the switching efficiencies of this model are affected by the interpretation of whether apoA is ON or OFF, approximately the same temperature compensation occurs in the MWC conformational selection model of Figure 5 (SI Figure S11).
There is now concordance across numerous measurements for an MWC model that posits strong secondary structure similarities between apoA and holo. We also propose herein that further heterogeneity is likely to be present in apoB, the 'OFF' state, as LM 2 R measurements with different helix stabilizers gave different results for this state, suggesting incomplete correlation of the tested 'signature' helices (P1B and P4A). Numerous alternative secondary structures are possible for this state (see, e.g., SI Figure S4). The presence of these myriad helices, each at low population, would explain their detection difficulty with this and prior bulk equilibrium techniques but their appearance in single-molecule measurements. 19 Supporting this picture, higher-order M 2 analysis in single mutant backgrounds (SI Figure S12) isolate and confirm numerous alternative apoB structures. We note that there are no functional reasons for apoB to maintain a single secondary structure -it simply has to disallow adenine binding while keeping the gene expression platform closed in P4. Without selection for a pure single structure, we therefore suggest that apoB has remained structurally heterogeneous, and the exact populations of its helix pairings outside P4 may shift in different solution conditions and flanking sequences while still being consistent with riboswitch function. We further speculate that 'non-functional' states of other riboswitches and RNA gene regulatory modules may have highly heterogeneous structures and, indeed, this feature might explain why those states have been refractory to conventional structural biology approaches developed primarily to dissect protein structure/function. This picture also implies that antibiotics targeting any specific apo structure of the add and other riboswitches are unlikely to succeed, as single mutations that disrupt the targeted structure while still maintaining the apo ensemble should be easy to find and would offer resistance.

General applicability of 4D RNA chemical mapping
The next challenge for understanding riboswitches and other cis-regulatory RNA elements is to test how the structural ensembles defined through in vitro studies are retained or altered by cotranscriptional effects, protein binding, helicases, crowding, noise due to low numbers of molecules, and other complexities of these molecules' native biological environments. 32,43 Amongst available biochemical and biophysical techniques, chemical mapping methods read out by sequencing have unusual promise in delivering single-nucleotide-resolution structural information in such environments. As techniques improve to edit genomes across organisms and to amplify and measure chemical mapping signals in cellulo, the lock-mutate-map-rescue approach developed here offers the possibility of dissecting complex structural ensembles for these shape-shifting molecules.

ACKNOWLEDGMENTS
We thank M. Ali, S. Doniach, and C. VanLang for early discussions and experiments on the add riboswitch. We thank Drs. H. Schwalbe and B. Fürtig for personal communication regarding temperature compensation and NMR analysis and sharing a manuscript before publication. We acknowledge financial support from NIH R01 GM102519 (to R.D.).

RNA Synthesis and Construct Design
Double-stranded DNA templates were prepared by PCR assembly of DNA oligomers with maximum length of 60 nt ordered from IDT (Integrated DNA Technologies). DNA templates contain a 20-nt T7 RNA polymerase promoter sequence (TTCTAATACGACTCACTATA) on the 5´ end and a 20-nt Tail2 sequence (AAAGAAACAACAACAACAAC) on the 3´ end. The sequence of interest is flanked by one hairpin with single-stranded buffering region on each end. 44 The primer assembly scheme and plate orders for all constructs were automatically designed by Primerize-2D. 45 PCR reactions, including 100 pmol of terminal primers and 1 pmol of internal primers were carried out as previously described. PCR products were purified using Ampure XP magnetic beads (Agencourt) on a 96-well microplate format following manufacturer's instructions. DNA concentrations were measured on a Nanodrop 1000 spectrophotometer (Thermo Scientific). In vitro transcription reactions were described previously, followed by similar purification (using Ampure XP beads with externally added 10% PEG-8000) and quantification steps.

Chemical Modification
M 2 , M 2 R and LM 2 R chemical mapping were carried out in 96-well format as described previously 29,31 . Prior to chemical modification, 1.2 pmol of RNA was heated to 90 °C for 2 min and cooled on ice for 2 min to remove secondary structure heterogeneity, then folded for 20 min at 37 °C in

CE Data Processing
The Hit RACE 2.0 software was used to analyze CE (capillary electrophoresis) data. [46][47][48] Electrophoretic traces were aligned and baseline subtracted using linear and non-linear alignment routines as previously described. 44 Sequence assignment was accomplished semi-automatically with human supervision. Band intensities were obtained by fitting profiles to Gaussian peaks and integrating. Normalization, correction for signal attenuation, and background subtraction were enabled by inclusion of referencing hairpin loop residues (GAGUA) at both 5´ and 3´ ends, 10x dilution replicates, and no-modification controls. Briefly, true values for saturated peaks were obtained from 10x dilutions. Signal attenuation was corrected from 5´ to 3´ ends based on the relative reactivity between 5´ and 3´ referencing hairpin loop intensities. 44 Reactivities of SHAPE profiles were normalized against GAGUA, while other modifiers in multi-probe mapping were normalized to subsets of GAGUA which are reactive to that particular modifier.

Data Deposition
All chemical mapping datasets, including M 2 , M 2 R, LM 2 R and multi-probe mapping, have been  Table 1. Helix frequency estimate results for single base-pair M 2 R Median helix frequencies from Rfam simulations are reported along with 90% confidence range from the same bin. The rescue factor for each helix is averaged across all tested base pairs. Median helix frequencies from Rfam simulations are reported along with 90% confidence range from the same bin. The rescue factor for each helix is averaged across all tested base pairs. Median helix frequencies from Rfam simulations are reported along with 90% confidence range from the same bin. The rescue factor for each helix is averaged across all tested base pairs.  (C) Correlation of observed rescue factor against helix frequency for helices predicted with length longer than 2-bp from 325 Rfam families mutated and folded in silico. Each data point represents a helix whose rescue factor and simulated helix frequency has been averaged across all its basepairs. Helices are colored by their length. The rescue factor estimates the extent of similarity restored in a double mutant's reactivity compared to WT (wild type), scaled by perturbations observed in single mutants.

LEGENDS
(B) In silico M 2 R quartets illustrating full compensatory rescue (high rescue factor), partial rescue (mid rescue factor) and no rescue (low rescue factor).

Mutate-Map-Rescue (M 2 R) -Application Generality and Automated Classification
Before interrogating add riboswitch ensembles, we sought a method that would enable rapid and unbiased readout of helix frequencies. Our prior work on mutate-map-rescue (M 2 R) tested RNA structures by disrupting a base pair by single mutation, and testing whether compensatory double mutations restore such perturbations, but relied on manual inspection to assess rescue, and focused on a single model system, a four-way junction domain of E. coli 16S rRNA. 1 We first sought to assess the generality of Uncertain; the last one was assigned for cases that either 1) single mutants failed to introduce discernible perturbations, or 2) the rescuing effect was 'half-way' and hard to assign. After training with the in vitro M 2 R data on the GIR1, P4-P6, and IRES measurements as well as in silico simulated counterparts, we obtained a classifier that recovered human expert calls (SI Figure S2).
Next, we simulated M 2 R in silico on 325 RNA families from the Rfam database 6 whose lengths were in range between 100 and 250 nt. More than 37,000 base pairs were tested from helices that were longer than 2 bp and with predicted base pairing probability (BPP) greater than 1%. SI Figure S13 shows the BPP distribution of the 4-category classification. These results confirmed that the Validated and Falsified categories corresponded to base pairs with high and low BPP. Unexpectedly, we also observed a quantitative correlation between the in silico predicted BPP and the 'rescue factor' metric that underlies our classification, even for base pairs that appear with BPP frequencies between 0.3 and 0.7 ( Figure 2A).
Furthermore, for helices of longer length, this relationship became more well-defined. We reasoned that this correlation would enable estimation of BPP based on helix length and the 'rescue factor' metrics measured and averaged over all the base pairs for a given helix (SI Figure S13), and that this quantitative metric would be more informative than a qualitative 4-category classification. Further support for using the 'rescue factor' to estimate a posterior probability for tested helix frequencies arose from our studies on the adenine riboswitch with and without adenine, which could be compared to 'gold standard' helix frequency estimates from NMR analysis, as described in the main text.

Current Limitations of M 2 R and Classifier
We trained an automated classifier based on previous M 2 R in vitro data combined with in silico simulations. The classifier was later independently tested on add single-base-pair M 2 R, double-base-pair M 2 R, and LM 2 R datasets, showing agreement with calling by an expert human referee (S.T.) ( Figure 3A).
Moreover, the BPP distribution of the simulated in silico counterparts for add riboswitch follows the same trend as Rfam tests (SI Figure S14), in agreement with its generality. The performance of such a simple classifier was acceptable and supported our heuristic for the rescue factor (SI Figure S13). Our preliminary attempt on 4-category classification gave conservative conclusions for M 2 R quartets, with 'Partially Validated' as weaker cases for incomplete compensatory rescue. For final analyses, we chose to present final rescue factors and corresponding helix frequencies (see, e.g., main text Figures 3 and 4); the classifier based on the rescue factor gives a more qualitative picture of the results, but allows simple visual checks that for each helix, different base pairs tested for compensatory rescue give concordant 'calls' for the frequency of the helix ( Figure S14).

Adenine riboswitch construct design
We performed the M 2 R pipeline on a 128-nt add riboswitch construct, which has 15 extra nucleotides on the 3´ end into the coding sequence compared to a 112-nt one used in a recent study. 7 We chose this longer add version after reproducing prior work in our laboratory [8][9] finding that the sequence context has an effect on the folding landscape of this riboswitch (SI Figure S5A). Although the 112-nt construct showed weak ligand-responsive chemical reactivity changes in the SD region under the salt conditions used in NMR 9 , it does not show noticeable switching under our in vitro solution conditions. The extended construct gives chemical reactivity changes under all conditions tested. Besides M 2 (SI Figure   S5B-C), we performed single-base-pair M2R on the add riboswitch ( Figure 3, SI Figure S6).

Multi-probe chemical mapping tests partitioning predicted by the MWC model
As discussed in the main text introduction, the MWC model requires that, even in the absence of adenine ligand, the add riboswitch samples two states: apoA and apoB, that have secondary structures compatible and incompatible with aptamer adenine binding, respectively, and that these states are more likely or less likely, respectively, to have free ribosome binding sites. We tested this model with the aid of the lock-P1 and lock-P4A mutants, which isolated the apoA and apoB states, respectively, and chemical mapping to monitor the riboswitch structural ensemble at nucleotide resolution. To stringently test the prediction, we expanded our chemical mapping protocol to include five additional modifiers beyond SHAPE: DMS (dimethyl sulfate) 10 , glyoxal 11 , RNase V1 12 , terbium(III) [13][14] , and ultraviolet irradiation at 302 nm 15 , giving a total number of 1920 measurements measured in three sequences that would test a two-state partitioning. As expected from the model of apoA vs. apoB, these multiple modifier profiles varied significantly between the lock-P1 and lock-P4A mutants, especially in the expression platform regions (see, e.g., nts 98-112). Nevertheless, as predicted from the MWC model, a simple linear combination of the profiles, assuming 48% of the apoA state and 52% of the apoB state recovered the data measured for the adenine-free wild type riboswitch within experimental errors, except at nucleotides 79-81 (SI Figure   7A-B). We note that there was only one parameter optimized in this fit; scaling of each data profile was carried out based on well-defined flanking hairpins as normalization standards 16 . These values for the apoA vs. apoB state frequencies agree within errors with helix frequencies measured above in compensatory rescue experiments as well as with the prior NMR experiment at similar temperature (40% apoA at 30 °C) 17 . However, the discrepancies at nucleotides 79-81 were reproducible, suggesting that there are additional secondary structures, potentially many at low frequencies, beyond those detected here in the ligand-free ensembles.

Supporting Methods
In silico Partition, Autoscore Classifier and Rfam Sampling In silico RNA SHAPE profiles were simulated using the partition executable from the RNAstructure package version 5.6 [18][19] . The resulting pair-wise probability matrices was projected into a onedimensional vector to get per-residue base pairing probabilities (BPP). For M 2 R quartet simulations, the targeted base pair was mutated to G-C or C-G pairs (i.e., the 'Stable' library from Primerize-2D 20 ).
Flanking sequences (5´ or 3´) were not included for simulation.
The autoscore classifier utilizes a 'ratio' metric for classification. Normalized SHAPE profiles are used as input. First, the difference between two SHAPE profiles are calculated by: If the value of (1) is less than a CUTOFF, the Uncertain class is assigned. Otherwise, continue to determine the amount of rescuing effect by the double mutant (AB): 6 [2] defines the 'ratio' metric, i.e., the ratio between the distance of AB to WT and the maximum distance of A or B to WT. If the value of [2] is less than LOW, the Validated class is assigned due to the relatively small residual of difference of AB to WT compared to single mutants. If the value of (2) is greater than HIGH, the Falsified class is assigned. For values of (2) between LOW and HIGH, the Partial class is assigned, which captures cases that show incomplete rescue or a mixed result. The

A A G A C U C A UG A A U U A C U U U G A C C U G C C G
20 40 60 80 100 120 140

C G C U U C A U A U A A U C C U A A U G A U A U G G U U U G G G A G U U U C U A C C
60 80 100 120 140