Cotranscriptional RNA strand exchange underlies the gene regulation mechanism in a purine-sensing transcriptional riboswitch

Abstract RNA folds cotranscriptionally to traverse out-of-equilibrium intermediate structures that are important for RNA function in the context of gene regulation. To investigate this process, here we study the structure and function of the Bacillus subtilis yxjA purine riboswitch, a transcriptional riboswitch that downregulates a nucleoside transporter in response to binding guanine. Although the aptamer and expression platform domain sequences of the yxjA riboswitch do not completely overlap, we hypothesized that a strand exchange process triggers its structural switching in response to ligand binding. In vivo fluorescence assays, structural chemical probing data and experimentally informed secondary structure modeling suggest the presence of a nascent intermediate central helix. The formation of this central helix in the absence of ligand appears to compete with both the aptamer’s P1 helix and the expression platform’s transcriptional terminator. All-atom molecular dynamics simulations support the hypothesis that ligand binding stabilizes the aptamer P1 helix against central helix strand invasion, thus allowing the terminator to form. These results present a potential model mechanism to explain how ligand binding can induce downstream conformational changes by influencing local strand displacement processes of intermediate folds that could be at play in multiple riboswitch classes.


INTRODUCTION
RNA folds during transcription (1)(2)(3). During this cotranscriptional folding process, RNA molecules traverse sequential structural transitions that lead to the formation of helices, loops, junctions, pseudoknots and other elements that are important for RNA function in many biological processes (4,5). Cotranscriptional RNA folding has been shown to be important for establishing the functional folds of catalytic RNAs (6,7), assembling the ribosome and spliceosome (8)(9)(10), regulating RNA processing (11) and facilitating gene regulation (12). While we are beginning to understand the importance of cotranscriptional RNA folding, we still lack a complete mechanistic understanding of how specific RNA sequences can encode folding pathways, and how these folding pathways facilitate diverse cellular functions (13).
Riboswitches have emerged as important model systems for uncovering principles of cotranscriptional folding (14,15). These compact non-coding RNA elements regulate gene expression in cis in response to the binding of a specific chemical ligand, thus acting as RNA biosensors in a wide distribution of bacteria species as well as fungi and plants (16)(17)(18)(19). Riboswitches achieve this function through their architecture which consists of a ligand binding aptamer domain (AD) that lies upstream of an expression platform (EP) whose folding state determines the functional gene expression outcome (20). Riboswitches are typically classified by the cognate ligand they respond to, the type of regulatory control the EP exerts on either the translational, transcriptional or RNA splicing level, and whether they upregulate or downregulate gene expression in response to ligand binding (17,21). Transcriptional riboswitches contain an EP that includes an intrinsic terminator hairpin that, once folded, prevents transcription elongation (20). These riboswitches are particularly important model systems for understanding principles of cotranscriptional folding because AD folding, ligand binding and EP folding need to occur within the short time window of active transcription (22).
While studies have extensively characterized the structures of riboswitch ADs (23)(24)(25)(26)(27), current research is only just beginning to understand how structural features induced by ligand binding in the AD can alter the EP structure and determine gene expression outcomes during transcription (28)(29)(30)(31)(32)(33)(34). Extensive progress has come from the study of transcriptional 'ON' riboswitches that contain an overlap in sequence between the AD and EP that gives rise to mutually exclusive folds of these two domains (Supplementary Figure S1). In these cases, studies have shown that ligand binding stabilizes the AD fold, which prevents the EP from forming a terminator hairpin (28)(29)(30)35). On the other hand, when a ligand is not bound, EP folding disrupts the AD structure through RNA strand displacement. During this process, the mutually exclusive helices interconvert--a set of nucleating base pairs seed the formation of a new helix while unwinding the existing helix (36,37). Thus, when there is an overlap in sequence between the ligand binding portion of the AD and EP, there appears to be a clear mechanism by which ligand binding can affect downstream folding of the EP through RNA strand displacement (38,39).
However, out of all identified riboswitch classes to date, many of them do not encode a direct sequence overlap between the AD and EP (21,40,41) (Figure 1). This observation raises an important question for understanding the mechanisms of these riboswitches: how can ligand binding in the AD 'communicate' downstream structural changes and determine folding outcomes of the EP (38,39)? Given the role of strand displacement mechanisms among diverse riboswitches (30,41,42), as well as within the cotranscriptional folding pathways of other important noncoding RNAs such as the signal recognition particle (SRP) RNA (43), we hypothesize that riboswitches without direct AD and EP overlap still rely on a ligand-dependent strand exchange process to differentiate their folded states and enact regulatory function, however with an unknown mechanism to inhibit strand exchange in the presence of ligand.
To investigate this hypothesis, we focused our studies on the Bacillus subtilis yxjA guanine riboswitch, a transcriptional 'OFF' riboswitch that downregulates the expression of a nucleoside transporter protein in response to excess guanine as a part of nucleotide biosynthesis pathway regulation (44,45). As part of the purine-sensing riboswitch family, the yxjA riboswitch shares 75.4% sequence conservation and a similar predicted AD structure with the adenineresponsive B. subtilis pbuE riboswitch (Figure 1 and Supplementary Figure S1). These AD aptamers consist of a P1 helix at the base of a three-way junction (3WJ), with two protruding P2 and P3 helices that are oriented through a kissing hairpin pseudoknot interaction (46-48) ( Figure  1A). Within the ligand binding pocket provided by the 3WJ, the J2/3 junction sequence forms a triple helix with the top of P1 while a single nucleotide in the J3/1 junction sequence determines the ligand binding specificity. Despite the similarities between the yxjA and pbuE ADs however, important differences remain, including that the pbuE riboswitch is an 'ON' regulator with inverse expression logic to the yxjA riboswitch. The pbuE riboswitch EP also shares direct sequence overlap with a large portion of the AD, extending through the P3 helix, which explains how ligand binding can dictate which mutually exclusive structure to adopt (Supplementary Figure S1). In contrast, the yxjA AD core 3WJ structure and the predicted EP fold appear to be compatible with each other at a secondary structural level ( Figure  1C and D). Thus, yxjA is a model riboswitch for investigating how ligand binding may inhibit cotranscriptional RNA strand displacement processes that help form downstream RNA structures.
In this work, we present a series of functional gene expression studies of rationally designed yxjA riboswitch mutants, motivated by preliminary RNA structure predictions, to propose a mechanism for the yxjA riboswitch to serve as a model for several identified riboswitch classes. We conducted SHAPE-Seq structure probing experiments of two intermediate and the full length riboswitch sequences within stalled transcription elongation complexes and under equilibrium refolded conditions to generate reactivity data, which we then used as restraints for RNA structure prediction algorithms. The resulting experimentally informed structure models revealed an intermediate central helix structure that is mutually exclusive with both the bottom of the P1 stem and the terminator hairpin of the EP. Mutants that disrupt base pairing of this central helix break regulatory function, while rescue mutants restore function, demonstrating its importance and suggesting that ligand binding in the AD structurally regulates the formation of the central helix. To investigate this further, we performed restrained all-atom molecular dynamics (MD) simulations that demonstrated ligand binding helps to stabilize the P1 helix against strand invasion by the central helix, thereby providing a mechanism to explain how ligand binding in the AD can influence folding of the downstream EP.
Based on our results, we propose that RNA strand exchange plays a critical role in the yxjA riboswitch mechanism and that the mechanism may rely on a competition between two strand exchange processes--central helix formation through unzipping the base of the P1 stem, and terminator formation through unzipping of the central helix--with the winner of the competition determining the riboswitch folded state and regulatory outcome. Furthermore, we suggest that ligand binding can bias this strand exchange process by stabilizing the P1 helix, providing a larger energetic barrier for strand displacement by the central helix and thereby allowing the terminator to form more efficiently to make an 'OFF' regulatory decision. We believe our observations add valuable knowledge for understanding the mechanism of how riboswitches with modular or non-overlapping aptamer domains and expression plat- The ligand is colored pink, the 5 side of the P1 helix purple, and the 3 side of the P1 helix teal (58). (B) Dose-response curve of the yxjA riboswitch regulation in Escherichia coli cells. E. coli cells were transformed with plasmids containing riboswitch expression constructs and characterized for SFGFP fluorescence using flow cytometry after 5 h of growth in M9 minimal media subcultures with or without 2-aminopurine (2AP) added. Dots represent averages from three biological replicates, each performed in triplicate technical replicates for a total of nine data points (n = 9), with error bars representing standard deviation. A blank control with a constitutive promoter and no SFGFP was used to measure any cellular background fluorescence levels in response to ligand in the media. A non-ligand-responsive riboswitch construct (C79A) was used as a negative control to account for non-riboswitch regulated changes in fluorescence in response to ligand in the media. WT represents the wild-type yxjA riboswitch sequence in the expression construct, mutated (C79U) to recognize adenine and 2AP (see Materials and Methods). Riboswitch secondary structure models in (C) represent the predicted anti-terminated conformation consisting of an intact aptamer domain with C79U mutation, a partially disrupted P1 stem, and the central helix formed. Secondary structure models in (D) represent the predicted terminated conformation consisting of the intact aptamer domain with C79U mutation, a fully formed P1 helix and the formed intrinsic terminator expression platform. Secondary structure models are based on minimum free energy predictions from Mulhbacher and Lafontaine, 2007 (44). Nucleotide regions are colored to represent base pairs that are mutually exclusive between the anti-terminated and terminated structure models (teal), nucleotides that are predicted to complete the P1 helix on the 5 side (purple), and nucleotides predicted to complete the terminator on the 3 side (green). In (D), the ligand is shown in proximity to the 3WJ ligand-binding pocket, where the residue U79 that makes ligand contacts is highlighted in orange. forms can make genetic decisions. These results also provide a possible overarching mechanism for a broad number of riboswitch classes and reveals insights into how competing RNA strand displacement processes may influence cotranscriptional RNA folding pathways relevant to a range of important biological phenomena, from RNA processing to macromolecular assembly. We also anticipate this work to directly inform the design of synthetic riboswitch systems for a range of important synthetic biology applications (38,(49)(50)(51) as well as the design and mechanistic understanding of RNA-targeting drugs (52).

Cloning and plasmid construction
To prepare each riboswitch construct (Supplementary Data File S1), the Bacillus subtilus yxjA (nupG) 5 -untranslated region (UTR) sequence encoding the riboswitch as wildtype (WT) and mutants was cloned into p15a plasmid backbones with chloramphenicol resistance. The wild-type sequence was mutated (C79T) to recognize adenine and 2aminopurine (2AP) (44) and is referred to as WT in this work. Inverse polymerase chain reaction (iPCR) was used to introduce the riboswitch sequence downstream of the Escherichia coli sigma 70 consensus promoter (annotated as J23119 in the registry of standardized biological parts, Supplementary Data File S1) and upstream of a ribosome binding site and the coding sequence of the superfolder green fluorescent protein (SFGFP) reporter gene. Various lengths of the riboswitch leader sequence from the B. subtilis genome were similarly inserted between the promoter and riboswitch sequence when optimizing constructs for testing functional response to ligand binding. For creating different mutants using iPCR, primers were ordered from Integrated DNA Technologies (IDT) and phosphorylated using a New England Biosciences (NEB) phosphorylation kit (PNK enzyme, 10X T4 DNA Ligase Buffer, primers). The 100 l PCR reactions were mixed with 1-10 ng/l of DNA template, 200 M dNTPs, 1X Phusion Buffer, 400 nM each of phosphorylated forward and reverse primers and 0.25 l of Phusion Polymerase (2000 U/ml) (NEB). PCR products were run on 1% agarose gels to confirm successful amplification. DpnI (NEB) was used for digestion and T4 DNA Ligase (NEB) for ligation. Final amplified products were transformed into NEBTurbo competent cells and plated on chloramphenicol agar plates and incubated overnight at 37 • C. Single isolated colonies were selected the next day to inoculate cultures in LB media, mini-prepped, and then sequence confirmed using Sanger Sequencing (Quintara).

In vivo flow cytometry fluorescence reporter assay
Fluorescence assays (Supplementary Figure S2) on the flow cytometer were performed in E. coli strain TG1 cells with three independent biological replicates. Plasmids tested for each replicate (Supplementary Data File S1), including yxjA WT (C79T mutation, pJBL730), yxjA ON mutant control (C79A mutation, pJBL735), yxjA experimental mutants, empty control plasmid (pJBL001) and constitutively expressed SFGFP control plasmid (pJBL736), were transformed into competent E. coli TG1 cells, plated on LB agar plates containing 34 g/ml chloramphenicol, and incubated overnight at 37 • C, for approximately 18 hours. After overnight incubation, the plates containing transformed E. coli colonies were removed from the incubator and kept at room temperature at benchtop for approximately 5 h. The three replicates of riboswitch constructs and controls were picked from different plate colonies to inoculate separate aliquots of 300 l of LB media containing 34 g/ml chloramphenicol in a 2 ml 96-well block. The block was covered with a breathable seal and incubated in a benchtop shaker at 37 • C overnight while shaking at 1000 rpm for 18 h. The next day, cultures for each biological replicate were used to inoculate subcultures in triplicate for each of three technical replicates. Each technical replicate was created by transferring 4 l of an overnight shaking culture to each of two 96-well blocks containing 196 l of M9 media (1X M9 salts, 1 mM thiamine hydrochloride, 0.4% glycerol, 0.2% casamino acids, 2 mM MgSO 4 , 0.1 mM CaCl 2 , 34 g/ml chloramphenicol) with either 1 mM 2-aminopurine (2AP) dissolved in 1% acetic acid (with ligand condition) or an equal volume of 1% acetic acid (no ligand condition). The block was covered with a Breath-Easier sealing membrane (Sigma-Aldrich) and incubated in a benchtop shaker at 37 • C while shaking at 1000 rpm for 5 h unless otherwise indicated. Samples for fluorescence analysis were prepared after incubation by transferring 4 l of exponential phase culture to 196 l of phosphate-buffered saline (1X PBS) and kanamycin (50 g/ml) in 96-well v-bottom plates (Costar).
Fluorescence measurements (Supplementary Data File S2) were collected on a BD Accuri C6 Plus with a CSampler Plus using the BD CSampler Plus software. E. coli events were collected using the FSC-A threshold set to 2000. GFP fluorescence was measured by collecting FITC signal on the FL1-A channel.
Analysis was carried out by using the FlowJo software (v10.6.1) (Supplementary Figure S6). An ellipsoid gate was drawn around the E. coli cell population on the FSC-A versus SSC-A profile. Corresponding arbitrary fluorescence units collected by the flow cytometer were plotted against known standard molecules of equivalent fluorescein (MEFL) values from Spherotech rainbow 8-peak calibration beads (Lot: 8137522). The calibration curve was created by calculating the linear regression between measured relative fluorescence units (RFU) of each bead peak with the manufacturer provided MEFL value for each peak, with the y-intercept set to zero. The mean fluorescence of peaks in the FITC channel corresponding to the gated E. coli cell population was converted to a MEFL value by multiplying by the slope of the calibration curve. File S2). Arbitrary fluorescence units for each well were divided by OD 600 measurements in that well to normalize for the amount of cells in each sample, prior to performing averages and standard deviations across replicates. For the time course optimization experiment, measured samples were successively collected from the same subculture sampled.

DNA template preparation
DNA template libraries for cotranscriptional and equilibrium SHAPE-Seq experiments (Supplementary Table S1) were prepared by PCR amplifying plasmids containing the yxjA riboswitch sequence. About 500 l PCR reactions were mixed to a final concentration of 1X ThermoPol Buffer, 250 nM each of forward and reverse primers (Supplementary Table S2), 1.25 mM dNTPs, 5 l of Taq Polymerase (5000 U/ml) (NEB) and 100-300 ng of DNA template. Templates were amplified for 30 cycles and then purified on a 1% agarose gel. Bands were visualized using UV and then extracted using a Qiagen Gel Extraction kit. Final concentrations were measured on a Qubit 3 Fluorometer.

In vitro transcription assay with SHAPE modification
About 50 l total reaction mixtures contained 5 nM linear DNA template for each of the template lengths tested (Supplementary Table S1 For RNA extraction and purification, after adding 20 l of chloroform, 70 l of the aqueous phase was extracted and then ethanol precipitated with the addition of 50 l of isopropanol (following TRIzol extraction protocol) and rested for 10 min at room temperature. Samples were spun down for 10 min at 4 • C into a pellet, then aspirated and washed with 500 l of 70% ethanol. Dried pellets were resuspended in 43 l of water and incubated with 5 l of 10X TURBO DNase buffer and 2 l of TURBO DNase (Thermo Fisher) for 1 h at 37 • C. After digestion, 150 l of TRIzol and 40 l of chloroform were added for another round of extraction and purification. About 140 l of the aqueous phase was extracted then ethanol precipitated with the addition of 100 l of isopropanol and following the same steps as above.

Cotranscriptional SHAPE-Seq library preparation
Extracted RNA from IVT and SHAPE modification was used to prepare SHAPE-Seq libraries with the methods described in Watters et al. and Yu et al. (28,43). Additional oligonucleotides (IDT) used for library preparation can be found in Supplementary Table S2. Several modifications to previously published protocols used in this work are as follows: • Linker ligation: 0.2 pmol of RNA linker was added to each reaction. • Adapter ligation: The DNA adapter sequence and ligation protocol was adapted from the Structure-seq2 protocol described in Ritchey et al. (53). Reverse transcribed cDNA product was resuspended in 7 l of water and mixed with 0.5 l of 100 M of dumbbell adapter (final concentration 50 pmol). Reactions were incubated at 95 • C for 2 min followed by incubation at room temperature for 3 min. The ligation reaction was then set up by combining the cDNA and adapter mixture with 1X T4 DNA Ligase Buffer (NEB), 500 mM betaine, 20% PEG-8000 and 2.5 l T4 DNA Ligase in a 25 l volume solution. The ligation reaction was incubated on a thermocycler at 16 • C for 6 h, 30 • C for 6 h, 65 • C for 15 min, then 4 • C until ready for ethanol precipitation. Subsequent steps were performed according to previously published cotranscriptional SHAPE-Seq protocols.
For sequencing library preparation, the average DNA length and quality of each library sample was determined by running samples on an 8% non-denaturing PAGE gel stained with SYBR Gold (Thermo Fisher Scientific). Each library sample concentration was measured on a Qubit 3.0 and mixed sample pools were balanced with each library at equimolar amounts. The pooled libraries also included an 8% PhiX spike-in to increase sequence diversity. Sequencing data (Supplementary Table S3) were then obtained by sequencing the pooled library samples on an Illumina MiSeq using the v3 MiSeq Reagent Kit and 2 × 75 bp paired-end reads.
Reactivity analyses (Supplementary Table S4) using FASTQ files were carried out using Spats v2.0.5 and run with Python 2.7 through Anaconda 2.4. Additional options used during Spats analysis are provided in Supplementary Note S1.
SHAPE-Seq-informed secondary structure predictions were generated using Reconstructing RNA Dynamics from Data (R2D2) with the protocol and code detailed in the methods of Yu et al. (43). Additional options used during R2D2 analysis are provided in Supplementary Note S1. Predicted structures were visualized as bowplots using RNAbow software (55) and secondary structures using VARNA version 3.9 (http://varna.lri.fr/) (56).

Template-based homology and molecular dynamics modeling of the yxjA riboswitch
To model the aptamer states, recently solved X-Ray Free Electron Laser (XFEL) structures of the pbuE adenine riboswitch aptamer (PDB IDs: 5E54 and 5SWE) were used as templates to model the apo1, apo2 and holo states of the yxjA aptamer. The sequence of the wildtype yxjA aptamer (C79) was obtained from the B. subtilis genomic sequence. All modeling was performed using the Molecular Operating Environment (MOE) software package (57). In MOE, the yxjA aptamer sequence was aligned with the pbuE aptamer sequence such that the length of the conserved regions was identical (nucleotides 8-50 and 53-74), and such that the two additional nucleotides 51-52 in the yxjA aptamer fell in the variable L3 loop region. The pbuE aptamer sequence was then mutated in place based on secondary structure predictions generated by R2D2. For the apo1 state, chain A of PDB structure 5E54 was used. For the apo2 state, chain B of PDB structure 5E54 was used. For the holo state, chain A of PDB structure 5SWE was used.
To model the ligand binding pocket, a guanine nucleobase was manually inserted into the 3WJ of the modeled yxjA holo aptamer structure in MOE such that all proposed hydrogen bonding and stacking interactions were represented. A previous study showing that the global structure of the guanine binding pocket is nearly identical to that of the adenine binding pocket was used as a guide (58). Bound guanine is held in place with the same network of hydrogenbonding interactions, and C74 is the specificity-determining pyrimidine. Additionally, there are two conserved recognition triples that stack on top of each other to organize the aptamer in response to bound guanine.
To model the expression platform, the sequence and structure for nucleotides 1-7 and 75-124 containing the remainder of the P1 helix, a section of the P4 helix, and the rest of the nucleotides that form the central helix, was obtained from secondary structures of the full length terminated and antiterminated conformations, respectively. This information was provided as input into the open-source tertiary structure generator RNAComposer (59). We narrowed down the 60 conformations predicted by RNAComposer based on two criteria: (i) whether the base pairs involved in central helix formation were equidistant between their P1 and central helix complements, and (ii) whether the conformation showed good agreement with the R2D2based secondary structure predictions of the 160 nt intermediate state. The existing apo2 and holo aptamers was stitched onto the chosen 53 nt expression platform conformation in MOE to construct 124 nt hybrid models. Additionally, we created a separate holo hybrid model with a U80G mutation to mimic the broken 'ON' mutant (U91G in Figure 4C) that was studied experimentally.

All-atom molecular dynamics simulations
All-atom molecular dynamics (MD) simulations were performed using the GROMACS 2019.4 software package (60); RNA was represented by the Amber-99 force field (61) with Chen-Garcia modifications for RNA bases (62) and Steinbrecher-Case modifications for the backbone phosphate (63). Each structure was placed in the center of a cu-bic box such that there was at least 1.5 nm from all RNA atoms to the box edge in each direction. Structures were solvated with sufficient TIP4P-EW water and K + and Clions to mimic 1 M excess salt conditions. The systems were then energy minimized using the steepest descent algorithm for 10 000 steps with a 2 fs timestep and a force tolerance of 800 kJ mol -1 nm -1 . Once energy minimized, all systems were simulated in the N,P,T-ensemble using a leapfrog Verlet integrator with a time step of 2 fs. An isotropic Berendsen barostat was used for pressure coupling (64) at 1 bar with a coupling coefficient of p = 1 ps, and a velocity-rescale Berendsen thermostat was used for temperature coupling (65) with a time constant of t = 0.1 ps. The LINCS algorithm was used to constrain all bonds, and a cutoff of 1.0 nm was employed using the Verlet scheme for Lennard-Jones and Coulomb interactions.
The apo1, apo2 and holo aptamer models were equilibrated at 310 K for a total of 50 ns. Base pairs predicted using R2D2 were enforced by applying distance-dependent piecewise flat-bottomed harmonic bias restraints between the central hydrogen bond donor and acceptor of paired bases at a strength of 5 kcal/mol. After equilibration, all aptamer models were simulated at a wide range of temperatures (310-400 K) for a total of 50 ns without any restraints.
The 124 nt apo2 and holo hybrid models were also equilibrated at 310 K for a total of 50 ns. Base pairs in the AD predicted using R2D2-generated secondary structure predictions were reinforced by applying harmonic bias restraints at a strength of 5 kcal/mol.

Two-dimensional replica exchange molecular dynamics simulations
After equilibration, two-dimensional replica exchange molecular dynamics (2D REMD) simulations were run on the apo2 and holo hybrid models to directly probe the strand exchange process. The first series of 2D REMD simulations were performed using nine replicas at temperatures 373, 374 and 375 K, each performed in two attempts for a total of 18 attempts (Supplementary Table S5). Harmonic bias restraints were applied at strengths of 4.5 kcal/mol to base pairs 75A-107U, 76A-106U and 77A-105U and 5 kcal/mol to base pairs 78G-104C, 79U-103A and 81U-101A (Supplementary Figure S15). As a result of this biasing, successful strand invasion was observed under 50 ns frequently in the apo2 hybrid model but infrequently in the holo hybrid model. As such, these initial simulations were run for 22 ns each, for a total of 198 ns per simulation. To further confirm what was observed in these simulations, a second series of 2D REMD simulations was performed using nine replicas at temperatures 373, 374 and 375 K, each performed in two attempts for a total of 18 attempts (Supplementary Table S6). Harmonic bias restraints were applied at strengths of 4 kcal/mol to base pair 74G-108C, 4.5 kcal/mol to base pairs 75A-107U, 76A-106U and 77A-105U, and 5 kcal/mol to base pairs 78G-104C, 79U-103A and 81U-101A. Again, due to this biasing, central helix formation was observed under 50 ns frequently in the apo2 hybrid model but infrequently in the holo hybrid model. These simulations were run for 47 ns each, for a total of 423 ns per simulation. Finally, two 38 ns 2D REMD simulations were Nucleic Acids Research, 2022, Vol. 50, No. 21 12007 performed using nine replicas each with identical temperatures and harmonic bias restraints to probe the strand exchange process in our broken 'ON' (U80G in hybrid model, U91G in in vivo assays) holo hybrid model (Supplementary  Table S7). Altogether, the cumulative 2D REMD simulation time amounted to ∼1.6 s.

The yxjA riboswitch reporter construct downregulates GFP expression in response to ligand
We began our studies by developing a cellular assay to evaluate the function of the yxjA riboswitch. We assembled several versions of an expression construct consisting of the E. coli sigma 70 consensus promoter, a variable portion of the pre-aptamer leader sequence, an adenine-responsive B. subtilis yxjA riboswitch sequence, and the superfolder green fluorescent protein (SFGFP) coding sequence (Supplementary Figure S2). We mutated these constructs at position 79 to make the riboswitch responsive to 2-aminopurine (2AP) (C79U labeled as WT) based on a prior study that demonstrated this mutation in the yxjA riboswitch produces one of the highest 2AP binding affinities among guanine riboswitches (K D ∼ 0.59 M) (44). Although this value is weaker than its natural guanine binding affinity (K D ∼ 0.5 nM), we found 2AP convenient to use for in vitro and in vivo experiments and had confirmation that 2AP produces identical structural changes in the yxjA riboswitch from inline probing experiments in the same study (44). In addition to the wild-type aptamer sequence, we also designed a ligand-unresponsive yxjA mutant (C79A) to serve as a control. Constructs were transformed into E. coli TG1 chemically competent cells, grown overnight in LB media, then used to inoculate subcultures in M9 minimal media with or without 2AP ligand added.
We first optimized subculture growth time by collecting a time course of SFGFP fluorescence data over 6 h of subculture growth to identify the optimal time point that gave the maximum difference in fluorescence between the with and without ligand conditions. We found this time point to lie at five hours after subculture inoculation (Supplementary Figure S3). After determining this optimal time point, we collected cellular growth data at the same time point to test for potential ligand toxicity. This experiment determined that extracellular concentrations of 1 mM 2AP in the media caused some changes in cellular growth compared to the 0 mM 2AP condition but was still tolerable for our assay conditions (Supplementary Figure S4). An additional point of construct optimization involved examining the 5 pre-aptamer leader sequence that is encoded before the aptamer domain (AD). Initial studies of this construct architecture demonstrated that introducing an additional five nucleotides of the pre-aptamer leader sequence provided the maximal dynamic range of ligand response over other lengths (Supplementary Figure S5). Finally, we used this optimized construct and assay condition to characterize the dose response of the riboswitch construct to a range of ligand concentrations (Supplementary Figure S6 and Figure 1B). Collectively, these results demonstrate that our reporter system and assay conditions can characterize the function of the yxjA riboswitch in E. coli cells.
The P1 helix of the yxjA riboswitch has specific length requirements for ligand-responsive activity Once our cellular function assay was established, we next investigated the dependence of yxjA riboswitch function on the length of the aptamer P1 helix. Previous work has shown across multiple riboswitches that the P1 helix is a critical site for controlling the ligand-responsiveness of transcriptional riboswitches (66). This is particularly true for riboswitches such as the pbuE adenine or tetrahydrofolate (THF) riboswitches where ligand binding occurs within junctions, and the P1 stem not only helps to maintain the base of the ligand binding pocket but also shares direct overlapping sequence with the EP (67). In these cases, this overlapping sequence maintains mutual exclusivity between the terminated and anti-terminated structures. However, at the secondary structure level in the yxjA riboswitch, the terminator can fully form when the P1 helix is only partially disrupted, leaving the AD and ligand binding pocket intact ( Figure 1C and D). We therefore wanted to investigate if P1 length is important in this architecture, and if it would reveal details about the switching mechanism. To test this idea, we designed mutants that altered the 5 side of P1 to disrupt base pairing while preserving base pairs involved in terminator formation.
Characterization of a range of P1 shortening and lengthening mutations showed that the optimal P1 length for the yxjA riboswitch to function lies between 7 (WT − 6) and 14 (WT + 1) base pairs ( Figure 2). As P1 base pairs are removed, mutant riboswitches functioned similarly to the WT until seven base pairs remained (Figure 2A). At six base pairs or shorter, SFGFP fluorescence remained relatively high even in the presence of ligand, representing a riboswitch broken toward the 'ON' or anti-terminated structure. As P1 base pairs were lengthened, the mutants functioned similarly to WT only up until one additional base pair ( Figure 2B). Any P1 helix >14 base pairs was characterized by low SFGFP expression even without ligand present, which we interpreted to mean that the riboswitch was broken toward the 'OFF' or always terminated structure.
These results suggest that the P1 helix region between 7 and 14 base pairs plays an important role in the switching mechanism. Since this region shares sequence overlap with the terminator (Figure 1C and D), we hypothesized it could play an analogous role to the 'overlap region' between the AD and EP that ensures exclusivity between the anti-terminated and terminated structures in other riboswitches. This overlap region differs from those in the pbuE riboswitch and other 'ON' riboswitches that extend into the direct ligand binding portion of the AD. Interestingly, for yxjA to function, it requires P1 to be at least seven base pairs long, which is longer than the minimum three base pairs required for function in the pbuE riboswitch (68). Overall, this suggests that the overlap region is critical for facilitating the yxjA regulatory mechanism.

Modeled structures based on chemical probing data suggest a possible mechanism involving strand exchange
To investigate the structural intermediates that mediate the yxjA riboswitch regulatory mechanism, we performed co-A B Figure 2. Flow cytometry gene expression characterization data of P1 mutants. (A) Shortened P1 mutants were designed by mutating the 5 side of the P1 helix into the same nucleotide identity as the 3 side to prevent base pair formation (orange). Mutations were introduced in successive order beginning from the bottom of P1 toward the 3WJ (arrow). (B) Lengthened P1 mutants were designed by successively inserting additional nucleotides on the 5 side of the P1 helix (orange) upstream of the riboswitch AD and downstream of the pre-aptamer sequence, to lengthen the P1 stem (arrow). Data were collected for each group of mutants as described in Figure 1 using the indicated amount of ligand (2AP) in the media. Bars represent averages from three biological replicates, each performed in triplicate technical replicates for a total of nine data points (n = 9), with error bars representing standard deviation. transcriptional in vitro Selective 2 -Hydroxyl Acylation Analyzed by Primer Extension Sequencing (cotranscriptional SHAPE-Seq) experiments using benzoyl cyanide (BzCN) to chemically modify the RNA at more flexible residues during stalled transcription at various lengths (28,69). The resulting chemical probing reactivity data were then used as inputs into the Reconstructing RNA Dynamics from Data (R2D2) algorithm to generate experimentally informed models of cotranscriptionally folded structural intermediates (28,43). Our goal with these experiments was to observe evidence of a central helix--a structure formed by the complementary regions of P1 and the terminator ( Figure  1C)--that could form during the folding pathway, or an alternative structural model that we could use to further investigate the riboswitch mechanism. As such, we focused these experiments on transcription conditions without ligand.
We selected three distinct lengths of the riboswitch: before where strand invasion by the central helix should occur, after where strand invasion by the central helix should have occurred and the full-length riboswitch. We generated secondary structure predictions using RNAStructure (54) to help inform our selection (Supplementary Figure S7) and designed purified DNA templates that would allow the E. coli RNA Polymerase (EcRNAP) to transcribe the ri-boswitch up to a length of 135 nt (before the central helix has invaded P1), 160 nt (central helix has invaded part of P1) and 220 nt (the full length of the riboswitch).
We chemically probed these lengths of the yxjA riboswitch in stalled transcription elongation complexes with EcRNAP and tested two replicates for each length. For each transcribed RNA, we included an additional 14 nt on the 3 end of the DNA template design to account for the EcRNAP footprint (70). We also performed additional equilibrium-refolding experiments where the RNA was synthesized, purified, denatured and equilibrium refolded in the absence of EcRNAP and therefore did not include the additional 14 nt. Reverse transcription and highthroughput sequencing allowed us to uncover SHAPE reactivity information for each nucleotide of these transcripts. Our resulting SHAPE data revealed general consistencies in reactivity patterns between replicates ( Supplementary Figures S8 and S9). We then used the SHAPE data to experimentally inform computational structure models with R2D2 to uncover details of cotranscriptional structures that may not be detectable by minimum free energy (MFE) structure predictions (43).
For each RNA length, R2D2 generated 100 structures that are most consistent with the observed chemical probing data. These data are conveniently visualized as RN- Reactivity data were then incorporated into R2D2 pathway modeling to generate 100 structures that are maximally consistent with the reactivity data. This family of structures is displayed as RNAbow plots, where an arc between two positions indicates a base pair in a specific structure, and the opacity of the arc indicates the prevalence of the base pair among the selected structures. Two replicates were performed at each length. The consensus structure over the population of selected structures is shown as a secondary structure alongside each bow plot. In the secondary structures, the central helix is colored teal while the 5 side of the P1 helix is purple and the 3 side of the terminator is green. Colored boxes drawn around the P1 helix, the central helix, or the terminator, as well as the color of the P2/P3/P4 helices, match the color of arcs in the RNAbow plots.
Abow plots (55) that capture a population of RNA structures. In these diagrams, arcs between two positions indicate a base pair between the two connected positions and the opacity of the arc line indicates how frequent that pair occurred in the population of structures. Viewing the RN-Abow plots from our cotranscriptional SHAPE-Seq experiments of these lengths revealed the presence of the P1 helix at length 135 nt, the potential presence of a central helix at length 160 nt, and a terminator structure at full length (Figure 3).
At length 135 nt, we observed that most structures in the population indicated P1 formation, as well as base pairs in P2 and P3 ( Figure 3A). We saw similar features in the equilibrium data, with the additional presence of the P4 stem (Supplementary Figure S10A). At length 160 nt, we saw differences between our two replicates. One replicate showed the formation of P2, P3 and a central helix, resembling an anti-terminated structure. The second replicate still contained an intact P1, along with P2 and P3 in the AD and the P4 stem ( Figure 3B). In the equilibrium data, both replicates revealed the presence of the central helix (Supplementary Figure S10B). Finally, at the full length, we saw the base pairs of the previously predicted terminator structure reflected in the R2D2 output, but also the presence of other sets of base pairs that did not correspond to previously predicted or proposed secondary structure features ( Figure 3C). The equilibrium data contained similar features as the cotranscriptional data, most notably the terminator in the full-length riboswitch data (Supplementary Figure S10C). This is consistent with previous studies of other riboswitch systems where we observed the formation of a more thermodynamically favorable terminator structure in equilibrium refolding conditions independent of ligand being present (28,30). In addition, we note that the P1 helix was not always properly formed at the full length, which could indicate that P1 formation is not preferred once the terminator forms, or could be due to lower read depth in the full length data introducing experimental noise that make these results less clear for interpretation. These results do not influence our observation of the P1 helix in the intermediate riboswitch length (135 nt) across both cotranscriptional and equilibrium conditions.
Overall, these results suggest that an intermediate structure consisting of a central helix that competes with part of the P1 helix sequence can form during transcription. The presence of the terminator at full-length may also imply that the terminator may need to invade through the central helix, or compete with it, to fully form as the final structure.
Based on this structural data, we therefore hypothesized that the yxjA riboswitch mechanism depends on a competition of two strand exchange processes to regulate gene expression: formation of the central helix via strand displacement of the P1 stem, and formation of the termina-tor via strand displacement of the central helix. Depending on which strand exchange process can properly form base pairs (central helix versus terminator), the more favored helix will determine which of the two mutually exclusive final structures the riboswitch will adopt.

Naturally encoded mismatches in the central helix control riboswitch response to ligand binding
We next sought to investigate how mutations in key regions may affect strand invasion in our hypothesized riboswitch mechanism. The region of the P1 helix shown to be critical for function in our previous length requirement experiments ( Figure 2) encodes single mismatches that are not expected to form base pairs in either the P1 or central helix in secondary structure predictions ( Figure 4A). Mismatches within helical regions are predicted to slow the progression of strand displacement mechanisms by introducing a barrier that stalls the progress of an 'invading' strand from gaining an additional base pair with the 'substrate' strand and displacing the 'incumbent' strand (71). Depending on where they occur, these mismatches may slow the formation of the P1 helix or the central helix and bias the folding pathway accordingly. We therefore hypothesized that if we repaired a mismatch to instead form a base pair within the P1 helix, it would favor P1 formation and outcompete central helix formation to facilitate terminator formation and fold into a broken 'OFF' state. Correspondingly, if we mutated the mismatch to form a base pair in the central helix instead, favoring the central helix would inhibit terminator formation and lead to an always 'ON' phenotype.
We tested this hypothesis by making several point mutations and characterizing riboswitch function. We found that when we mutated a single nucleotide to form a base pair with either P1 or the central helix, the riboswitch would either break 'OFF' or 'ON' according to our hypothesis, regardless of ligand in the media. For example, extending the central helix by a single base pair by mutating position 83 from an A to a C broke the riboswitch into the 'ON' phenotype ( Figure 4B). When the opposing nucleotide at position 21 was mutated to a G so that both P1 and the central helix could form this pair, the riboswitch function was restored ( Figure 4B).
In another example, positions 88 and 91 encode an A:A mismatch or G:U wobble pair, respectively, in the P1 helix. In this case, the A88U mutation, which allows a base pair to form within P1, but makes the corresponding pair in the central helix a mismatch, broke the riboswitch into the 'OFF' phenotype ( Figure 4C). A complementary pattern held true for position 91 which can normally form a wobble pair within P1. When mutated to cause a G:G mismatch in P1, the results yield a broken 'ON' phenotype. But, when mutated to cause a strong G:C base pair in P1 and a broken C:C pair in the central helix, the alteration yields a broken 'OFF' riboswitch ( Figure 4C).
These results demonstrate that these naturally occurring P1 and central helix mismatches are critical for determining the regulatory outcomes of the yxjA riboswitch. By altering them to form base pairs in either the P1 or the central helix, we could distinctly favor the P1-terminator structure or the central helix/anti-terminator structure. This observation also suggests that the ability to strand exchange plays an important role in the riboswitch folding process and that mismatches may introduce a stalling effect on invading strands.

Central helix base pairing is required for riboswitch function
After discovering specific requirements for the P1 helix length and single mismatches that could completely shift riboswitch function in one direction or the other, we next designed mutants to test the importance of base pairing in the central helix structure for riboswitch function. We created these mutants by introducing broken base pairs in the central helix to observe how breaking multiple continuous base pairs would affect riboswitch function. Based on observations from our previous results, we hypothesized that breaking these base pairs would prevent the central helix from forming and disrupt riboswitch ligand response, favoring the formation of the terminator and leading to a broken 'OFF' phenotype. To test this hypothesis, we introduced these mutations by swapping the base pairs in the terminator so the central helix was broken while still maintaining base pairing context in the terminator ( Figure 5).
As hypothesized, when we introduced these broken base pairs, the riboswitch displayed a broken 'OFF' phenotype. When the broken base pairs were introduced starting from the top of the central helix (U158A:A167U), the riboswitch could only tolerate one broken base pair before breaking 'OFF' even in the presence of ligand ( Figure 5A). Introducing broken base pairs from the bottom (A152U:U173A) yielded the same results ( Figure 5B).
However, we noticed that introducing more base pair swaps in the terminator caused the riboswitch to display a broken 'ON' phenotype. We hypothesized that this was due to introducing too many sequence changes that ultimately impaired terminator function. To test this hypothesis, we designed constructs that removed the AD to test the terminator's baseline ability to terminate without interference from a competing AD. This experiment allowed us to test the function of mutated terminators with the same cellular gene expression assay. As expected, the terminator experiments showed that termination was impaired when either four or more swaps were introduced from the top (U158A:A167U) (Supplementary Figure S11), or seven swaps were introduced from the bottom (A152U:U173A) of the central helix, matching the same regions that created a broken 'ON' phenotype in the swapped riboswitch constructs (Supplementary Figure S12).
Overall, these results demonstrate that base pairing within the central helix is critical for riboswitch function.

All-atom molecular dynamics simulations suggest the P1 helix becomes more conformationally rigid upon ligand binding
Our data suggests that the yxjA regulatory mechanism is mediated by the competition between the formation of the P1 helix and the central helix. A full mechanistic model of the regulatory mechanism would then need to explain how ligand binding biases the initial folding of the P1 helix structure toward the 'OFF' functional state. Based on our mutational results, we hypothesized that ligand binding to the AD junction prevents the central helix from invading a fully formed P1, though our functional data are not able to explain the structural basis for P1 stabilization. To investigate this further, we conducted all-atom molecular dynamics (MD) simulations to probe specific structural differences between the apo and holo states of the yxjA aptamer.
Prior time-resolved crystallographic studies investigated adenine-binding to the pbuE riboswitch aptamer and found the 3WJ and P1 helix differs between the apo and holo aptamer states (72). Specifically, this study showed that in the holo form, the P1 helix has less conformational freedom compared to that observed in the apo form. Due to the high level of sequence structural conservation between adenine and guanine riboswitches (46)(47)(48), we hypothesized that the same difference in P1 helix flexibility observed in the apo and holo states of the pbuE aptamer would be observable in the apo and holo states of the yxjA aptamer.
To test this hypothesis, we used X-ray crystal structures of the two distinct pbuE aptamers in apo conformation found present in solution as templates to construct homologybased models for two yxjA apo conformations: apo1 and apo2 (72). Between the two apo conformations, regions such as the J1/2 hinge and P1 helix differ slightly in arrangement, and only apo2 is ligand binding competent. We also used the pbuE crystal structure to construct the yxjA holo aptamer state. We used these three models as our starting points for all-atom MD simulations for 50 ns at 310 K. We aligned the resulting trajectories from each simulation, using the initial structure of the AD (aside from the P1 helix) as the reference state. To analyze these aligned trajectories, we overlaid every 500 th frame of the 3 -strand of the P1 helix, which is involved in base pair formation in both the terminated and anti-terminated conformations ( Figure 6A). We also calculated the per-residue root mean square fluctuation (RMSF)  Figure 1 using the indicated amount of ligand (2AP) in the media. Bars represent averages from three biological replicates, each performed in triplicate technical replicates for a total of nine data points (n = 9), with error bars representing standard deviation. of the 3 -strand of the P1 helix to quantify the average flexibility of this region over time (Table 1).
These simulation results show a clear difference in P1 helix flexibility between apo and holo states. Specifically, in the apo1 state, we observed a range of simulation conformations of the P1 helix and the largest RMSF distances, indicating that the P1 helix is flexible and dynamic in this state. In contrast, in the holo state, there is a distinct reduction in possible P1 conformations with reduced RMSF distances of approximately 0.2-0.3Å in the RMSF of residues 70-74 in the holo state. We found that the apo2 state is in between the apo1 and holo states, with slightly reduced conformational flexibility from the apo1 state, indicating that the helix is more compressed. From these results, we can conclude that, in agreement with the observations for the pbuE aptamer, the binding of guanine to the yxjA aptamer causes the P1 helix to become more rigid, particularly in the 3 -strand of P1, which is involved in the formation of the central helix during strand exchange. It should be noted these fluctuation differences are readily observable even in much shorter (<10 ns) simulations, and therefore, although the exact RMSF values reported in Table 1 would be expected to vary slightly for longer trajectories, the site-specific trends would be identical.

All-atom MD simulations of hybrid models of the yxjA riboswitch suggest that a more conformationally rigid P1 helix interrupts strand exchange by the central helix
Having established that ligand binding can alter the flexibility of the P1 stem, we next used two-dimensional replica exchange molecular dynamics (2D REMD) simulations to investigate the strand exchange process by which the central helix is formed in place of the P1 helix in the ligand-free apo state. To directly probe the process of strand exchange, we designed a 124 nt model of the yxjA riboswitch, consisting of both the AD and an intermediate state of the EP. The remainder of the P1 helix was taken from a secondary structure prediction of the 195 nt terminated conformation, while the P4 helix was adapted from a secondary structure prediction of a 195 nt antiterminated conformation (Figure 6B). To generate the three-dimensional structure of the EP, we provided the sequence and secondary structure information for our 124 nt intermediate as input into the open-source tertiary structure generator RNAComposer (59). We narrowed down the 60 conformations predicted by RNAComposer based on two criteria: (i) whether the base pairs involved in central helix formation were equidistant between their P1 and central helix complements and (ii) whether the conformation showed good agreement with the R2D2-based secondary structure predictions of the 160 nt intermediate state. We then stitched our existing apo2 and holo aptamers onto our chosen EP conformation using Molecular Operating Environment (MOE) (57). Finally, we ran single-temperature all-atom MD simulations on our final 124 nt apo2 and holo hybrid structures at 310 K for a total of 50 ns to equilibrate the AD and the expression platform ( Figure 6C). We enforced base pairs at the start of each helix, at the end of each helix, the kissing loop interactions, and in the binding pocket of the AD by applying distancedependent harmonic bias restraints (Supplementary Figure  S13).
With the model established, we next aimed to identify the mechanisms by which bound guanine in the 3WJ rigidifies the P1 helix. We first ran all-atom MD simulations of our equilibrated apo1, apo2 and holo aptamers at a wide range of temperatures (310-400 K) for 50 ns each (Supplementary Video S1). Again, we used per-residue RMSF to quantify average flexibility over time versus temperature. Our RMSF analysis revealed that the J1/2 hinge, which is thought to  (72), is nearly three times more flexible in apo1 and apo2 as compared to holo (Supplementary Figure S14A). This observation suggests that guanine bound to the 3WJ tethers the J1/2 hinge to the AD such that it can no longer serve as a point of ro-tation for the P1 helix. Similarly, the J2/3 latch is found to be nearly twice as flexible in apo1 and apo2 as compared to the holo state due to the formation of base triplets and stacking interactions that occur in the binding pocket upon ligand recognition (Supplementary Figure S14B). Following the same trend, the kissing loop interaction between L2 and L3 is significantly more stable in the holo form than either apo1 or apo2, where it is dynamic and constantly dissociating and reassociating ( Supplementary Figure S14C). As the only difference between the two simulations is the presence of bound guanine (i.e. no restraints were placed on the kissing loop interaction for either state), this effect must be a direct result of increased binding pocket organization upon recognition of guanine. In contrast to the overall trend, we found that the P3 helix shows greater flexibility in the holo state than it does in the apo states, which is most noticeable at the P3-L3 interface, where some residues show a three-fold increase in RMSF ( Supplementary Figure S14D). This increase is a direct result of the stabilized kissing loop interaction in the holo state. Conversely, the P2 helix exhibits greater stability in the holo state and increased flexibility in the apo1 and apo2 states ( Supplementary Figure S14E).
Taken together, these results are in broad agreement with single-molecule pulling studies of the xpt guanine riboswitch (73), where increased kissing loop stability in the holo state led to increased flexibility in the P3 helix, especially at the P3-L3 interface, ultimately increasing P1 helix stability. The same study also observed the opposite trend for the apo state, where decreased kissing loop stability correlated with increased P3 helix stability and decreased P1 helix stability. We therefore conclude that this mechanism also holds true for the highly homologous yxjA riboswitch.
To observe the effect of bound guanine on the ability of the yxjA riboswitch central helix to undergo strand exchange, we ran a series of 2D REMD simulations on our 124 nt apo2 and holo hybrid models (Supplementary Video S2). To ensure that structurally plausible strand exchange events could be observed in feasible simulation timescales, we applied a gradient of harmonic bias restraints to the base pairs involved in the strand exchange process to form the central helix (Supplementary Figure S15). These restraints biased the model toward forming the anti-terminator conformation but were kept sufficiently weak that they would stall rather than unfold the AD if strand exchange was difficult to achieve. We hypothesized that the apo2 hybrid model would have a higher likelihood to successfully undergo strand exchange and form the anti-terminated conformation due to the increased flexibility of its P1 helix. Therefore, for the central helix to form, the 5 -end of the P1 helix would gradually dissociate from its starting position. Conversely, we predicted the holo hybrid model would not be able to form the anti-terminated conformation. Instead, the 5 -and 3 -ends of the central helix would move closer together to satisfy the distance restraints, but the structure would stall and would be unable to form the central helix. The starting P1 helix would remain largely intact, and strand exchange would not occur.
To test these hypotheses, we first ran a series of 2D REMD simulations (2 sets of 9 replicas, 22 ns per replica) on each hybrid model with a gradient of weak harmonic bias restraints applied to 6 central helix base pairs (Supplementary Table S5). The strongest harmonic bias restraint was applied to the lowermost base pair, and the weakest harmonic bias restraint was applied to the uppermost base pair (Supplementary Figure S15). Strand exchange was determined to be successful if at least 80% of the six restrained central helix base pairs formed over the course of a simulation trajectory, and each of the nine replica trajectories was treated as an independent strand exchange 'attempt'.
As hypothesized, we found that successful strand exchange ( Figure 6D) occurred in the apo2 hybrid model on a consistent basis. In 10 out of 18 attempts, at least 80% of the restrained central helix base pairs formed. In 9 of these attempts, 100% of the restrained central helix base pairs formed. On the other hand, strand exchange was difficult for the holo hybrid model to achieve ( Figure 6E). In only 3 out of 18 attempts, 80% of the restrained central helix base pairs formed. Notably, in 6 out of 18 attempts, all the restrained base pairs stalled, and the original P1 helix remained entirely intact. The highest percentage of central helix formation achieved out of all attempts was 83%.
To further confirm these results, we ran a second series of 2D REMD simulations (2 sets of 9 replicas, 47 ns per replica) on each hybrid model with a gradient of weak harmonic bias restraints applied to seven central helix base pairs (Supplementary Table S6). For the apo2 state, strand exchange was successful in 10 out of 18 attempts and at least 14% of restrained central helix base pairs formed in every attempt. Conversely, for the holo state, strand exchange was not successful in any of the 18 attempts. In 4 of these attempts, the restrained central helix base pairs stalled, and the P1 helix remained fully intact. The highest percentage of central helix formation achieved out of all attempts was 71%. The agreement of the 47 ns/replica and 22 ns/replica results indicates the robustness of the observed results with respect to total simulation length, as all strand exchanges sterically able to occur under the specific biases forces applied have either already happened or completely stalled within this simulation timeframe.
We also explored the experimental finding that the U91G mutant ( Figure 4C) (position 80 in our holo hybrid model) yields a broken 'ON' phenotype. We mutated position 80 to G in our holo hybrid model to create a G:G mismatch in the P1 helix. We ran two sets of 2D REMD simulations (2 sets of 9 replicas, 38 ns per replica) on this mutant and again applied a gradient of harmonic bias restraints to seven central helix base pairs (Supplementary Table S7). Strand exchange was successful ( Figure 6F) in 10 out of 18 attempts, with at least 85% of the central helix base pairs formed. The U91G mutant allowed more strand invasion than the WT holo complex and behaved similarly to the WT apo2 complex. The observation that the U91G broken 'ON' mutant exhibits favorable strand exchange even with the ligand bound indicates the UG base pair normally formed by U91 at the bottom of the P1 helix plays a crucial role in modulating overall strand exchange efficiency. Altogether, the 2D REMD strand exchange calculations entailed ∼1.6 s of cumulative all-atom MD simulations.

DISCUSSION
Previous secondary structure predictions of the yxjA riboswitch show that the ligand binding junction of the AD and EP do not share direct overlapping sequences (44). This feature differs from the well-characterized pbuE purine riboswitch where a significant portion of the junction of the AD can be broken by the terminator when there is no ligand bound (Supplementary Figure 1). Yet, in the yxjA riboswitch, which serves as an interesting contrasting model as an 'OFF' purine riboswitch, ligand binding is still able to induce a conformational change in the downstream EP, even without the junction of the AD overlapping with the EP. Our structural, functional and simulation studies of the yxjA riboswitch in this work support a model in which the riboswitch relies on the formation of an intermediate central helix, mutually exclusive with both the P1 helix and the terminator helix, that mediates the structural switching mechanism. More specifically, our data also suggest that ligand binding in the AD induces a conformational change in the P1 helix and prevents the central helix from forming, thereby allowing the terminator to form in the presence of ligand.
In either scenario, for this process to occur fast enough to respond to a bound ligand and influence whether RNAP continues to transcribe or terminate, the central helix invading the P1 helix must occur cotranscriptionally. The transition through sequential structures during transcription is a critical feature that allows transcriptional riboswitches to regulate gene expression. Our investigation of how strand exchange is involved in the mechanism of the yxjA riboswitch adds another example to the growing repertoire of studies showing the important role of strand exchange and cotranscriptional folding in functional RNAs, such as the rearrangement of the E. coli SRP RNA into a large hairpin, and conformational switching of riboswitches such as ZTP, FMN and pbuE (29)(30)(31)35,43). We note that many cellular factors and conditions may play a role in influencing cotranscriptional folding as well. For example, the transcription elongation factors NusA and NusG that facilitate pausing or RNA hairpin formation can influence the rate of transcription, providing additional time for strand invasion processes to occur within the cell (6,(74)(75)(76). In addition, ion concentrations known to stabilize RNA structures, such as magnesium, can also impact riboswitch folding and regulatory function (77). More studies are needed to elucidate the effect of these conditions on cotranscriptional folding pathways.
Our work also builds on previous studies about cotranscriptional folding and riboswitch-mediated gene regulation by combining several complementary approaches to build a comprehensive understanding of the yxjA riboswitch mechanism. While our previous work using cotranscriptional SHAPE-Seq has attempted to chemically probe RNA structures during transcription at all intermediate lengths during transcription (28,30), we decided to focus our experiments in this study on a fewer number of specific single length RNAs. This approach allowed us to achieve a higher read depth for a single length when sequencing, although at the cost of losing structural information from other lengths that would have informed a full matrix of re-activity values for illustrating an entire 'folding pathway' of structures. However, we were still able to observe largely replicable reactivity patterns across replicates and test hypotheses by using in vivo functional reporter assays, experimentally informed structure modeling algorithms, and in silico modeling that produced results that agreed with each other.
Regarding our mechanism model, we also present how a riboswitch without a directly overlapping AD junction and EP can be influenced by ligand binding. In the case of the yxjA riboswitch, we have found that ligand binding creates a more stable P1 helix, which can explain how the AD 'communicates' over a relatively longer distance with the downstream EP to influence its conformation. Compared to the pbuE riboswitch and more broadly riboswitches where the AD and EP directly overlap, this mechanism could be a central factor in determining how similar aptamers can have different expression logic depending on variations in the EP.
Several riboswitches that have been studied to date with a similar modular architecture that have no direct overlap between the AD junction and EP may involve this mechanism as well. Our results in this study suggest that a central helix motif that lies in between the terminator and the AD P1 helix may be the key feature that determines the 'OFF' switching expression logic. For example, a previous study by Ceres et al. noted the modularity of select riboswitches that turn 'OFF' gene expression in response to ligand-binding (38). Based on their sequence complementary patterns, these riboswitches, such as SAM (metE and yitJ), lysine (lysC), purine (xpt) and FMN (ribD), appear to also have a P1 helix and central helix that do not overlap into the junction part of the AD, similar to the yxjA riboswitch. We hypothesize that these riboswitches also rely on a similar P1 helix/central helix interaction to communicate ligand binding between the AD to the EP.
We believe this work has implications for RNA biology, RNA drug targeting and RNA engineering beyond the knowledge learned about the yxjA riboswitch. With regards to RNA engineering, the modular architecture of riboswitches without directly overlapping AD and EP folds have been recognized as a more engineerable platform to create synthetic riboswitches that act as biosensors and gene expression controllers in both in vivo and in vitro systems (38). Specifically, the mechanistic insights of this work may inform strategies to fuse synthetic aptamers generated by laboratory evolution and screening (78) onto modular expression platforms to create new riboswitches. In addition, our work has also shown that small sequence changes can result in large functional changes, providing new design principles to optimize the function of these systems for a range of applications including controlling translation initiation, splicing, or producing new classes of biosensors (79).
Beyond engineering riboswitches, future work may also take advantage of our growing understanding of how ligand-aptamer interactions influence RNA folding pathways to create small molecule drugs that disrupt riboswitch function. For example, this work demonstrates that ligand binding can mediate and influence large scale changes in RNA structure to produce specific riboswitch regulatory outcomes, suggesting that metastable ligand binding states may be attractive drug targets, especially for discovering new classes of antibiotics (52,80).
Broadly speaking, riboswitches remain an informative model for understanding how RNAs fold along specific cotranscriptional folding pathways to enact their functions, especially in the context of gene regulation. For RNA biology, it is well known that the formation of RNA structures at the right time and in the right context can regulate transcription and translation in prokaryotes and RNA processing events in eukaryotes, such as polyadenylation and splicing (11,81,82). However, the details of what the RNA cotranscriptional folding pathways are that mediate these functions are only beginning to emerge. Investigating riboswitch mechanisms as model systems is key to understanding how cotranscriptional folding can play a reoccurring role in facilitating the folding of RNA into their specialized roles in the cell. We anticipate that RNA strand displacement, a seemingly simple mechanism that exploits subtle features of RNA folding to enact large scale RNA structural rearrangements, will be a vital feature of cotranscriptional folding mechanisms throughout biology (71).

DATA AVAILABILITY
Detailed installation and usage guide for Spats is available at Read the Docs (https://spats.readthedocs.io/en/master/).
Sequencing data generated in this work were deposited on the Small Read Archive (http://www.ncbi.nlm.nih.gov/ sra) and is accessible via BioProject ID PRJNA776034 or individual accession numbers listed in Supplementary Table  S3. SHAPE-Seq reactivity data generated in this work were deposited on the RNA Mapping Database (http://rmdb. stanford.edu/repository/) and were accessible using RMDB IDs listed in Supplementary Table S4.
Flow cytometry data generated in this work were deposited on FlowRepository (https://flowrepository.org/) with the following IDs: • Figure 1B