The roles of structural dynamics in the cellular functions of RNAs

RNAs fold into 3D structures that range from simple helical elements to complex tertiary structures and quaternary ribonucleoprotein assemblies. The functions of many regulatory RNAs depend on how their 3D structure changes in response to a diverse array of cellular conditions. In this Review, we examine how the structural characterization of RNA as dynamic ensembles of conformations, which form with different probabilities and at different timescales, is improving our understanding of RNA function in cells. We discuss the mechanisms of gene regulation by microRNAs, riboswitches, ribozymes, post-transcriptional RNA modifications and RNA-binding proteins, and how the cellular environment and processes such as liquid–liquid phase separation may affect RNA folding and activity. The emerging RNA-ensemble–function paradigm is changing our perspective and understanding of RNA regulation, from in vitro to in vivo and from descriptive to predictive. The functions of many regulatory RNAs depend on how their 3D structure changes in response to cellular conditions. Recent studies have revealed that RNA exists as a dynamic ensemble of conformations, which form with different probabilities in different cellular conditions and thus modulate RNA function.

A central and bold goal of biomedical sciences is to achieve a predictive understanding of living cells, tissues and, ultimately, whole organisms based on the properties of their constituent biomolecules. To date, much effort has been directed towards understanding the functions of biomolecules by determining their 3D structures at atomic resolution, but we still lack the ability to quantitatively predict from these structures key biochemical properties such as folding stability, catalytic efficiency and binding affinity and specificity. In reality, all macromolecules dynamically alternate between conformational states to carry out their biological functions. Decades ago, it was realized that the structures of biomolecules are better described as 'screaming and kicking' 1 , constantly undergoing motions on timescales spanning 12 orders of magnitude, from picosecond to seconds, and that in addition to sequence and structure, these motions are important for their activities. The goal of this Review is to show that although the dynamics of biomolecules can be dizzyingly complex, they can be described, classified and, ultimately, will be used to quantitatively predict molecular behaviour.
RNA molecules have crucial functions both in normal, physiological conditions 2 and in pathological conditions 3 (Fig. 1). The number of RNAs targeted to treat infectious diseases, cancer and genetic disorders is rapidly growing 4 and so is the number of applications in which RNA is engineered to drugs, cellular devices and tools for molecular and synthetic biology 5 . RNAs such as microRNAs readily fold into stemloop secondary structures, which can be recognized by proteins 6,7 , and other RNAs such as ribozymes and riboswitches can also fold into complex tertiary structures and exhibit activities that rival those of proteins [8][9][10] . Another subset of RNAs form large quaternary assemblies of ribonucleoprotein (RNP) machines such as the ribosome 11,12 and spliceosome 13,14 (Fig. 1). Moreover, RNA fine-tunes and expands its functionality through its ability to change structure in response to specific cellular cues (reviewed in reF. 15 ). Indeed, mutations that disrupt RNA structural dynamics have been linked to human diseases [16][17][18][19][20] (Box 1) and many antibiotics function by disrupting RNA structural dynamics 21 .
In solution, an RNA molecule dynamically samples a vast number of conformations. Some conformations may form with high probability and account for a large (~10%) proportion of the population of the RNA at any given time, whereas others, such as the unfolded state, may be extremely rare (0.00001% of the population). Some conformations may form rapidly -within picoseconds -whereas others may form slowly -within seconds. The population-weighted distribution of all conformations of an RNA is referred to as an 'ensemble' . (The term ensemble is often used loosely to describe any collection of structures of a given biomolecule, and these descriptions should be distinguished from the rigorous statistical and mechanical description of ensembles as the Boltzmann distribution of conformations discussed in this review 15,22-24 .) In this Review, we discuss how the ensemble description of RNA structural dynamics 15,[25][26][27] is providing a more predictive understanding (compared with that provided by static structures) of how regulatory RNAs fold and function in cells, thereby illuminating their The roles of structural dynamics in the cellular functions of RNAs Laura R. Ganser 1 , Megan L. Kelly 1 , Daniel Herschlag 2,3,4 and Hashim M. Al-Hashimi 1,5 * Abstract | RNAs fold into 3D structures that range from simple helical elements to complex tertiary structures and quaternary ribonucleoprotein assemblies. The functions of many regulatory RNAs depend on how their 3D structure changes in response to a diverse array of cellular conditions. In this Review , we examine how the structural characterization of RNA as dynamic ensembles of conformations, which form with different probabilities and at different timescales, is improving our understanding of RNA function in cells. We discuss the mechanisms of gene regulation by microRNAs, riboswitches, ribozymes, post-transcriptional RNA modifications and RNA-binding proteins, and how the cellular environment and processes such as liquid-liquid phase separation may affect RNA folding and activity. The emerging RNA-ensemble-function paradigm is changing our perspective and understanding of RNA regulation, from in vitro to in vivo and from descriptive to predictive. relevance to human diseases (Box 1) and their therapy. We discuss the mechanisms of gene regulation by ensembles of microRNAs, riboswitches, ribozymes, post-transcriptionally modified RNAs and RNAbinding proteins (RBPs), and how the cellular environment affects RNA folding and activity. Although dynamics studies of the assembly and function of RNP machines such as the ribosome 11,12 and spliceosome 13,14 are important and well-established examples of the concepts discussed here, they are beyond the scope of this Review.

Activation of RNA cellular functions
The activities of many regulatory RNAs arise from changes to their structure (Fig. 1). The structural changes can be induced by the binding of proteins, metabolites, ions, DNA and other RNAs and by post-transcriptional modifications, changes in environmental conditions such as temperature and solute concentration, mutations to the RNA sequence or even by the act of RNA synthesis itself through co-transcriptional folding of the RNA. The outcomes of RNA structural changes and their dependent interactions include turning gene expression on or off, alternative splicing, regulating microRNA maturation and RNP assembly (Fig. 1). A wide variety of experimental methods have been developed to probe these functionally relevant RNA conformational dynamics (Supplementary Box 1).
Changes in RNA conformation can occur at the secondary, tertiary or quaternary structural levels. For example, riboswitches control gene expression by folding into alternative secondary structures upon ligand binding 8,[28][29][30] (Fig. 1a). The 5′ leader of the HIV-1 RNA genome undergoes changes in secondary structure that direct genome dimerization and are proposed to regulate the switch between translation and packaging of the viral genome 31 (Fig. 1b). Many RBPs, including alternative splicing factors, bind single-stranded RNA motifs and thus may require melting of the RNA secondary structure before binding 32 (Fig. 1c). On the other hand, ribozymes cycle through different tertiary structures to enable substrate binding, catalysis (often through multiple reaction steps) and product release 33,34 (Fig. 1d). Proteins also induce changes in RNA tertiary structure that can trigger various outcomes, such as the quaternary assembly of the ribosome and other RNP machines 35 (Fig. 1e). Proteins can also inhibit RNA activity by stabilizing inactive RNA conformations. For example, LIN28A binds the let-7 pre-microRNA and inhibits its maturation in part by changing its structure such that it can no longer be recognized by the microRNA maturation factors Drosha and Dicer 36 (Fig. 1f). Therefore, alternative splicing depends on the equilibrium between structured and unstructured conformations of the RNA at the binding site. d | Ribozymes undergo changes in tertiary structure during their catalytic cycles. e | Binding of ribosomal protein S15 induces a conformational change in ribosomal RNA to direct the ordered assembly of the ribosome. f | MicroRNAs interact with protein partners in specific conformations, which can be important for recognition, binding affinity and downstream activity. LIN28A binds to the primary let-7 microRNA or the precursor let-7 (pre-let-7) microRNA and induces a conformation that prevents binding of the microRNA processing factors Drosha and Dicer, respectively. g | Long non-coding RNAs have many cellular functions, including as scaffolds that direct RNA-protein, RNA-RNA and RNA-DNA interactions in epigenetic regulation. h | RNA can be found in phase-separated granules, in which it forms dynamic RNA-protein and RNA-RNA interactions. RNP, ribonucleoprotein.

RNA
Riboswitches rNA structures typically found in 5′-untranslated regions of bacterial mrNAs, which regulate transcription or translation through a ligandinduced conformational change.

Tertiary structures
Typically long-range interactions between distal rNA structural elements or nucleotides involved in base pairing.

Quaternary assemblies
Higher-order organizations of rNA molecules in complex with other molecules, including with other rNAs, proteins and DNA.

Boltzmann distribution
A probability distribution that describes the likelihood that a system will be in a specific state based on the relative energy of that state and the temperature of the system.
NATURe RevIewS | MOlEculAR cEll BiOlOgy of different conformations 15,43,46,49 (Fig. 2a). Some cellular modifiers, such as RNA chaperones 50,51 , accelerate the rates of interconversion by lowering energetic barriers. To understand how cellular modifiers activate RNA, we need to describe the RNA ensemble and understand how it is redistributed by cellular modifiers.

Describing RNA ensembles
Relative to a single structure, ensembles are more difficult to describe and to experimentally characterize. The free energy landscape that was first developed to describe complex systems such as glasses and, later, proteins 15,24 provides a powerful framework for describing ensembles of macromolecules by specifying the energetic stability (free energy, G 0 i ) of every allowed conformation 15,24 (Fig. 2a). The population size of any given conformation will depend on its energetic stability relative to other conformations, whereas the rates at which any two conformations interconvert will depend on the energetic barriers that separate them.
The experimentally determined 44,52 ensemble for the transactivation response element (TAR) RNA of HIV-1 (reF. 53 ) highlights general features of the free energy landscape that appear to be common to many RNAs (Fig. 2a). TAR is a model system for studying RNA structural dynamics, and one of the few RNAs for which a detailed and comprehensive ensemble and free energy

Dynamic ensembles
The many conformations adopted by an rNA molecule over time and their abundance or probabilities of formation as described by the Boltzmann distribution.

Free energy landscape
A depiction of the free energy of every conformational state in a macromolecule.

Ground state
The lowest-energy and therefore most populated structural conformation of an rNA molecule.

RNA junction topology
in a structured rNA molecule, the lengths of the different single strands that adjoin helices.

Four-way junction
A structural element in which four helices come together.

Box 1 | The relevance of RNA ensembles to human diseases
In addition to clarifying RNA folding and its roles in cellular processes, an ensemble perspective of RNA structure is required to understand how some mutations, including potentially disease-causing mutations, affect RNA activity. Single-nucleotide polymorphisms (SNPs) are the most common type of genetic variation in the human population and many are associated with diseases. There is mounting evidence that many diseaseassociated SNPs induce changes in RNA ensembles, and ensemble-based mechanisms have been proposed to explain disease phenotypes 16,18,20 . In retinoblastoma, SNPs in the 5′ UTR of the RBL1 mRNA, which encodes the tumour suppressor retinoblastoma-like protein 1, can collapse the ensemble of secondary structures formed by the UTR into a single secondary structure, resulting in reduced expression of the tumour suppressor 18 . Structural dynamics at microRNA sites that are recognized by microRNA-processing factors are proposed to be important for their maturation 16,150 . A SNP in the human microRNA miR-125a that interferes with its maturation is highly associated with breast cancer 151 . This SNP is proposed to inhibit maturation efficiency and lead to breast cancer by increasing the number of non-native secondary-structure conformations and simultaneously decreasing the population of the native, or ground state, conformation, which is thought to be the conformation most amenable to processing (see the figure part a) 16 . other mutations in miR-125a affect its maturation in a manner correlated with their predicted effect on the RNA ensemble 16 . Thus, ensemble descriptions of microRNAs may be more predictive of their maturation efficiency than single structures, and may be required to fully model the effects of disease-causing mutations 16 .
Pathogenic mutations can also affect rNA junction topology and the resulting inter-helical dynamics. For example, the highly conserved topology of the tRNA four-way junction skews the orientation of helices towards tertiary conformations 73 (see the figure, part b). mammalian mitochondrial tRNAs such as tRNA Ser(UCN) fold into near-canonical tRNA 3D structures despite having shortened inter-helical loops, which disrupt tertiary interactions that are conserved in many tRNAs. Using a coarse grain computational model that predicts RNA global dynamics, the topological constraints encoded by the shorter inter-helical loops in tRNA Ser(UCN) were shown to decrease the free energy of folding by decreasing the energy cost of redistribution, thus compensating for the loss of tertiary interactions 152 . A pathogenic insertion mutation that changes the junction topology of mammalian mitochondrial tRNA Ser(UCN) decreases the concentration of the tRNA in vivo and is linked to hearing loss and epilepsy. The mutation is predicted to broaden the unfolded ensemble and thereby destabilize tertiary folding 152 (see the figure, part b). These computational predictions were verified experimentally 152 . Because tRNA processing requires a properly folded conformation, the mutation was proposed to increase the susceptibility of tRNA Ser(UCN) to a pathway that degrades unfolded and misprocessed tRNAs, thereby decreasing its cellular concentration 152  RNA structural dynamics are likely to be important for other, less characterized processes. For example, long non-coding RNAs (lncRNAs) 37-39 likely undergo conformational changes when acting as scaffolds for assembling proteins, DNA and RNA molecules [37][38][39] (Fig. 1g). In cells, RNA is often found in phase-separated granules, where it dynamically interacts with other RNAs and proteins [40][41][42] . RNA conformational changes are likely important for nucleating such phase transitions and in defining the properties of the resulting granules (Fig. 1h).
Dynamic ensembles of RNA structures For many RNAs, understanding their function requires understanding how cellular cues and modifiers change their conformation. What has become clear in the past two decades, and follows from first principles of statistical mechanics 15,24 , is that cellular modifiers change the abundance of two or more pre-existing conformations in the ensemble [43][44][45][46] . Furthermore, although it is commonly and reasonably assumed that the formation of RNA-ligand and RNA-protein complexes generally results in more rigid RNA structures, RNAs in both types of complexes are also more accurately described as dynamic ensembles [46][47][48] . Thus, cellular RNA modifiers do not change RNA from one structure to another; rather, they change the RNA ensemble from one distribution to another by changing the relative populations www.nature.com/nrm landscape is available following the application of various experimental techniques (Supplementary Box 1).
The free energy landscape is rugged, punctuated by local energetic minima that correspond to a subset of highly populated conformations (Fig. 2a). These energetic minima are separated by variable energetic barriers that reflect different rates of interconversion. In practice, it is only possible to observe a subset of dominant   53 ). The different conformations of TAR include a co-axially stacked conformation, a bent flexible conformation, three non-native secondary structures, including one with a base triple that represents a protein-bound conformation, and finally the unfolded conformation. The relative energetic stabilities (G 0 i ) of each conformation i are represented by the depth of the free energy minima and the corresponding abundance of each conformer is shown as the fractional population over the entire ensemble. TAR activates transcription elongation of the HIV-1 genome by forming a complex with the viral protein Tat and host factors including positive transcription elongation factor b (P-TEFb), which is part of the super elongation complex (SEC). Binding to partner molecules, changes in the cellular environment such as in the concentration of magnesium ions (Mg 2+ ) and/or of crowding agents or mutations can remodel the free energy landscape and redistribute the ensemble of conformations, thereby altering RNA activity. The structural features of the TAR native structure include the lower helix (green), bulge (yellow) and upper helix and apical loop (blue). The dashed orange line represents tertiary interactions. b | Illustration of key aspects of RNA activity that can be modelled using RNA ensembles but not by static RNA structures. (Left) The strength of interactions between RNAs and ligands depends on the population of the bound conformation in the free ensemble, with a lower population corresponding to higher energy cost of redistribution (ΔG 0 redist ) and therefore lower binding affinity , and vice versa. The free RNA ensemble allowed by the long linker is broader and thus less likely to sample the bound conformation, resulting in weaker binding affinity (top). By contrast, a short linker results in a narrow ensemble centred around the bound conformation, which will result in tighter binding affinity (bottom). The free energy landscapes illustrate the extent of overlap between the free and bound ensembles for both examples. (Middle) Similarly , the degree of RNA activity correlates with the population of the active conformer in the ensemble. A small population of the active RNA conformation in the ligand-bound RNA ensemble will elicit low-level biological activity (top), whereas a large population of the active RNA conformation in the ligand-bound RNA ensemble will elicit high-level biological activity (bottom). The free energy landscapes illustrate the stability of the RNA in the active conformation for both examples. (Right) Selective pressure that favours dynamic ensembles (rather than a single conformation) can give rise to unique conservation patterns, which depend on the nature of the dynamics. In the example of sequence conservation, bases that form Watson-Crick pairs (paired red circles) in the native secondary structure form mismatches (paired red and yellow circles) in an alternative non-native secondary structure. The relative stabilities of these pairings determines the population of each secondary structure, and, thus, a mutation can affect this equilibrium by differently affecting the two structures. If the relative population of structures in the ensemble is important for function, there will be evolutionary pressure to maintain the relative stability of structures, rather than solely the stability of the native secondary structure. In the example of topological conservation, secondary-structure elements such as the length of junction linkers are important determinants of inter-helical dynamics. Thus, evolutionary pressure to maintain inter-helical dynamics can result in sequence-independent conservation of secondary structure.
NATURe RevIewS | MOlEculAR cEll BiOlOgy conformations in an ensemble with populations that fall within the detection limits of experimental techniques (Supplementary Box 1). Despite this limited view, the land scape of even a simple RNA such as TAR can be highly complex. A single native secondary structure dominates (~80%) the ensemble of free TAR in solution, but the 3D structure jitters on the picosecondto-microsecond timescale, thereby forming various conformations, in which the two helices of TAR are either stacked and relatively rigid (population of ~40%) or bent and flexible (population of ~40%) 54 (Fig. 2a). Also present are low-abundance non-native secondary structures (populations of ~10%, ~0.1% and ~0.001%) that form on the microsecond-to-millisecond timescale and that differ with respect to base pairing in and around the non-canonical bulge and apical loop 52 . At even lower abundance (population of ~0.00001%) are partially or fully unstructured conformations. As we discuss below, cellular modifiers and other perturbations change the relative popu lation of these RNA conformations rather than 'create' new conformations -all conformations are present on the landscape, but their abundance can vary greatly.

Cellular RNA conformational changes
The TAR example also provides a perspective on how RNA ensembles are redistributed and the biological consequences of this redistribution. Like many RNAs, TAR binds a protein, the viral transactivator Tat, which skews the TAR ensemble towards coaxial conformations that are stabilized by a base triple 55 (Fig. 2a). This conformational change facilitates proper assembly of an RNP complex that activates transcription elongation of viral genes. In addition, divalent metal ions such as magnesium (Mg 2+ ) skew the TAR ensemble towards coaxial conformations 54 that lack the base triple by neutralizing charge repulsion in and around the bulge (Fig. 2a). As we discuss below, other cellular factors can also redistribute the RNA ensembles; when not properly accounted for, these factors can cause differences in RNA ensembles measured in vitro versus in vivo.
Mutations, which are widely used to study RNA and also occur naturally, are another important mechanism of redistributing RNA ensembles. A single point mutation can be sufficient to stabilize a non-native TAR secondary structure that differs substantially from the major conformation sampled in the wild-type sequence, thereby changing its population from ~0.1% to >99% 52,56 (Fig. 2a). Likewise, increasing or decreasing the length of the TAR bulge can broaden or narrow the range of interhelical dynamics sampled by the dominant native state 57 . Mutation-induced changes in the RNA ensemble provide an avenue for studying the importance of structural dynamics in vivo, and are increasingly being linked to disease (Box 1).

Predictive value of ensembles
The dynamic ensemble description of RNA is important not just because it reflects the true structural nature of the molecule, but also because it enables describing and predicting key steps in RNA-mediated processes that involve conformational change that cannot be accurately modelled based on static RNA structures (Fig. 2b).
Energy is required to redistribute an RNA ensemble (Supplementary Box 2). This energy cost of redistribution (ΔG 0 redist ) 58 has to be paid for by the formation of favourable intramolecular contacts, such as tertiary interactions during folding or intermolecular interactions when an RNA binds a partner molecule. The energy cost will be large for large changes in the ensemble and will be zero when the ensemble does not change. For example, if binding a molecule leads to stabilization of a single RNA conformation, the population of that conformation in the free (unbound) ensemble will dictate the free energy cost (Fig. 2b and Supplementary Box 2). As a result, knowledge of ΔG 0 redist in addition to the energetics of binding (ΔG 0 bind ) is ultimately required to determine the overall strength of any interaction, be it the binding affinity of a protein or ligand to its target RNA or the stability of a folded conformation ( Fig. 2b and Supplementary Box 2). Importantly, whereas the energy cost cannot be inferred from static RNA structures, it can be determined from the original and redistributed ensembles (Supplementary Box 2).
The strength of the response to a given cellular modifier will often depend on the relative abundance of 'active' versus 'inactive' RNA conformations and/or on their kinetic rates of interconversion in the redistributed ensemble (Fig. 2b). Again, static structures do not provide this information. Finally, as we discuss below, static structures may not fully capture evolutionary conservation patterns that maintain the stabilities of multiple RNA conformations in the ensemble (Fig. 2b). For example, co-variation could exist between nucleotides that do not form base pairs in the energetically most favourable, native secondary structure because they form base pairs in alternative, higher-energy, non-native conformations that are functionally important 19 . This can be important when functionally annotating the transcriptome or when trying to understand the deleterious consequences of RNA mutations.
Organizing principles of RNA ensembles Dynamic ensembles of biomolecules contain hundreds of thousands of conformations that interconvert on timescales spanning 12 orders of magnitude. Organizing principles are emerging that simplify the description and determination of RNA dynamic ensembles, while also providing a unified framework for understanding their regulatory functions and their dependence on sequence versus secondary structure. These principles also have important implications for the evolvability of RNA (Fig. 3).

Structural motifs and their modularity
One of the major challenges in advancing our understanding of the connections between the RNA ensemble and RNA function is that determining ensembles by experimental or computational methods is very challenging and time-consuming even for simple RNAs such as TAR (Supplementary Box 1). Recently, an RNA reconstitution model was proposed that may facilitate the reconstitution of ensembles by interrogating their component motifs 59 . RNA structures consist of a limited number of secondary and tertiary motif 'building blocks' ,

Native secondary structure
The lowest-energy and therefore most populated secondary structure adopted by a particular rNA molecule.

Non-native secondary structures
Alternative secondary structures of rNA that are of higher energy than the native structure, but still form in solution with non-negligible probability.

Apical loop
A single-stranded rNA loop of variable length at the end of a helical region.

Coaxial conformations
Conformations in which two helices stack on each other across inter-helical junctions.

Base triple
A structural element in which three nucleotides are hydrogen bonded to one another.

Inter-helical dynamics
Conformational changes at junctions between two helices that lead to changes in the bend and twist angles between the two helices, thus greatly affecting the global conformation of the rNA.
www.nature.com/nrm which generally autonomously fold into similar structures in different RNAs 60 . For example, building blocks for tertiary structure such as tetraloops and their receptors fold into similar structures independently of the sequence or structural context outside the motifs. This principle of RNA structure modularity has been extended to thermodynamics through the concept of ensemble modularity 33,59 . Ensemble modularity posits that the free energy landscape of a given RNA motif is independent of the context outside the motif and that its conformational ensemble is dictated by its internal properties, which are influenced by additional geometric constraints and preferences from the remainder of the molecule (Fig. 3). In other words, the intrinsic energetic preference for or probability of forming a given conformation does not change with context even though additional intermolecular interactions could exist that redistribute the ensemble. In the RNA reconstitution model 33 , the free energy landscapes of constituent motifs are added to one another 61 to reconstitute the thermodynamic ensemble of an RNA assembly 59 . This suggests that insights into the dynamic behaviour of RNA motifs in complex structural contexts such as RNPs or lncRNAs can be obtained from studies of isolated motifs. The reconstitution model also reduces the number of motifs whose dynamics need to be characterized because one ensemble can be used to model a motif in different contexts 33 (Fig. 3). Thus, in the future, it may be feasible to reconstitute ensembles for a large variety of RNAs from an 'atlas' of motif ensembles 59 . As discussed below, recent advances in high-throughput technologies may provide an avenue for compiling such an atlas.

The timescales of motions
The decomposition of 3D structures into structural motifs has greatly aided structure-function studies. Analogously, decomposition of ensembles into motions that occur on different timescales may help elucidate ensemble-organizing principles that would otherwise be buried within the complexity of structural dynamics. A classification according to timescale is appealing not only because the kinetic rates of conformational change can be important determinants of resulting activity, but also because motions on different timescales often have distinct dependencies on sequence versus secondary structure. Different spectroscopic methods used to characterize RNA dynamics also tend to be sensitive to different timescale ranges 15 (Supplementary Box 1).
Free energy landscapes of proteins are hierarchically organized into different 'tiers' , which feature an increasing number of conformations that interconvert on faster timescales 24   of RNA also appears to be hierarchically organized 15 . The motions that have been characterized to date using various methods (Supplementary Box 1) can be classified into three tiers, which represent three different timescale ranges.
At the top tier, tier 0, a small number of conformations differ substantially in terms of the presence or absence of secondary or tertiary structures. Interconversion between these conformations requires the breaking of several base pairs and, consequently, they occur on slow timescales ranging from milliseconds to several hours (Fig. 3). Transitions between tier 0 conformations can sequester secondary structures from or expose them to the cellular machinery, make or break tertiary contacts that are required for tertiary folding into functional structures, or anneal or melt intramolecular or intermolecular duplexes 43 . Protein chaperones 50 are often required to accelerate tier 0 dynamics to biological timescales (that is, faster than milliseconds) 62 .
Each tier 0 conformation can be subdivided into tier 1 conformations that feature more subtle differences in base pairing and secondary structure in and around non-canonical motifs 52,[63][64][65] . Interconversion between these conformations typically requires the breaking of a single base pair and usually occurs on the faster timescale of microseconds to milliseconds, without assistance from protein chaperones 52,63-65 (Fig. 3). Relative to tier 0, many more iso-energetic conformations are likely to be found in tier 1 because they feature finer variations around the parent tier 0 conformation. Tier 1 dynamics can function as fast switches 63,66 or help break down slow tier 0 conformational transitions into multiple kinetically labile steps 67,68 .
Each tier 1 conformation can be subdivided into an even larger number of tier 2 conformational states, which interconvert on the picosecond to nanosecond timescale and include more continuous variations in sugar pucker, base orientation, backbone angles and the global orientation and translation of helices 53 (Fig. 3). Tier 2 dynamics enable RNA structures to be readily moulded into specific conformations, for example those that optimize interactions with proteins, ligands, DNA and other RNAs [69][70][71] .

Evolutionary conservation
Dynamics that involve changes in base pairing (tiers 0 and 1) are strongly dependent on sequence. Functionally important base-pair dynamics have been inferred in riboswitches based on evolutionary co-variations of sequence that support multiple RNA secondary structures 18,19 . Conservation at the level of the dynamic ensemble could also explain the lack of co-variation in the UTRs of various mRNAs despite a high degree of evolutionary sequence conservation 18 (Fig. 2b).
In contrast to tiers 0 and 1, the faster tier 2 motions depend more on the secondary structure of RNA than on its sequence 59,72 . For example, global motions of helical domains depend more on the length and asymmetry (or topology) of inter-helical junctions than on the sequence of the junctions 59,72 . The existence of functionally important tier 2 dynamics therefore implies that there are selective pressures to conserve RNA secondary structures independently of sequence, as indeed is the case for some tRNAs 73 . The topology of different junctions can also co-vary to ensure proper RNA tertiary assembly, and this can in principle result in co-variation of junction topology 59 . The topological requirements for certain RNA activities, such as the high cleavage efficiencies of certain microRNAs 74 , may in part reflect the finetuning of such dynamics, which optimize interactions with processing proteins.
In the future, it may be possible to infer functionally important dynamics at different tiers based on the conservations of sequence versus secondary structure and topology. For example, the conservation of interhelical junctions in lncRNAs such as COOLAIR may be owing to inter-helical motions that help lncRNAs with scaffolding functions to spatially organize protein, DNA and other RNA molecules that bind them [37][38][39] (Fig. 1g).
Together, ensemble modularity and the hierarchical nature of RNA free energy landscapes make RNA an evolutionary highly malleable substrate. Mutations within individual motifs that affect distinct dynamics tiers can be tested and selected for while minimally disrupting other motional modes or motifs and, thus, the core functionality of the RNA (Fig. 3).

Ensemble predictions in vitro
We now examine how the description of RNA as ensembles is being put into practice and improves the modelling and prediction of biochemical properties such as the binding affinity of small molecules, the stability of tertiary assembly, the robustness of RNA regulatory functions and the fidelity (accuracy) of RNA-dependent processes. In this section we focus on studies performed in vitro, and in subsequent sections we discuss extensions of the ensemble description to predict aspects of RNA folding and activity in vivo. Because these studies are in their infancy, we focus on examples that employ experimentally verified RNA ensembles, which are still generally more reliable than computationally predicted ensembles 25,75,76 .

Prediction of ligand binding
A growing number of RNAs have been linked to diseases, which has spurred great interest in developing small-molecule inhibitors that target the 3D structure of RNA rather than conventionally targeting the RNA sequence 3,4 . Changes in RNA structure upon smallmolecule binding present significant challenges to structure-based drug discovery using, for example, computational docking 77 , but they also present opportunities, as stabilizing inactive RNA conformations with small molecules is a proven concept and the functional mechanism of many antibiotics that target the ribosome 21 . For example, the binding affi nity of 38 small-molecule TAR binders was calculated with reasonable accuracy by computationally docking small molecules to an NMR-derived dynamic ensemble of TAR containing 20 conformations (Fig. 4a). The predictions deteriorated considerably when docking to static NMR or X-ray structures. This ensemble-based virtual screen identified a compound that binds TAR with high selectivity and that inhibits HIV replication

Sugar pucker
A conformational description of the ribose sugar ring in nucleic acids. The sugar pucker tends to be predominately C3′-endo for the helical A-form rNA conformation.

Computational docking
The use of computational algorithms to predict the lowest-energy conformation for the binding of a small molecule to a receptor molecule (rNA or protein).

www.nature.com/nrm
in a cell line with half-maximal inhibitory concentration of ≈23 μM (reF. 78 ). In a retrospective study, ensemblebased docking was able to distinguish 26 hits from 102,307 non-hits 79 . By contrast, when docking to static structures or lower-quality ensembles, the predictions were of lower quality and exhibited large variability (Fig. 4a). The computational-docking prediction of TAR ensemble redistribution following small-molecule binding was in reasonable agreement with experimentally determined structures of TAR-small-molecule complexes 79 . Further improvements in virtual screening may enable the rational identification of compounds that specifically bind and stabilize inactive conformations in the RNA ensemble.

Predicting tertiary assembly
Modelling tertiary interactions requires consideration of the nucleotides involved in the tertiary contact and the relative alignment of the tertiary receptors, which is often dictated by helix-junction-helix (HJH) motifs such as bulges, internal loops and higher-order junctions that adjoin RNA helical stems 72 Fig. 4 | Ensemble-based modelling of RNA activity in vitro. a | The RNA-binding affinity of a library of unique small molecules (represented by different symbols) containing both hits (red) and non-hits (grey) can be calculated computationally by virtually docking small molecules to dynamic RNA ensembles. The affinity predictions deteriorate considerably when static structures are used for the screening 78,79 . b | The tectoRNA host-guest system is a heterodimer of two structured RNAs (the host and the guest) connected by two tetraloop-tetraloop receptor tertiary contacts. The energetics of tertiary assembly of the tectoRNA host-guest system can vary considerably depending on the relative alignment of the tertiary receptors, which is dictated by helix-junction-helix (HJH) motifs such as mismatches, bulges and internal loops. The energetics of tertiary assembly is much better modelled by considering the guest HJH motifs as dynamic ensembles compared with static structures. c | G·T and G·U mismatches form wobble conformations that differ from the Watson-Crick geometry. On this basis, polymerases and ribosomes can discriminate against mispairing and reduce the error frequency during DNA replication, transcription and translation. However, the ensembles of G·T and G·U mismatches also include low-abundance, short-lived Watson-Crick-like conformations that are stabilized by rare tautomeric (blue) and anionic (green) bases. The population and lifetime of these rare species were integrated into a kinetic model, which could predict the probability of dG·dT misincorporation over a wide range of conditions and for modified mutagenic bases 63,64 . Part c is adapted from reF. 63 , Springer Nature Limited.

Ensemble redistribution
Changes in the abundance of two or more conformations in the rNA ensemble, which are induced by a cellular cue such as the binding of a protein or ligand or by post-transcriptional modifications.

Tertiary receptors
regions of rNA molecules that are involved in tertiary, long-range interactions.
NATURe RevIewS | MOlEculAR cEll BiOlOgy system is a tertiary-interactions modelling system, which includes a heterodimer composed of two structured RNAs connected by two intermolecular tetralooptetraloop receptor tertiary contacts (Fig. 4b). The 'guest' RNA contains variable HJH motifs that alter their alignment to enable the formation of tertiary contacts with different energetic penalties for tertiary assembly. Ensemble models of HJH motifs were successfully used to predict the energetics of tertiary assembly for thousands of tectoRNA variants encompassing different HJH motifs, measured with the quantitative high-throughput method RNA-MaP 59 (Supplementary Box 1). A dynamic ensemble description of RNA helices and HJH motifs was more predictive of tertiary assembly energetics compared with single structures 59,81 (Fig. 4b).
By providing ensemble information in a highthroughput manner (Supplementary Box 1), RNA-MaP brings within reach the ability to determine experimentally verified ensembles of a vast number of RNA motifs. As stated above, such an atlas can be used with a reconstitution model to streamline and accelerate the determination of ensembles for more complex RNAs and to provide rich data that can guide the development of next-generation nucleic acid force fields for computer modelling 75,76 .

Robustness of RNA regulatory functions
Ensembles are helping to model the robustness of RNAbased regulation. For example, a three-state ensemble description of the adenine-sensing translation riboswitch from the pathogenic bacterium Vibrio vulnificus helped explain how it maintains robust switching efficiency over a wide range of biologically relevant temperatures 82 . The riboswitch is present in the 5′ UTR of the add gene and activates translation through a conformational change that is stabilized by binding to adenine. NMR studies found that, before ligand binding, the apo (ligand-free) ensem ble of this riboswitch consists of two major conformations, only one of which is binding competent. The relative populations of these conformations are sensitive to temperature such that it counteracts the temperaturedependent ligand-binding affinity, thereby leading to robust switching of the riboswitch and translation regulation 82 . Specifically, at low temperature, the population of the binding-competent conformation is small, which counteracts high-affinity binding, and vice versa at high temperatures. Computational simulations showed how the observed temperature-dependent pre-equilibrium could maintain more robust switching efficiency than a simple single-state binding model.

Predicting fidelity
The mechanisms that ensure high fidelity of DNA replication, transcription and translation highlight the biological importance of very low-abundance conformations. NMR 30,31 and X-ray 41,42 studies show that G-T and G-U wobble mismatches in nucleic acid duplexes exist in vitro as ensembles, which include Watson-Crick-like conformations that form through tautomerization or ionization of the bases (Fig. 4c). By mimicking the Watson-Crick geometry, these low abundance and short-lived Watson-Crick-like conformations (populations as small as 0.001% and lifetimes as short as microseconds), which feature sub-angstrom differences in structure relative to the dominant wobble conformation, are proposed to evade fidelity checkpoints and contribute to the introduction of errors during replication, transcription and translation. An ensemble-based kinetic model (Fig. 4c) could predict the frequency of G-T misincorporation during replication across a wide range of pH conditions and types of polymerases, as well as the misincorporation frequencies of modified mutagenic bases 63 . Mutagenic modifications can bias the ensemble of DNA duplexes towards the tautomeric or anionic Watson-Crick-like conformations 63 . Likewise, the tRNA modi fication uridine 5-oxyacetic acid (cmo5U), which is found at the tRNA wobble position, redistributes the G-cmo5U ensemble in the minihelix formed by pairing the mRNA codon and tRNA anticodon towards anionic Watson-Crick-like conformations, thereby expanding the translational decoding capacity to include this G-cmo5U mismatch 83 .

Effects of the cellular environment
It is currently not feasible to determine high-resolution ensembles for RNA within the complex cellular environment. Therefore, to extend the ensemble description to cells, we need to determine what aspects unique to the cellular environment affect the RNA ensemble and/or its redistribution, and to devise ways to describe and reconstitute these contributions in vitro (Fig. 5).

Cellular constituents
The cellular environment is heterogeneous and consists of many microenvironments that differ in pH, molecular crowding and composition of metals and co-solutes such as osmolytes; these constituents can also vary between cell types and growth conditions. In vitro studies have examined how these different constituents affect the RNA ensemble and its redistribution, and have provided guidelines for optimizing in vitro conditions to mimic the in vivo environment.
Metal ions. Every RNA is surrounded by an ion atmosphere, which profoundly affects its folding and interactions [84][85][86] . In particular, divalent metals such as Mg 2+ are important for the proper folding and function of RNAs [87][88][89] . The nature of the ion atmosphere and how it affects RNA behaviour may differ in cells compared with typical in vitro conditions. For example, in cells, the free Mg 2+ concentration is ~1 mM, but additional chelated Mg 2+ ions exist that raise the total concentration to ~50 mM. Using amino-acid chelated Mg 2+ to model cellular chelated Mg 2+ , a recent study found that weakly chelated Mg 2+ biases the ensemble towards compactly folded conformations to a similar degree as free Mg 2+ does, thus decreasing RNA degradation and increasing ribozyme activity 90 .

Molecular crowding.
Cells contain 20-40% (w/v) macro molecules that can exclude volume, interact with RNA and alter the solution properties. The effect of molecular crowding on RNA ensembles has been empirically examined using reagents that mimic different aspects of crowding in vitro (reviewed in reF. 91 ).

Nucleic acid force fields
The physical models used to computationally predict nucleic acid dynamics in molecular dynamics simulations.

Tautomerization
The interconversion between two molecules with the same molecular formula but different connectivity.

www.nature.com/nrm
Crowding agents with high molecular weight have been suggested to bias RNA ensembles towards tertiary folding through simple excluded-volume effects 92,93 , and this simple excluded-volume effect has been proposed to explain the enhanced catalytic activities of ribozymes [94][95][96] and enhanced binding affinities of ligands to riboswitches in crowded conditions 97 . Molecular crowding can stabilize tertiary folding of the Azoarcus group I ribozyme and compensate for loss of activity owing to destabilizing mutations, which possibly explains differences in the activity of these mutants when measured in vitro versus in vivo 98 . The cellular environment can therefore alter the energy cost of redistribution relative to in vitro conditions, where crowding is not taken into consideration.

Cellular solutes.
Small cellular co-solutes such as osmolytes, including methylamines, amino acids, sugars and alcohols, appear to have different effects on the RNA ensemble compared to crowding agents with high molecular weight, and the effects can vary significantly in different Mg 2+ concentrations. Nine osmolytes were found to destabilize the RNA secondary structure to varying extents and to either stabilize or destabilize the RNA tertiary structure 99,100 . The stabilization of the tertiary structure was attenuated or even reversed in the presence of high Mg 2+ concentration 99,100 . Osmolytes appear to have such complex effects because they form favourable interactions with exposed nucleobases and also form unfavourable interactions with the exposed ribose-phosphate backbone 99 . Other small cellular solutes such as metabolites have been found within RNA ligand-binding sites and shown to decrease the apparent binding affinity to cognate ligands 101 . This is an important example of how the cellular environment can affect RNA activity through mechanisms that do not necessarily involve ensemble redistribution.  Linchpin breakage allows the invasion of the mRNA part of the RNA molecule and causes the riboswitch to refold and form a terminator helix, which terminates transcription. b | As mRNA is being transcribed by an RNA polymerase (RNAP), its free energy landscape changes. The elongating transcript may fold into an initial ensemble, which redistributes following the synthesis of additional nucleotides. c | The RNA post-transcriptional modification N 6 -methyladenosine (m 6 A) reduces the energetic cost (ΔG 0 redist ) of conformational changes, thereby enabling the opening of the RNA duplex structure and promoting the binding of heterogeneous nuclear ribonucleoprotein C (HNRNPC) to mRNAs at single-stranded RNA regions 124 . Apo, ligand free; F -, fluoride ion. Part a is adapted from reF. 66 , Springer Nature Limited.

Liquid-liquid phase separation
An important cellular phenomenon is the organization of specific RNAs, proteins and other molecules into phase-separated granules 102,103 . These biological condensates are thought to regulate gene expression by compartmentalizing and concentrating specific molecules and machineries to change reaction specificities or kinetics, or by direct subcellular relocalization 102,104,105 . RNA-containing biological condensates, or RNA granules, which include, among others, processing bodies, stress granules, nucleoli, paraspeckles and germ granules, form through complex multivalent RNA-RNA interactions and RNA interactions with proteins, often through intrinsically disordered regions of proteins 105,106 .
We expect RNA ensembles to help determine the strength and dynamics of RNA-RNA and RNA-protein multivalent interactions, and, therefore, whether a given RNA undergoes liquid-liquid phase separation. Dynamic RNA-protein interactions are also proposed to contribute to the properties of the droplets, such as their fluidity 41,42 . Indeed, studies show that the identity of droplet populations is at least in part defined by the RNA secondary structure, as two functionally disparate mRNAs that bind the same protein but form different secondary structures were shown to form distinct droplets that do not mix in vitro and in vivo 107 . By contrast, RNAs that form non-specific interactions can prevent phase separation in vitro and in vivo 108 .
In turn, the biological condensates create unique environments that may affect the RNA ensemble and its redistribution energies. For example, the activity of the hammerhead ribozyme increased 70-fold in phase-separated droplets formed by crowding agents of high molecular weight in vitro, likely due to stabilization of the folded RNA conformation 40 . Furthermore, conditions that favour droplet formation can increase the dynamics of protein-bound RNA, measured using single-molecule fluorescence microscopy 41,42 .
Examining how RNA ensembles influence the formation of granules and in turn are affected by them will benefit from reconstituting in vitro at least certain aspects of RNA granules [107][108][109] . The formation in vitro of RNA foci that are common in many neurological diseases was shown to be inhibited by the same RNA intercalating molecule that inhibits the formation of these foci in cells and patient primary tissues 109 . Although having the ability to reconstitute droplets in vitro is an important step towards their in-depth characterization, new techniques will likely have to be developed to shed light on the behaviour of RNA and its ensemble within these environments.

Co-transcriptional folding
Within cells, many RNAs fold as they are being synthesized in a process termed 'co-transcriptional folding' . Many regulatory processes that are coupled to cotranscriptional folding require descriptions of ensembles and free energy landscapes, which can change over time as the transcript is elongated. The time available to redistribute the ensemble during co-transcriptional folding is crucial for predicting the structural landscape and therefore the functional output of such processes.
Riboswitches are highly structured regulatory RNA elements typically found in 5′ UTRs, which regulate gene expression in response to cellular modifiers such as metabolites 8,10 . Riboswitch folding illustrates the importance of co-transcriptional folding to RNA function. Many riboswitches are activated in vivo at metabolite concentrations that exceed the ligand-binding affinities measured in vitro for pre-folded RNAs. For example, the thiamine pyrophosphate binding riboswitch binds thiamine pyrophosphate in vitro with an apparent dissociation constant (K d ) of 50 nM (reF. 8 ), whereas the thiamine pyrophosphate concentration that causes the riboswitch to terminate transcription is in the micromolar range 28 . For riboswitches that regulate gene expression at the transcriptional level, co-transcriptional mRNA folding defines a time window during which ligand binding can redistribute the riboswitch ensemble, leading to the formation of conformations that either promote or terminate transcription of the rest of the mRNA. Because ligand-binding kinetics can be slow relative to this time window 110 , binding does not always reach thermodynamic equilibrium and riboswitch activation requires ligand concentrations that exceed the dissociation constant 111 .
Another striking difference between riboswitch activity in vitro and in vivo is that although ligands activate riboswitches during co-transcriptional mRNA folding, at equilibrium in physiological concentrations of Mg 2+ , the riboswitches often fold into identical conformations in the presence or absence of ligand 66,[112][113][114] . How, then, do these riboswitches switch between the active and inactive conformations? Studies increasingly indicate that metastable conformations in the ensemble that form during co-transcriptional folding are stabilized or destabilized in a ligand-dependent manner and thus bias riboswitch folding towards active or inactive conformations 66,[112][113][114][115] . For example, the sensing aptamer domain of the Bacillus cereus fluoride riboswitch folds into nearly identical structures in the presence or absence of its cognate ligand fluoride 66 . However, NMR studies show that the dynamics of the ligand-free (apo) and ligand-bound ensembles differ: the apo ensemble uniquely includes a low-abundance (population of ~1%) and short-lived (lifetime of ~3 ms) conformational state, which is proposed to increase the vulnerability of the riboswitch to mRNA-strand invasion, thereby biasing riboswitch folding towards supporting transcription termination 66 (Fig. 5a). Ligand binding decreases the population of this conformational state, thereby protecting the riboswitch from strand invasion and redirecting folding towards supporting transcription. A kinetic mechanism that integrates such a two-state ensemble predicted the dependence of riboswitch activity on transcription rates 66 .
Unique to co-transcriptional folding are the changes to the free energy landscape and RNA ensembles that can occur during transcription elongation, as new sequence elements are transcribed (Fig. 5b). The time available to redistribute the ensemble will vary depending on the rate of transcription and the absence or presence of transcription pausing sites 111,116 . Insights into the changing RNA ensemble can be obtained from studying www.nature.com/nrm the ensemble behaviour of variable-length transcripts. For example, for both a 2ʹ-deoxyguanosine-sensing transcription riboswitch from Mesoplasma florum 115 and a guanine-sensing riboswitch from Bacillus subtilis 112 , the relevant active conformations are favoured for a particular range of nascent-mRNA length in the absence of ligand; when the length of the nascent transcript exceeds the threshold, the riboswitch folds into the alternative, inactive conformation. During co-transcriptional mRNA folding, these length-dependent metastable active conformations are sufficiently long-lived to avoid transcription termination. Ligand binding prevents the formation of these metastable states and stabilizes the inactive conformations (Fig. 5b). Because the conformational changes and ligand binding are all under kinetic control, transcription rates and pause sites are crucial for riboswitch regulation 112,113 . Other methods for reconstituting co-transcriptional folding now provide a basis for further exploring this fascinating and highly complex process 114,[117][118][119] .

Post-transcriptional modifications
Many RNAs in cells, including mRNAs, microRNAs and lncRNAs, are post-transcriptionally modified through different chemical modifications 120 . Post-transcriptional modifications such as N 6 -methyladenosine (m 6 A) can affect the cellular function of RNA by recruiting specific m 6 A-reader proteins or by blocking key mRNA interactions during translation, and there is also evidence that they can modulate RNA activity by redistributing RNA ensembles [121][122][123][124][125][126] . For example, by destabilizing RNA helices, m 6 A can increase the binding affinity of an RBP to its RNA target 127,128 (Fig. 5c). Conversely, m 6 A can strongly inhibit RNA-protein interactions by biasing the ensemble of a non-canonical A-G mismatch of some box C/D small-nucleolar RNAs away from the conformation required for their proper folding 129 . By contrast, N 1 -methyladenosine (m 1 A) facilitates proper folding of certain tRNAs 130 by potently destabilizing base pairs in non-native conformations 130 ; similar mechanisms could explain how m 1 A promotes mRNA translation 131 .
Understanding RNA activity in vivo Although we are still far from determining RNA ensembles in vivo and using them to better understand and predict cellular RNA activity, transcriptome-wide studies of RNA structure and of RNA-protein binding are helping us to understand RNA behaviour in vivo. Different perspectives are emerging from these studies that appear to depend on the method used or the system studied, thereby highlighting important challenges that need to be addressed before we can describe RNA ensembles in vivo.

Transcriptome ensembles
Transcriptome-wide structure probing experiments, which measure the reactivity of nucleotides to vari ous chemical reagents, are beginning to illuminate how the cellular environment affects RNA folding. The reactivity of nucleotides provides low-resolution information on RNA structure, for example on whether a nucleotide is paired or not. Because the reactivity of nucleotides is averaged over all RNA conformations present during the reaction time, they carry information regarding the RNA ensemble within cells, although many studies interpret the data assuming a single RNA conformation.
Early structure-probing studies indicated that the reactivity of cellular RNAs, particularly mRNAs, to chemical probes is higher in cells compared with in vitro experiments, which typically remove any bound proteins and also include denaturing and refolding, indicating that mRNAs are less structured in vivo than in vitro 125,132,133 . Recent improvements in structure probing, which focus on modelling RNA structures of individual transcripts, suggest that the structures of mRNAs in vivo are transiently destabilized by active translation 134,135 . Other than translation-dependent unfolding, mRNA reactivity was found to be generally similar in vitro and in vivo, with lowly translated mRNAs being minimally perturbed, suggesting that RNA is similarly structured in vivo as it is in vitro 134 . Another recent study identified some differences in RNA structure in vitro and in vivo at different cellular compartments, which were linked to cellular processes including RBP binding, transcription, translation and RNA decay 136 .
Similarly to studying the occurrence of structural dynamics on different timescales, structure-probing data can also provide different views of the RNA ensemble, which depend on the experimental technique used to produce them. For example, dimethyl sulfate profiling of mammalian RNA found that although thousands of RNA G-quadruplex structures (RG4) in G-rich sequences form stably in vitro, they are almost entirely unfolded in vivo 137 . Cellular factors such as helicases and RBPs were proposed to either destabilize or to tightly regulate the formation of RG4 (reF. 137 ). Although these data strongly suggest that in vivo the RG4 ensemble is very biased towards unfolded conformations, they do not preclude the possibility that RG4 form transiently and to a degree that is undetectable by dimethyl sulfate profiling. Indeed, a recent study using a crosslinking technique to obtain global snapshots of RNA structure suggests that RG4 form transiently 138  Interpreting chemical probing data in terms of dynamic ensembles is challenging. In addition to low sensitivity to rare transient structures, the quantity and complexity of the data are low relative to the number of parameters needed to define an ensemble, and the nature of the dependence of chemical reactivity on structure is not entirely understood 139 . Nevertheless, methodologies are being developed to interpret chemical probing data in terms of secondary structure ensembles [139][140][141][142] , using strategies similar to those developed to generate NMR 3D ensembles 143 (Supplementary Box 1). This in turn provides insights into how the cellular environment redistributes ensembles compared with the ensembles determined in vitro. Chemical probing data were used to compare the ensemble of the human ACTB mRNA (encoding β-actin) in vitro and in cells 144 . Based on differences in SHAPe (selective 1ʹ-hydroxyl acylation analysed by primer extension) reactivities 144 a region that harbours a protein-binding site was proposed to form distinct ensembles in vitro and in vivo. In the in vivo ensemble, the dominant structure exposes the protein-binding site, whereas it is more occluded in the structure dominating the in vitro ensemble. Interestingly, the dominant in vitro structure was a minor population in the in vivo ensemble, suggesting that the same underlying ensemble is redistributed under cellular conditions (Fig. 6a).
Methods such as 'mutate and map' 65,146 (Supplementary Box 1), which increase the information content of chemical probing data by examining how point mutations affect reactivity, hold great promise in determining transcriptome-wide RNA ensembles in vivo, although technological advances will be required to maximize their potential.

RNA-binding molecules
The complex cellular environment could in principle affect the RNA-binding preferences of RBPs compared with the in vitro environment; inversely, the binding of RBPs in vivo could alter RNA behaviour compared with the behaviour in vitro. Many RBPs bind specific RNA motifs with high affinity in vitro. By contrast, some crosslinking experiments in vivo indicate that most RNA motifs are not bound by their respective RBPs, raising the question of why the sites are differently bound in cells 147 . One possible reason for this different binding of sites is differences in the formation of RNA secondary structures around the motif, which can occlude binding sites 32 (Fig. 6b). Better agreement between in vitro and in vivo binding was obtained when longer RNA transcripts of ~100 nucleotides were used   and in vivo ensembles described using selective 1′-hydroxyl acylation analysed by primer extension (SHAPE) data for a region of the human ACTB mRNA that contains two binding sites for zipcode binding protein 1 (ZBP1; also known as Insulin-like growth factor 2 mRNA-binding protein 1). Circle areas are proportional to the population of the conformation they represent. The data demonstrate a redistribution of the ensemble away from the dominant structure in vitro (green), in which a ZBP1 binding site is occluded, towards another structure in vivo (purple), in which the ZBP1 binding site is exposed. b | RNA secondary structures can determine whether or not a recognition motif for an RNA-binding protein will be bound and active in vivo. For example, the splicing factor RNA binding fox-1 homologue 2 (RBFOX2) enhances splicing by preferentially binding its target RNAs in less structured regions. c | A new method identifies compounds that bind their target RNAs specifically within the cellular context by self-assembling into multivalent compounds using click chemistry 148 . This has been applied to inhibit muscleblind-like 1 protein (MBNL1) binding to expanded CCUG repeats in the myotonic dystrophy type 2 mRNA. When the compound self-assembles along the repeats, the molecules link to form a single polymer (yellow lines zig-zag to indicate covalent binding). When this occurs, MBNL1 is not sequestered by the repeats and functions normally , thereby eliminating disease symptoms. Part a is adapted from reF. 144 , CC-BY-4.0 (https://creativecommons.org/licenses/by/4.0/). www.nature.com/nrm in vitro, which allow for the formation of local secondary structures. These results suggest that similar ensembleredistribution energetic penalties are required to access the binding motif in vitro and in vivo when the relevant structural context is included and that RNA structure is the major determinant of protein binding as opposed to other possible factors such as cellular localization or competitive binding 32 .
By contrast, a study comparing the RNA-binding preferences of the human RBPs Pumilio homologue 1 and Pumilio homologue 2 in vitro with their transcriptome binding preferences in vivo found that RNA secondary structures can limit the accessibility of binding sites in vitro but have negligible effects in vivo 148 . Thus, the cellular environment -possibly the binding of RBPs or helicases -seems to bias the RNA ensemble towards the unfolded conformation. Using data for binding of these proteins to thousands of RNAs, a thermodynamic model was developed that could accurately predict transcriptome-wide in vivo crosslinking data 148 .
The cellular environment can also affect the binding of in vitro-identified therapeutic small molecules to their intended RNA targets 4 . A new method selects for compounds that bind their target RNAs specifically within the cellular context using self-assembling smallmolecule inhibitors that bind to sequence repeats in an RNA target and are assembled into multivalent compounds using click chemistry 149 (Fig. 6c). Performing this reaction in vitro and in vivo on RNA CCUG repeats, which cause myotonic dystrophy type 2, resulted in the same strong binding of a multivalent compound to the targeted repeat. Thus, in this case, the RNA structure is likely unchanged between the cellular environment and in vitro conditions, and retains the ability to bind and assemble an inhibitor even though the repeats are much longer in vivo than those tested in vitro. This similarity in binding further demonstrates ensemble modularity, because increasing the number of repeat motifs in a single RNA does not change the binding properties of the individual motifs.

Conclusion and future perspectives
The description of RNA in terms of dynamic ensembles is changing our understanding of RNA-regulated processes in vivo. Determining ensembles remains far more difficult than solving structures, and doing so in vivo is a daunting challenge. Meeting this challenge will require an integrated approach through close collaborations between experimentalists and computational scientists to facilitate the continued development of force fields 75,76 for RNA ensemble modelling in the cellular environment to a level that allows predictions to be made and tested experimentally. We propose four principles of ensemble redistribution to help guide these future efforts.
• Principle #1. Cellular factors redistribute the ensemble of full-length RNAs by changing the relative populations of pre-existing conformations, which correspond to local energetic minima along the free energy landscape. Cellular factors do not reshape the free energy landscape by creating new energetic minima. This limits the ways with which cellular factors can act on RNA to alter its behaviour within cells. • Principle #2. Although the unique cellular environment can redistribute RNA ensembles compared to in vitro conditions, the physiological ensemble can be reconstituted in vitro using proper RNA samples (length, absence or presence of modifications, and so forth) and including relevant cellular components within the buffer. Reconstituting the cellular environment in vitro provides a means to test for and determine physiologically relevant ensembles at high resolution, which complements approaches that determine RNA ensembles using lower-resolution measurements performed in vivo. • Principle #3. RNAs that function during co-transcriptional mRNA folding require a time-dependent ensemble description, in which new transient conformations can be created that carry out unique functions. Such conformations may not be observed in studies of the full-length RNA ensemble and can only be characterized by analysing the evolution of the ensemble during co-transcriptional folding or through studies of RNA transcripts of variable lengths. • Principle #4. Conservation of RNA functional dynamics can result in unique evolutionary conservation patterns. Thus, searching for evolutionary conservation of RNA ensembles, in addition to structure, may aid the functional annotation of non-coding RNAs and help to discriminate functional RNAs from transcriptional noise. As applications of RNA ensembles in drug discovery and synthetic biology already appear on the horizon, there is little doubt that higher-level understanding of RNA ensembles will lead to other, unexpected discoveries.

Click chemistry
Simple and robust chemical reactions commonly used to covalently join specific substrates.