Journal of Molecular Biology
Volume 381, Issue 4, 12 September 2008, Pages 1055-1067
Journal home page for Journal of Molecular Biology

Simulating RNA Folding Kinetics on Approximated Energy Landscapes

https://doi.org/10.1016/j.jmb.2008.02.007Get rights and content

Abstract

We present a general computational approach to simulate RNA folding kinetics that can be used to extract population kinetics, folding rates and the formation of particular substructures that might be intermediates in the folding process. Simulating RNA folding kinetics can provide unique insight into RNA whose functions are dictated by folding kinetics and not always by nucleotide sequence or the structure of the lowest free-energy state. The method first builds an approximate map (or model) of the folding energy landscape from which the population kinetics are analyzed by solving the master equation on the map. We present results obtained using an analysis technique, map-based Monte Carlo simulation, which stochastically extracts folding pathways from the map. Our method compares favorably with other computational methods that begin with a comprehensive free-energy landscape, illustrating that the smaller, approximate map captures the major features of the complete energy landscape. As a result, our method scales to larger RNAs. For example, here we validate kinetics of RNA of more than 200 nucleotides. Our method accurately computes the kinetics-based functional rates of wild-type and mutant ColE1 RNAII and MS2 phage RNAs showing excellent agreement with experiment.

Introduction

Ribonucleic acid performs diverse and important functions in the cell. Messenger RNAs, transfer RNAs and ribosomal RNAs function as integral components of the protein synthesis machinery whose workings can be understood at nearly atomic detail,1 while other RNAs are involved in the mRNA maturation as part of the spliceosome.2 Recent work has highlighted the ability of noncoding and plant microRNAs to base-pair with complementary regions of mRNAs that target them for degradation.3 Another class of folded RNA domains, termed riboswitches, are found in the 5′ untranslated regions of mRNAs and function as molecular switches by binding a specific metabolite that changes the structure of the mRNA; this, in turn, controls transcription termination/anti-termination or translation initiation.4 Detailed structural and physicochemical studies of metabolite-sensing riboswitches have led to the hypothesis that the function of these domains may be kinetically controlled, with regulation of the flavin mononucleotide riboswitch, in particular, controlled by the relative rates of flavin mononucleotide binding and RNA transcription under the prevailing intracellular concentrations of metabolite.5 Thus, this ligand-controlled conformational switch may operate under kinetic control, with the relative stabilities of the two, often mutually exclusive, conformational states (with and without ligand) perhaps less important.

Earlier work on the control of Escherichia coli ColE1 plasmid copy number and translation initiation in bacterial and bacteriophage mRNAs hinted strongly that the kinetics of folding might be capable of dictating biological function. For example, accelerating the refolding rate of RNA II can increase the ColE1 plasmid copy number.6, 7 In the control of translation initiation, mutations of IS10 transposase mRNA that slowed the folding kinetics of structure formation increased the rate of ribosome binding resulting in a higher expression of IS10 transposase.8 For the bacteriophage MS2 maturation protein, the mRNA is efficiently translated only when the region around the Shine–Dalgarno sequence, which base-pairs with the 16S ribosomal RNA, is found in an unpaired or “open” conformational state. Since it is closed (paired) in the native state, this can only happen before folding finishes. The longer the RNA remains in the open “metastable” state, the higher the gene expression rate. Recently coined RNA thermometers associated with the heat-shock response in α- and γ-proteobacteria are thought to function in the same way, by controlling translation initiation via a temperature-dependent conformational change around the Shine–Dalgarno sequence.9, 10 These few examples highlight the importance of developing a robust computational method that can be used both to study the global properties of RNA folding and to provide more detailed insights of the kinetics of folding RNA substructures as well as the rates of interconversion between these substructures as a function of temperature.

A number of computational approaches have been developed to investigate the kinetics of RNA folding. For instance, folding pathways have been identified using both Monte Carlo (MC)-based algorithms11, 12, 13 and genetic algorithms.14, 15, 16 Other approaches utilize dynamic programming to calculate the partition funtion Q = sexp(−ΔG(s)/kT) over all secondary structures s to gain insight into RNA folding kinetics.17 The ViennaRNA package implements McCaskill's algorithm in addition to an expanded set of nearest-neighbor free energies to account for noncanonical pairings.18 Ding and Lawrence extended this algorithm to generate statistical samplings of RNA structures based on the partition function.19 Some methods incorporate computation of the RNA folding energy landscape. Chen and Dill used matrices to compute the partition function over all possible structures in order to approximate the complete folding landscape.20 Wuchty modified Zuker's algorithm to generate all secondary structures within some given free energy of the native structure,21 and Clote developed an algorithm to generate all secondary structures below a threshold with respect to the Nussinov–Jacobson energy model.22 Flamm11 and Wolfinger et al.23, 24 extended this algorithm to identify local minima within some energy threshold of the native state and connect them via energy barriers, with the resulting energy barrier tree representing the energy landscape. In order to calculate the energy barrier, these authors used a flooding algorithm that is exponential in the size of the RNA; it is therefore impractical for large RNAs. Waldispühl and Clote created an algorithm to generate all saturated secondary structures having k base pairs.25 Statistical mechanical methods have also been used to study RNA folding kinetics. For example, the master equation has been used to compute the population kinetics of the folding landscape, with a matrix of differential equations used to calculate the probability of a transition between different conformations.26, 27, 28 Once solved, the dominant modes of the solution describe the general folding kinetics.29 In addition, one can also approximate the folding kinetics by performing a statistical analysis on an ensemble of MC folding pathways.30, 31, 32

In this work, we present a novel suite of computational tools that can be used to approximate the folding energy landscape and extract both global properties and detailed features of the folding process. The key advantage of our approach over the other computational techniques discussed above is that it is fast and efficient while providing both macroscopic and microscopic properties of the folding kinetics, e.g., population kinetics and low-resolution folding details. Our method first builds a map, or model, of the RNA folding energy landscape. We then adapt standard energy landscape analysis tools to analyze the energy landscape represented by our map; the solution of the master equation on our map (map-based master equation or MME) allows us to investigate properties of the ensemble during the time course of folding. We also present another technique, termed a map-based Monte Carlo (MMC) simulation, to extract microscopic folding pathways stochastically from our maps. These tools allow us to investigate the formation rates of transition states and therefore provide a comprehensive picture of the rates at which folding intermediates are formed and lost, as well as the final equilibrium distribution of folded conformers. A new statistical sampling method ensures that our method scales well and is applicable to large RNAs containing hundreds of nucleotides. Finally, we validate our method against traditional MC simulation methods that require a complete energy landscape, as well as against existing experimental data in two systems.

Section snippets

Summary of the method

Our work provides computational tools to approximate the RNA folding energy landscape and extract global properties and detailed features of the folding process. The key advantage of our approach over other computational techniques is that it is fast and efficient while bridging the gap between high-level folding events and low-level folding details. Our method builds an approximate representation (called a map) of the RNA's folding energy landscape and then uses specialized analysis techniques

Conclusions

We present a new model and new map-based analysis tools that can be used to study RNA folding kinetics and provide specific insights into both local and global features of the process. These new tools enable us to study larger RNAs than before, up to hundreds of nucleotides. The method is validated against known experimental data for two classic cases in detail. We anticipate that our method will be a valuable tool for discovering such relationships for other RNAs that have not yet been

General computational methods

Our method first constructs a map that approximates the energy landscape and then uses a number of map-based tools to analyze the approximated energy landscape. In our current implementation, we are able to generate three types of maps based on complete BPE, SPE and PBS. While a BPE map describes the complete energy landscape, it is not feasible for large RNA, e.g., more than approximately 40 nucleotides. SPE maps are a subset of BPE maps and are 1 or 2 orders of magnitude smaller but are still

Acknowledgements

We acknowledge grants from the NSF (EIA-0103742, ACR-0081510, ACR-0113971, CCR-0113974, ACI-0326350, CRI-0551685 to N.M.A.), the NIH (AI040187 to D.P.G.), the Department of Energy and the Hewlett-Packard Foundation (to N.M.A.). Ms. Thomas was supported in part by an NSF Graduate Research Fellowship, a PEO scholarship, a Department of Education Graduate Fellowship (GAANN), and an IBM TJ Watson PhD Fellowship. Ms. Tapia was supported in part by an NIH Molecular Biophysics Training Grant (T32

References (40)

  • A.P. Gultyaev et al.

    The influence of a metastable structure in plasmid primer RNA on antisense RNA binding kinetics

    Nucleic Acids Res.

    (1995)
  • P. Klaff et al.

    RNA structure and the regulation of gene expression

    Plant Mol. Biol.

    (1996)
  • C.K. Ma et al.

    Control of translation by mRNA secondary structure: the importance of the kinetics of structure formation

    Mol. Microbiol.

    (1994)
  • F. Narberhaus et al.

    RNA thermometers

    FEMS Microbiol. Rev.

    (2006)
  • T. Waldminghaus et al.

    FourU: a novel type of RNA thermometer in Salmonella

    Mol. Microbiol.

    (2007)
  • C. Flamm et al.

    RNA Folding at Elementary Step Resolution

    RNA

    (2000)
  • P.G. Higgs

    RNA secondary structure: physical and computational aspects

    Q. Rev. Biophys.

    (2000)
  • A. Xayaphoummine et al.

    Prediction and statistics of pseudoknots in RNA structures using exactly clustered stochastic simulations

    Proc. Natl Acad. Sci. USA

    (2003)
  • B.A. Shapiro et al.

    The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation

    Bioinformatics

    (2001)
  • J.S. McCaskill

    The equilibrium partition function and base pair binding probabilities for RNA secondary structure

    Biopolymers

    (1990)
  • Cited by (50)

    • Cotranscriptional folding of RNA pseudoknots with different rates

      2021, Chemical Physics Letters
      Citation Excerpt :

      Denesyuk and Thirumalai used coarse-grained models to study RNA folding [14], which can not deal with complicated RNA. Several computational methods based on coarse-grained kinetic model was developed [15,16]. This method can effectively reduce the number of conformations.

    • Efficient computation of co-transcriptional RNA-ligand interaction dynamics

      2018, Methods
      Citation Excerpt :

      Since barriers relies on an exhaustive enumeration of low energy structures, the computational effort grows exponentially with sequence size, which limits the length of RNAs that can be studied to a little over 100 nt. Recently, a number of heuristic approaches have been reported that attempt to raise this limit based on flooding techniques [34,24] or sampling of local minima [27,28,20,21]. RNA-ligand interactions are crucial for the functioning of cis-acting ribo-regulators, but are outside the conventional RNA secondary structure model.

    • Predicting RNA secondary structures from sequence and probing data

      2016, Methods
      Citation Excerpt :

      Since barriers relies on an exhaustive enumeration of low energy structures, it is in practice applicable only to RNAs of less than 100 nt. Recently, a number of heuristic approaches have been reported that attempt to raise this limit based on flooding techniques [134,71] or sampling of local minima [113,114,56]. Experimental technologies to elucidate RNA structure by means of chemical and enzymatic probing were established long before the first computational approaches toward RNA structure prediction have become available [105].

    View all citing articles on Scopus

    This work was completed while he was a Ph.D. student in the Department of Computer Science at Texas A&M University.

    View full text