Skip to main content
Log in

On quantitative effects of RNA shape abstraction

  • Original Paper
  • Published:
Theory in Biosciences Aims and scope Submit manuscript

Abstract

Over the last few decades, much effort has been taken to develop approaches for identifying good predictions of RNA secondary structure. This is due to the fact that most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native folding among all these possible secondary structures. Using the abstract shapes approach as introduced by Giegerich et al. (Nucleic Acids Res 32(16):4843–4851, 2004), each class of similar secondary structures is represented by one shape and the native structures can be found among the top shape representatives. In this article, we derive some interesting results answering enumeration problems for abstract shapes and secondary structures of RNA. We compute precise asymptotics for the number of different shape representations of size n and for the number of different shapes showing up when abstracting from secondary structures of size n under a combinatorial point of view. A more realistic model taking primary structures into account remains an open challenge. We give some arguments why the present techniques cannot be applied in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. Later, we will speak of the minimal length of hairpin loops being 1, but so far we do not have the right vocabulary.

  2. Allowing pseudoknots makes secondary structure prediction to become \({{\mathcal{NP}}}\) complete, which probably is the reason for their exclusion.

  3. It would be an easy task to change the definition to allow loops of length at least 3 only. However, when changing to enumeration and corresponding methods from singularity analysis, such a change would imply polynomials of higher degree and the need to compute their roots. Thus, to keep the mathematics behind the model manageable, one probably resigned this modification. Nevertheless, for covariance models, where these reasons do not apply, one sometimes allows loops of length 0 in the consensus.

  4. In accordance with observations independently made by R. Giegerich at about the same time (personal communication).

  5. According to the informal description of level 1 shapes given in Janssen et al.(2008), it is not clear whether the (one and only, but always existing) unpaired region in a hairpin must be recorded on this shape abstraction level or not. Here, we decided to follow the definition used by the RNAShapes tool, which is available at http://bibiserv.techfak.uni-bielefeld.de/rnashapes/welcome.html. This tool assumes that hairpin loops are not recorded.

  6. Note that it does not matter if a hairpin is represented only by a pair of corresponding squared brackets or by a pair of corresponding squared brackets with an underscore in between, as there must always exist an unpaired region of length at least one in any hairpin.

  7. Unambiguity is necessary, as we will later use these grammars to construct generating functions counting the numbers of type i shapes, 1 ≤ i ≤ 5. If there are more than one leftmost derivations for a type i shape sh, 1 ≤ i ≤ 5, then sh is counted more than once by the corresponding generating function.

  8. Note that in this article, we will not recall the fundamental definitions and methods regarding generating functions. An introduction to generating functions and some of their uses in discrete mathematics can be found for example in Flajolet and Sedgewick (2009) and Wilf (1994). Several good examples for generating functions can be found in Comtet (1974). Furthermore, for an introduction to some advanced methods that have to be used for more difficult problems, see for example Greene and Knuth (1990).

  9. In this paper, we use [z n]S(z) to denote the coefficient at z n in the expansion of S(z) around z = 0.

  10. In the considered version of Darboux’s theorem as given in Knuth and Wilf (1989), the variable m is used to choose the number of terms for the computed asymptotic. In fact, by choosing m = 0, the resulting asymptotic consists of the leading term only.

  11. Within our grammar, the rule \(B\rightarrow\varepsilon\) generates from a sentential form ...[B]... such a pair of brackets and therefore has to be weighted by a factor z.

References

  • Abrahams JP, van den Berg M, van Batenburg E, Pleij CW (1990) Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Nucleic Acids Res 18(10):3035–3044

    Article  CAS  PubMed  Google Scholar 

  • Comtet L (1974) Advanced combinatorics; the art of finite and infinite expansions. Reidel, Dordrecht

  • Chomsky N, Schützenberger MP (1963) The algebraic theorey of context-free languages. In: Braffort P, Hirschberg D (eds) Computer programming and formal systems. North-Holland, Amsterdam, pp 118–161

    Chapter  Google Scholar 

  • Ding Y, Chan C, Lawrence CE (2004) Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res 32:W135–W141

    Article  CAS  PubMed  Google Scholar 

  • Ding Y, Lawrence CE (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res 31(24):7280–7301

    Article  CAS  PubMed  Google Scholar 

  • Dam E, Pleij K, Draper D (1992) Structural and functional aspects of RNA pseudoknots. Biochemistry 31:11665–11676

    Article  CAS  PubMed  Google Scholar 

  • Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, London

  • Greene DH, Knuth DE (1990) Mathematics for the analysis of algorithms, 3rd edn. Birkhäuser, Boston

  • Giegerich R, Voß B, Rehmsmeier M (2004) Abstract shapes of RNA. Nucleic Acids Res 32(16):4843–4851

    Article  CAS  PubMed  Google Scholar 

  • Gutell RR, Woese CR (1990) Higher order structural elements in ribosomal RNAs: pseudo-knots and the use of noncanonical pairs. Proc Natl Acad Sci USA 87:663–667

    Article  CAS  PubMed  Google Scholar 

  • Harrison MA (1978) Introduction to formal language theory. Addison-Wesley, Reading

  • Hopcroft JE, Motwani R, Ullman JD (2001) Introduction to automata theory, languages, and computation, 2nd edn. Addison-Wesley, Reading

  • Janssen S, Reeder J, Giegerich R (2008) Shape based indexing for faster search of RNA family databases. BMC Bioinformatics 9(1):131

    Google Scholar 

  • Knuth DE, Wilf HS (1989) A short proof of Darboux’s lemma. Appl Math Lett 2:139–140

    Article  Google Scholar 

  • Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1):31–63

    Article  CAS  PubMed  Google Scholar 

  • Nebel ME (2004) Investigation of the Bernoulli-model of RNA secondary structures. Bull Math Biol 66:925–964

    Article  CAS  PubMed  Google Scholar 

  • Nussinov R, Jacobson AB (1980) Fast algorithms for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci USA 77(11):6309–6313

    Article  CAS  PubMed  Google Scholar 

  • Nussinov R, Pieczenik G, Griggs JR, Kleitman DJ (1978) Algorithms for loop matchings. SIAM J Appl Math 35:68–82

    Article  Google Scholar 

  • Pleij CW, Bosch L (1989) RNA pseudoknots: structure, detection, and prediction. Methods Enzymol 180:289–303

    Article  CAS  PubMed  Google Scholar 

  • Pleij CW (1994) RNA pseudoknots. Curr Opin Struct Biol 4:337–344

    Article  CAS  Google Scholar 

  • Reeder J, Giegerich R (2005) Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction. Bioinformatics 21(17):3516–3523

    Article  CAS  PubMed  Google Scholar 

  • Sankoff D, Kruskal JB, Mainville S, Cedergren RJ (1983) Fast algorithms to determine RNA secondary structures containing multiple loops. In: Time warps, string edits, and macromolecules: the theory and practice of sequence comparison, chap 3. Addison-Wesley, Reading, pp 93–120

  • Scheid A, Nebel ME (2008) On abstract shapes of RNA. Technical report, Technische Universität Kaiserslautern

  • Steffen P, Voß B, Rehmsmeier M, Reeder J, Giegerich R (2006a) RNAshapes 2.1.1 manual

  • Steffen P, Voß B, Rehmsmeier M, Reeder J, Giegerich R (2006b) RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4):500–503

    Article  CAS  PubMed  Google Scholar 

  • Viennot G, Vauchaussade de Chaumont M (1985) Enumeration of RNA secondary structures by complexity. Math Med Biol Lect Notes Biomath 57:360–365

    CAS  Google Scholar 

  • Voß B, Giegerich R, Rehmsmeier M (2006) Complete probabilistic analysis of RNA shapes. BMC Biol 4(5)

  • Waterman MS (1978) Secondary structure of single-stranded nucleic acids. Adv Math Suppl Stud 1:167–212

    Google Scholar 

  • Wuchty S, Fontana W, Hofacker I, Schuster P (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49:145–165

    Article  CAS  PubMed  Google Scholar 

  • Wilf HS (1994) Generatingfunctionology, 2nd edn. Academic Press, London

  • Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148

    Article  CAS  PubMed  Google Scholar 

  • Zuker M, Sankoff D (1984) RNA secondary structures and their prediction. Bull Math Biol 46:591–621

    CAS  Google Scholar 

  • Zuker M (1989) On finding all suboptimal foldings of an RNA molecule. Science 244:48–52

    Article  CAS  PubMed  Google Scholar 

  • Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13):3406–3415

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

The authors wish to thank the two anonymous reviewers for their careful and helpful remarks and suggestions made for a previous version of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anika Scheid.

Appendix

Appendix

During our investigations, we computed precise asymptotics for the size of the folding space F(s) for different models of secondary structures. The models differ with respect to structural restrictions (minimal length of hairpin loops, isolated base pairs) and the complementary assumed (Watson–Crick pairings only, wobble GU pairs allowed), expecting a uniform distribution for the bases or a skewed one (p A = p U = 2/10, p C  = p G = 3/10), according to the experiments performed in Giegerich et al. (2004) and Voß et al. (2006).

Even if they were of no use to our investigations related to abstract shapes due to the problems reported when analyzing shape spaces, we expect those results to be of use for the future and, therefore, decided to present them in this appendix without proof. Few of those results may already be found in literature (see e.g., Nebel 2004), but such a complete presentation does not exist.

Theorem 6.1

Considering a uniform distribution of the bases A, C, G and U resp. the skewed distribution p A  = p U = 2/10, p C = p G = 3/10, regarding Watson–Crick pairings only or allowing wobble GU pairs and under the assumption of each possible combination of a minimum hairpin loop length minLhairpin ∈ {1, 3}, and a minimum helix length minLladder ∈ {1, 2}, the asymptotic expected folding space sizes card(F(s)) for a random primary structure s of size \(n, n \rightarrow \infty,\) are those given in Table 3 shown in roman resp. italics.

Table 3 Asymptotics for the expected sizes of the folding space for a random primary structure s of size n assuming a uniform distribution of the bases A, C, G, U (results in roman) or the skewed distribution p A  = p U  = 2/10, p C  = p G  = 3/10 (results in italics), a minimum hairpin length minLhairpin ∈ {1, 3} and a minimum ladder length minLladder ∈ {1, 2}

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nebel, M.E., Scheid, A. On quantitative effects of RNA shape abstraction. Theory Biosci. 128, 211–225 (2009). https://doi.org/10.1007/s12064-009-0074-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12064-009-0074-z

Keywords

Navigation