On quantitative effects of RNA shape abstraction

Nebel, Markus E.; Scheid, Anika

doi:10.1007/s12064-009-0074-z

On quantitative effects of RNA shape abstraction

Original Paper
Published: 15 September 2009

Volume 128, pages 211–225, (2009)
Cite this article

Theory in Biosciences Aims and scope Submit manuscript

Markus E. Nebel¹ &
Anika Scheid¹

164 Accesses
13 Citations
Explore all metrics

Abstract

Over the last few decades, much effort has been taken to develop approaches for identifying good predictions of RNA secondary structure. This is due to the fact that most computational prediction methods based on free energy minimization compute a number of suboptimal foldings and we have to identify the native folding among all these possible secondary structures. Using the abstract shapes approach as introduced by Giegerich et al. (Nucleic Acids Res 32(16):4843–4851, 2004), each class of similar secondary structures is represented by one shape and the native structures can be found among the top shape representatives. In this article, we derive some interesting results answering enumeration problems for abstract shapes and secondary structures of RNA. We compute precise asymptotics for the number of different shape representations of size n and for the number of different shapes showing up when abstracting from secondary structures of size n under a combinatorial point of view. A more realistic model taking primary structures into account remains an open challenge. We give some arguments why the present techniques cannot be applied in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abstract Shape Analysis of RNA

Physics-based RNA structure prediction

Article Open access 09 July 2015

Predicting RNA Structure: Advances and Limitations

Notes

Later, we will speak of the minimal length of hairpin loops being 1, but so far we do not have the right vocabulary.
Allowing pseudoknots makes secondary structure prediction to become \({{\mathcal{NP}}}\) complete, which probably is the reason for their exclusion.
It would be an easy task to change the definition to allow loops of length at least 3 only. However, when changing to enumeration and corresponding methods from singularity analysis, such a change would imply polynomials of higher degree and the need to compute their roots. Thus, to keep the mathematics behind the model manageable, one probably resigned this modification. Nevertheless, for covariance models, where these reasons do not apply, one sometimes allows loops of length 0 in the consensus.
In accordance with observations independently made by R. Giegerich at about the same time (personal communication).
According to the informal description of level 1 shapes given in Janssen et al.(2008), it is not clear whether the (one and only, but always existing) unpaired region in a hairpin must be recorded on this shape abstraction level or not. Here, we decided to follow the definition used by the RNAShapes tool, which is available at http://bibiserv.techfak.uni-bielefeld.de/rnashapes/welcome.html. This tool assumes that hairpin loops are not recorded.
Note that it does not matter if a hairpin is represented only by a pair of corresponding squared brackets or by a pair of corresponding squared brackets with an underscore in between, as there must always exist an unpaired region of length at least one in any hairpin.
Unambiguity is necessary, as we will later use these grammars to construct generating functions counting the numbers of type i shapes, 1 ≤ i ≤ 5. If there are more than one leftmost derivations for a type i shape sh, 1 ≤ i ≤ 5, then sh is counted more than once by the corresponding generating function.
Note that in this article, we will not recall the fundamental definitions and methods regarding generating functions. An introduction to generating functions and some of their uses in discrete mathematics can be found for example in Flajolet and Sedgewick (2009) and Wilf (1994). Several good examples for generating functions can be found in Comtet (1974). Furthermore, for an introduction to some advanced methods that have to be used for more difficult problems, see for example Greene and Knuth (1990).
In this paper, we use [z ⁿ]S(z) to denote the coefficient at z ⁿ in the expansion of S(z) around z = 0.
In the considered version of Darboux’s theorem as given in Knuth and Wilf (1989), the variable m is used to choose the number of terms for the computed asymptotic. In fact, by choosing m = 0, the resulting asymptotic consists of the leading term only.
Within our grammar, the rule \(B\rightarrow\varepsilon\) generates from a sentential form ...[B]... such a pair of brackets and therefore has to be weighted by a factor z.

References

Abrahams JP, van den Berg M, van Batenburg E, Pleij CW (1990) Prediction of RNA secondary structure, including pseudoknotting, by computer simulation. Nucleic Acids Res 18(10):3035–3044
Article CAS PubMed Google Scholar
Comtet L (1974) Advanced combinatorics; the art of finite and infinite expansions. Reidel, Dordrecht
Chomsky N, Schützenberger MP (1963) The algebraic theorey of context-free languages. In: Braffort P, Hirschberg D (eds) Computer programming and formal systems. North-Holland, Amsterdam, pp 118–161
Chapter Google Scholar
Ding Y, Chan C, Lawrence CE (2004) Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res 32:W135–W141
Article CAS PubMed Google Scholar
Ding Y, Lawrence CE (2003) A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res 31(24):7280–7301
Article CAS PubMed Google Scholar
Dam E, Pleij K, Draper D (1992) Structural and functional aspects of RNA pseudoknots. Biochemistry 31:11665–11676
Article CAS PubMed Google Scholar
Flajolet P, Sedgewick R (2009) Analytic combinatorics. Cambridge University Press, London
Greene DH, Knuth DE (1990) Mathematics for the analysis of algorithms, 3rd edn. Birkhäuser, Boston
Giegerich R, Voß B, Rehmsmeier M (2004) Abstract shapes of RNA. Nucleic Acids Res 32(16):4843–4851
Article CAS PubMed Google Scholar
Gutell RR, Woese CR (1990) Higher order structural elements in ribosomal RNAs: pseudo-knots and the use of noncanonical pairs. Proc Natl Acad Sci USA 87:663–667
Article CAS PubMed Google Scholar
Harrison MA (1978) Introduction to formal language theory. Addison-Wesley, Reading
Hopcroft JE, Motwani R, Ullman JD (2001) Introduction to automata theory, languages, and computation, 2nd edn. Addison-Wesley, Reading
Janssen S, Reeder J, Giegerich R (2008) Shape based indexing for faster search of RNA family databases. BMC Bioinformatics 9(1):131
Google Scholar
Knuth DE, Wilf HS (1989) A short proof of Darboux’s lemma. Appl Math Lett 2:139–140
Article Google Scholar
Lorenz WA, Ponty Y, Clote P (2008) Asymptotics of RNA shapes. J Comput Biol 15(1):31–63
Article CAS PubMed Google Scholar
Nebel ME (2004) Investigation of the Bernoulli-model of RNA secondary structures. Bull Math Biol 66:925–964
Article CAS PubMed Google Scholar
Nussinov R, Jacobson AB (1980) Fast algorithms for predicting the secondary structure of single-stranded RNA. Proc Natl Acad Sci USA 77(11):6309–6313
Article CAS PubMed Google Scholar
Nussinov R, Pieczenik G, Griggs JR, Kleitman DJ (1978) Algorithms for loop matchings. SIAM J Appl Math 35:68–82
Article Google Scholar
Pleij CW, Bosch L (1989) RNA pseudoknots: structure, detection, and prediction. Methods Enzymol 180:289–303
Article CAS PubMed Google Scholar
Pleij CW (1994) RNA pseudoknots. Curr Opin Struct Biol 4:337–344
Article CAS Google Scholar
Reeder J, Giegerich R (2005) Consensus shapes: an alternative to the Sankoff algorithm for RNA consensus structure prediction. Bioinformatics 21(17):3516–3523
Article CAS PubMed Google Scholar
Sankoff D, Kruskal JB, Mainville S, Cedergren RJ (1983) Fast algorithms to determine RNA secondary structures containing multiple loops. In: Time warps, string edits, and macromolecules: the theory and practice of sequence comparison, chap 3. Addison-Wesley, Reading, pp 93–120
Scheid A, Nebel ME (2008) On abstract shapes of RNA. Technical report, Technische Universität Kaiserslautern
Steffen P, Voß B, Rehmsmeier M, Reeder J, Giegerich R (2006a) RNAshapes 2.1.1 manual
Steffen P, Voß B, Rehmsmeier M, Reeder J, Giegerich R (2006b) RNAshapes: an integrated RNA analysis package based on abstract shapes. Bioinformatics 22(4):500–503
Article CAS PubMed Google Scholar
Viennot G, Vauchaussade de Chaumont M (1985) Enumeration of RNA secondary structures by complexity. Math Med Biol Lect Notes Biomath 57:360–365
CAS Google Scholar
Voß B, Giegerich R, Rehmsmeier M (2006) Complete probabilistic analysis of RNA shapes. BMC Biol 4(5)
Waterman MS (1978) Secondary structure of single-stranded nucleic acids. Adv Math Suppl Stud 1:167–212
Google Scholar
Wuchty S, Fontana W, Hofacker I, Schuster P (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers 49:145–165
Article CAS PubMed Google Scholar
Wilf HS (1994) Generatingfunctionology, 2nd edn. Academic Press, London
Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Article CAS PubMed Google Scholar
Zuker M, Sankoff D (1984) RNA secondary structures and their prediction. Bull Math Biol 46:591–621
CAS Google Scholar
Zuker M (1989) On finding all suboptimal foldings of an RNA molecule. Science 244:48–52
Article CAS PubMed Google Scholar
Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31(13):3406–3415
Article CAS PubMed Google Scholar

Download references

Acknowledgments

The authors wish to thank the two anonymous reviewers for their careful and helpful remarks and suggestions made for a previous version of this article.

Author information

Authors and Affiliations

Fachbereich Informatik, Technische Universität Kaiserslautern, Gottlieb-Daimler-Straße 48, 67663, Kaiserslautern, Germany
Markus E. Nebel & Anika Scheid

Authors

Markus E. Nebel
View author publications
You can also search for this author in PubMed Google Scholar
Anika Scheid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anika Scheid.

Appendix

During our investigations, we computed precise asymptotics for the size of the folding space F(s) for different models of secondary structures. The models differ with respect to structural restrictions (minimal length of hairpin loops, isolated base pairs) and the complementary assumed (Watson–Crick pairings only, wobble GU pairs allowed), expecting a uniform distribution for the bases or a skewed one (p _A = p _U = 2/10, p _C = p _G = 3/10), according to the experiments performed in Giegerich et al. (2004) and Voß et al. (2006).

Even if they were of no use to our investigations related to abstract shapes due to the problems reported when analyzing shape spaces, we expect those results to be of use for the future and, therefore, decided to present them in this appendix without proof. Few of those results may already be found in literature (see e.g., Nebel 2004), but such a complete presentation does not exist.

Theorem 6.1

Considering a uniform distribution of the bases A, C, G and U resp. the skewed distribution p _A = p _U = 2/10, p _C = p _G = 3/10, regarding Watson–Crick pairings only or allowing wobble GU pairs and under the assumption of each possible combination of a minimum hairpin loop length minL_hairpin ∈ {1, 3}, and a minimum helix length minL_ladder ∈ {1, 2}, the asymptotic expected folding space sizes card(F(s)) for a random primary structure s of size \(n, n \rightarrow \infty,\) are those given in Table 3 shown in roman resp. italics.

Table 3 Asymptotics for the expected sizes of the folding space for a random primary structure s of size n assuming a uniform distribution of the bases A, C, G, U (results in roman) or the skewed distribution p _A = p _U = 2/10, p _C = p _G = 3/10 (results in italics), a minimum hairpin length minL_hairpin ∈ {1, 3} and a minimum ladder length minL_ladder ∈ {1, 2}

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nebel, M.E., Scheid, A. On quantitative effects of RNA shape abstraction. Theory Biosci. 128, 211–225 (2009). https://doi.org/10.1007/s12064-009-0074-z

Download citation

Received: 13 March 2009
Accepted: 07 August 2009
Published: 15 September 2009
Issue Date: November 2009
DOI: https://doi.org/10.1007/s12064-009-0074-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On quantitative effects of RNA shape abstraction

Abstract

Access this article

Similar content being viewed by others

Abstract Shape Analysis of RNA

Physics-based RNA structure prediction

Predicting RNA Structure: Advances and Limitations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Theorem 6.1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On quantitative effects of RNA shape abstraction

Abstract

Access this article

Similar content being viewed by others

Abstract Shape Analysis of RNA

Physics-based RNA structure prediction

Predicting RNA Structure: Advances and Limitations

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Theorem 6.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation