Skip to main content

Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics

  • Conference paper
Combinatorial Pattern Matching (CPM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5029))

Included in the following conference series:

Abstract

We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching statistics, i.e. we study the distributions of the number of occurrences of a pattern under a given text model. Such calculations play an important role in computational biology as they give access to the significance of pattern occurrences. To assess the practicability of our method, we apply it to the Prosite database of amino acid motifs and to the Jaspar database of transcription factor binding sites. Regarding the latter, we additionally show that our framework permits to take binding affinities predicted from a physical model into account.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P., Pagni, M., Sigrist, C.: The PROSITE database. Nucleic Acids Research 34(S1), D227–230 (2006)

    Article  Google Scholar 

  2. Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, Cambridge (2005)

    Google Scholar 

  3. Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. Journal of Computational Biology 7(1-2), 1–46 (2000)

    Article  Google Scholar 

  4. Régnier, M.: A unifed approach to word occurrence probabilities. Discrete Applied Mathematics 104, 259–280 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Nicodème, P., Salvy, B., Flajolet, P.: Motif statistics. Theoretical Computer Science 287, 593–617 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Lladser, M., Betterton, M.D., Knight, R.: Multiple pattern matching: A Markov chain approach. Journal of Mathematical Biology 56(1-2), 51–92 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  7. Kaltenbach, H.M., Böcker, S., Rahmann, S.: Markov additive chains and applications to fragment statistics for peptide mass fingerprinting. In: Ideker, T., Bafna, V. (eds.) Joint RECOMB 2006 Satellite Workshops on Systems Biology and on Computational Proteomics. LNCS (LNBI), vol. 4532, pp. 29–41. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Zhang, J., Jiang, B., Li, M., Tromp, J., Zhang, X., Zhang, M.Q.: Computing exact p-values for DNA motifs. Bioinformatics 23(5), 531–537 (2007)

    Article  Google Scholar 

  9. Stoelinga, M.: An introduction to probabilistic automata. In: Rozenberg, G. (ed.) EATCS bulletin, vol. 78 (2002)

    Google Scholar 

  10. Navarro, G., Raffinot, M.: Flexible pattern matching in strings. Cambridge University Press, Cambridge (2002)

    MATH  Google Scholar 

  11. Hopcroft, J.: An n logn algorithm for minimizing the states in a finite automaton. In: Kohavi, Z., Paz, A. (eds.) The theory of machines and computations, pp. 189–196. Academic Press, New York (1971)

    Google Scholar 

  12. Knuutila, T.: Re-describing an algorithm by Hopcroft. Theoretical Computer Science 250, 333–363 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  14. Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Information Processing Letters 98(2), 66–72 (2006)

    Article  MathSciNet  Google Scholar 

  15. Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)

    Article  Google Scholar 

  16. Pape, U.J., Grossmann, S., Hammer, S., Sperling, S., Vingron, M.: A new statistical model to select target sequences bound by transcription factors. Genome Informatics 17(1), 134–140 (2006)

    Google Scholar 

  17. Sandelin, A., Alkema, W., Engström, P.G., Wasserman, W.W., Lenhard, B.: JASPAR: an open access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32(1) (2004) (Database Issue)

    Google Scholar 

  18. Rahmann, S., Müller, T., Vingron, M.: On the power of profiles for transcription factor binding site detection. Statistical Applications in Genetics and Molecular Biology (Article 7), 2(1) (2003)

    Google Scholar 

  19. Roider, H., Kanhere, A., Manke, T., Vingron, M.: Predicting transcription factor affinities to DNA from a biophysical model. Bioinformatics 23(2), 134–141 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Paolo Ferragina Gad M. Landau

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marschall, T., Rahmann, S. (2008). Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69068-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69066-5

  • Online ISBN: 978-3-540-69068-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics