Abstract
We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching statistics, i.e. we study the distributions of the number of occurrences of a pattern under a given text model. Such calculations play an important role in computational biology as they give access to the significance of pattern occurrences. To assess the practicability of our method, we apply it to the Prosite database of amino acid motifs and to the Jaspar database of transcription factor binding sites. Regarding the latter, we additionally show that our framework permits to take binding affinities predicted from a physical model into account.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P., Pagni, M., Sigrist, C.: The PROSITE database. Nucleic Acids Research 34(S1), D227–230 (2006)
Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, Cambridge (2005)
Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. Journal of Computational Biology 7(1-2), 1–46 (2000)
Régnier, M.: A unifed approach to word occurrence probabilities. Discrete Applied Mathematics 104, 259–280 (2000)
Nicodème, P., Salvy, B., Flajolet, P.: Motif statistics. Theoretical Computer Science 287, 593–617 (2002)
Lladser, M., Betterton, M.D., Knight, R.: Multiple pattern matching: A Markov chain approach. Journal of Mathematical Biology 56(1-2), 51–92 (2008)
Kaltenbach, H.M., Böcker, S., Rahmann, S.: Markov additive chains and applications to fragment statistics for peptide mass fingerprinting. In: Ideker, T., Bafna, V. (eds.) Joint RECOMB 2006 Satellite Workshops on Systems Biology and on Computational Proteomics. LNCS (LNBI), vol. 4532, pp. 29–41. Springer, Heidelberg (2007)
Zhang, J., Jiang, B., Li, M., Tromp, J., Zhang, X., Zhang, M.Q.: Computing exact p-values for DNA motifs. Bioinformatics 23(5), 531–537 (2007)
Stoelinga, M.: An introduction to probabilistic automata. In: Rozenberg, G. (ed.) EATCS bulletin, vol. 78 (2002)
Navarro, G., Raffinot, M.: Flexible pattern matching in strings. Cambridge University Press, Cambridge (2002)
Hopcroft, J.: An n logn algorithm for minimizing the states in a finite automaton. In: Kohavi, Z., Paz, A. (eds.) The theory of machines and computations, pp. 189–196. Academic Press, New York (1971)
Knuutila, T.: Re-describing an algorithm by Hopcroft. Theoretical Computer Science 250, 333–363 (2001)
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)
Dori, S., Landau, G.M.: Construction of Aho Corasick automaton in linear time for integer alphabets. Information Processing Letters 98(2), 66–72 (2006)
Stormo, G.D.: DNA binding sites: representation and discovery. Bioinformatics 16(1), 16–23 (2000)
Pape, U.J., Grossmann, S., Hammer, S., Sperling, S., Vingron, M.: A new statistical model to select target sequences bound by transcription factors. Genome Informatics 17(1), 134–140 (2006)
Sandelin, A., Alkema, W., Engström, P.G., Wasserman, W.W., Lenhard, B.: JASPAR: an open access database for eukaryotic transcription factor binding profiles. Nucleic Acids Research 32(1) (2004) (Database Issue)
Rahmann, S., Müller, T., Vingron, M.: On the power of profiles for transcription factor binding site detection. Statistical Applications in Genetics and Molecular Biology (Article 7), 2(1) (2003)
Roider, H., Kanhere, A., Manke, T., Vingron, M.: Predicting transcription factor affinities to DNA from a biophysical model. Bioinformatics 23(2), 134–141 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Marschall, T., Rahmann, S. (2008). Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics. In: Ferragina, P., Landau, G.M. (eds) Combinatorial Pattern Matching. CPM 2008. Lecture Notes in Computer Science, vol 5029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69068-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-69068-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69066-5
Online ISBN: 978-3-540-69068-9
eBook Packages: Computer ScienceComputer Science (R0)