Skip to main content

Two algorithms for approxmate string matching in static texts

  • Contributions
  • Conference paper
  • First Online:
Book cover Mathematical Foundations of Computer Science 1991 (MFCS 1991)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 520))

Abstract

The problem of finding all approximate occurrences P′ of a pattern string P in a text string T such that the edit distance between P and P′ is ≤k is considered. We concentrate on a scheme in which T is first preprocessed to make the subsequent searches with different P fast. Two preprocessing methods and the corresponding search algorithms are described. The first is based suffix automata and is applicable for edit distances with general edit operation costs. The second is a special design for unit cost edit distance and is based on q-gram lists. The preprocessing needs in both cases time and space O(|T|). The search algorithms run in the worst case in time O(|P||T|) or O(k|T|), and in the best case in time O(|P|).

(Extended Abstract)

Research supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany). The work of the second author was in part carried out when visiting Institut fuer Informatik, University of Freiburg, Germany.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blumer, A., Blumer, J., Haussler, D., Ehrenfeucht, A., Chen, M.T. and Seiferas, J. (1985): The smallest automaton recognizing the subwords of a text. Theor. Comp. Sci. 40, 31–55.

    Google Scholar 

  2. Chang,W. and Lawler,E (1990): Approximate string matching in sublinear expected time. FOCS'90, pp. 116–124.

    Google Scholar 

  3. Crochemore, M. (1986): Transducers and repetitions. Theor. Comp. Sci. 45, 63–86.

    Google Scholar 

  4. Crochemore, M. (1988): String matching with constraints. Proc. MFCS'88. SLNCS 324, pp. 44–58.

    Google Scholar 

  5. Dowling, G. R. & Hall, P. (1980): Approximate string matching. ACM Comput. Surv. 12, 381–402.

    Google Scholar 

  6. Galil, Z. & Giancarlo, R. (1988): Data structures and algorithms for approximate string matching. J. Complexity 4, 33–72.

    Google Scholar 

  7. Galil, Z. & Park, K. (1989): An improved algorithm for approximate string matching. ICALP'89. SLNCS 372, pp. 394–404.

    Google Scholar 

  8. Karp, R.M. and Rabin, M.O. (1987): Efficient randomized pattern matching. IBM J. Res. Dev. 31, 249–260.

    Google Scholar 

  9. Kohonen,T. & Reuhkala,E. (1978): A very fast associative method for the recognition and correction of misspellt words, based on redundant hash-addressing. Proc. 4th Int. Joint Conf. on Pattern Recognition, 1978, Kyoto, Japan, pp. 807–809.

    Google Scholar 

  10. Landau, G. & Vishkin, U. (1988): Fast string matching with k differences. JCSS 37, 63–78. (Also 26th FOCS, pp. 126–136).

    Google Scholar 

  11. Manber, U. & Myers, G. (1990): Suffix arrays: a new method for on-line string searches. SODA'90, pp. 319–327.

    Google Scholar 

  12. McCreight, E. M. (1976): A space economical suffix tree construction algorithm. J. ACM 23, 262–272.

    Google Scholar 

  13. Owolabi, O. & McGregor, D. R. (1988): Fast approximate string matching. Software — Practice and Experience 18(4), 387–393.

    Google Scholar 

  14. Tarhio, J. & Ukkonen, E. (1990): Boyer-Moore approach to approximate string matching. 2nd Scand. Workshop on Algorithm Theory (SWAT90), SLNCS 447, pp. 348–359.

    Google Scholar 

  15. Ukkonen, E. (1991): Approximate string matching with q-grams and maximal matches. Theor. Comp. Sci., to appear.

    Google Scholar 

  16. Ukkonen, E. & Wood, D. (1990): Approximate string matching with suffix automata. Report A-1990-4. Department of Computer Science, University of Helsinki.

    Google Scholar 

  17. Weiner, P. (1973): Linear pattern matching algorithms. Proc. 14th IEEE Symp. Switching and Automata Theory, pp. 1–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrzej Tarlecki

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jokinen, P., Ukkonen, E. (1991). Two algorithms for approxmate string matching in static texts. In: Tarlecki, A. (eds) Mathematical Foundations of Computer Science 1991. MFCS 1991. Lecture Notes in Computer Science, vol 520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-54345-7_67

Download citation

  • DOI: https://doi.org/10.1007/3-540-54345-7_67

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-54345-9

  • Online ISBN: 978-3-540-47579-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics