Skip to main content

One-Gapped q-Gram Filters for Levenshtein Distance

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2373))

Included in the following conference series:

Abstract

We have recently shown that q- gram filters based on gapped q-grams instead of the usual contiguous q-grams can provide orders of magnitude faster and/or more efficient filtering for the Hamming distance. In this paper, we extend the results for the Levenshtein distance, which is more problematic for gapped q-grams because an insertion or deletion in a gap affects a q-gram while a replacement does not. To keep this effect under control, we concentrate on gapped q-grams with just one gap. We demostrate with experiments that the resulting filters provide a significant improvement over the contiguous q-gram filters. We also develop new techniques for dealing with complex q-gram filters.

Supported by the DFG ‘Initiative Bioinformatik’ grant BIZ 4/1-1.

Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.

    Google Scholar 

  2. S. Burkhardt, A. Crauser, P. Ferragina, H.-P. Lenhof, E. Rivals, and M. Vingron. q-gram based database searching using a suffix array (QUASAR). In Proc. 3rd Annual International Conference on Computational Molecular Biology (RECOMB), pages 77–83. ACM Press, 1999.

    Google Scholar 

  3. S. Burkhardt and J. Kärkkäinen. Better filtering with gapped q-grams. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 73–85. Springer, 2001.

    Google Scholar 

  4. A. Califano and I. Rigoutsos. FLASH: A fast look-up algorithm for string homology. In Proc. 1st International Conference on Intelligent Systems for Molecular Biology, pages 56–64. AAAI Press, 1993.

    Google Scholar 

  5. D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Google Scholar 

  6. N. Holsti and E. Sutinen. Approximate string matching using q-gram places. In Proc. 7th Finnish Symposium on Computer Science, pages 23–32, 1994.

    Google Scholar 

  7. P. Jokinen and E. Ukkonen. Two algorithms for approximate string matching in static texts. In Proc. 16th Symposium on Mathematical Foundations of Computer Science, volume 520 of LNCS, pages 240–248. Springer, 1991.

    Google Scholar 

  8. J. Kärkkäinen. Computing the threshold for q-gram filters. In Proc. 8th Scandinavian Workshop on Algorithm Theory (SWAT), July 2002. To appear.

    Google Scholar 

  9. A. Krause and M. Vingron. A set-theoretic approach to database searching and clustering. Bioinformatics, 14:430–438, 1998.

    Article  Google Scholar 

  10. O. Lehtinen, E. Sutinen, and J. Tarhio. Experiments on block indexing. In Proc. 3rd South American Workshop on String Processing (WSP), pages 183–193. Carleton University Press, 1996.

    Google Scholar 

  11. G. Navarro. Approximate Text Searching. PhD thesis, Dept. of Computer Science, University of Chile, 1998.

    Google Scholar 

  12. G. Navarro. A guided tour to approximate string matching. ACM Computing Surveys, 33(1):31–88, 2001.

    Article  Google Scholar 

  13. P. A. Pevzner and M. S. Waterman. Multiple filtration and approximate pattern matching. Algorithmica, 13(1/2):135–154, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  14. E. Ukkonen. Approximate string matching with q-grams and maximal matches. Theor. Comput. Sci, 92(1):191–212, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  15. J. Weber and H. Myers. Human whole genome shotgun sequencing. Genome Research, 7:401–409, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Burkhardt, S., Kärkkäinen, J. (2002). One-Gapped q-Gram Filters for Levenshtein Distance. In: Apostolico, A., Takeda, M. (eds) Combinatorial Pattern Matching. CPM 2002. Lecture Notes in Computer Science, vol 2373. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45452-7_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-45452-7_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43862-5

  • Online ISBN: 978-3-540-45452-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics