skip to main content
research-article

Succinct backward-DAWG-matching

Published:23 February 2009Publication History
Skip Abstract Section

Abstract

We consider single and multiple string matching in small space and optimal average time. Our algorithm is based on the combination of compressed self-indexes and Backward-DAWG-Matching (BDM) algorithm. We consider several implementation techniques having different space/time and implementation complexity trade-offs. The experimental results show that our approach has much smaller space requirements than BDM, while being much faster and easier to implement. We show that some of our techniques can boost the search speed of compressed self-indexes as well, using a small amount of additional space.

References

  1. Aho, A. V. and Corasick, M. J. 1975. Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 6, 333--340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Allauzen, C. and Raffinot, M. 1999. Factor oracle of a set of words. Tech. rep. 99--11, Institut Gaspard-Monge, Université de Marne-la-Vallée.Google ScholarGoogle Scholar
  3. Baeza-Yates, R. A. and Gonnet, G. H. 1992. A new approach to text searching. Commun. ACM 35, 10, 74--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Commun. ACM 20, 10, 762--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Burrows, M. and Wheeler, D. 1994. A block sorting lossless data compression algorithm. Tech. rep. 124, Digital Equipment Corporation.Google ScholarGoogle Scholar
  6. Crochemore, M. 1992. String-matching on ordered alphabets. Theor. Comput. Sci. 92, 1, 33--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., and Rytter, W. 1994. Speeding up two string matching algorithms. Algorithmica 12, 4/5, 247--267.Google ScholarGoogle Scholar
  8. Crochemore, M. and Perrin, D. 1991. Two-way string-matching. J. Assoc. Comput. Mach. 38, 3, 651--675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Crochemore, M. and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Crochemore, M. and Rytter, W. 1995. Squares, cubes and time-space efficient string-searching. Algorithmica 13, 5, 405--425.Google ScholarGoogle ScholarCross RefCross Ref
  11. Crochemore, M. and Rytter, W. 2002. Jewels of Stringology. World Scientific.Google ScholarGoogle Scholar
  12. Ferragina, P. and Manzini, G. 2000. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Compute r Science (FOCS 2000). IEEE Computer Society, Washington, DC, 390--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ferragina, P. and Manzini, G. 2005. Indexing compressed text. J. ACM 52, 4, 552--581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2007. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms (TALG) 3, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fredriksson, K. 2007. Succinct pattern matching automata. Tech, rep. A-2007-1, Department of Computer Science, University of Joensuu.Google ScholarGoogle Scholar
  16. Fredriksson, K. and Nikitin, F. 2007. Simple compression code supporting random access and fast string matching. In Proceedings of the 6th Workshop on Efficient and Experimental Algorithms (WEA'07). LNCS 4525. Springer--Verlag, 203--216.Google ScholarGoogle Scholar
  17. Galil, Z. and Seiferas, J. 1981. Linear-time string matching using only a fixed number of local storage locations. Theor. Comput. Sci. 13, 3, 331--336.Google ScholarGoogle ScholarCross RefCross Ref
  18. Golynski, A., Munro, I., and Rao, S. S. 2006. Rank/select operations on large alphabets: a tool for text indexing. In Proceedings of SODA'06. ACM Press, New York, 368--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. González, R., Grabowski, S., M&akinenuml;, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA'05). CTI Press and Ellinika Grammata, Greece, 27--38.Google ScholarGoogle Scholar
  20. Grossi, R., Gupta, A., and Vitter, J. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03). ACM, New York, 841--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Grossi, R., Gupta, A., and Vitter, J. 2004. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'04). ACM, New York, 636--645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Horspool, R. N. 1980. Practical fast searching in strings. Softw. Pract. Exp. 10, 6, 501--506.Google ScholarGoogle ScholarCross RefCross Ref
  23. Huffman, D. A. 1951. A method for the construction of minimum redundancy codes. Proceedings of the I.R.E. 40, 1098--1101.Google ScholarGoogle ScholarCross RefCross Ref
  24. Hyyrö, H., Fredriksson, K., and Navarro, G. 2006. Increased bit-parallelism for approximate and multiple string matching. JEA 10, 2.6, 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jacobson, G. 1989. Succinct static data structures. Ph.D. thesis, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Karp, R. M. and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kim, D. K., Na, J. C., Kim, J. E., and Park., K. 2005. Efficient implementation of rank and select functions for succinct representation. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). Springer, Berlin, Germany, 315--327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Knuth, D. E., Morris, Jr, J. H., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Manber, U. and Myers, G. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Muth, R. and Manber, U. 1996. Approximate multiple string search. In Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, D. S. Hirschberg and E. W. Myers, Eds. Number 1075 in Lecture Notes in Computer Science. Springer-Verlag, Berlin, Laguna Beach, CA, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Navarro, G. and Fredriksson, K. 2004. Average complexity of exact and approximate multiple string matching. Theoretical Computer Science A 321, 2--3, 283--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Navarro, G. and Mäkinen, V. 2007. Compressed full-text indexes. ACM Computing Surveys 39, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Navarro, G. and Raffinot, M. 2000. Fast and flexible string matching by combining bit-parallelism and suffix automata. JEA 5, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Navarro, G. and Raffinot, M. 2002. Flexible Pattern Matching in Strings—Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Okanohara, D. and Sadakane, K. 2007. Practical entropy-compressed rank/select dictionary. In Proceedings of ALENEX'07. ACM Press, New York.Google ScholarGoogle Scholar
  36. Pagh, R. 1999. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proceedings of ICALP'99. Springer-Verlag, Berlin, Germany, 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Salmela, L., Tarhio, J., and Kytöjoki, J. 2006. Multipattern string matching with q-grams. JEA 11, 1.1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sunday, D. M. 1990. A very fast substring search algorithm. Commun. ACM 33, 8, 132--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Weiner, P. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. Washington, DC, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Wu, S. and Manber, U. 1992. Fast text searching allowing errors. Commun. ACM 35, 10, 83--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yao, A. C. 1979. The complexity of pattern matching for a random string. SIAM J. Comput. 8, 3, 368--387.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Succinct backward-DAWG-matching

                    Recommendations

                    Comments

                    Login options

                    Check if you have access through your login credentials or your institution to get full access on this article.

                    Sign in

                    Full Access

                    • Published in

                      cover image ACM Journal of Experimental Algorithmics
                      ACM Journal of Experimental Algorithmics  Volume 13, Issue
                      2009
                      482 pages
                      ISSN:1084-6654
                      EISSN:1084-6654
                      DOI:10.1145/1412228
                      Issue’s Table of Contents

                      Copyright © 2009 ACM

                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      • Published: 23 February 2009
                      • Accepted: 1 October 2008
                      • Revised: 1 December 2007
                      • Received: 1 September 2007
                      Published in jea Volume 13, Issue

                      Qualifiers

                      • research-article
                      • Research
                      • Refereed

                    PDF Format

                    View or Download as a PDF file.

                    PDF

                    eReader

                    View online with eReader.

                    eReader