Abstract
We consider single and multiple string matching in small space and optimal average time. Our algorithm is based on the combination of compressed self-indexes and Backward-DAWG-Matching (BDM) algorithm. We consider several implementation techniques having different space/time and implementation complexity trade-offs. The experimental results show that our approach has much smaller space requirements than BDM, while being much faster and easier to implement. We show that some of our techniques can boost the search speed of compressed self-indexes as well, using a small amount of additional space.
- Aho, A. V. and Corasick, M. J. 1975. Efficient string matching: an aid to bibliographic search. Commun. ACM 18, 6, 333--340. Google ScholarDigital Library
- Allauzen, C. and Raffinot, M. 1999. Factor oracle of a set of words. Tech. rep. 99--11, Institut Gaspard-Monge, Université de Marne-la-Vallée.Google Scholar
- Baeza-Yates, R. A. and Gonnet, G. H. 1992. A new approach to text searching. Commun. ACM 35, 10, 74--82. Google ScholarDigital Library
- Boyer, R. S. and Moore, J. S. 1977. A fast string searching algorithm. Commun. ACM 20, 10, 762--772. Google ScholarDigital Library
- Burrows, M. and Wheeler, D. 1994. A block sorting lossless data compression algorithm. Tech. rep. 124, Digital Equipment Corporation.Google Scholar
- Crochemore, M. 1992. String-matching on ordered alphabets. Theor. Comput. Sci. 92, 1, 33--47. Google ScholarDigital Library
- Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., and Rytter, W. 1994. Speeding up two string matching algorithms. Algorithmica 12, 4/5, 247--267.Google Scholar
- Crochemore, M. and Perrin, D. 1991. Two-way string-matching. J. Assoc. Comput. Mach. 38, 3, 651--675. Google ScholarDigital Library
- Crochemore, M. and Rytter, W. 1994. Text Algorithms. Oxford University Press. Google ScholarDigital Library
- Crochemore, M. and Rytter, W. 1995. Squares, cubes and time-space efficient string-searching. Algorithmica 13, 5, 405--425.Google ScholarCross Ref
- Crochemore, M. and Rytter, W. 2002. Jewels of Stringology. World Scientific.Google Scholar
- Ferragina, P. and Manzini, G. 2000. Opportunistic data structures with applications. In Proceedings of the 41st Annual Symposium on Foundations of Compute r Science (FOCS 2000). IEEE Computer Society, Washington, DC, 390--398. Google ScholarDigital Library
- Ferragina, P. and Manzini, G. 2005. Indexing compressed text. J. ACM 52, 4, 552--581. Google ScholarDigital Library
- Ferragina, P., Manzini, G., Mäkinen, V., and Navarro, G. 2007. Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms (TALG) 3, 2. Google ScholarDigital Library
- Fredriksson, K. 2007. Succinct pattern matching automata. Tech, rep. A-2007-1, Department of Computer Science, University of Joensuu.Google Scholar
- Fredriksson, K. and Nikitin, F. 2007. Simple compression code supporting random access and fast string matching. In Proceedings of the 6th Workshop on Efficient and Experimental Algorithms (WEA'07). LNCS 4525. Springer--Verlag, 203--216.Google Scholar
- Galil, Z. and Seiferas, J. 1981. Linear-time string matching using only a fixed number of local storage locations. Theor. Comput. Sci. 13, 3, 331--336.Google ScholarCross Ref
- Golynski, A., Munro, I., and Rao, S. S. 2006. Rank/select operations on large alphabets: a tool for text indexing. In Proceedings of SODA'06. ACM Press, New York, 368--373. Google ScholarDigital Library
- González, R., Grabowski, S., M&akinenuml;, V., and Navarro, G. 2005. Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA'05). CTI Press and Ellinika Grammata, Greece, 27--38.Google Scholar
- Grossi, R., Gupta, A., and Vitter, J. 2003. High-order entropy-compressed text indexes. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'03). ACM, New York, 841--850. Google ScholarDigital Library
- Grossi, R., Gupta, A., and Vitter, J. 2004. When indexing equals compression: Experiments with compressing suffix arrays and applications. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'04). ACM, New York, 636--645. Google ScholarDigital Library
- Horspool, R. N. 1980. Practical fast searching in strings. Softw. Pract. Exp. 10, 6, 501--506.Google ScholarCross Ref
- Huffman, D. A. 1951. A method for the construction of minimum redundancy codes. Proceedings of the I.R.E. 40, 1098--1101.Google ScholarCross Ref
- Hyyrö, H., Fredriksson, K., and Navarro, G. 2006. Increased bit-parallelism for approximate and multiple string matching. JEA 10, 2.6, 1--27. Google ScholarDigital Library
- Jacobson, G. 1989. Succinct static data structures. Ph.D. thesis, Carnegie Mellon University. Google ScholarDigital Library
- Karp, R. M. and Rabin, M. O. 1987. Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31, 2, 249--260. Google ScholarDigital Library
- Kim, D. K., Na, J. C., Kim, J. E., and Park., K. 2005. Efficient implementation of rank and select functions for succinct representation. In Proceedings of the 4th Workshop on Efficient and Experimental Algorithms (WEA'05). Springer, Berlin, Germany, 315--327.Google ScholarDigital Library
- Knuth, D. E., Morris, Jr, J. H., and Pratt, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 1, 323--350.Google ScholarDigital Library
- Manber, U. and Myers, G. 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22, 5, 935--948. Google ScholarDigital Library
- Muth, R. and Manber, U. 1996. Approximate multiple string search. In Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching, D. S. Hirschberg and E. W. Myers, Eds. Number 1075 in Lecture Notes in Computer Science. Springer-Verlag, Berlin, Laguna Beach, CA, 75--86. Google ScholarDigital Library
- Navarro, G. and Fredriksson, K. 2004. Average complexity of exact and approximate multiple string matching. Theoretical Computer Science A 321, 2--3, 283--290. Google ScholarDigital Library
- Navarro, G. and Mäkinen, V. 2007. Compressed full-text indexes. ACM Computing Surveys 39, 1. Google ScholarDigital Library
- Navarro, G. and Raffinot, M. 2000. Fast and flexible string matching by combining bit-parallelism and suffix automata. JEA 5, 4. Google ScholarDigital Library
- Navarro, G. and Raffinot, M. 2002. Flexible Pattern Matching in Strings—Practical On-Line Search Algorithms for Texts and Biological Sequences. Cambridge University Press. Google ScholarDigital Library
- Okanohara, D. and Sadakane, K. 2007. Practical entropy-compressed rank/select dictionary. In Proceedings of ALENEX'07. ACM Press, New York.Google Scholar
- Pagh, R. 1999. Low redundancy in static dictionaries with O(1) worst case lookup time. In Proceedings of ICALP'99. Springer-Verlag, Berlin, Germany, 595--604. Google ScholarDigital Library
- Salmela, L., Tarhio, J., and Kytöjoki, J. 2006. Multipattern string matching with q-grams. JEA 11, 1.1. Google ScholarDigital Library
- Sunday, D. M. 1990. A very fast substring search algorithm. Commun. ACM 33, 8, 132--142. Google ScholarDigital Library
- Weiner, P. 1973. Linear pattern matching algorithm. In Proceedings of the 14th Annual IEEE Symposium on Switching and Automata Theory. Washington, DC, 1--11. Google ScholarDigital Library
- Witten, I. H., Moffat, A., and Bell, T. C. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA. Google ScholarDigital Library
- Wu, S. and Manber, U. 1992. Fast text searching allowing errors. Commun. ACM 35, 10, 83--91. Google ScholarDigital Library
- Yao, A. C. 1979. The complexity of pattern matching for a random string. SIAM J. Comput. 8, 3, 368--387.Google ScholarCross Ref
Index Terms
- Succinct backward-DAWG-matching
Recommendations
Average-optimal string matching
The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple ...
An aggressive algorithm for multiple string matching
A new algorithm based on the Wu-Manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances in an aggressive manner. After each ...
Increased bit-parallelism for approximate and multiple string matching
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of a pattern of length m in a text of length n in time O(...
Comments