Abstract
Searching for all occurrences of a given set of patterns in a text is a fundamental problem in computer science with applications in many fields, like computational biology and intrusion detection systems.
In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. This study introduces a filter based exact multiple string matching algorithm, which benefits from Intel’s SSE (streaming SIMD extensions) technology for searching long strings. Our experimental results on various conditions show that the proposed algorithm outperforms other solutions, which are known to be among the fastest in practice.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)
Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Communications of the ACM 35(10), 74–82 (1992)
Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R., Weimann, O.: Optimal packed string matching. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011), vol. 13, pp. 423–432 (2011)
Cantone, D., Faro, S., Giaquinta, E.: A Compact Representation of Nondeterministic (Suffix) Automata for the Bit-Parallel Approach. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 288–298. Springer, Heidelberg (2010)
Cantone, D., Faro, S., Giaquinta, E.: On the bit-parallel simulation of the nondeterministic aho-corasick and suffix automata for a set of patterns. J. Discrete Algorithms 11, 25–36 (2012)
Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press (1994)
Faro, S., Lecroq, T.: Efficient variants of the backward-oracle-matching algorithm. Int. J. Found. Comput. Sci. 20(6), 967–984 (2009)
Faro, S., Lecroq, T.: The exact string matching problem: a comprehensive experimental evaluation. Arxiv preprint arXiv:1012.2547 (2010)
Faro, S., Lecroq, T.: Smart: a string matching algorithm research tool. Univ. of Catania and Univ. of Rouen (2011), http://www.dmi.unict.it/~faro/smart/
Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Computing Surveys (to appear)
Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)
Külekci, M.O.: Filter based fast matching of long patterns by using SIMD instructions. In: Proc. of the Prague Stringology Conference, pp. 118–128 (2009)
Külekci, M.O.: Blim: A new bit-parallel pattern matching algorithm overcoming computer word size limitation. Mathematics in Comp. Science 3(4), 407–420 (2010)
Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: Fast extended string matching. In: Comb. Pattern Matching, pp. 14–33 (1998)
Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Experimental Algorithmics 5, 4 (2000)
Navarro, G., Raffinot, M.: Flexible pattern matching in strings - practical on-line search algorithms for texts and biological sequences. Cambridge Univ. Press (2002)
Navarro, G., Fredriksson, K.: Average complexity of exact and approximate multiple string matching. Theor. Comput. Sci. 321(2-3), 283–290 (2004)
Rivals, E., Salmela, L., Kiiskinen, P., Kalsi, P., Tarhio, J.: mpscan: Fast Localisation of Multiple Reads in Genomes. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 246–260. Springer, Heidelberg (2009)
Wu, S., Manber, U.: Agrep – a fast approximate pattern-matching tool. In: Proc. of USENIX Winter 1992 Technical Conference, pp. 153–162 (1992)
Wu, S., Manber, U.: A fast algorithm for multi-pattern searching. Report TR-94-17, Dep. of Computer Science, University of Arizona, Tucson, AZ (1994)
Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)
Gog, S., Karhu, K., Kärkkäinen, J., Mäkinen, V., Välimäki, N.: Multi-pattern matching with bidirectional indexes. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds.) COCOON 2012. LNCS, vol. 7434, pp. 384–395. Springer, Heidelberg (2012)
Salmela, L., Tarhio, J., Kyotojoki, J.: Multi–pattern string matching with q–grams. ACM J. Experimental Algorithmics 11 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Faro, S., Külekci, M.O. (2012). Fast Multiple String Matching Using Streaming SIMD Extensions Technology. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-34109-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34108-3
Online ISBN: 978-3-642-34109-0
eBook Packages: Computer ScienceComputer Science (R0)