Skip to main content

Fast Multiple String Matching Using Streaming SIMD Extensions Technology

  • Conference paper
String Processing and Information Retrieval (SPIRE 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7608))

Included in the following conference series:

Abstract

Searching for all occurrences of a given set of patterns in a text is a fundamental problem in computer science with applications in many fields, like computational biology and intrusion detection systems.

In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. This study introduces a filter based exact multiple string matching algorithm, which benefits from Intel’s SSE (streaming SIMD extensions) technology for searching long strings. Our experimental results on various conditions show that the proposed algorithm outperforms other solutions, which are known to be among the fastest in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho, A.V., Corasick, M.J.: Efficient string matching: an aid to bibliographic search. Commun. ACM 18(6), 333–340 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Communications of the ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  3. Ben-Kiki, O., Bille, P., Breslauer, D., Gasieniec, L., Grossi, R., Weimann, O.: Optimal packed string matching. In: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011), vol. 13, pp. 423–432 (2011)

    Google Scholar 

  4. Cantone, D., Faro, S., Giaquinta, E.: A Compact Representation of Nondeterministic (Suffix) Automata for the Bit-Parallel Approach. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 288–298. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Cantone, D., Faro, S., Giaquinta, E.: On the bit-parallel simulation of the nondeterministic aho-corasick and suffix automata for a set of patterns. J. Discrete Algorithms 11, 25–36 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  6. Crochemore, M., Rytter, W.: Text algorithms. Oxford University Press (1994)

    Google Scholar 

  7. Faro, S., Lecroq, T.: Efficient variants of the backward-oracle-matching algorithm. Int. J. Found. Comput. Sci. 20(6), 967–984 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Faro, S., Lecroq, T.: The exact string matching problem: a comprehensive experimental evaluation. Arxiv preprint arXiv:1012.2547 (2010)

    Google Scholar 

  9. Faro, S., Lecroq, T.: Smart: a string matching algorithm research tool. Univ. of Catania and Univ. of Rouen (2011), http://www.dmi.unict.it/~faro/smart/

  10. Faro, S., Lecroq, T.: The exact online string matching problem: a review of the most recent results. ACM Computing Surveys (to appear)

    Google Scholar 

  11. Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  12. Külekci, M.O.: Filter based fast matching of long patterns by using SIMD instructions. In: Proc. of the Prague Stringology Conference, pp. 118–128 (2009)

    Google Scholar 

  13. Külekci, M.O.: Blim: A new bit-parallel pattern matching algorithm overcoming computer word size limitation. Mathematics in Comp. Science 3(4), 407–420 (2010)

    Article  MATH  Google Scholar 

  14. Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: Fast extended string matching. In: Comb. Pattern Matching, pp. 14–33 (1998)

    Google Scholar 

  15. Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM J. Experimental Algorithmics 5, 4 (2000)

    Article  MathSciNet  Google Scholar 

  16. Navarro, G., Raffinot, M.: Flexible pattern matching in strings - practical on-line search algorithms for texts and biological sequences. Cambridge Univ. Press (2002)

    Google Scholar 

  17. Navarro, G., Fredriksson, K.: Average complexity of exact and approximate multiple string matching. Theor. Comput. Sci. 321(2-3), 283–290 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. Rivals, E., Salmela, L., Kiiskinen, P., Kalsi, P., Tarhio, J.: mpscan: Fast Localisation of Multiple Reads in Genomes. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 246–260. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  19. Wu, S., Manber, U.: Agrep – a fast approximate pattern-matching tool. In: Proc. of USENIX Winter 1992 Technical Conference, pp. 153–162 (1992)

    Google Scholar 

  20. Wu, S., Manber, U.: A fast algorithm for multi-pattern searching. Report TR-94-17, Dep. of Computer Science, University of Arizona, Tucson, AZ (1994)

    Google Scholar 

  21. Wu, S., Manber, U.: Fast text searching: allowing errors. Commun. ACM 35(10), 83–91 (1992)

    Article  Google Scholar 

  22. Gog, S., Karhu, K., Kärkkäinen, J., Mäkinen, V., Välimäki, N.: Multi-pattern matching with bidirectional indexes. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds.) COCOON 2012. LNCS, vol. 7434, pp. 384–395. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  23. Salmela, L., Tarhio, J., Kyotojoki, J.: Multi–pattern string matching with q–grams. ACM J. Experimental Algorithmics 11 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Faro, S., Külekci, M.O. (2012). Fast Multiple String Matching Using Streaming SIMD Extensions Technology. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds) String Processing and Information Retrieval. SPIRE 2012. Lecture Notes in Computer Science, vol 7608. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34109-0_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34109-0_23

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34108-3

  • Online ISBN: 978-3-642-34109-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics