Skip to main content

String Matching on the Internet

  • Conference paper
Combinatorial and Algorithmic Aspects of Networking (CAAN 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 3405))

Included in the following conference series:

Abstract

We consider a variant of the “string searching in database” problem where the string database comes on a data stream, and processing the data is at a premium but querying is not a runtime bottleneck. Speci.cally, the strings to be searched into (let’s call them the documents) have to be processed online very e.ciently, meaning the documents have to be added to some string searching data structure one by one in time proportional to their length. Of course, we desire this data structure to be small, i.e. at most linear space, and hopefully exhibit a tradeo. between storage/processing cost and accuracy. Upon some query string, the data structure must return whether that string is contained in a document (the presence query), and must also be able to return a list of the documents which contain the query (the attribution query). We may require that the query be large enough and that only portions of it may match (pattern matching). In practice, it is acceptable that the data structure return a superset of the answer, as long as no document from the answer is missing and there are only few false positives; either the false positives can be .ltered (by actual veri.cation if the document texts are available in a repository), or a small number of false positives are acceptable for the application (e.g. network forensics, see below).

This research is supported by NSF CyberTrust Grant 0430444, “Fornet: :Design and Implementation of a Network Forensics System”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bloom, B.: Space/time tradeoffs in hash coding with allowable errors. Communnications of the ACM 13(7), 422–426 (1970)

    Article  MATH  Google Scholar 

  2. Broder, A., Mitzenmatcher, M.: Network applications of Bloom filters: A survey. In: Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646 (2002)

    Google Scholar 

  3. Cao, P.: Bloom filters - the math, http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html

  4. Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: An efficient data structure for static support lookup tables. In: Proc. ACM/SIAM Symposium on Discrete Algorithms, pp. 30–39 (2004)

    Google Scholar 

  5. Cohen, S., Matias, Y.: Spectral Bloom filters. In: Proc. ACM SIGMOD International Conference on Management of Data, pp. 241–252 (2003)

    Google Scholar 

  6. Demaine, E.D., Lopez-Ortiz, A.: A linear lower bound on index size for text retrieval. Journal of Algorithms 48(1), 2–15 (2003); Special issue of selected papers from the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001)

    Article  MATH  MathSciNet  Google Scholar 

  7. Dharmapurikar, S., Attig, M., Lockwood, J.: Design and implementation of a string matching system for network intrusion detection using fpga-based bloom filters. Technical Report, CSE Dept, Washington University, Saint Louis, MO (2004)

    Google Scholar 

  8. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: A scalable wide-area web cache sharing protocol. IEEE /ACM Transactions on Networking 8(3), 281–293 (2000)

    Article  Google Scholar 

  9. Kumar, A., Li, L., Wang, J.: Space-code bloom filter for efficient traffic flow measurement. In: Proc. of the Conference on Internet Measurement, Miami Beach, FL, USA, pp. 167–172 (2003)

    Google Scholar 

  10. Manber, U.: Finding similar files in a large file system. In: Proc. of the Winter 1994 USENIX Conference, San Francisco, CA, pp. 1–10 (1994)

    Google Scholar 

  11. Mitzenmacher, M.: Compressed Bloom filters. IEEE/ACM Transactions on Networking 10(5), 613–620 (2002)

    Article  Google Scholar 

  12. Rhea, S.C., Liang, K., Brewer, E.: Value-based web caching. In: Proc. 12th International Conference on World Wide Web, pp. 619–628. ACM Press, New York (2003)

    Google Scholar 

  13. Shanmugasundaram, K., Brönnimann, H., Memon, N.: Payload attribution via hierarchical bloom filters. In: Proc. of the ACM Conference on Computer Communications and Security, pp. 31–41 (2004)

    Google Scholar 

  14. Shanmugasundaram, K., Memon, N., Savant, A., Brönnimann, H.: Fornet: A distributed forensics network. In: Proc. of MMM-ACNS Workshop, pp. 1–16 (2003)

    Google Scholar 

  15. Snoeren, A.C., Partridge, C., Sanchez, L.A., Jones, C.E., Tchakountio, F., Kent, S.T., Strayer, W.T.: Single-packet IP traceback. IEEE/ACM Transactions on Networking 10(6), 721–734 (2002)

    Article  Google Scholar 

  16. Spring, N.T., Wetherall, D.: A protocol-independent technique for eliminating redundant network traffic. In: Proc. of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 87–95. ACM Press, New York (2000)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brönnimann, H., Memon, N., Shanmugasundaram, K. (2005). String Matching on the Internet. In: López-Ortiz, A., Hamel, A.M. (eds) Combinatorial and Algorithmic Aspects of Networking. CAAN 2004. Lecture Notes in Computer Science, vol 3405. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527954_8

Download citation

  • DOI: https://doi.org/10.1007/11527954_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27873-3

  • Online ISBN: 978-3-540-31860-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics