ABSTRACT
Regular expression matching is a crucial task in several networking applications. Current implementations are based on one of two types of finite state machines. Non-deterministic finite automata (NFAs) have minimal storage demand but have high memory bandwidth requirements. Deterministic finite automata (DFAs) exhibit low and deterministic memory bandwidth requirements at the cost of increased memory space. It has already been shown how the presence of wildcards and repetitions of large character classes can render DFAs and NFAs impractical. Additionally, recent security-oriented rule-sets include patterns with advanced features, namely back-references, which add to the expressive power of traditional regular expressions and cannot therefore be supported through classical finite automata.
In this work, we propose and evaluate an extended finite automaton designed to address these shortcomings. First, the automaton provides an alternative approach to handle character repetitions that limits memory space and bandwidth requirements. Second, it supports back-references without the need for back-tracking in the input string. In our discussion of this proposal, we address practical implementation issues and evaluate the automaton on real-world rule-sets. To our knowledge, this is the first high-speed automaton that can accommodate all the Perl-compatible regular expressions present in the Snort network intrusion and detection system.
- A. V. Aho and M. J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search," in Communications of the ACM, 1975. Google ScholarDigital Library
- J. E. Hopcroft and J. D. Ullman, "Introduction to Automata Theory, Languages, and Computation," Addison Wesley, 1979. Google ScholarDigital Library
- J. E. F. Friedl, "Mastering Regular Expressions," Third Edition, O'Reilly, August 2006 Google ScholarDigital Library
- Perl Compatible Regular Expressions: http://www.pcre.org/Google Scholar
- Ville Laurikari, "NFAs with Tagged Transitions, Their Conversion to Deterministic Automata and Application to Regular Expressions", in SPIRE 2000 Google ScholarDigital Library
- M. Roesch, "Snort: Lightweight Intrusion Detection for Networks," in 13th System Administration Conf., Nov 1999. Google ScholarDigital Library
- Snort: http://www.Snort.org/Google Scholar
- V. Paxson, "Bro: A System for Detecting Network Intruders in Real-Time", in Computer Networks, 31(23--24), Dec. 1999 Google ScholarDigital Library
- ClamAV: http://www.clamav.net/Google Scholar
- Cisco Security Appliance. http://www.cisco.com. 2007.Google Scholar
- Citrix Application Firewall. http://www.citrix.com. 2007.Google Scholar
- M. Altinel, M. J. Franklin, "Efficient Filtering of XML Documents for Selective Dissemination of Information", in Proc. VLDB Conference 2000. Google ScholarDigital Library
- R. Sommer and V. Paxson "Enhancing byte-level network intrusion detection signatures with context," in CCS 2003. Google ScholarDigital Library
- J. Newsome et al., "Polygraph: Automatic Signature Generation for Polymorphic Worms", in IEEE Security & Privacy Symp., 2005. Google ScholarDigital Library
- L. Tan, and T. Sherwood, "A High Throughput String Matching Architecture for Intrusion Detection and Prevention," in ISCA 2005. Google ScholarDigital Library
- F. Yu et al., "Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection", in ANCS 2006 Google ScholarDigital Library
- S. Kumar et al., "Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection," in ACM SIGCOMM, Sept 2006. Google ScholarDigital Library
- S. Kumar et al., "Advanced Algorithms for Fast and Scalable Deep Packet Inspection", ANCS 2006 Google ScholarDigital Library
- M. Becchi and P. Crowley, "An Improved Algorithm to Accelerate Regular Expression Evaluation", in ANCS 2007. Google ScholarDigital Library
- M. Becchi and P. Crowley, "A Hybrid Finite Automaton for Practical Deep Packet Inspection", in CoNEXT 2007. Google ScholarDigital Library
- S. Kumar et al. "Curing Regular Expressions Matching Algorithms from Insomnia, Amnesia, and Acalculia," in ANCS 2007. Google ScholarDigital Library
- R. Sidhu and V. K. Prasanna, "Fast Regular Expression Matching using FPGAs", in FCCM 2001 Google ScholarDigital Library
- R. Franklin et al., "Assisting Network Intrusion Detection with Reconfigurable Hardware," FCCM 2002. Google ScholarDigital Library
- C. Clark et al., "Efficient reconfigurable logic circuit for matching complex network intrusion detection patterns," in FLP 2003Google Scholar
- B. Brodie, et al., "A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching," in ISCA 2006. Google ScholarDigital Library
- A. Mitra et al., "Compiling PCRE to FPGA for Accelerating SNORT IDS", in ANCS 2007 Google ScholarDigital Library
- Becchi et al., "A workload for evaluating deep packet inspection architectures," in IISWC 2008Google Scholar
Index Terms
- Extending finite automata to efficiently match Perl-compatible regular expressions
Recommendations
Construction of fuzzy automata from fuzzy regular expressions
Li and Pedrycz have proved fundamental results that provide different equivalent ways to represent fuzzy languages with membership values in a lattice-ordered monoid, and generalize the well-known results of the classical theory of formal languages. In ...
Translating Regular Expressions into Small ε-Free Nondeterministic Finite Automata
We prove that every regular expression of size n can be converted into an equivalent nondeterministic -free finite automaton (NFA) with O(n(logn)2) transitions in time O(n2logn). The best previously known conversions result in NFAs of worst-case size (...
New finite automata corresponding to semiextended regular expressions
A semiextended regular expression is a regular expression having an intersection operation. It is known that a regular expression of length m can be transformed into a nondeterministic finite automaton of at most 2m states; however, if the semiextended ...
Comments