• DocumentCode
    76122
  • Title

    Towards Fast and Optimal Grouping of Regular Expressions via DFA Size Estimation

  • Author

    Tingwen Liu ; Liu, Alex X. ; Jinqiao Shi ; Yong Sun ; Li Guo

  • Author_Institution
    Inst. of Inf. Eng., Beijing, China
  • Volume
    32
  • Issue
    10
  • fYear
    2014
  • fDate
    Oct. 2014
  • Firstpage
    1797
  • Lastpage
    1809
  • Abstract
    Regular Expression (RegEx) matching, as a core operation in many network and security applications, is typically performed on Deterministic Finite Automata (DFA) to process packets at wire speed; however, DFA size is often exponential in the number of RegExes. RegEx grouping is the practical way to address DFA state explosion. Prior RegEx grouping algorithms are extremely slow and memory intensive. In this paper, we first propose DFAestimator, an algorithm that can quickly estimate DFA size for a given RegEx set without building the actual DFA. Second, we propose RegexGrouper, a RegEx grouping algorithm based on DFA size estimation. In terms of speed and memory consumption, our work is orders of magnitude more efficient than prior art because DFA size estimation is much faster and memory efficient than DFA construction. In terms of the resulting size sum of DFAs, our work is significantly more effective than prior art because we use a much finer grained quantification of the degree of interaction between two RegExes. For example, to divide the RegEx set of the L7-filter system into 7 groups, prior art uses 279.3 minutes and the resulting 7 DFAs have a total of 29047 states, whereas RegexGrouper uses 3.2 minutes and the resulting 7 DFAs have a total of 15578 states.
  • Keywords
    deterministic automata; finite automata; pattern matching; DFA size estimation; DFA state explosion; DFAestimator algorithm; L7-filter system; RegEx grouping; RegEx matching; deterministic finite automata; regular expression; Art; Automata; Estimation; Explosions; Memory management; Partitioning algorithms; Upper bound; Deep packet inspection; intrusion prevention systems; regular expression matching;
  • fLanguage
    English
  • Journal_Title
    Selected Areas in Communications, IEEE Journal on
  • Publisher
    ieee
  • ISSN
    0733-8716
  • Type

    jour

  • DOI
    10.1109/JSAC.2014.2358839
  • Filename
    6902788