DocumentCode
76122
Title
Towards Fast and Optimal Grouping of Regular Expressions via DFA Size Estimation
Author
Tingwen Liu ; Liu, Alex X. ; Jinqiao Shi ; Yong Sun ; Li Guo
Author_Institution
Inst. of Inf. Eng., Beijing, China
Volume
32
Issue
10
fYear
2014
fDate
Oct. 2014
Firstpage
1797
Lastpage
1809
Abstract
Regular Expression (RegEx) matching, as a core operation in many network and security applications, is typically performed on Deterministic Finite Automata (DFA) to process packets at wire speed; however, DFA size is often exponential in the number of RegExes. RegEx grouping is the practical way to address DFA state explosion. Prior RegEx grouping algorithms are extremely slow and memory intensive. In this paper, we first propose DFAestimator, an algorithm that can quickly estimate DFA size for a given RegEx set without building the actual DFA. Second, we propose RegexGrouper, a RegEx grouping algorithm based on DFA size estimation. In terms of speed and memory consumption, our work is orders of magnitude more efficient than prior art because DFA size estimation is much faster and memory efficient than DFA construction. In terms of the resulting size sum of DFAs, our work is significantly more effective than prior art because we use a much finer grained quantification of the degree of interaction between two RegExes. For example, to divide the RegEx set of the L7-filter system into 7 groups, prior art uses 279.3 minutes and the resulting 7 DFAs have a total of 29047 states, whereas RegexGrouper uses 3.2 minutes and the resulting 7 DFAs have a total of 15578 states.
Keywords
deterministic automata; finite automata; pattern matching; DFA size estimation; DFA state explosion; DFAestimator algorithm; L7-filter system; RegEx grouping; RegEx matching; deterministic finite automata; regular expression; Art; Automata; Estimation; Explosions; Memory management; Partitioning algorithms; Upper bound; Deep packet inspection; intrusion prevention systems; regular expression matching;
fLanguage
English
Journal_Title
Selected Areas in Communications, IEEE Journal on
Publisher
ieee
ISSN
0733-8716
Type
jour
DOI
10.1109/JSAC.2014.2358839
Filename
6902788
Link To Document