Title :
Tools for Very Fast Regular Expression Matching
Author :
Pasetto, Davide ; Petrini, Fabrizio ; Agarwal, Virat
Author_Institution :
IBM Comput. Sci. Center, Ireland
fDate :
3/1/2010 12:00:00 AM
Abstract :
Regular expressions, or regex, are a common choice for defining configurable rules for data parsing because of their expressiveness in detecting recurrent patterns and information. For many data-intensive applications, regex matching is the first line of defense in performing online data filtering. Unfortunately, few solutions can keep up with the increasing data rates and the complexity posed by sets with hundreds of expressions. DotStar addresses this problem by providing a complete algorithmic solution and a software tool chain that can compile large sets of user-provided regex first into a sequence of intermediate representations and then into an automaton that can search for matches in a single pass without backtracking. The entire software tool chain supports the extended Posix standard syntax for regex.
Keywords :
data handling; pattern matching; search problems; software packages; DotStar; data parsing; extended Posix standard syntax; fast regular expression matching; innovative algorithmic; match searching; online data filtering; recurrent information detection; recurrent pattern detection; regex matching; sets; software tool chain; user provided regex; Application software; Automata; Filtering; Matched filters; Software algorithms; Software standards; Software tools; DotStar; Expression matching; Multicore processors; Regular expressions;