Title :
Resource-efficient regular expression matching architecture for text analytics
Author_Institution :
IBM Res. - Zurich, Zurich, Switzerland
Abstract :
Text analytics systems, such as IBM´s SystemT software, rely on regular expressions (regexs) and dictionaries for transforming unstructured data into a structured format. Unlike network intrusion detection systems, text analytics systems compute and report precisely where the specific and sensitive information starts and ends in a text document. Therefore, advanced regex matching functions, such as start-offset reporting, capturing groups, and leftmost match computation are heavily used in text analytics systems. We present a novel regex matching architecture that supports such functions in a resource-efficient way. The resource efficiency is achieved by 1) eliminating state replication, 2) avoiding expensive offset comparison operations in leftmost match computation, and 3) minimizing the number of offset registers. Experiments on regex sets from text analytics and network intrusion detection domains, using an Altera Stratix IV FPGA, show that the proposed architecture achieves a more than threefold reduction of the logic resources used and a more than 1.25-fold increase of the clock frequency with respect to a recently proposed architecture that supports identical features.
Keywords :
data structures; dictionaries; field programmable gate arrays; text analysis; Altera Stratix IV FPGA; IBM SystemT software; advanced regex matching functions; capturing groups; clock frequency; dictionaries; leftmost match computation; logic resources; network intrusion detection domains; offset registers; resource efficiency; resource-efficient regular expression matching architecture; sensitive information; start-offset reporting; structured format; text analytics; text document; unstructured data; Clocks; Computer architecture; Hardware; Information retrieval; Redundancy; Registers; Vectors;
Conference_Titel :
Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on
Conference_Location :
Zurich
DOI :
10.1109/ASAP.2014.6868623