DocumentCode :
2780235
Title :
Memory-efficient regular expression matching for Chinese network content audit
Author :
Zhu, Zezhi ; Lin, Ping ; Chen, Luying ; Zhang, Kun
Author_Institution :
Sch. of Inf. & Commun. Eng., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2009
fDate :
6-8 Nov. 2009
Firstpage :
144
Lastpage :
148
Abstract :
When match against Chinese keyword for network content audit, one of the biggest problems is that there is interference of ¿noise characters¿, it makes the traditional way using explicit string pattern to match infeasible. Regular expression matching can solve the problem perfectly, but the DFA-base approaches for regular expression matching will also encounter the problem of excessive memory usage. In this paper, we try to solve the problem encountered when applying regular expression to Chinese network content audit. We propose a regular expression rewriting techniques and grouping principle that can solve excessive memory usage problem in DFA-based approach. Our solution can make it possible to apply regular expression to Chinese network content audit.
Keywords :
content management; finite automata; pattern matching; Chinese network content audit; DFA; deterministic finite automaton; excessive memory usage problem; grouping principle; keyword matching; memory-efficient regular expression matching; regular expression rewriting techniques; Automata; Character generation; Doped fiber amplifiers; Encoding; Interference; Natural languages; Pattern matching; Payloads; Chinese keyword matching; I Regular expression; Network content audit;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network Infrastructure and Digital Content, 2009. IC-NIDC 2009. IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4898-2
Electronic_ISBN :
978-1-4244-4900-6
Type :
conf
DOI :
10.1109/ICNIDC.2009.5360785
Filename :
5360785
Link To Document :
بازگشت