Title :
TFD: A multi-pattern matching algorithm for large-scale URL filtering
Author :
Zhenlong Yuan ; Baohua Yang ; Xiaoqi Ren ; Yibo Xue
Author_Institution :
Dept. of Autom., Tsinghua Univ., Beijing, China
Abstract :
During the past decade, URL filtering systems have been widely applied to prevent people from browsing undesirable or malicious websites. However, the key method of URL filtering, such as URL blacklist filter, is more challenging due to the limited performance of existing multi-pattern matching algorithms. In this paper, we propose a multi-pattern matching algorithm named TFD for large-scale and high-speed URL filtering. TFD employs Two-phase hash, Finite state machine and Double-array storage to eliminate the performance bottleneck of blacklist filter. Experimental results show that TFD achieves better performance than existing work in terms of matching speed, preprocessing time and memory usage. Specially, on large-scale URL pattern sets (over 10 million URLs), with single thread, TFD´s matching speed reaches over 100Mbps on a general x86 platform.
Keywords :
Web sites; finite state machines; information filtering; large-scale systems; pattern matching; TFD matching speed; double-array storage; finite state machine; high-speed URL filtering; large-scale URL filtering system; large-scale URL pattern sets; malicious websites; matching speed; memory usage; multipattern matching algorithm; preprocessing time; single thread; still blacklist filter performance; two-phase hash; Arrays; Automata; Heuristic algorithms; Indexes; Memory management; Pattern matching; Uniform resource locators; Blacklist; Large-scale; Matching speed; Multi-pattern matching; URL filtering;
Conference_Titel :
Computing, Networking and Communications (ICNC), 2013 International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-5287-1
Electronic_ISBN :
978-1-4673-5286-4
DOI :
10.1109/ICCNC.2013.6504109