DocumentCode :
1992156
Title :
TFD: A multi-pattern matching algorithm for large-scale URL filtering
Author :
Zhenlong Yuan ; Baohua Yang ; Xiaoqi Ren ; Yibo Xue
Author_Institution :
Dept. of Autom., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
28-31 Jan. 2013
Firstpage :
359
Lastpage :
363
Abstract :
During the past decade, URL filtering systems have been widely applied to prevent people from browsing undesirable or malicious websites. However, the key method of URL filtering, such as URL blacklist filter, is more challenging due to the limited performance of existing multi-pattern matching algorithms. In this paper, we propose a multi-pattern matching algorithm named TFD for large-scale and high-speed URL filtering. TFD employs Two-phase hash, Finite state machine and Double-array storage to eliminate the performance bottleneck of blacklist filter. Experimental results show that TFD achieves better performance than existing work in terms of matching speed, preprocessing time and memory usage. Specially, on large-scale URL pattern sets (over 10 million URLs), with single thread, TFD´s matching speed reaches over 100Mbps on a general x86 platform.
Keywords :
Web sites; finite state machines; information filtering; large-scale systems; pattern matching; TFD matching speed; double-array storage; finite state machine; high-speed URL filtering; large-scale URL filtering system; large-scale URL pattern sets; malicious websites; matching speed; memory usage; multipattern matching algorithm; preprocessing time; single thread; still blacklist filter performance; two-phase hash; Arrays; Automata; Heuristic algorithms; Indexes; Memory management; Pattern matching; Uniform resource locators; Blacklist; Large-scale; Matching speed; Multi-pattern matching; URL filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing, Networking and Communications (ICNC), 2013 International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-5287-1
Electronic_ISBN :
978-1-4673-5286-4
Type :
conf
DOI :
10.1109/ICCNC.2013.6504109
Filename :
6504109
Link To Document :
بازگشت