DocumentCode :
2919119
Title :
An Efficient Algorithm for Content Security Filtering Based on Double-Byte
Author :
Zhao, Yanping ; Lu, Wei
Author_Institution :
Beijing Inst. of Technol., Beijing
fYear :
2007
fDate :
23-24 May 2007
Firstpage :
300
Lastpage :
307
Abstract :
Nowadays, the task of security monitoring for vast Internet content has the problem of time efficiency. In improving the efficiency, we have studied and compared several typical Multi-pattern searching algorithms such as AC and Wu-Manber algorithms both in English and Chinese environment. Testing results show that the classic Multi-pattern matching algorithms are less efficient in the Chinese environment than in English. And we analyze the factors that cause this: Chinese characters are much bigger a set than English 26 letters, which repeat much but Chinese dose not in a text, and Chinese key word is much shorter than English. According to these factors, this paper presents a novel fast multi-pattern matching algorithm, Byte-Coding algorithm (BC) and a fast semantic content filtering algorithm based on the simple semantic characteristics. By adding the weights of different sizes to the key words, we can improve the accuracy and the speed of filtering system. We thoroughly compare our algorithm with the conventional ones in the speed of filtering. The results show that in multi-pattern mode its speed is at least ten times faster than the traditional AC, WM algorithm and more scaleable with the number of patterns increasing; in simple semantic with frequency calculations mode, this algorithm is still suitable and much faster. The algorithm can also apply to multi-languages environment and rapid parallel or distributed monitoring system as a core module.
Keywords :
Internet; character recognition; natural languages; pattern matching; security of data; Chinese characters; Internet content; core module; distributed monitoring system; double-byte-coding algorithm; fast semantic content security filtering; multipattern matching algorithm; parallel monitoring system; security monitoring; Algorithm design and analysis; Filtering algorithms; Information filtering; Information filters; Information security; Internet; Monitoring; National security; Pattern matching; Testing; Byte-Coding algorithm; Internet content security; multi-pattern matching; text filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligence and Security Informatics, 2007 IEEE
Conference_Location :
New Brunswick, NJ
Electronic_ISBN :
1-4244-1329-X
Type :
conf
DOI :
10.1109/ISI.2007.379489
Filename :
4258715
Link To Document :
بازگشت