Title :
A Memory-Efficient Multi-pattern Matching Algorithm Based on the Bitmap
Author :
Zhang, Weizhe ; Zhang, Yuanjing ; Zhang, HongLi ; Gu, Xuemai ; Cheng, Albert M K
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Abstract :
The Aho-Corasick algorithm is an efficient multiple pattern matching algorithm for large scale pattern sets. However, it consumes too much memory. A new efficient space optimization algorithm (AC-Bitmap) is proposed, which is based on the data structure of the bitmap. It divides all the states in the automata into two groups by their depths in the dictionary tree of all the patterns, and reduces the deeper group´s memory consumption which is retrieved less in matching. For the latter group, it also makes use of the bitmap to improve its retrieval time efficiency. Experiments indicate that the AC-Bitmap algorithm significantly reduces the memory consumption, which still keeps the time efficiency compared to the AC algorithm for random texts and literatures.
Keywords :
data structures; optimisation; pattern matching; text analysis; trees (mathematics); AC-Bitmap; Aho-Corasick algorithm; data structure; large scale pattern sets; memory consumption; memory-efficient multipattern matching algorithm; random texts; space optimization algorithm; Automata; Computer science; Data structures; Dictionaries; Information retrieval; Internet; Intrusion detection; Large-scale systems; Pattern matching; Space technology; AC-Bitmap algorithm; Aho-Corasick algorithm; multipattern matching; string matching;
Conference_Titel :
Internet Computing for Science and Engineering (ICICSE), 2009 Fourth International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-6754-9
DOI :
10.1109/ICICSE.2009.37