DocumentCode
1880997
Title
Speeding up pattern matching by optimal partial string extraction
Author
Tan Jianlong ; Liu Xia ; Liu Yanbing ; Liu Ping
Author_Institution
Inst. of Comput. Technol., Chinese Acad. of Sci., Beijing, China
fYear
2011
fDate
10-15 April 2011
Firstpage
1030
Lastpage
1035
Abstract
String matching plays a key role in web content monitoring systems. Suffix matching algorithms have good time efficiency, and thus are widely used. These algorithms require that all patterns in a set have the same length. When the patterns cannot satisfy this requirement, the leftmost characters, m being the length of the shortest pattern, are extracted to construct the data structure. We call such m-character strings partial strings. However, a simple extraction from the left does not address the impact of partial string locations on search speed. We propose a novel method to extract the partial strings from each pattern which maximizes search speed. More specifically, with this method we can compute all the corresponding searching time cost by theoretical derivation, and choose the location which yields an approximately minimal search time. We evaluate our method on two rule sets: Snort and ClamAV. Experiments show that in most cases, our method achieves the fastest searching speed in all possible locations of partial string extraction, and is about 5%-20% faster than the alternative methods.
Keywords
Internet; feature extraction; string matching; ClamAV; Internet; Snort; Web content monitoring system; optimal partial string extraction; pattern matching; searching time cost; string matching; suffix matching algorithm; Algorithm design and analysis; Automata; Bills of materials; Data mining; Data structures; Pattern matching; Training; SBOM algorithm; Wu Manber algorithm; multiple string matching; pattern matching; string matching;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4577-0249-5
Electronic_ISBN
978-1-4577-0248-8
Type
conf
DOI
10.1109/INFCOMW.2011.5928778
Filename
5928778
Link To Document