Title :
Fast string matching with space-efficient word graphs
Author :
Yata, Susumu ; Morita, Kazuhiro ; Fuketa, Masao ; Aoe, Jun-Ichi
Author_Institution :
Inst. of Technol. & Sci., Univ. of Tokushima, Tokushima
Abstract :
String matching is one of the fundamentals in various text-processing applications such as text mining and content filtering systems. This paper describes a fast string matching algorithm using a compact pattern matching machine DAWG. A directed acyclic word graph (DAWG) is traditionally implemented with a 2-dimensional linked list or matrix. However, DAWGs with these structures have drawbacks, the lookup time of the linked list based one is slow and the space requirement of the matrix based one is large. Therefore, this paper proposes a novel DAWG based on a compacted double-array, which overcomes the drawbacks of traditional ones. Experimental results show that the novel DAWG is more efficient than traditional ones.
Keywords :
data mining; directed graphs; pattern matching; text analysis; compact pattern matching machine; compacted double-array; content filtering systems; directed acyclic word graph; fast string matching; space-efficient word graphs; text mining; text-processing applications; Application specific integrated circuits; Dictionaries; Electronic mail; Field programmable gate arrays; Filtering; Handheld computers; Matched filters; Pattern matching; Space technology; Text mining;
Conference_Titel :
Innovations in Information Technology, 2008. IIT 2008. International Conference on
Conference_Location :
Al Ain
Print_ISBN :
978-1-4244-3396-4
Electronic_ISBN :
978-1-4244-3397-1
DOI :
10.1109/INNOVATIONS.2008.4781726