Title :
Design of Fast Multiple String Searching Based on Improved Prefix Tree
Author :
Cheng, Yu ; Zhang, Tao
Author_Institution :
Dept. of Biomed. Eng., Tsinghua Univ., Beijing, China
Abstract :
Multi-string matching is one of the most important components in data mining task. New applications in many technology fields require high performance string matching algorithms. This paper first presents a new string searching approach based on a data structure called prefix tree. The innovative algorithm eliminates the functional overlap of the table HASH and Prefix Function. Then we make a little improvement on the prefix tree and present a second algorithm that is faster and more space-saving. It is demonstrated analytically that the two algorithms inherit the optimality and are very competitive in practice. On tests of both real life and synthetic data, our algorithms are also efficient and especially effective for various string pattern and large alphabet sets.
Keywords :
data mining; data structures; string matching; data mining; data structure; fast multiple string searching design; improved prefix tree function; multistring matching algorithm; synthetic data; table hash function; Algorithm design and analysis; Application software; Automation; Biomedical engineering; Computer science; Data mining; Life testing; Pattern matching; Payloads; Tree data structures; Multi-string matching; prefix tree; string pattern;
Conference_Titel :
Knowledge Discovery and Data Mining, 2010. WKDD '10. Third International Conference on
Conference_Location :
Phuket
Print_ISBN :
978-1-4244-5397-9
Electronic_ISBN :
978-1-4244-5398-6
DOI :
10.1109/WKDD.2010.138