Title :
A New String Search Tree with Multiple Alignment
Author :
Kim, Sung-Hwan ; Kim, Kwanghwi ; Cho, Hwan-Gue
Author_Institution :
Dept. of Comput. Eng., Pusan Nat. Univ., Busan, South Korea
Abstract :
String similarity search is a fundamental problem in computer science and can be applied to various fields of information processing. Construction of an efficient indexing structure is an important challenge to improve search performance. In this paper, we propose a new effective tree structure for string similarity search under Levenshtein distance. We present a bottom-up approach to construct the search tree, the internal nodes of which indicate a multiple-sequence alignment of its child leaf nodes, while a leaf node representing a string element. Defining an operation to merge strings and an algorithm for computing distance to merged strings, we present an efficient search strategy with our proposed tree structure. We also demonstrate the search performance with an experimental evaluation, and verify that our method outperforms the sequential search by at most a factor of 15 in terms of the search time.
Keywords :
indexing; merging; string matching; tree searching; trees (mathematics); Levenshtein distance; bottom-up approach; child leaf nodes; computer science; indexing structure; information processing; internal nodes; multiple-sequence alignment; search performance; search strategy; sequential search; string element; string merging; string search tree; string similarity search; tree structure; Equations; Indexing; Linear programming; Merging; Nearest neighbor searches; Search problems; Vectors; Approximate Dictionary; Edit Distance; Multiple Sequence Alignment; String Similarity Search;
Conference_Titel :
Computer and Information Technology (CIT), 2012 IEEE 12th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4673-4873-7
DOI :
10.1109/CIT.2012.103