Title :
A New Indexing Method for Approximate Search in Text Databases
Author :
Shi, Fei ; Mefford, Cread
Author_Institution :
Comput. Sci. Dept., Suffolk Univ., Boston, MA
Abstract :
We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (kges0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user chooses k=1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components - a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implementation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file
Keywords :
database indexing; information retrieval; query processing; search problems; text analysis; tree data structures; vocabulary; V-tree partition method; approximate keyword search; data structure; index structure; inverted index file; query word; text database; vocabulary; Data structures; Database systems; Dictionaries; Indexes; Indexing; Information retrieval; Keyword search; Proteins; Sequences; Vocabulary;
Conference_Titel :
Computer and Information Technology, 2005. CIT 2005. The Fifth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
0-7695-2432-X
DOI :
10.1109/CIT.2005.23