مرکز منطقه ای اطلاع رساني علوم و فناوري - A New Indexing Method for Approximate Search in Text Databases

DocumentCode :

2826132

Title :

A New Indexing Method for Approximate Search in Text Databases

Author :

Shi, Fei ; Mefford, Cread

Author_Institution :

Comput. Sci. Dept., Suffolk Univ., Boston, MA

fYear :

2005

fDate :

21-23 Sept. 2005

Firstpage :

Lastpage :

Abstract :

We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (kges0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user chooses k=1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components - a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implementation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file

Keywords :

database indexing; information retrieval; query processing; search problems; text analysis; tree data structures; vocabulary; V-tree partition method; approximate keyword search; data structure; index structure; inverted index file; query word; text database; vocabulary; Data structures; Database systems; Dictionaries; Indexes; Indexing; Information retrieval; Keyword search; Proteins; Sequences; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer and Information Technology, 2005. CIT 2005. The Fifth International Conference on

Conference_Location :

Shanghai

Print_ISBN :

0-7695-2432-X

Type :

conf

DOI :

10.1109/CIT.2005.23

Filename :

1562630

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2826132