• DocumentCode
    2826132
  • Title

    A New Indexing Method for Approximate Search in Text Databases

  • Author

    Shi, Fei ; Mefford, Cread

  • Author_Institution
    Comput. Sci. Dept., Suffolk Univ., Boston, MA
  • fYear
    2005
  • fDate
    21-23 Sept. 2005
  • Firstpage
    70
  • Lastpage
    76
  • Abstract
    We present an index structure to support the approximate keyword search in text databases. In an approximate keyword search query, the user presents a query word Q and a tolerance value k (kges0), and wishes to find all documents in the database that contain the query word Q or any other word in the vocabulary that matches Q approximately (We say that two words match each other approximately if the edit distance between them does not exceed the tolerance value k. In a typical text database application, a user chooses k=1, 2, 3, or 4). Our index structure is built on the underlying vocabulary of the text database. The new technique has two principal components - a new data structure called the V-tree and its partition methods for clustering words in the vocabulary into subgroups. We have implemented our index structure and conducted experiments on real-world data. Our experiments show that even for very large text database, the construction of our index and a search for keywords that match the query word approximately can be done quickly. Our implementation makes it clear that the V-tree data structure can be easily integrated into existing access structures built on the database such as the inverted index file
  • Keywords
    database indexing; information retrieval; query processing; search problems; text analysis; tree data structures; vocabulary; V-tree partition method; approximate keyword search; data structure; index structure; inverted index file; query word; text database; vocabulary; Data structures; Database systems; Dictionaries; Indexes; Indexing; Information retrieval; Keyword search; Proteins; Sequences; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2005. CIT 2005. The Fifth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    0-7695-2432-X
  • Type

    conf

  • DOI
    10.1109/CIT.2005.23
  • Filename
    1562630