• DocumentCode
    705014
  • Title

    Approximating Document Frequency for Self-Index based Top-k Document Retrieval

  • Author

    Suzuki, Tokinori ; Fujii, Atsushi

  • Author_Institution
    Dept. of Comput. Sci., Tokyo Inst. of Technol., Tokyo, Japan
  • fYear
    2015
  • fDate
    24-27 March 2015
  • Firstpage
    541
  • Lastpage
    546
  • Abstract
    Top-k document retrieval, which returns highly relevant documents relative to a query, is an essential task for many applications. One of the promising index frameworks is built by FM-index and wavelet tree for supporting efficient top-k document retrieval. The index, however, has difficulty on handling document frequency (DF) at search time because indexed terms are all substrings of a document collection. Previous works exhaustively search all the parts of the index, where most of the documents are not relevant, for DF calculation or store recalculated DF values in huge additional space. In this paper, we propose two methods to approximate DF of a query term by exploiting the information obtained from the process of traversing the index structures. Experimental results showed that our methods achieved almost equal effectiveness of exhaustive search while keeping search efficiency that time of our methods are about a half of the exhaustive search.
  • Keywords
    approximation theory; document handling; query processing; trees (mathematics); wavelet transforms; FM-index; document collection; document frequency approximation; exhaustive search; index frameworks; index structures; query term; search efficiency; search time; self-index based top-k document retrieval; wavelet tree; Accuracy; Approximation methods; Arrays; Correlation; Indexes; Mathematical model; Resource description framework; FM-index; approximate search; wavelet tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Information Networking and Applications Workshops (WAINA), 2015 IEEE 29th International Conference on
  • Conference_Location
    Gwangiu
  • Print_ISBN
    978-1-4799-1774-7
  • Type

    conf

  • DOI
    10.1109/WAINA.2015.68
  • Filename
    7096233