• DocumentCode
    610060
  • Title

    Faster Compressed Top-k Document Retrieval

  • Author

    Wing-Kai Hon ; Thankachan, S.V. ; Shah, Rohan ; Vitter, Jeffrey Scott

  • Author_Institution
    Nat. Tsing Hua Univ., Hsinchu, Taiwan
  • fYear
    2013
  • fDate
    20-22 March 2013
  • Firstpage
    341
  • Lastpage
    350
  • Abstract
    Let D = {d1, d2,...dD} be a given collection of D string documents of total length n, our task is to index D, such that whenever a pattern P (of length p) and an integer k come as a query, those k documents in which P appears the most number of times can be listed efficiently. In this paper, we propose a compressed index taking 2|CSA| + D logn/D + O(D) + o(n) bits of space, which answers a query with O(tsa log k logϵ n) per document report time. This improves the O(tsa log k log1+ϵ n) per document report time of the previously best-known index with (asymptotically) the same space requirements [Belazzougui and Navarro, SPIRE 2011]. Here, |CSA| represents the size (in bits) of the compressed suffix array (CSA) of the text obtained by concatenating all documents in V, and tsa is the time for decoding a suffix array value using the CSA.
  • Keywords
    concatenated codes; data compression; decoding; query processing; CSA; D string document collection; compressed suffix array index; decoding; document concatenation; faster compressed top-k document retrieval; query processing; Abstracts; Arrays; Data compression; Decoding; Educational institutions; Indexes; Query processing; Compressed Index; Document Retrieval; Top-k;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2013
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-4673-6037-1
  • Type

    conf

  • DOI
    10.1109/DCC.2013.42
  • Filename
    6543070