• DocumentCode
    2734930
  • Title

    Opportunistic data structures with applications

  • Author

    Ferragina, Paolo ; Manzini, Giovanni

  • Author_Institution
    Dipt. di Inf., Pisa Univ., Italy
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    390
  • Lastpage
    398
  • Abstract
    We address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because text T[1,u] is stored using O(Hk (T))+o(1) bits per input symbol in the worst case, where Hk (T) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P[1,p], the opportunistic data structure allows to search for the occurrences of P in T in O(p+occlog εu) time (for any fixed ε>0). If data are uncompressible we achieve the best space bound currently known (Grossi and Vitter, 2000); on compressible data our solution improves the succinct suffix array of (Grossi and Vitter, 2000) and the classical suffix tree and suffix array data structures either in space or in query time or both. We also study our opportunistic data structure in a dynamic setting and devise a variant achieving effective search and update time bounds. Finally, we show how to plug our opportunistic data structure into the Glimpse tool (Manber and Wu, 1994). The result is an indexing tool which achieves sublinear space and sublinear query time complexity
  • Keywords
    computational complexity; data compression; data structures; database indexing; database theory; Glimpse tool; data compression; data indexing; data set; entropy; opportunistic data structures; query performance; search; sublinear query time complexity; sublinear space complexity; succinct suffix array; suffix array data structures; suffix tree data structures; Computer science; Costs; Data engineering; Data structures; Entropy; Fault tolerance; Indexing; Plugs; Postal services; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on
  • Conference_Location
    Redondo Beach, CA
  • ISSN
    0272-5428
  • Print_ISBN
    0-7695-0850-2
  • Type

    conf

  • DOI
    10.1109/SFCS.2000.892127
  • Filename
    892127