• DocumentCode
    2052028
  • Title

    Wavelet Trees: From Theory to Practice

  • Author

    Grossi, Roberto ; Vitter, Jeffrey Scott ; Xu, Bojian

  • Author_Institution
    Dipt. di Inf., Univ. di Pisa, Pisa, Italy
  • fYear
    2011
  • fDate
    21-24 June 2011
  • Firstpage
    210
  • Lastpage
    221
  • Abstract
    The wavelet tree data structure is a space-efficient technique for rank and select queries that generalizes from binary characters to an arbitrary multicharacter alphabet. It has become a key tool in modern full-text indexing and data compression because of its capabilities in compressing, indexing, and searching. We present a comparative study of its practical performance regarding a wide range of options on the dimensions of different coding schemes and tree shapes. Our results are both theoretical and experimental: (1) We show that the run-length δ coding size of wavelet trees achieves the 0-order empirical entropy size of the original string with leading constant 1, when the string´s 0-order empirical entropy is asymptotically less than the logarithm of the alphabet size. This result complements the previous works that are dedicated to analyzing run-length γ-encoded wavelet trees. It also reveals the scenarios when run-length δ encoding becomes practical. (2) We introduce a full generic package of wavelet trees for a wide range of options on the dimensions of coding schemes and tree shapes. Our experimental study reveals the practical performance of the various modifications.
  • Keywords
    data compression; tree data structures; wavelet transforms; alphabet size; data compression; multicharacter alphabet; space efficient technique; tree shapes; wavelet tree data structure; Arrays; Bismuth; Encoding; Entropy; Indexing; Shape; Vegetation; data compression; full-text indexing; pattern matching; wavelet tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression, Communications and Processing (CCP), 2011 First International Conference on
  • Conference_Location
    Palinuro
  • Print_ISBN
    978-1-4577-1458-0
  • Electronic_ISBN
    978-0-7695-4528-8
  • Type

    conf

  • DOI
    10.1109/CCP.2011.16
  • Filename
    6061128