• DocumentCode
    2857802
  • Title

    Compressing multisets using tries

  • Author

    Gripon, Vincent ; Rabbat, Michael ; Skachek, Vitaly ; Gross, Warren J.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
  • fYear
    2012
  • fDate
    3-7 Sept. 2012
  • Firstpage
    642
  • Lastpage
    646
  • Abstract
    We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2n. One expects that encoding the (unordered) multiset should lead to significant savings in rate as compared to encoding an (ordered) sequence with the same words, since information about the order of words in the sequence corresponds to a permutation. We propose and analyze a practical multiset encoder/decoder based on the trie data structure. The act of encoding requires O(m(n + log m)) operations, and decoding requires O(mn) operations. Of particular interest is the case where cardinality of the multiset scales as m = 1/c2n for some c >; 1, as n → ∞. Under this scaling, and when the words in the multiset are drawn independently and uniformly, we show that the proposed encoding leads to an arbitrary improvement in rate over encoding an ordered sequence with the same words. Moreover, the expected length of the proposed codes in this setting is asymptotically within a constant factor of 5/3 of the lower bound.
  • Keywords
    computational complexity; data compression; data structures; set theory; O(m(n + log m)) operation; O(mn) operation; constant factor; multiset cardinality; multiset compression; multiset decoder; multiset encoder; multiset encoding; multiset lossless representation; trie data structure; Channel coding; Complexity theory; Conferences; Decoding; Entropy; Manganese;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Theory Workshop (ITW), 2012 IEEE
  • Conference_Location
    Lausanne
  • Print_ISBN
    978-1-4673-0224-1
  • Electronic_ISBN
    978-1-4673-0222-7
  • Type

    conf

  • DOI
    10.1109/ITW.2012.6404756
  • Filename
    6404756