DocumentCode
2857802
Title
Compressing multisets using tries
Author
Gripon, Vincent ; Rabbat, Michael ; Skachek, Vitaly ; Gross, Warren J.
Author_Institution
Dept. of Electr. & Comput. Eng., McGill Univ., Montreal, QC, Canada
fYear
2012
fDate
3-7 Sept. 2012
Firstpage
642
Lastpage
646
Abstract
We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2n. One expects that encoding the (unordered) multiset should lead to significant savings in rate as compared to encoding an (ordered) sequence with the same words, since information about the order of words in the sequence corresponds to a permutation. We propose and analyze a practical multiset encoder/decoder based on the trie data structure. The act of encoding requires O(m(n + log m)) operations, and decoding requires O(mn) operations. Of particular interest is the case where cardinality of the multiset scales as m = 1/c2n for some c >; 1, as n → ∞. Under this scaling, and when the words in the multiset are drawn independently and uniformly, we show that the proposed encoding leads to an arbitrary improvement in rate over encoding an ordered sequence with the same words. Moreover, the expected length of the proposed codes in this setting is asymptotically within a constant factor of 5/3 of the lower bound.
Keywords
computational complexity; data compression; data structures; set theory; O(m(n + log m)) operation; O(mn) operation; constant factor; multiset cardinality; multiset compression; multiset decoder; multiset encoder; multiset encoding; multiset lossless representation; trie data structure; Channel coding; Complexity theory; Conferences; Decoding; Entropy; Manganese;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Theory Workshop (ITW), 2012 IEEE
Conference_Location
Lausanne
Print_ISBN
978-1-4673-0224-1
Electronic_ISBN
978-1-4673-0222-7
Type
conf
DOI
10.1109/ITW.2012.6404756
Filename
6404756
Link To Document