• DocumentCode
    10927
  • Title

    Universal Source Coding for Monotonic and Fast Decaying Monotonic Distributions

  • Author

    Shamir, Gil I.

  • Author_Institution
    Google, Inc., Pittsburgh, PA, USA
  • Volume
    59
  • Issue
    11
  • fYear
    2013
  • fDate
    Nov. 2013
  • Firstpage
    7194
  • Lastpage
    7211
  • Abstract
    We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size k, each probability parameter costs essentially 0.5log(n/k3) bits, where n is the coded sequence length, as long as k=o(n1/3). Otherwise, for k=O(n), the total average sequence redundancy is O(n1/3+ε) bits overall. We then show that there exists a sub-class of monotonic distributions over infinite alphabets for which redundancy of O(n1/3+ε) bits overall is still achievable. This class contains fast decaying distributions, including many distributions over the integers such as the family of Zipf distributions and geometric distributions. For some slower decays, including other distributions over the integers, redundancy of o(n) bits overall is achievable. A method to compute specific redundancy rates for such distributions is derived. The results are specifically true for finite entropy monotonic distributions. Finally, we study individual sequence redundancy behavior assuming a sequence is governed by a monotonic distribution. We show that for sequences whose empirical distributions are monotonic, individual redundancy bounds even tighter than those in the average case can be obtained. The relation of universal compression with monotonic distributions to universal compression of patterns of sequences is demonstrated.
  • Keywords
    entropy; probability; source coding; Zipf distributions; coded sequence length; fast decaying monotonic distribution; finite entropy monotonic distribution; geometric distributions; individual sequence redundancy behavior; infinite alphabet; probability parameter; redundancy bounds; sequence pattern; specific redundancy rates; total average sequence redundancy; universal compression; universal sequence compression; universal source coding; Decoding; Encoding; Entropy; Quantization (signal); Redundancy; Standards; Upper bound; Average redundancy; individual redundancy; large alphabets; monotonic distributions; patterns; universal compression;
  • fLanguage
    English
  • Journal_Title
    Information Theory, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9448
  • Type

    jour

  • DOI
    10.1109/TIT.2013.2281695
  • Filename
    6600940