Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Abstract :
We study asymptotically the redundancy of Huffman (and other) codes. It has been known from the inception of the Huffman (1952) code that in the worst case its redundancy-defined as the excess of the code length over the optimal (ideal) code length-is not more than one. However, to the best of our knowledge no precise asymptotic results have been reported in literature thus far. We consider here a memoryless binary source generating a sequence of length n distributed as binomial (n, p) with p being the probability of emitting 0. Based on the results of Stubley (1994), we prove that for p<1/2 the average redundancy R¯nH of the Huffman code becomes as n→∞: R¯nH={(3/2-(1/ln2+o(1))=0.057304…, α irrational); (3/2-(1/M)(⟨βMn⟩-½)); (-(1/M(1-2-1M/))2-⟨nβM⟩M/); (+O(ρn), α=N/M rational); where α=log2 (1-p)/p and β=-log2(1-p), ρ<1, M, N are integers such that gcd (N, M)=1, and ⟨x⟩=x-[x] is the fractional part of x. The appearance of the fractal-like function ⟨βMn⟩ explains the erratic behavior of the Huffman redundancy, and its “resistance” to succumb to a precise analysis. As a side result, we prove that the average redundancy of the Shannon block code is as n→∞: R¯nS{(½+o(1), α irrational); (½-1/M (⟨Mnβ⟩-½)); (+O(ρn), α=N/M rational); where ρ<1. Finally, we derive the redundancy of the Golomb (1966) code (for the geometric distribution) which can be viewed as a special case of the Huffman and Shannon codes, Golomb´s code redundancy has only oscillating behavior (i.e., there is not convergent mode). These findings are obtained by analytic methods such as theory of distribution of sequences modulo 1 and Fourier series
Keywords :
Fourier series; Huffman codes; binary sequences; binomial distribution; block codes; memoryless systems; redundancy; Fourier series; Golomb code; Huffman code; Huffman redundancy; Shannon block code; Shannon code; asymptotic average redundancy; binary sequence; binomial distribution; block codes; code length; distribution theory; fractal-like function; geometric distribution; lossless coding; memoryless binary source; oscillating behavior; probability; sequence length; Binary sequences; Block codes; Computer science; Entropy; Fourier series; Fractals; Huffman coding; Information theory; Postal services; Source coding;