Author :
Milidiú, Ruy Luiz ; Pessoa, Artur Alves ; Laber, Eduardo Sany
Author_Institution :
Dept. de Inf., PUC, Rio de Janeiro, Brazil
Abstract :
The minimum redundancy prefix code problem is to determine, for a given list W=[w1,...,wn] of n positive symbol weights, a list L=[l1,...,ln] of n corresponding integer codeword lengths such that Σi=1n2 -li⩽1 and Σi=1nwil i is minimized. With the optimal list of codeword lengths, an optimal canonical code can be easily obtained. If W is already sorted, then this optimal code can also be represented by the list M=[m1 ,...,mH], where ml, for l=1,...,H, denotes the number of codewords with length l and H is the length of the longest codeword. Fortunately, H is proved to be O(min{log(1/p1),n}, where p1 is the smallest symbol probability, given by w1 /Σi=1nwi. The E-LazyHuff algorithm uses a lazy approach to calculate optimal codes in O(nlog(n/H)) time, requiring only O(H) additional space. In addition, the input weights are not destroyed during the code calculation. We propose a new technique, which we call homogenization, that can be used to improve the time efficiency of algorithms for constructing optimal prefix codes. Next, we introduce the Best LazyHuff algorithm (B-LazyHuff) as an application of this technique. B-LazyHuff is an O(n)-time variation of the E-LazyHuff algorithm. It also requires O(H) additional space and does not destroy the input data
Keywords :
codes; computational complexity; redundancy; Best LazyHuff algorithm; E-LazyHuff algorithm; code calculation; homogenization; input weights; integer codeword lengths; longest codeword; minimum redundancy prefix code problem; optimal canonical code; optimal prefix codes; positive symbol weights; space-economical algorithm; symbol probability; time efficiency; Binary trees; Data compression; Reactive power;