Title :
Practical constructions of L-restricted alphabetic prefix codes
Author :
Laber, Eduardo Sany ; Milidiú, Ruy Luiz ; Pessoa, Artur Alves
Author_Institution :
Dept. de Inf., PUC, Rio de Janeiro, Brazil
Abstract :
Information retrieval systems use various search techniques such as B-trees, inverted files and suffix arrays to provide quick response. Many of these techniques rely on string comparison operations. If a record field is coded using Huffman codes (D.A. Huffman, 1952) in order to save storage space, the field must be decoded before performing any comparison. On the other hand, if the field is alphabetically coded, then the comparison can be directly applied to the sequence of codewords, which is faster. This approach also saves storage space, in comparison with the case where no data compression is applied. Experiments with alphabetically coded texts indexed with suffix arrays were reported by E.S. Moura et al. (1997). We consider the construction of L-restricted ABPC (alphabetic binary prefix code) which satisfies l i⩽L for i=1,...,n. Optimal L-restricted ABPC can be constructed in O(nLlogn) time, using O(nL) space (L.L Larmore and T.M. Przytycka, 1994). Nevertheless, due to its space requirements, this method turns out to be prohibitive for larger values of n. We suggest a simple approach to construct suboptimal L-restricted ABPC. Our approach is divided into three phases. In the first phase, we verify if an optimal ABPC is also an optimal L-restricted ABPC. In the second one, we obtain a L-restricted prefix code (not necessarily alphabetical) and in the third phase we turn this code into an alphabetical one. We denote this approach by 3-phase algorithm . The codes generated through this algorithm are called 3-phase codes. We analyze the time and space complexities and compare the average length of the 3-phase code against the Shannon Entropy. We also compare the average length of the Huffman code against the average length of an optimal L-restricted ABPC
Keywords :
Huffman codes; binary codes; computational complexity; entropy; string matching; 3-phase code; 3-phase codes; B-trees; Huffman codes; L-restricted ABPC; L-restricted alphabetic prefix codes; L-restricted prefix code; Shannon Entropy; alphabetic binary prefix code; alphabetically coded texts; average length; data compression; information retrieval systems; inverted files; optimal L-restricted ABPC; record field; search techniques; space complexities; space requirements; storage space; string comparison operations; suboptimal L-restricted ABPC; suffix arrays; Binary codes; Data compression; Decoding; Entropy; Information retrieval;
Conference_Titel :
String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware
Conference_Location :
Cancun
Print_ISBN :
0-7695-0268-7
DOI :
10.1109/SPIRE.1999.796585