Practical constructions of L-restricted alphabetic prefix codes

Author

Laber, Eduardo Sany ; Milidiú, Ruy Luiz ; Pessoa, Artur Alves

Author_Institution

Dept. de Inf., PUC, Rio de Janeiro, Brazil

fYear

1999

fDate

1999

Firstpage

115

Lastpage

119

Abstract

Information retrieval systems use various search techniques such as B-trees, inverted files and suffix arrays to provide quick response. Many of these techniques rely on string comparison operations. If a record field is coded using Huffman codes (D.A. Huffman, 1952) in order to save storage space, the field must be decoded before performing any comparison. On the other hand, if the field is alphabetically coded, then the comparison can be directly applied to the sequence of codewords, which is faster. This approach also saves storage space, in comparison with the case where no data compression is applied. Experiments with alphabetically coded texts indexed with suffix arrays were reported by E.S. Moura et al. (1997). We consider the construction of L-restricted ABPC (alphabetic binary prefix code) which satisfies l _i⩽L for i=1,...,n. Optimal L-restricted ABPC can be constructed in O(nLlogn) time, using O(nL) space (L.L Larmore and T.M. Przytycka, 1994). Nevertheless, due to its space requirements, this method turns out to be prohibitive for larger values of n. We suggest a simple approach to construct suboptimal L-restricted ABPC. Our approach is divided into three phases. In the first phase, we verify if an optimal ABPC is also an optimal L-restricted ABPC. In the second one, we obtain a L-restricted prefix code (not necessarily alphabetical) and in the third phase we turn this code into an alphabetical one. We denote this approach by 3-phase algorithm . The codes generated through this algorithm are called 3-phase codes. We analyze the time and space complexities and compare the average length of the 3-phase code against the Shannon Entropy. We also compare the average length of the Huffman code against the average length of an optimal L-restricted ABPC

Keywords

Huffman codes; binary codes; computational complexity; entropy; string matching; 3-phase code; 3-phase codes; B-trees; Huffman codes; L-restricted ABPC; L-restricted alphabetic prefix codes; L-restricted prefix code; Shannon Entropy; alphabetic binary prefix code; alphabetically coded texts; average length; data compression; information retrieval systems; inverted files; optimal L-restricted ABPC; record field; search techniques; space complexities; space requirements; storage space; string comparison operations; suboptimal L-restricted ABPC; suffix arrays; Binary codes; Data compression; Decoding; Entropy; Information retrieval;

fLanguage

English

Publisher

ieee

Conference_Titel

String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware

Conference_Location

Cancun

Print_ISBN

0-7695-0268-7

Type

conf

DOI

10.1109/SPIRE.1999.796585

Filename

796585