• DocumentCode
    3204945
  • Title

    Practical constructions of L-restricted alphabetic prefix codes

  • Author

    Laber, Eduardo Sany ; Milidiú, Ruy Luiz ; Pessoa, Artur Alves

  • Author_Institution
    Dept. de Inf., PUC, Rio de Janeiro, Brazil
  • fYear
    1999
  • fDate
    1999
  • Firstpage
    115
  • Lastpage
    119
  • Abstract
    Information retrieval systems use various search techniques such as B-trees, inverted files and suffix arrays to provide quick response. Many of these techniques rely on string comparison operations. If a record field is coded using Huffman codes (D.A. Huffman, 1952) in order to save storage space, the field must be decoded before performing any comparison. On the other hand, if the field is alphabetically coded, then the comparison can be directly applied to the sequence of codewords, which is faster. This approach also saves storage space, in comparison with the case where no data compression is applied. Experiments with alphabetically coded texts indexed with suffix arrays were reported by E.S. Moura et al. (1997). We consider the construction of L-restricted ABPC (alphabetic binary prefix code) which satisfies l i⩽L for i=1,...,n. Optimal L-restricted ABPC can be constructed in O(nLlogn) time, using O(nL) space (L.L Larmore and T.M. Przytycka, 1994). Nevertheless, due to its space requirements, this method turns out to be prohibitive for larger values of n. We suggest a simple approach to construct suboptimal L-restricted ABPC. Our approach is divided into three phases. In the first phase, we verify if an optimal ABPC is also an optimal L-restricted ABPC. In the second one, we obtain a L-restricted prefix code (not necessarily alphabetical) and in the third phase we turn this code into an alphabetical one. We denote this approach by 3-phase algorithm . The codes generated through this algorithm are called 3-phase codes. We analyze the time and space complexities and compare the average length of the 3-phase code against the Shannon Entropy. We also compare the average length of the Huffman code against the average length of an optimal L-restricted ABPC
  • Keywords
    Huffman codes; binary codes; computational complexity; entropy; string matching; 3-phase code; 3-phase codes; B-trees; Huffman codes; L-restricted ABPC; L-restricted alphabetic prefix codes; L-restricted prefix code; Shannon Entropy; alphabetic binary prefix code; alphabetically coded texts; average length; data compression; information retrieval systems; inverted files; optimal L-restricted ABPC; record field; search techniques; space complexities; space requirements; storage space; string comparison operations; suboptimal L-restricted ABPC; suffix arrays; Binary codes; Data compression; Decoding; Entropy; Information retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware
  • Conference_Location
    Cancun
  • Print_ISBN
    0-7695-0268-7
  • Type

    conf

  • DOI
    10.1109/SPIRE.1999.796585
  • Filename
    796585