DocumentCode :
3061580
Title :
Design consideration for multi-lingual cascading text compressors
Author :
Chi, Chi-Hung ; Zhang, Yan
Author_Institution :
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fYear :
1999
fDate :
29-31 Mar 1999
Firstpage :
520
Abstract :
Summary form only given. We study the cascading of LZ variants to Huffman coding for multilingual documents. Two models are proposed: the static model and the adaptive (dynamic) model. The static model makes use of the dictionary generated by the LZW algorithm in Chinese dictionary-based Huffman compression to achieve better performance. The dynamic model is an extension of the static cascading model. During the insertion of phrases into the dictionary the frequency count of the phrases is updated so that a dynamic Huffman tree with variable length output tokens is obtained. We propose a new method to capture the “LZW dictionary” “by picking up the dictionary entries during decompression. The general idea is the adding of delimiters during the decompression process so that the decompressed files are segmented into phrases that reflect how the LZW compressor makes use of its dictionary phrases to encode the source. The idea of the adaptive cascading model can be thought as an extension of the Chinese LZW compression. Since the size of the header is one important performance bottleneck in the static cascading model we propose the adaptive cascading model to address this issue. The LZW compressor is now outputting not a fixed length token, but a variable length Huffman code from the Huffman tree. It is expected that such a compressor can achieve very good compression performance. In our adaptive cascading model we choose LZW instead of LZSS because the LZW algorithm preserves more information than the LZSS algorithm does. This characteristic is found to be very useful in helping Chinese compressors to attain better performance
Keywords :
Huffman codes; adaptive codes; character sets; source coding; text analysis; tree data structures; variable length codes; Chinese dictionary; LZW dictionary; adaptive cascading model; cascading; decompression; dynamic Huffman tree; multilingual documents; multilingual text compressors; performance; source coding; static model; variable length Huffman code; Compressors; Dictionaries; Frequency; Huffman coding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-0096-X
Type :
conf
DOI :
10.1109/DCC.1999.785677
Filename :
785677
Link To Document :
بازگشت