DocumentCode :
2399099
Title :
Extending Huffman coding for multilingual text compression
Author :
Chi, Chi-Hung ; Kan, Chi-Kwun ; Cheng, Kwok-Shing ; Wong, Ling
Author_Institution :
Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin, Hong Kong
fYear :
1995
fDate :
28-30 Mar 1995
Firstpage :
437
Abstract :
Summary form only given. We propose two new algorithms that are based on the 16-bit or 32-bit sampling character set and on the unique features of languages with a large number of distinct characters to improve the data compression ratios for multilingual text documents. We choose Chinese language using 16 bit character sampling as the representative language in our study. The first approach, called the static Chinese Huffman coding, introduces the concept of a single Chinese character in the Huffman tree. Experimental results showed that the improvement in compression ratio obtained. The second approach, called the dictionary-based Chinese Huffman coding, includes the concept of Chinese words in the Huffman coding
Keywords :
Huffman codes; data compression; encoding; image sampling; word processing; 16 bit; 32 bit; Chinese language; Huffman tree; algorithms; data compression ratios; dictionary-based Chinese Huffman coding; experimental results; multilingual text compression; multilingual text documents; sampling character set; single Chinese character; static Chinese Huffman coding; Compression algorithms; Computer science; Data compression; Dictionaries; Huffman coding; Natural languages; Sampling methods;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-8186-7012-6
Type :
conf
DOI :
10.1109/DCC.1995.515547
Filename :
515547
Link To Document :
بازگشت