DocumentCode
2399099
Title
Extending Huffman coding for multilingual text compression
Author
Chi, Chi-Hung ; Kan, Chi-Kwun ; Cheng, Kwok-Shing ; Wong, Ling
Author_Institution
Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin, Hong Kong
fYear
1995
fDate
28-30 Mar 1995
Firstpage
437
Abstract
Summary form only given. We propose two new algorithms that are based on the 16-bit or 32-bit sampling character set and on the unique features of languages with a large number of distinct characters to improve the data compression ratios for multilingual text documents. We choose Chinese language using 16 bit character sampling as the representative language in our study. The first approach, called the static Chinese Huffman coding, introduces the concept of a single Chinese character in the Huffman tree. Experimental results showed that the improvement in compression ratio obtained. The second approach, called the dictionary-based Chinese Huffman coding, includes the concept of Chinese words in the Huffman coding
Keywords
Huffman codes; data compression; encoding; image sampling; word processing; 16 bit; 32 bit; Chinese language; Huffman tree; algorithms; data compression ratios; dictionary-based Chinese Huffman coding; experimental results; multilingual text compression; multilingual text documents; sampling character set; single Chinese character; static Chinese Huffman coding; Compression algorithms; Computer science; Data compression; Dictionaries; Huffman coding; Natural languages; Sampling methods;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 1995. DCC '95. Proceedings
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
0-8186-7012-6
Type
conf
DOI
10.1109/DCC.1995.515547
Filename
515547
Link To Document