• DocumentCode
    2399099
  • Title

    Extending Huffman coding for multilingual text compression

  • Author

    Chi, Chi-Hung ; Kan, Chi-Kwun ; Cheng, Kwok-Shing ; Wong, Ling

  • Author_Institution
    Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin, Hong Kong
  • fYear
    1995
  • fDate
    28-30 Mar 1995
  • Firstpage
    437
  • Abstract
    Summary form only given. We propose two new algorithms that are based on the 16-bit or 32-bit sampling character set and on the unique features of languages with a large number of distinct characters to improve the data compression ratios for multilingual text documents. We choose Chinese language using 16 bit character sampling as the representative language in our study. The first approach, called the static Chinese Huffman coding, introduces the concept of a single Chinese character in the Huffman tree. Experimental results showed that the improvement in compression ratio obtained. The second approach, called the dictionary-based Chinese Huffman coding, includes the concept of Chinese words in the Huffman coding
  • Keywords
    Huffman codes; data compression; encoding; image sampling; word processing; 16 bit; 32 bit; Chinese language; Huffman tree; algorithms; data compression ratios; dictionary-based Chinese Huffman coding; experimental results; multilingual text compression; multilingual text documents; sampling character set; single Chinese character; static Chinese Huffman coding; Compression algorithms; Computer science; Data compression; Dictionaries; Huffman coding; Natural languages; Sampling methods;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 1995. DCC '95. Proceedings
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    0-8186-7012-6
  • Type

    conf

  • DOI
    10.1109/DCC.1995.515547
  • Filename
    515547