• DocumentCode
    2416489
  • Title

    Compressing Chinese text files using an adaptive Huffman coding scheme and a static dictionary of character pairs

  • Author

    Ong, Ghim Hwee ; Chong, Wing Teck

  • Author_Institution
    Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
  • Volume
    2
  • fYear
    1993
  • fDate
    6-11 Sep 1993
  • Firstpage
    808
  • Abstract
    The compression method for Chinese text files proposed in this paper is based on a single pass data compression technique, adaptive Huffman coding. All Chinese text files to be compressed are modeled to contain not only ASCII characters, Chinese ideographic characters and punctuation marks, but also commonly used Chinese character pairs. The approach of using a static dictionary is employed to maintain about 3000 most frequently occurring character pairs found in general Chinese texts. This is to define the extension to the standard source alphabet in ideogram-based adaptive Huffman coding. The performance in compression ratio and CPU execution time of the proposed method is evaluated against those of the adaptive byte-oriented Huffman coding scheme, the adaptive ideogram-based Huffman coding scheme, and the adaptive LZW method. The experimental results have shown that the proposed method based on adaptive Huffman coding with an extended source alphabet yields better compression on Chinese text files
  • Keywords
    Huffman codes; adaptive codes; character sets; computational complexity; data compression; word processing; CPU execution time; Chinese character pairs; Chinese text files; adaptive Huffman coding; compression ratio; extended source alphabet; single pass data compression; static dictionary; Arithmetic; Computer science; Context modeling; Data compression; Dictionaries; Encoding; Frequency; Huffman coding; Information systems; Natural languages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Networks, 1993. International Conference on Information Engineering '93. 'Communications and Networks for the Year 2000', Proceedings of IEEE Singapore International Conference on
  • Print_ISBN
    0-7803-1445-X
  • Type

    conf

  • DOI
    10.1109/SICON.1993.515699
  • Filename
    515699