Title :
An efficient data compression scheme based on semi-adaptive Huffman coding for moderately large Chinese text files
Author :
Ong, Ghim Hwee ; Huang, Shell Ying
Author_Institution :
Dept. of Inf. Syst. & Comput. Sci., Nat. Univ. of Singapore, Singapore
Abstract :
This paper presents a data compression scheme for Chinese text files. Due to the skewness of the distribution of Chinese ideograms, the Huffman coding method is adopted. By storing the Huffman tree in the coding table and representing the Huffman tree using the Zaks sequence, the algorithm produces significant improvement on the compression results. The proposed method is evaluated by comparing its performance with three well-known compression algorithms and an algorithm specially designed to compress the coding table. This algorithm should also be applicable to other ideogram-based or oriental language texts. Also, it has the potential to reduce the dictionary size in a bigram or trigram-based semi-adaptive compression scheme for English texts
Keywords :
Huffman codes; adaptive codes; data compression; Chinese ideograms; Chinese text files; Huffman tree; Zaks sequence; binary tree coding; data compression scheme; ideogram-based texts; oriental language texts; semi-adaptive Huffman coding; Algorithm design and analysis; Compression algorithms; Computer science; Data compression; Dictionaries; Encoding; Frequency; Huffman coding; Information systems; Natural languages;
Conference_Titel :
Networks, 1995. Theme: Electrotechnology 2000: Communications and Networks. [in conjunction with the] International Conference on Information Engineering., Proceedings of IEEE Singapore International
Print_ISBN :
0-7803-2579-6
DOI :
10.1109/SICON.1995.526073