DocumentCode :
2549382
Title :
A Quasi Word-Based Compression Method of English Text Using Byte-Oriented Coding Scheme
Author :
Wei-Ling Chang ; Xiao-chun Yun ; Bin-Xing Fang ; Shu-peng Wang ; Shu-hao Li
Author_Institution :
Res. Centre of Comput. Network & Inf. Security Technol., Harbin Inst. of Technol., Harbin
fYear :
2008
fDate :
20-22 July 2008
Firstpage :
558
Lastpage :
563
Abstract :
In this paper we present a universal compression algorithm for English text, ERecode. The proposed scheme highlights the importance of pre-processing work for English text, and employs one or two bytes code values to recode the 511 most common used English words, sequences of symbols and ASCII codes based on their occurrence frequency. Acting as a pre-processing tool for English text by the popular compression utilities, ERecode can improve their compression ratio from 0.89% to 19.65%. The proposed method also is applicable to text files for other languages.
Keywords :
data compression; natural language processing; ERecode; English text; byte-oriented coding scheme; quasi word-based compression; Compression algorithms; Computer network management; Data structures; Dictionaries; Entropy; Frequency; Huffman coding; Information management; Information security; Probability; byte-oriented; coding; compression; word-based;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location :
Zhangjiajie Hunan
Print_ISBN :
978-0-7695-3185-4
Electronic_ISBN :
978-0-7695-3185-4
Type :
conf
DOI :
10.1109/WAIM.2008.89
Filename :
4597066
Link To Document :
بازگشت