DocumentCode
2549382
Title
A Quasi Word-Based Compression Method of English Text Using Byte-Oriented Coding Scheme
Author
Wei-Ling Chang ; Xiao-chun Yun ; Bin-Xing Fang ; Shu-peng Wang ; Shu-hao Li
Author_Institution
Res. Centre of Comput. Network & Inf. Security Technol., Harbin Inst. of Technol., Harbin
fYear
2008
fDate
20-22 July 2008
Firstpage
558
Lastpage
563
Abstract
In this paper we present a universal compression algorithm for English text, ERecode. The proposed scheme highlights the importance of pre-processing work for English text, and employs one or two bytes code values to recode the 511 most common used English words, sequences of symbols and ASCII codes based on their occurrence frequency. Acting as a pre-processing tool for English text by the popular compression utilities, ERecode can improve their compression ratio from 0.89% to 19.65%. The proposed method also is applicable to text files for other languages.
Keywords
data compression; natural language processing; ERecode; English text; byte-oriented coding scheme; quasi word-based compression; Compression algorithms; Computer network management; Data structures; Dictionaries; Entropy; Frequency; Huffman coding; Information management; Information security; Probability; byte-oriented; coding; compression; word-based;
fLanguage
English
Publisher
ieee
Conference_Titel
Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
Conference_Location
Zhangjiajie Hunan
Print_ISBN
978-0-7695-3185-4
Electronic_ISBN
978-0-7695-3185-4
Type
conf
DOI
10.1109/WAIM.2008.89
Filename
4597066
Link To Document