• DocumentCode
    2549382
  • Title

    A Quasi Word-Based Compression Method of English Text Using Byte-Oriented Coding Scheme

  • Author

    Wei-Ling Chang ; Xiao-chun Yun ; Bin-Xing Fang ; Shu-peng Wang ; Shu-hao Li

  • Author_Institution
    Res. Centre of Comput. Network & Inf. Security Technol., Harbin Inst. of Technol., Harbin
  • fYear
    2008
  • fDate
    20-22 July 2008
  • Firstpage
    558
  • Lastpage
    563
  • Abstract
    In this paper we present a universal compression algorithm for English text, ERecode. The proposed scheme highlights the importance of pre-processing work for English text, and employs one or two bytes code values to recode the 511 most common used English words, sequences of symbols and ASCII codes based on their occurrence frequency. Acting as a pre-processing tool for English text by the popular compression utilities, ERecode can improve their compression ratio from 0.89% to 19.65%. The proposed method also is applicable to text files for other languages.
  • Keywords
    data compression; natural language processing; ERecode; English text; byte-oriented coding scheme; quasi word-based compression; Compression algorithms; Computer network management; Data structures; Dictionaries; Entropy; Frequency; Huffman coding; Information management; Information security; Probability; byte-oriented; coding; compression; word-based;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web-Age Information Management, 2008. WAIM '08. The Ninth International Conference on
  • Conference_Location
    Zhangjiajie Hunan
  • Print_ISBN
    978-0-7695-3185-4
  • Electronic_ISBN
    978-0-7695-3185-4
  • Type

    conf

  • DOI
    10.1109/WAIM.2008.89
  • Filename
    4597066