• DocumentCode
    394074
  • Title

    Dictionary-based fast transform for text compression

  • Author

    Sun, Weifeng ; Zhang, Nan ; Mukherjee, Amar

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Univ. of Central Florida, Orlando, FL, USA
  • fYear
    2003
  • fDate
    28-30 April 2003
  • Firstpage
    176
  • Lastpage
    182
  • Abstract
    In this paper we present StarNT, a dictionary-based fast lossless text transform algorithm. With a static generic dictionary, StarNT achieves a superior compression ratio than almost all the other recent efforts based on BWT and PPM. This algorithm utilizes ternary search tree to expedite transform encoding. Experimental results show that the average compression time has improved by orders of magnitude compared with our previous algorithm LIPT and the additional time overhead it introduced to the backend compressor is unnoticeable. Based on StarNT, we propose StarZip, a domain-specific lossless text compression utility. Using domain-specific static dictionaries embedded in the system, StarZip achieves an average improvement in compression performance (in terms of BPC) of 13% over bzip2-9, 19% over gzip-9, and 10% over PPMD.
  • Keywords
    data compression; dictionaries; text analysis; tree searching; LIPT; StarNT; backend compressor; compression performance; dictionary-based fast lossless text transform algorithm; dictionary-based fast transform; domain-specific lossless text compression utility; domain-specific static dictionaries; static generic dictionary; superior compression ratio; ternary search tree; text compression; transform encoding; Compression algorithms; Computer science; Data compression; Dictionaries; Electronic mail; Encoding; Explosions; Internet; Sun; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: Coding and Computing [Computers and Communications], 2003. Proceedings. ITCC 2003. International Conference on
  • Print_ISBN
    0-7695-1916-4
  • Type

    conf

  • DOI
    10.1109/ITCC.2003.1197522
  • Filename
    1197522