• DocumentCode
    2708830
  • Title

    A new approach of DCA by using BWT

  • Author

    Zhao, Bo ; Iwata, Ken-Ichi ; Itoh, Shuichi ; Kato, Toshihiko

  • Author_Institution
    Univ. of Electro-Commun., Chofu, Japan
  • fYear
    2005
  • fDate
    29-31 March 2005
  • Firstpage
    495
  • Abstract
    Summary form only given. M. Crochemore et al. introduced a two-pass lossless data compression scheme called data compression using antidictionaries (DCA) (see Proc. IEEE, vol.88, no.11, p.1756-68, 2000). DCA finds some words (minimal forbidden words) that never appear in the text to be compressed, and chooses a subset of minimal forbidden words as an antidictionary (AD). We develop improved DCAs in two ways. (1) We propose an improved DCA using, instead of the AD, predictable dictionaries made up of the words whose last symbols are uniquely determined by their prefixes. The good properties of the original DCA hold in this extension. (2) We present an algorithm to construct the AD (or predictable dictionary) by using a suffix array of the text. We can reduce the amount of memory necessary to construct an AD by using the suffix array. Moreover, by applying a suffix array instead of a suffix tree, we can exploit the relationship between the suffix array and the BWT (Burrows-Wheeler transform), giving us the chance of choosing between DCA and the Burrows-Wheeler compression algorithm (BWCA).
  • Keywords
    data compression; dictionaries; text analysis; transform coding; transforms; Burrows-Wheeler compression algorithm; Burrows-Wheeler transform; data compression using antidictionaries; minimal forbidden words; predictable dictionaries; suffix array; suffix tree; Compression algorithms; Data compression; Decoding; Dictionaries; Encoding; Parallel processing; Search engines;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2005. Proceedings. DCC 2005
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-2309-9
  • Type

    conf

  • DOI
    10.1109/DCC.2005.8
  • Filename
    1402252