Title :
Parsing strategies for BWT compression
Author :
Isal, R. Yugo Kartono ; Moffat, Alistair
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Melbourne Univ., Vic., Australia
Abstract :
Block-sorting is an innovative compression mechanism introduced by Burrows and Wheeler (1994), and has been the subject of considerable scrutiny in the years since it first became public. Block-sorting compression is usually described as involving three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. In this paper we prepend a fourth transformation to this sequence: parsing. In the BWT implementations that have been considered to date the unit of transmission has been taken to be the ASCII character. But there is no particular reason why this should be so, and a range of other strategies can be used to construct the sequence of symbols that is fed into the BWT process. We consider some of the issues associated with making this change, and show that in some situations the introduction of a simple parsing stage allows improved compression to be obtained compared to an otherwise equivalent character-based BWT implementation. We also describe an MTF-like ranking transformation that caters better to large-alphabet situations than does the strict MTF rule used in conventional BWT implementations
Keywords :
data compression; entropy codes; grammars; sorting; transforms; BWT compression; Burrows-Wheeler transform; MTF transform; MTF-like ranking transformation; compression mechanism; large-alphabet situations; move-to-front transform; parsing; parsing strategies; Arithmetic; Australia Council; Computer science; Data compression; Decoding; Dictionaries; Entropy coding; Software engineering; Sorting; World Wide Web;
Conference_Titel :
Data Compression Conference, 2001. Proceedings. DCC 2001.
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-1031-0
DOI :
10.1109/DCC.2001.917174