Abstract :
Lossless text compression based on the Burrows-Wheeler transform (BWT) has become popular. Compression-time issues-MTF coding or the avoidance thereof (Wirth 2000), encoding the MTF values, sorting fast (Seward 2000)-have seen considerable investigation. Decompression-time issues remain underinvestigated. For some applications, decompression-time behaviour is a critical issue. For example, boot-time decompression of the Linux kernel requires decompression in limited memory. On-the-fly disk compressors employing the BWT demand fast decompression. This paper presents a number of ways to implement the inverse B-W transform, giving a useful range of speed-memory tradeoffs. We show that, at one extreme, it is possible to invert the BWT, slowly, using just one byte per block-byte. At the other extreme, we argue that the maximum speed of BWT decompression is unavoidably constrained by the rate at which cache misses are serviced
Keywords :
data compression; inverse problems; text analysis; transforms; Burrows-Wheeler transform; cache misses; decompression-time; inverse B-W transform; lossless text compression; space-time tradeoffs; speed-memory tradeoffs; Compressors; Concatenated codes; Disk recording; Encoding; Fasteners; Frequency synthesizers; Kernel; Linux; Sorting;