Title :
Text compression using recency rank with context and relation to context sorting, block sorting and PPM*
Author :
Sadakane, Kunihiko
Author_Institution :
Dept. of Inf. Sci., Tokyo Univ., Japan
Abstract :
A block sorting compression scheme was developed and its relation to a statistical scheme was studied, but a theoretical analysis of its performance has not been studied fully. Context sorting is a compression scheme based on context similarity and it is regarded as an on-line version of block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performance is not better. We propose a compression scheme using recency rank code with context (RRC), which is based on context similarity. The proposed method encodes characters to recency ranks according to their contexts. It can be implemented using suffix tree and the recency rank code is realized by move-to-front transformation of edges in the suffix tree. It is faster than context sorting and is also asymptotically optimal. The performance is improved by changing models according to the length of the context and by combining some characters into a code. However, it is still inferior to block sorting in both performance and speed. We investigate the reason for the bad performance and we also prove the asymptotical optimality of a variation of block sorting and derive the relation among the RRC, context sorting, block sorting and PPM* clear
Keywords :
data compression; encoding; trees (mathematics); word processing; PPM*; asymptotical optimality; block sorting; compression speed; context similarity; context sorting; move-to-front edge transformation; performance; prediction by partial matching; recency rank code; statistical scheme; suffix tree; text compression; Binary trees; Context modeling; Data compression; Dictionaries; Information analysis; Information science; Performance analysis; Sorting; Statistical analysis;
Conference_Titel :
Compression and Complexity of Sequences 1997. Proceedings
Conference_Location :
Salerno
Print_ISBN :
0-8186-8132-2
DOI :
10.1109/SEQUEN.1997.666925