Title :
Simple Linear-Time Off-Line Text Compression by Longest-First Substitution
Author :
Nakamura, Ryosuke ; Bannai, Hideo ; Inenaga, Shunsuke ; Takeda, Masayuki
Author_Institution :
Dept. of Inf., Kyushu Univ., Fukuoka
Abstract :
We consider grammar based text compression with longest-first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in S. Inenaga et al., (2003). We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest-first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest-first strategy and the most frequent strategy
Keywords :
data compression; grammars; tree data structures; correctness proof; grammar based text compression; linear-time offline text compression; longest repeating substring; longest-first substitution; nonoverlapping occurrences; nonterminal symbol; sparse lazy suffix tree data structure; Compression algorithms; Computer Society; Computer science; Data compression; Informatics; Production; Shape; Tree data structures;
Conference_Titel :
Data Compression Conference, 2007. DCC '07
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-2791-4
DOI :
10.1109/DCC.2007.70