DocumentCode :
2298268
Title :
Simple Linear-Time Off-Line Text Compression by Longest-First Substitution
Author :
Nakamura, Ryosuke ; Bannai, Hideo ; Inenaga, Shunsuke ; Takeda, Masayuki
Author_Institution :
Dept. of Inf., Kyushu Univ., Fukuoka
fYear :
2007
fDate :
27-29 March 2007
Firstpage :
123
Lastpage :
132
Abstract :
We consider grammar based text compression with longest-first substitution, where non-overlapping occurrences of a longest repeating substring of the input text are replaced by a new non-terminal symbol. We present a new text compression algorithm by simplifying the algorithm presented in S. Inenaga et al., (2003). We give a new formulation of the correctness proof introducing the sparse lazy suffix tree data structure. We also present another type of longest-first substitution strategy that allows better compression. We show results of preliminary experiments comparing grammar sizes of the two versions of the longest-first strategy and the most frequent strategy
Keywords :
data compression; grammars; tree data structures; correctness proof; grammar based text compression; linear-time offline text compression; longest repeating substring; longest-first substitution; nonoverlapping occurrences; nonterminal symbol; sparse lazy suffix tree data structure; Compression algorithms; Computer Society; Computer science; Data compression; Informatics; Production; Shape; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2007. DCC '07
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-2791-4
Type :
conf
DOI :
10.1109/DCC.2007.70
Filename :
4148751
Link To Document :
بازگشت