Title :
Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries
Author :
Sekine, Keisuke ; Sasakawa, H. ; Yoshida, Sigeru ; Kida, T.
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
Abstract :
The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely high compression ratio. However, Re-Pair is an offline and very space consuming algorithm. Thus, to apply it to a very large text, we need to divide the text into smaller blocks. Consequently, if we share a part of the dictionary among all blocks, we expect that the compression speed and ratio of the algorithm will improve. In this paper, we implemented our method with exploiting variable-to-fixed-length codes, and empirically show how the compression speed and ratio of the method vary by adjusting three parameters: block size, dictionary size, and size of shared dictionary. Finally, we discuss the tendencies of compression speed and ratio with respect to the three parameters.
Keywords :
data compression; grammars; variable length codes; block size; compression ratio; compression speed; dictionary size; grammar-based compression method; large-texts; re-pair algorithm; shared dictionaries; shared dictionary size; variable-to-fixed-length encoding; Data compression; Dictionaries; Educational institutions; Encoding; Grammar; Information science; Workstations; Re-Pair algorithm; VF code; grammar-based compression;
Conference_Titel :
Data Compression Conference (DCC), 2013
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-4673-6037-1
DOI :
10.1109/DCC.2013.97