DocumentCode :
610118
Title :
Variable-to-Fixed-Length Encoding for Large Texts Using Re-Pair Algorithm with Shared Dictionaries
Author :
Sekine, Keisuke ; Sasakawa, H. ; Yoshida, Sigeru ; Kida, T.
Author_Institution :
Grad. Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
fYear :
2013
fDate :
20-22 March 2013
Firstpage :
518
Lastpage :
518
Abstract :
The Re-Pair algorithm proposed by Larsson and Moffat in 1999 is a simple grammar-based compression method that achieves an extremely high compression ratio. However, Re-Pair is an offline and very space consuming algorithm. Thus, to apply it to a very large text, we need to divide the text into smaller blocks. Consequently, if we share a part of the dictionary among all blocks, we expect that the compression speed and ratio of the algorithm will improve. In this paper, we implemented our method with exploiting variable-to-fixed-length codes, and empirically show how the compression speed and ratio of the method vary by adjusting three parameters: block size, dictionary size, and size of shared dictionary. Finally, we discuss the tendencies of compression speed and ratio with respect to the three parameters.
Keywords :
data compression; grammars; variable length codes; block size; compression ratio; compression speed; dictionary size; grammar-based compression method; large-texts; re-pair algorithm; shared dictionaries; shared dictionary size; variable-to-fixed-length encoding; Data compression; Dictionaries; Educational institutions; Encoding; Grammar; Information science; Workstations; Re-Pair algorithm; VF code; grammar-based compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2013
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4673-6037-1
Type :
conf
DOI :
10.1109/DCC.2013.97
Filename :
6543128
Link To Document :
بازگشت