DocumentCode :
3049507
Title :
Compressed pattern matching for SEQUITUR
Author :
Mitarai, Shuichi ; Hirao, Masahiro ; Matsumoto, Tetsuya ; Shinohara, Ayurni ; Takeda, Masayuki ; Arikawa, Setsuo
Author_Institution :
Dept. of Inf., Kyushu Univ., Fukuoka, Japan
fYear :
2001
fDate :
2001
Firstpage :
469
Lastpage :
478
Abstract :
SEQUITUR due to Nevill-Manning and Witten (see Journal of Artificial Intelligence Research, vol.7, p.67-82, 1997) is a powerful program to infer a phrase hierarchy from the input text, that also provides extremely effective compression of large quantities of semi-structured text. In this paper, we address the problem of searching in SEQUITUR compressed text directly. We show a compressed pattern matching algorithm that finds a pattern in compressed text without explicit decompression. We show that our algorithm is approximately 1.27 times faster than a decompression followed by an ordinal search
Keywords :
data compression; pattern matching; text analysis; SEQUITUR; compressed pattern matching; compressed pattern matching algorithm; decompression; input text; ordinal search; phrase hierarchy; program; semi-structured text compression; Data compression; Informatics; Pattern matching; Size measurement; Software libraries;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2001. Proceedings. DCC 2001.
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-1031-0
Type :
conf
DOI :
10.1109/DCC.2001.917178
Filename :
917178
Link To Document :
بازگشت