DocumentCode
2548300
Title
Bit-parallel approach to approximate string matching in compressed texts
Author
Matsumoto, Tetsuya ; Kida, Takuya ; Takeda, Masayuki ; Shinohara, Ayumi ; Arikawa, Setsuo
Author_Institution
Dept. of Inf., Kyushu Univ., Fukuoka, Japan
fYear
2000
fDate
2000
Firstpage
221
Lastpage
228
Abstract
Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)⩽L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k2(||𝒟||+|𝒮|)+km) time using O(k2||𝒟||) space, where ||𝒟|| is the size of dictionary 𝒟 and |𝒮| is the number of tokens in the sequence 𝒮. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=||𝒟||+|𝒮| as the compressed length, the time and space complexities are O(k2n+km) and O(k 2n), respectively. For general k and m, they become O(k3 mn/L+km) and O(k3mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Kärkkäinen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(√L)
Keywords
computational complexity; data compression; string matching; Ziv-Lempel compression methods; allowed errors; approximate string matching; bit-parallelism; collage system; compressed length; compressed texts; dictionary size; dictionary-based compression methods; machine word; pattern length; space complexity; time complexity; token sequence; Automata; Dictionaries; Informatics; Pattern matching;
fLanguage
English
Publisher
ieee
Conference_Titel
String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
Conference_Location
A Curuna
Print_ISBN
0-7695-0746-8
Type
conf
DOI
10.1109/SPIRE.2000.878198
Filename
878198
Link To Document