Title :
Approximate Searching on Compressed Text
Author :
Pérez, Carlos Avendaño ; Uribe, Claudia Feregrino
Author_Institution :
INAOE, Puebla, Mexico
Abstract :
The approximate searching problem on compressed text tries to find all the matches of a pattern in a compressed text, without decompressing it and considering that the match of the pattern with the text can have a limited number of differences. This problem has diverse applications in information retrieval, computational biology and signal processing, among others. One of the best solutions to this problem is to execute a multipattern search of a set of pieces of the pattern, followed by a local decompression and a direct verification in the decompressed areas. In this work an improvement to this solution concerning verification is presented, where instead of executing a decompression process and searching for the pattern, bit-parallel automata are constructed that recognize the pattern. In this way, we perform the entire searching process without decompressing the text and obtain competitive times, compared to decompressing text and searching it with the best existing algorithms.
Keywords :
data compression; search problems; string matching; text analysis; approximate searching; bit-parallel automata; compressed text; computational biology; direct verification; information retrieval; local decompression; multipattern search; pattern matching; signal processing; Automata; Biomedical signal processing; Computational biology; Concurrent computing; Dynamic programming; Information retrieval; Pattern matching; Pattern recognition; Signal processing algorithms;
Conference_Titel :
Electronics, Communications and Computers, 2005. CONIELECOMP 2005. Proceedings. 15th International Conference on
Print_ISBN :
0-7695-2283-1
DOI :
10.1109/CONIEL.2005.23