Title :
Approximate pattern matching using the Burrows-Wheeler transform
Author :
Zhang, Nan ; Mukherjee, Amar ; Adjeroh, Don ; Bell, Tim
Author_Institution :
Dept. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
Abstract :
Summary form only given. The approximate pattern matching on the text transformed by the Burrows-Wheeler transform (BWT) was considered. This is an important first step towards developing a compressed pattern matching algorithm for the BWT based compression system. Algorithms are proposed to solve the K-mismatch problem. Tests were performed on different pattern lengths using 133 selected files from the Canterbury, Calgary, and TREC corpus. The results on the K-mismatch pattern matching show that the running time and storage are superior to the fast suffix tree approach. Thus, once the index arrays are created, for repeated pattern search operations and for long patterns, the proposed algorithms perform significantly better than the agrep and ngrep. Using DFA verification, the search time is almost constant. The amortized cost is lower for multiple patterns search operations.
Keywords :
data compression; deterministic automata; finite automata; pattern matching; search problems; text analysis; transform coding; BWT; Burrows-Wheeler transform; DFA; compressed pattern matching; compression system; deterministic finite automata; index arrays; k-mismatch problem; multiple pattern search operations; pattern search; running time; search time; storage; suffix tree approach; Arithmetic; Automata; Computer science; Decoding; Doped fiber amplifiers; Filtering algorithms; Matched filters; Pattern matching; Testing; USA Councils;
Conference_Titel :
Data Compression Conference, 2003. Proceedings. DCC 2003
Print_ISBN :
0-7695-1896-6
DOI :
10.1109/DCC.2003.1194077