DocumentCode
3384816
Title
Approximate pattern matching using the Burrows-Wheeler transform
Author
Zhang, Nan ; Mukherjee, Amar ; Adjeroh, Don ; Bell, Tim
Author_Institution
Dept. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
fYear
2003
fDate
25-27 March 2003
Firstpage
458
Abstract
Summary form only given. The approximate pattern matching on the text transformed by the Burrows-Wheeler transform (BWT) was considered. This is an important first step towards developing a compressed pattern matching algorithm for the BWT based compression system. Algorithms are proposed to solve the K-mismatch problem. Tests were performed on different pattern lengths using 133 selected files from the Canterbury, Calgary, and TREC corpus. The results on the K-mismatch pattern matching show that the running time and storage are superior to the fast suffix tree approach. Thus, once the index arrays are created, for repeated pattern search operations and for long patterns, the proposed algorithms perform significantly better than the agrep and ngrep. Using DFA verification, the search time is almost constant. The amortized cost is lower for multiple patterns search operations.
Keywords
data compression; deterministic automata; finite automata; pattern matching; search problems; text analysis; transform coding; BWT; Burrows-Wheeler transform; DFA; compressed pattern matching; compression system; deterministic finite automata; index arrays; k-mismatch problem; multiple pattern search operations; pattern search; running time; search time; storage; suffix tree approach; Arithmetic; Automata; Computer science; Decoding; Doped fiber amplifiers; Filtering algorithms; Matched filters; Pattern matching; Testing; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2003. Proceedings. DCC 2003
ISSN
1068-0314
Print_ISBN
0-7695-1896-6
Type
conf
DOI
10.1109/DCC.2003.1194077
Filename
1194077
Link To Document