• DocumentCode
    3384816
  • Title

    Approximate pattern matching using the Burrows-Wheeler transform

  • Author

    Zhang, Nan ; Mukherjee, Amar ; Adjeroh, Don ; Bell, Tim

  • Author_Institution
    Dept. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
  • fYear
    2003
  • fDate
    25-27 March 2003
  • Firstpage
    458
  • Abstract
    Summary form only given. The approximate pattern matching on the text transformed by the Burrows-Wheeler transform (BWT) was considered. This is an important first step towards developing a compressed pattern matching algorithm for the BWT based compression system. Algorithms are proposed to solve the K-mismatch problem. Tests were performed on different pattern lengths using 133 selected files from the Canterbury, Calgary, and TREC corpus. The results on the K-mismatch pattern matching show that the running time and storage are superior to the fast suffix tree approach. Thus, once the index arrays are created, for repeated pattern search operations and for long patterns, the proposed algorithms perform significantly better than the agrep and ngrep. Using DFA verification, the search time is almost constant. The amortized cost is lower for multiple patterns search operations.
  • Keywords
    data compression; deterministic automata; finite automata; pattern matching; search problems; text analysis; transform coding; BWT; Burrows-Wheeler transform; DFA; compressed pattern matching; compression system; deterministic finite automata; index arrays; k-mismatch problem; multiple pattern search operations; pattern search; running time; search time; storage; suffix tree approach; Arithmetic; Automata; Computer science; Decoding; Doped fiber amplifiers; Filtering algorithms; Matched filters; Pattern matching; Testing; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference, 2003. Proceedings. DCC 2003
  • ISSN
    1068-0314
  • Print_ISBN
    0-7695-1896-6
  • Type

    conf

  • DOI
    10.1109/DCC.2003.1194077
  • Filename
    1194077