• DocumentCode
    2548300
  • Title

    Bit-parallel approach to approximate string matching in compressed texts

  • Author

    Matsumoto, Tetsuya ; Kida, Takuya ; Takeda, Masayuki ; Shinohara, Ayumi ; Arikawa, Setsuo

  • Author_Institution
    Dept. of Inf., Kyushu Univ., Fukuoka, Japan
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    221
  • Lastpage
    228
  • Abstract
    Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)⩽L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k2(||𝒟||+|𝒮|)+km) time using O(k2||𝒟||) space, where ||𝒟|| is the size of dictionary 𝒟 and |𝒮| is the number of tokens in the sequence 𝒮. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=||𝒟||+|𝒮| as the compressed length, the time and space complexities are O(k2n+km) and O(k 2n), respectively. For general k and m, they become O(k3 mn/L+km) and O(k3mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Kärkkäinen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(√L)
  • Keywords
    computational complexity; data compression; string matching; Ziv-Lempel compression methods; allowed errors; approximate string matching; bit-parallelism; collage system; compressed length; compressed texts; dictionary size; dictionary-based compression methods; machine word; pattern length; space complexity; time complexity; token sequence; Automata; Dictionaries; Informatics; Pattern matching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
  • Conference_Location
    A Curuna
  • Print_ISBN
    0-7695-0746-8
  • Type

    conf

  • DOI
    10.1109/SPIRE.2000.878198
  • Filename
    878198