Author_Institution :
Sch. of Comput. Sci., Central Florida Univ., Orlando, FL, USA
Abstract :
Summary form only given. A novel multiple-pattern matching algorithm for LZW compressed files using the Aho-Corasick algorithm (Aho, A.N. and Corasick, M.J., 1975) is reported. Since the LZW trie can be reconstructed without explicit decompression (Amir, A. et al., J. System Sciences, vol.52, p.299-307, 1996), a state transition table is then built from the LZW trie and the AC automaton and is used in our algorithm for fast multiple-pattern searching. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total size of all patterns, t is the size of the LZW trie and r is the number of occurrences of the patterns. Extensive experiments have been conducted to test the performance of the algorithm and to compare it with other well-known compressed pattern matching algorithms, particularly the similar algorithm developed by M. Kida et al. (see J. Discrete Algorithm, vol.1, no.1, 2000). The results show that the proposed algorithm is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied in an archival search, if the archives are compressed with a common LZW trie.