Author/Authors :
Bruce W. Watson، نويسنده , , Richard E. Watson، نويسنده ,
Abstract :
This paper presents a Boyer–Moore-type algorithm for regular expression pattern matching, answering an open problem posed by Aho in 1980 (Pattern Matching in Strings, Academic Press, New York, 1980, p. 342). The new algorithm handles patterns specified by regular expressions—a generalization of the Boyer–Moore and Commentz-Walter algorithms.
Like the Boyer–Moore and Commentz-Walter algorithms, the new algorithm makes use of shift functions which can be precomputed and tabulated. The precomputation algorithms are derived, and it is shown that the required shift functions can be precomputed from Commentz-Walterʹs d1 and d2 shift functions.
In certain cases, the Boyer–Moore (respectively Commentz-Walter) algorithm has greatly outperformed the Knuth–Morris–Pratt (respectively Aho–Corasick) algorithm (as discussed by Watson in his Ph.D. Thesis, Eindhoven University of Technology, September 1995, and in: N. Ziviani, R. Baeza-Yates, K. Guimaraes (Eds.), Proc. Third South American Workshop on String Processing, International Informatics Series, vol. 4, Carleton University Press, Recife, Brazil, 1996, pp. 280–294). In testing, the algorithm presented in this paper also frequently outperforms the regular expression generalization of the Aho–Corasick algorithm.
Keywords :
String pattern matching , Regular expressions , Boyer–Moore algorithms , Commentz-Walter algorithms , Algorithm generalizations