Title :
Performance and architectural issues for string matching
Author :
Isenman, Mernll E. ; Shasha, Dennis E.
Author_Institution :
AT&T Bell Lab., Holmdel, NJ, USA
fDate :
2/1/1990 12:00:00 AM
Abstract :
The authors introduce special heuristics to the Knuth-Morris-Pratt algorithm to reduce the time and space required to perform the string matching. They compare their hardware-based approach to the software approaches embodied in the Unix system grep and fgrep commands. Simulation results show that the hardware approach can provide a 25-500-fold performance improvement, depending on the complexity of the query, and that it is fast enough, even in the presence of variable-length `don´t cares´ to keep up with a 20-million character/second disk. The approach compares favorably to other hardware designs in speed and space. The proposed hardware implementation requires 10 kB of one cycle static memory, 28 single-character comparators, four 16-b adders, and control logic for four finite-state machines with a term-matcher controller. After that, additional hardware produces negligible performance improvements for queries with up to 80 terms, about half of which have variable-length `don´t cares´
Keywords :
database management systems; finite automata; information retrieval; Knuth-Morris-Pratt algorithm; Unix system; architectural issues; control logic; finite-state machines; hardware implementation; hardware-based approach; performance issues; software approaches; string matching; term-matcher controller; Automata; Hardware; Information retrieval; Law; Legal factors; Logic; Medical simulation; Pattern matching; Software libraries; Testing;
Journal_Title :
Computers, IEEE Transactions on