DocumentCode :
843587
Title :
q-gram matching using tree models
Author :
Fogla, Prahlad ; Lee, Wenke
Author_Institution :
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
Volume :
18
Issue :
4
fYear :
2006
fDate :
4/1/2006 12:00:00 AM
Firstpage :
433
Lastpage :
447
Abstract :
q-gram matching is used for approximate substring matching problems in a wide range of application areas, including intrusion detection. In this paper, we present a tree-based model to perform fast linear time q-gram matching. All q-grams present in the text are stored in a tree structure similar to trie. We use a tree redundancy pruning algorithm to reduce the size of the tree without losing any information. We also use suffix links for fast q-gram search during query matching. We compare our work with the Rabin-Karp-based hash-table technique, commonly used for multiple q-gram search. We present results of experiments on system call sequence data used for intrusion detection.
Keywords :
computational complexity; data mining; query processing; security of data; string matching; tree data structures; tree searching; Rabin-Karp-based hash-table technique; fast linear time q-gram matching; intrusion detection; multiple q-gram search; pattern matching; query matching; substring matching problem; suffix tree; tree data structure; tree redundancy pruning algorithm; tree-based model; trie structure; word processing; Computational biology; Computer Society; Detectors; Information retrieval; Intrusion detection; Pattern matching; Runtime; Sequences; Signal processing algorithms; Tree data structures; Intrusion detection; pattern matching; q{hbox{-}}{rm gram} matching; search problems; string matching; suffix tree; tree data structure; trees; word processing.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2006.1599383
Filename :
1599383
Link To Document :
بازگشت