Title :
Pincer-search: an efficient algorithm for discovering the maximum frequent set
Author :
Lin, Dao-l ; Kedem, Zvi M.
Author_Institution :
Redback Networks Inc., San Jose, CA, USA
Abstract :
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up, breadth-first search direction. The computation starts from frequent 1-itemsets (the minimum length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform well when all maximal frequent itemsets are short. However, performance drastically deteriorates when some of the maximal frequent itemsets are long. We present a new algorithm which combines both the bottom-up and the top-down searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure, the maximum frequent candidate set. It is used to prune early candidates that would be normally encountered in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicit examination of every frequent itemset. We evaluate the performance of the algorithm using well-known synthetic benchmark databases, real-life census, and stock market databases
Keywords :
data mining; pattern recognition; software performance evaluation; tree data structures; tree searching; very large databases; Pincer search; association rules; benchmark databases; bottom-up search; breadth-first search; census database; data mining; data structure; episode discovery; knowledge discovery; large database; maximum frequent set discovery; minimal keys; performance evaluation; stock market databases; strong rules; Association rules; Data mining; Data structures; Databases; Itemsets; Stock markets;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2002.1000342