Title :
TFP: an efficient algorithm for mining top-k frequent closed itemsets
Author :
Wang, Jianyong ; Han, Jiawei ; Lu, Ying ; Tzvetkov, Petre
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fDate :
5/1/2005 12:00:00 AM
Abstract :
Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.
Keywords :
data mining; tree data structures; tree searching; FP-Tree traversing strategy; TFP algorithm; closed itemset verification scheme; hash-indexed result tree; search space pruning method; top-k frequent closed itemset mining; Association rules; Data mining; Data structures; Explosions; Itemsets; Scalability; Transaction databases; Index Terms- Data mining; association rules; frequent itemset; mining methods and algorithms.;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.81