Title :
Efficient algorithms for mining closed itemsets and their lattice structure
Author :
Zaki, Mohammed J. ; Hsiao, Ching-Jui
Author_Institution :
Dept. of Comput. Sci., Rensselaer Polytech. Inst., Troy, NY, USA
fDate :
4/1/2005 12:00:00 AM
Abstract :
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hash-based approach to remove any "nonclosed" sets found during computation. We also present CHARM-L, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a state-of-the-art algorithm that outperforms previous methods. Further, CHARM-L explicitly generates the frequent closed itemset lattice.
Keywords :
data mining; tree searching; very large databases; association rules; closed itemset lattice; closed itemsets; data mining; frequent itemsets; hash approach; search tree; Association rules; Data mining; Data visualization; Degradation; Frequency; Itemsets; Lattices; Multidimensional systems; Transaction databases; Visual databases; Index Terms- Closed itemsets; association rules; closed itemset lattice; data mining.; frequent itemsets;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2005.60