Title :
Fast and memory efficient mining of frequent closed itemsets
Author :
Lucchese, Claudio ; Orlando, Salvatore ; Perego, Raffaele
Author_Institution :
Dipt. di Informatica, Universita Ca´´Foscari di Venezia, Italy
Abstract :
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.
Keywords :
data mining; database management systems; divide and conquer methods; optimisation; search problems; association rule; bitwise vertical representation; closed frequent itemset discovery; condensed representation; data mining; divide-and-conquer approach; memory-efficient pruning technique; optimization; search space; transactional database; Association rules; Clustering algorithms; Data mining; Itemsets; Lattices; Partitioning algorithms; Pattern analysis; Proposals; Testing; Transaction databases; Index Terms- Data mining; association rules; closed itemsets; condensed representations; frequent itemsets; high-performance algorithms.;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2006.10