مرکز منطقه ای اطلاع رساني علوم و فناوري - Mining frequent closed itemsets for large data

Abstract :

Mining frequent closed itemsets is one effective method to analyse frequent pattern, and further, to generate association rules. Several algorithms were proposed to generate frequent closed itemsets, including CLOSE, A-CLOSE, CLOSET, CHARM and CLOSET + etc. However it´s still hard for these algorithms to deal with dense and very large data. In this paper, we analyze the search space of frequent closed itemsets and propose a new decomposition algorithm for mining frequent closed itemsets called PFC. PFC can dynamically generate non-overlapping partitions of the search space and mine frequent closed itemsets in each partition. Furthermore, each partition is independent and only shares the same source data with other partitions. So it is possible to implement PFC with multi-threads or parallel methods, and prune efficiently the search space of frequent closed itemsets. In this study, P FC is implemented in Java. We compare PFC with an author´s C++ version of CLOSET + on some large VCI repository datasets and on the worst case. The preliminary experimental results demonstrate good performance of PFC for dealing with dense and very large data.