DocumentCode :
3630042
Title :
A New Parallel Algorithm for the Frequent Itemset Mining Problem
Author :
Mitica Craus
Author_Institution :
Dept. of Comput. Sci. & Eng., Gh. Asachi Tech. Univ., Iasi, Romania
fYear :
2008
Firstpage :
165
Lastpage :
170
Abstract :
A new parallel algorithm for finding the frequent itemsets in databases is presented. It differs fundamentally of well known Apriori algorithm, where at the beginning of every step, the dimension of the new frequent itemsets increases by 1 . In our algorithm the frequent itemsets are determined by progressively enlarging the interval which the individual items appertain, i.e. if at the k-th step the new candidates are from [i,i+k] intervals, i=1,2,…,n-k, at the next step, k+1, the new candidates will belong to [i, i+k+1] intervals, i=1,2,...,n-k-1. The frequent individual items are identified by  their index. The basic idea is that the new frequent itemsets with individual items from the interval [i,j],  simultaneously contain the items i and j. The frequent itemsets are built by sharing the work between n processors. Hereby, the processor P[i] computes, step by step, the sets F[i,j] of the frequent itemsets with individual items from the intervals [i,j], j=i,...,n. In order to compute the set F[i,j], the processing unit P[i] uses F[i,j-1] obtained in the previous step and F[i+1,j] received from the processor P[i+1]. The main advantage of our parallel  algorithm is that it uses a communication pattern known before algorithm start, which allows mapping communication to hardware.  Another major advantage is that the set of the transactions can be distributed to processors prior to beginning. This is possible because a processor P[i] has to compute F[i,j], j=i,...,n and therefore only the transactions containing the frequent item i are needed.
Keywords :
"Parallel algorithms","Itemsets","Data mining","Association rules","Transaction databases","Distributed computing","Computer science","Data engineering","Distributed databases","Hardware"
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Computing, 2008. ISPDC ´08. International Symposium on
Print_ISBN :
978-0-7695-3472-5
Type :
conf
DOI :
10.1109/ISPDC.2008.45
Filename :
4724243
Link To Document :
بازگشت