DocumentCode :
2513871
Title :
Balanced parallel FP-Growth with MapReduce
Author :
Zhou, Le ; Zhong, Zhiyong ; Chang, Jin ; Li, Junjie ; Huang, Joshua Zhexue ; Feng, Shengzhong
Author_Institution :
Center for High Performance Comput., Chinese Acad. of Sci., Shenzhen, China
fYear :
2010
fDate :
28-30 Nov. 2010
Firstpage :
243
Lastpage :
246
Abstract :
Frequent itemset mining (FIM) plays an essential role in mining associations, correlations and many other important data mining tasks. Unfortunately, as the volume of dataset gets larger day by day, most of the FIM algorithms in literature become ineffective due to either too huge resource requirement or too much communication cost. In this paper, we propose a balanced parallel FP-Growth algorithm BPFP, based on the PFP algorithm [1], which parallelizes FP-Growth in the MapReduce approach. BPFP adds into PFP load balance feature, which improves parallelization and thereby improves performance. Through empirical study, BPFP outperformed the PFP which uses some simple grouping strategy.
Keywords :
data mining; distributed processing; BPFP; FIM algorithms; MapReduce; balanced parallel FP-growth algorithm; frequent itemset mining; Algorithm design and analysis; Clustering algorithms; Data mining; Estimation; Itemsets; Partitioning algorithms; Algorithms; Distributed computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Computing and Telecommunications (YC-ICT), 2010 IEEE Youth Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8883-4
Type :
conf
DOI :
10.1109/YCICT.2010.5713090
Filename :
5713090
Link To Document :
بازگشت