DocumentCode :
1958074
Title :
A distributed frequent itemset mining algorithm based on Spark
Author :
Feng Gui ; Yunlong Ma ; Feng Zhang ; Min Liu ; Fei Li ; Weiming Shen ; Hua Bai
Author_Institution :
Sch. of Electron. & Inf. Eng., Tongji Univ., Shanghai, China
fYear :
2015
fDate :
6-8 May 2015
Firstpage :
271
Lastpage :
275
Abstract :
Frequent itemset mining is an important step of association rules mining. Traditional frequent itemset mining algorithms have certain limitations. For example Apriori algorithm has to scan the input data repeatedly, which leads to high I/O load and low performance, and the FP-Growth algorithm is limited by the capacity of computer´s inner stores because it needs to build a FP-tree and mine frequent itemset on the basis of the FP-tree in memory. With the coming of the Big Data era, these limitations are becoming more prominent when confronted with mining large-scale data. In this paper, DPBM, a distributed matrix-based pruning algorithm based on Spark, is proposed to deal with frequent itemset mining. DPBM can greatly reduce the amount of candidate itemset by introducing a novel pruning technique for matrix-based frequent itemset mining algorithm, an improved Apriori algorithm which only needs to scan the input data once. In addition, each computer node reduces greatly the memory usage by implementing DPBM under a latest distributed environment-Spark, which is a lightning-fast distributed computing. The experimental results show that DPBM have better performance than MapReduce-based algorithms for frequent itemset mining in terms of speed and scalability.
Keywords :
data mining; input-output programs; matrix algebra; trees (mathematics); FP-growth algorithm; FP-tree; I/O load; Spark; apriori algorithm; association rules mining; distributed frequent itemset mining algorithm; distributed matrix-based pruning algorithm; MapReduce; Spark; distributed algorithm; frequent itemset mining; matrix-pruning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Supported Cooperative Work in Design (CSCWD), 2015 IEEE 19th International Conference on
Conference_Location :
Calabria
Print_ISBN :
978-1-4799-2001-3
Type :
conf
DOI :
10.1109/CSCWD.2015.7230970
Filename :
7230970
Link To Document :
بازگشت