DocumentCode :
2995283
Title :
PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
Author :
Xiao, Tao ; Yuan, Chunfeng ; Huang, Yihua
Author_Institution :
Dept. of Comput. Sci. & Technol., Nanjing Univ., Nanjing, China
fYear :
2011
fDate :
9-11 Dec. 2011
Firstpage :
252
Lastpage :
257
Abstract :
Many algorithms have been proposed in past decades to efficiently mine frequent sets in transaction database, including the SON Algorithm proposed by Savasere, Omiecinski and Navathe. This paper introduces the SON algorithm, explains why SON is very suitable to be parallelized, and illustrates how to adapt SON to the MapReduce paradigm. Then we propose a parallelized SON algorithm, PSON, and implement it in Hadoop. Our study suggests that PSON can mine frequent item sets from a very large database with good performance. The experimental results show that when performing frequent sets mining, the time cost will increase almost linearly with the size of the datasets and decrease with approximately linear trend with the number of cluster nodes. Consequently, we conclude that PSON works well for solving the frequent set mining problem from massive datasets with a good performance in both scalability and speed-up.
Keywords :
data mining; database management systems; distributed processing; pattern clustering; Hadoop; MapReduce; cluster nodes; datasets; frequent set mining problem; parallelized SON algorithm; transaction database; Algorithm design and analysis; Data mining; Distributed databases; Itemsets; Partitioning algorithms; Hadoop; MapReduce; frequent sets mining; parallelized SON algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2011 Fourth International Symposium on
Conference_Location :
Tianjin
Print_ISBN :
978-1-4577-1808-3
Type :
conf
DOI :
10.1109/PAAP.2011.38
Filename :
6128512
Link To Document :
بازگشت