DocumentCode :
2571688
Title :
Parallel and Distributed Frequent Pattern Mining in Large Databases
Author :
Tanbeer, Syed Khairuzzaman ; Ahmed, Chowdhury Farhan ; Jeong, Byeong-Soo
Author_Institution :
Dept. of Comput. Eng., Kyung Hee Univ., Yongin, South Korea
fYear :
2009
fDate :
25-27 June 2009
Firstpage :
407
Lastpage :
414
Abstract :
Recently, a significant number of parallel and distributed algorithms have been proposed to mine frequent patterns (FP) from large and/or distributed databases. Among them parallelization of the FP-growth algorithms using the FP-tree has been proved to be highly efficient. However, the FP-tree-based techniques suffer from two major limitations such as multiple database scans requirement (i.e., high I/O cost) and high inter-processor communications cost (during the mining phase). Therefore, we propose a novel tree structure, called PP-tree (Parallel Pattern tree) that significantly reduces the I/O cost by capturing the database contents with a single scan and facilitates the efficient FP-growth mining on it with reduced inter-processor communication overhead. Our parallel algorithm works independently at each local site and locally generates global frequent patterns which are merged at the final stage. The experimental results reflect that parallel and distributed FP mining with PP-tree outperforms other state-of-the-art algorithms.
Keywords :
data mining; distributed databases; parallel algorithms; tree data structures; very large databases; FP-tree-based technique; I/O cost reduction; distributed algorithm; distributed database; frequent pattern mining; inter-processor communication; large database; parallel algorithm; parallel pattern tree; tree structure; Broadcasting; Concurrent computing; Costs; Data engineering; Data mining; Distributed computing; Distributed databases; Frequency; High performance computing; Tree data structures;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Communications, 2009. HPCC '09. 11th IEEE International Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-4600-1
Electronic_ISBN :
978-0-7695-3738-2
Type :
conf
DOI :
10.1109/HPCC.2009.37
Filename :
5167021
Link To Document :
بازگشت