DocumentCode
2054353
Title
Frequent Itemset Mining on Large-Scale Shared Memory Machines
Author
Zhang, Yan ; Zhang, Fan ; Bakos, Jason
Author_Institution
Dept. of CSE, Univ. of South Carolina, Columbia, SC, USA
fYear
2011
fDate
26-30 Sept. 2011
Firstpage
585
Lastpage
589
Abstract
Frequent Item set Mining (FIM) is a data mining task that is used to find frequently-occurring subsets amongst a database of item sets. FIM is a non-numerical data intensive computation and is frequently used in machine learning and computational biology applications. The development of increasingly efficient FIM algorithms is an active field, but exposing and exploiting parallelism is not often emphasized in the development of new FIM algorithms. In this paper, we explore parallel implementations of two FIM algorithms, Apriori and Eclat, each using three different representations: vertical transaction id set, vertical bit vector, and diffset. We implemented these algorithms using OpenMP and evaluated their resultant scalability on the 4096-core Intel Nehalem-EX SGI Altix shared-memory machine Teragrid "Blacklight" using 16 processors (one blade) to 256 processors (16 blades) and reported our results. We found that, while scalability generally depends on the input data, Apriori is only scalable when used with diffset. On the other side, Eclat is generally scalable but achieves its best scalability with diffset.
Keywords
data mining; message passing; shared memory systems; Apriori; Eclat; Intel Nehalem-EX SGI Altix shared-memory machine; OpenMP; Teragrid Blacklight; computational biology application; data mining; frequent itemset mining; large-scale shared memory machine; machine learning; nonnumerical data intensive computation; parallel implementation; vertical bit vector; vertical transaction set; Algorithm design and analysis; Blades; Data mining; Instruction sets; Itemsets; Machine learning algorithms; Scalability; Apriori; Eclat; Frquent Itemset Mining; parallel; shared memory;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing (CLUSTER), 2011 IEEE International Conference on
Conference_Location
Austin, TX
Print_ISBN
978-1-4577-1355-2
Electronic_ISBN
978-0-7695-4516-5
Type
conf
DOI
10.1109/CLUSTER.2011.69
Filename
6061213
Link To Document