• DocumentCode
    2054353
  • Title

    Frequent Itemset Mining on Large-Scale Shared Memory Machines

  • Author

    Zhang, Yan ; Zhang, Fan ; Bakos, Jason

  • Author_Institution
    Dept. of CSE, Univ. of South Carolina, Columbia, SC, USA
  • fYear
    2011
  • fDate
    26-30 Sept. 2011
  • Firstpage
    585
  • Lastpage
    589
  • Abstract
    Frequent Item set Mining (FIM) is a data mining task that is used to find frequently-occurring subsets amongst a database of item sets. FIM is a non-numerical data intensive computation and is frequently used in machine learning and computational biology applications. The development of increasingly efficient FIM algorithms is an active field, but exposing and exploiting parallelism is not often emphasized in the development of new FIM algorithms. In this paper, we explore parallel implementations of two FIM algorithms, Apriori and Eclat, each using three different representations: vertical transaction id set, vertical bit vector, and diffset. We implemented these algorithms using OpenMP and evaluated their resultant scalability on the 4096-core Intel Nehalem-EX SGI Altix shared-memory machine Teragrid "Blacklight" using 16 processors (one blade) to 256 processors (16 blades) and reported our results. We found that, while scalability generally depends on the input data, Apriori is only scalable when used with diffset. On the other side, Eclat is generally scalable but achieves its best scalability with diffset.
  • Keywords
    data mining; message passing; shared memory systems; Apriori; Eclat; Intel Nehalem-EX SGI Altix shared-memory machine; OpenMP; Teragrid Blacklight; computational biology application; data mining; frequent itemset mining; large-scale shared memory machine; machine learning; nonnumerical data intensive computation; parallel implementation; vertical bit vector; vertical transaction set; Algorithm design and analysis; Blades; Data mining; Instruction sets; Itemsets; Machine learning algorithms; Scalability; Apriori; Eclat; Frquent Itemset Mining; parallel; shared memory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2011 IEEE International Conference on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-1355-2
  • Electronic_ISBN
    978-0-7695-4516-5
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2011.69
  • Filename
    6061213