• DocumentCode
    3469723
  • Title

    Distributed mining of maximal frequent itemsets from databases on a cluster of workstations

  • Author

    Chung, Soon M. ; Luo, Congnan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA
  • fYear
    2004
  • fDate
    19-22 April 2004
  • Firstpage
    499
  • Lastpage
    507
  • Abstract
    In this paper, we propose a new algorithm, named Distributed Max-Miner (DMM), for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. DMM requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix-tree can work with any local mining algorithm. We implemented DMM on a cluster of workstations and evaluated its performance for various cases. DMM demonstrates better performance than other sequential and parallel algorithms, and its performance is quite scalable, even when there are large maximal frequent itemsets (i.e., long patterns) in databases.
  • Keywords
    data mining; distributed algorithms; distributed databases; octrees; performance evaluation; tree searching; workstation clusters; DMM; Distributed Max-Miner; cluster of workstations; distributed computing; distributed mining; local database; maximal frequent itemsets; performance evaluation; prefix-tree data structure; top-down search; Association rules; Clustering algorithms; Computer science; Data engineering; Data mining; Distributed computing; Distributed databases; Itemsets; Parallel algorithms; Workstations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International Symposium on
  • Print_ISBN
    0-7803-8430-X
  • Type

    conf

  • DOI
    10.1109/CCGrid.2004.1336638
  • Filename
    1336638