• DocumentCode
    3141680
  • Title

    Parallel Clustering Based on Partitions of Local Minimal-Spanning-Trees

  • Author

    Shiau-Rung Tsui ; Wei-Jen Wang ; Shi-Shan Chen ; Lee Shu-Teng Chen ; Chilung Wang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Jhongli, Taiwan
  • fYear
    2012
  • fDate
    17-20 Dec. 2012
  • Firstpage
    111
  • Lastpage
    118
  • Abstract
    Many traditional clustering algorithms have the scalability problem while dealing with large data sets. One common strategy to handle the problem is to parallelize the algorithms and execute them along with the input data on high-performance computers. However, many graph-based clustering algorithms are hard to be parallelized since they need to calculate the similarity of all-pairs of all data nodes. In this paper, we propose a new parallel clustering algorithm, called the Para-CPLM (Parallel Clustering based on Partitions of Local Minimal-spanning-trees), which is based on three strategies - graph-based clustering, granular computing, and partition-and-merge. The Para-CPLM partitions the data domain into several regions for parallel execution, and then establishes a local minimal spanning tree in each region. After being established, the Para-CPLM combines those local minimal spanning trees and applies a method, namely the GBC method, to determine the best number of clusters. After the first phase of clustering, it repeatedly finds better pairs (edges) of the inter-clusters to reform the merged tree structure, such that the tree becomes closer to a global minimal spanning tree. Consequently, it uses the GBC method again to find the best number of clusters. From our experimental results, the Para-CPLM achieves significantly shorter execution time and better scalability while compared with the sequential GBC method. In addition, the clustering results are almost identical to those produced by the sequential GBC method.
  • Keywords
    parallel algorithms; tree searching; all data nodes; data domain; granular computing; graph based clustering algorithm; high performance computers; local minimal spanning trees; merged tree structure; para CPLM partition; parallel clustering algorithm; partition and merge; scalability; sequential GBC method; Classification algorithms; Clustering algorithms; Computers; Image edge detection; Merging; Parallel processing; Partitioning algorithms; clustering; graph-based clustering; parallel computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2012 Fifth International Symposium on
  • Conference_Location
    Taipei
  • ISSN
    2168-3034
  • Print_ISBN
    978-1-4673-4566-8
  • Type

    conf

  • DOI
    10.1109/PAAP.2012.25
  • Filename
    6424745