• DocumentCode
    2199754
  • Title

    Dynamic Load Balancing in Parallel KD-Tree k-Means

  • Author

    Di Fatta, Giuseppe ; Pettinger, David

  • Author_Institution
    Sch. of Syst. Eng., Univ. of Reading, Reading, UK
  • fYear
    2010
  • fDate
    June 29 2010-July 1 2010
  • Firstpage
    2478
  • Lastpage
    2485
  • Abstract
    One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy.
  • Keywords
    data mining; data structures; distributed memory systems; parallel processing; pattern clustering; resource allocation; cluster analysis; data mining methods; data structure; distributed memory systems; dynamic load balancing; geometrical constraints; multidimensional binary search tree; parallel KD-tree k-means; parallel computing environments; parallel processing; processing nodes; Algorithm design and analysis; Clustering algorithms; Construction industry; Distributed databases; Load management; Partitioning algorithms; Pediatrics; Clustering; Dynamic Load Balancing; KD-Trees; Parallel k-Means;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
  • Conference_Location
    Bradford
  • Print_ISBN
    978-1-4244-7547-6
  • Type

    conf

  • DOI
    10.1109/CIT.2010.424
  • Filename
    5578280