Title :
PGMCLU: A novel parallel grid-based clustering algorithm for multi-density datasets
Author :
Chen Xiaoyun ; Chen Yi ; Qi Xiaoli ; Yue Min ; He Yanshan
Author_Institution :
Sch. of Inf. Sci. & Eng., Lanzhou Univ., Lanzhou, China
Abstract :
Clustering is one of the basic data mining tasks. Clustering high-dimensional and massive data points is a particularly important task in cluster analysis. But some existing clustering algorithms are merely suitable for small and medium sized datasets. Meanwhile, clustering multi-density datasets is also a very difficult task for some clustering methods. In this paper, to address these issues, we present a novel parallel grid-based clustering algorithm for multi-density datasets, called PGMCLU, based on the idea of data parallelism and merging local clusters. The proposed algorithm uses new measure, called grid compactness, which reflects the degree of tightness between data points within grid. Furthermore, it introduces the notion of grid feature for summarizing the information about grid, and proposes the novel approaches of data partition, local clustering and merging local clusters. Extensive theoretical analysis and experiment results on both real and synthetic datasets show that PGMCLU algorithm is effective and scalable, and has approximately linear speedup.
Keywords :
data mining; grid computing; parallel algorithms; pattern clustering; PGMCLU; PGMCLU algorithm; cluster analysis; data mining; data parallelism; data partitioning; grid compactness; local cluster merging; multidensity dataset; parallel grid-based clustering algorithm; Algorithm design and analysis; Clustering algorithms; Data engineering; Data mining; Image analysis; Machine learning algorithms; Merging; Partitioning algorithms; Personal communication networks; Programming profession;
Conference_Titel :
Web Society, 2009. SWS '09. 1st IEEE Symposium on
Conference_Location :
Lanzhou
Print_ISBN :
978-1-4244-4157-0
Electronic_ISBN :
978-1-4244-4158-7
DOI :
10.1109/SWS.2009.5271791