Title :
Density-accumulated arbitrary shaped clustering for large data sets
Author_Institution :
Comput. Dept., Guangdong Univ. of Finance, Guangzhou, China
Abstract :
To improve the efficiency of finding arbitrary shape clusters from large data sets and overcome the adverse effect on the clustering accuracy from noise data, an arbitrary shaped clustering algorithm based on density-accumulated is proposed. Using the idea of particle coagulation, the algorithm firstly generates a small scale subset only by one scanning large original data set, in which each of data points is given a weight value. Second, noise data are removed from the weighted subset in terms of the weight distribution of data points so that the clear structures and shapes of clusters are obtained. Finally, the arbitrary shape clusters are found from the weighted subset using the existing clustering algorithms such as a hierarchical, a density-based or a spectral clustering algorithm, and then the cluster structures of original data set are represented by those of the weighted subset. The experimental results show that the novel method has high clustering efficiency and accuracy, and can effectively suppress noise in data set.
Keywords :
learning (artificial intelligence); pattern clustering; arbitrary shaped clustering algorithm; data points; density-accumulated arbitrary shaped clustering; large data sets; noise data; particle coagulation; spectral clustering algorithm; Algorithm design and analysis; Automation; Clustering algorithms; Instrumentation and measurement; Noise; Shape; Time complexity; arbitrary shaped clustering; clustering analysis; density-accumulated; large data set;
Conference_Titel :
Instrumentation and Measurement, Sensor Network and Automation (IMSNA), 2013 2nd International Symposium on
Conference_Location :
Toronto, ON
DOI :
10.1109/IMSNA.2013.6743470