DocumentCode :
3126703
Title :
A Fast and Flexible Clustering Algorithm Using Binary Discretization
Author :
Sugiyama, Mahito ; Yamamoto, Akihiro
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
1212
Lastpage :
1217
Abstract :
We present in this paper a new clustering algorithm for multivariate data. This algorithm, called BOOL (Binary coding Oriented clustering), can detect arbitrarily shaped clusters and is noise tolerant. BOOL handles data using a two-step procedure: data points are first discretized and represented as binary words, clusters are then iteratively constructed by agglomerating smaller clusters using this representation. This latter step is carried out with linear complexity by sorting such binary representations, which results in dramatic speedups when compared with other techniques. Experiments show that BOOL is faster than K-means, and about two to three orders of magnitude faster than two state-of-the-art algorithms that can detect non-convex clusters of arbitrary shapes. We also show that BOOL´s results are robust to changes in parameters, whereas most algorithms for arbitrarily shaped clusters are known to be overly sensitive to such changes. The key to the robustness of BOOL is the hierarchical structure of clusters that is introduced automatically by increasing the accuracy of the discretization.
Keywords :
computational complexity; data handling; data mining; learning (artificial intelligence); pattern clustering; BOOL; binary coding oriented clustering; binary discretization; binary representations; binary words; clustering algorithm; data mining; data points; knowledge discovery; linear complexity; machine learning; multivariate data; nonconvex cluster detection; Conferences; Data mining; Indexes; Noise; Binary encoding; Discretization; Hierarchical clustering; Shape-based clustering; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.9
Filename :
6137340
Link To Document :
بازگشت