DocumentCode :
2222876
Title :
A deterministic method for initializing K-means clustering
Author :
Su, Ting ; Dy, Jennifer
Author_Institution :
Northeastern Univ., Boston, MA, USA
fYear :
2004
fDate :
15-17 Nov. 2004
Firstpage :
784
Lastpage :
786
Abstract :
The performance of K-means clustering depends on the initial guess of partition. We motivate theoretically and experimentally the use of a deterministic divisive hierarchical method, which we refer to as PCA-Part (principal components analysis partitioning) for initialization. The criterion that K-means clustering minimizes is the SSE (sum-squared-error) criterion. The first principal direction (the eigenvector corresponding to the largest eigenvalue of the covariance matrix) is the direction which contributes the largest SSE. Hence, a good candidate direction to project a cluster for splitting is, then, the first principal direction. This is the basis for PCA-Part initialization method. Our experiments reveal that generally PCA-Part leads K-means to generate clusters with SSE values close to the minimum SSE values obtained by one hundred random start runs. In addition, this deterministic initialization method often leads K-means to faster convergence (less iterations) compared to random methods. Furthermore, we also theoretically show and confirm experimentally on synthetic data when PCA-Part may fail.
Keywords :
pattern classification; pattern clustering; principal component analysis; unsupervised learning; K-means clustering; deterministic divisive hierarchical method; deterministic initialization; principal components analysis partitioning; sum-squared-error criterion; Artificial intelligence; Clustering algorithms; Convergence; Covariance matrix; Eigenvalues and eigenfunctions; Greedy algorithms; Partitioning algorithms; Pattern analysis; Principal component analysis; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 2004. ICTAI 2004. 16th IEEE International Conference on
ISSN :
1082-3409
Print_ISBN :
0-7695-2236-X
Type :
conf
DOI :
10.1109/ICTAI.2004.7
Filename :
1374274
Link To Document :
بازگشت