DocumentCode :
2771789
Title :
On K-Means Cluster Preservation Using Quantization Schemes
Author :
Turaga, Deepak S. ; Vlachos, Michail ; Verscheure, Olivier
Author_Institution :
IBM T.J. Watson Res. Center, Hawthorne, NY, USA
fYear :
2009
fDate :
6-9 Dec. 2009
Firstpage :
533
Lastpage :
542
Abstract :
This work examines under what conditions compression methodologies can retain the outcome of clustering operations. We focus on the popular k-means clustering algorithm and we demonstrate how a properly constructed compression scheme based on post-clustering quantization is capable of maintaining the global cluster structure. Our analytical derivations indicate that a 1-bit moment preserving quantizer per cluster is sufficient to retain the original data clusters. Merits of the proposed compression technique include: a) reduced storage requirements with clustering guarantees, b) data privacy on the original values, and c) shape preservation for data visualization purposes. We evaluate quantization scheme on various high-dimensional datasets, including 1-dimensional and 2-dimensional time-series (shape datasets) and demonstrate the cluster preservation property. We also compare with previously proposed simplification techniques in the time-series area and show significant improvements both on the clustering and shape preservation of the compressed datasets.
Keywords :
data compression; data mining; pattern clustering; time series; compression methodology; global cluster structure; k-means cluster preservation; post-clustering quantization; time-series; Artificial intelligence; Clustering algorithms; Costs; Data analysis; Data mining; Laboratories; Partitioning algorithms; Quantization; Shape; USA Councils; clustering preservation; moment preserving quantization; privacy preservation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
ISSN :
1550-4786
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2009.12
Filename :
5360279
Link To Document :
بازگشت