DocumentCode :
399777
Title :
Model-based clustering with soft balancing
Author :
Zhong, Shi ; Ghosh, Joydeep
Author_Institution :
Dept. of Comput. Sci. & Eng., Florida Atlantic Univ., Boca Raton, FL, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
459
Lastpage :
466
Abstract :
Balanced clustering algorithms can be useful in a variety of applications and have recently attracted increasing research interest. Most recent work, however, addressed only hard balancing by constraining each cluster to have equal or a certain minimum number of data objects. We provide a soft balancing strategy built upon a soft mixture-of-models clustering framework. This strategy constrains the sum of posterior probabilities of object membership for each cluster to be equal and thus balances the expected number of data objects in each cluster. We first derive soft model-based clustering from an information-theoretic viewpoint and then show that the proposed balanced clustering can be parameterized by a temperature parameter that controls the softness of clustering as well as that of balancing. As the temperature decreases, the resulting partitioning becomes more and more balanced. In the limit, when temperature becomes zero, the balancing becomes hard and the actual partitioning becomes perfectly balanced. The effectiveness of the proposed soft balanced clustering algorithm is demonstrated on both synthetic and real text data.
Keywords :
computational complexity; data mining; maximum likelihood estimation; pattern clustering; probability; cluster constraints; data objects; graph partitioning; hard balancing; information-theoretic viewpoint; object membership; posterior probabilities; soft balanced clustering algorithms; soft model-based clustering; temperature parameter; Application software; Clustering algorithms; Clustering methods; Computer science; Data mining; Databases; Indexing; Maximum likelihood estimation; Partitioning algorithms; Temperature control;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250953
Filename :
1250953
Link To Document :
بازگشت