DocumentCode :
2370295
Title :
Information theoretic clustering of sparse cooccurrence data
Author :
Dhillon, Inderjit S. ; Guan, Yuqiang
Author_Institution :
Dept. of Comput. Sci., Texas Univ., Austin, TX, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
517
Lastpage :
520
Abstract :
A novel approach to clustering cooccurrence data poses it as an optimization problem in information theory which minimizes the resulting loss in mutual information. A divisive clustering algorithm that monotonically reduces this loss function was recently proposed. We show that sparse high-dimensional data presents special challenges which can result in the algorithm getting stuck at poor local minima. We propose two solutions to this problem: (a) a "prior" to overcome infinite relative entropy values as in the supervised Naive Bayes algorithm, and (b) local search to escape local minima. Finally, we combine these solutions to get a robust algorithm that is computationally efficient. We present experimental results to show that the proposed method is effective in clustering document collections and outperform previous information-theoretic clustering approaches.
Keywords :
Bayes methods; information theory; learning (artificial intelligence); optimisation; pattern clustering; divisive clustering algorithm; document clustering; information theory; local minima; sparse high-dimensional cooccurrence data; supervised Naive Bayes algorithm; Character generation; Clustering algorithms; Entropy; Information theory; Loss measurement; Mutual information; Probability distribution; Random variables; Robustness; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250966
Filename :
1250966
Link To Document :
بازگشت