DocumentCode :
2369763
Title :
Combining multiple weak clusterings
Author :
Topchy, Alexander ; Jain, Anil K. ; Punch, William
Author_Institution :
Dept. of Comput. Sci., Michigan State Univ., East Lansing, MI, USA
fYear :
2003
fDate :
19-22 Nov. 2003
Firstpage :
331
Lastpage :
338
Abstract :
A data set can be clustered in many ways depending on the clustering algorithm employed, parameter settings used and other factors. Can multiple clusterings be combined so that the final partitioning of data provides better clustering? The answer depends on the quality of clusterings to be combined as well as the properties of the fusion method. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. As a result, we show that the consensus function is related to the classical intra-class variance criterion using the generalized mutual information definition. Second, we show the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. We analyze the combination accuracy as a function of parameters controlling the power and resolution of component partitions as well as the learning dynamics vs. the number of clusterings involved. Finally, some empirical studies compare the effectiveness of several consensus functions.
Keywords :
data mining; learning (artificial intelligence); pattern clustering; statistical analysis; categorical clustering problem; component partition; consensus function; data projection; data set; fusion method property; intra-class variance criterion; learning dynamics; multiple weak clustering algorithm; mutual information definition; parameter setting; random data split; Classification algorithms; Clustering algorithms; Computer science; Data mining; Fusion power generation; Mutual information; Partitioning algorithms; Robustness; Taxonomy; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
Print_ISBN :
0-7695-1978-4
Type :
conf
DOI :
10.1109/ICDM.2003.1250937
Filename :
1250937
Link To Document :
بازگشت