DocumentCode :
3104754
Title :
Meta Clustering
Author :
Caruana, Rich ; Elhawary, Mohamed ; Nguyen, Nam ; Smith, Casey
Author_Institution :
Cornell Univ., Ithaca, NY
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
107
Lastpage :
118
Abstract :
Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the clusters will be used. Devising clustering criteria that capture what users need is difficult. Most clustering algorithms search for optimal clusterings based on a pre-specified clustering criterion. Our approach differs. We search for many alternate clusterings of the data, and then allow users to select the clustering(s) that best fit their needs. Meta clustering first finds a variety of clusterings and then clusters this diverse set of clusterings so that users must only examine a small number of qualitatively different clusterings. We present methods for automatically generating a diverse set of alternate clusterings, as well as methods for grouping clusterings into meta clusters. We evaluate meta clustering on four test problems and two case studies. Surprisingly, clusterings that would be of most interest to users often are not very compact clusterings.
Keywords :
pattern clustering; clustering criteria; clustering quality; meta clustering; optimal clustering; Cardiac disease; Clustering algorithms; Databases; History; Partitioning algorithms; Predictive models; Space exploration; Supervised learning; Testing; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
ISSN :
1550-4786
Print_ISBN :
0-7695-2701-7
Type :
conf
DOI :
10.1109/ICDM.2006.103
Filename :
4053039
Link To Document :
بازگشت