DocumentCode :
1537381
Title :
Chameleon: hierarchical clustering using dynamic modeling
Author :
Karypis, George ; Han, Eui-Hong ; Kumar, Vipin
Author_Institution :
Dept. of Comput. Sci., Minnesota Univ., Minneapolis, MN, USA
Volume :
32
Issue :
8
fYear :
1999
fDate :
8/1/1999 12:00:00 AM
Firstpage :
68
Lastpage :
75
Abstract :
Clustering is a discovery process in data mining. It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. Many advanced algorithms have difficulty dealing with highly variable clusters that do not follow a preconceived model. By basing its selections on both interconnectivity and closeness, the Chameleon algorithm yields accurate results for these highly variable clusters. Existing algorithms use a static model of the clusters and do not use information about the nature of individual clusters as they are merged. Furthermore, one set of schemes (the CURE algorithm and related schemes) ignores the information about the aggregate interconnectivity of items in two clusters. Another set of schemes (the Rock algorithm, group averaging method, and related schemes) ignores information about the closeness of two clusters as defined by the similarity of the closest items across two clusters. By considering either interconnectivity or closeness only, these algorithms can select and merge the wrong pair of clusters. Chameleon´s key feature is that it accounts for both interconnectivity and closeness in identifying the most similar pair of clusters. Chameleon finds the clusters in the data set by using a two-phase algorithm. During the first phase, Chameleon uses a graph partitioning algorithm to cluster the data items into several relatively small subclusters. During the second phase, it uses an algorithm to find the genuine clusters by repeatedly combining these subclusters
Keywords :
data analysis; data mining; graph theory; pattern clustering; CURE algorithm; Chameleon algorithm; Rock algorithm; advanced algorithms; aggregate interconnectivity; closeness; closest items; data item clustering; data mining; data set; discovery process; dynamic modeling; graph partitioning algorithm; hierarchical clustering; highly variable clusters; most similar pair; subclusters; two-phase algorithm; Aggregates; Clustering algorithms; Data analysis; Data mining; Earthquakes; Extraterrestrial measurements; Proteins; Seismology; Shape;
fLanguage :
English
Journal_Title :
Computer
Publisher :
ieee
ISSN :
0018-9162
Type :
jour
DOI :
10.1109/2.781637
Filename :
781637
Link To Document :
بازگشت