DocumentCode :
3090191
Title :
A min-max distance based external cluster validity index: MMI
Author :
Alok, Abhay Kumar ; Saha, Simanto ; Ekbal, Asif
Author_Institution :
Comput. Sci. Eng., Indian Inst. of Technol., Patna, Patna, India
fYear :
2012
fDate :
4-7 Dec. 2012
Firstpage :
354
Lastpage :
359
Abstract :
Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data points and prior information based on structure of the data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Genetic K-means algorithm (GAK-means) and single linkage have been used as the underlying clustering techniques. Results of the proposed index for identifying the appropriate number of clusters is shown for five artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied over a range. The MMI index is then used to determine the appropriate number of clusters. The performance of MMI is compared with existing external cluster validity indices, adjusted rand index (ARI) and rand index (RI). It works well for two class and multi class data sets.
Keywords :
data structures; genetic algorithms; minimax techniques; pattern clustering; ARI; GAK-means; MMI index; adjusted rand index; cluster validity indices; data points; data structure; genetic K-means algorithm; min-max distance based external cluster validity index; probabilistic approach; real-life data sets; single linkage clustering techniques; supervised information; underlying clustering techniques; underlying partitioning techniques; Clustering algorithms; Couplings; Distributed databases; Euclidean distance; Genetics; Indexes; Partitioning algorithms; Cluster validity; External cluster validity index; Genetic K-means Clustering algorithm; single linkage clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hybrid Intelligent Systems (HIS), 2012 12th International Conference on
Conference_Location :
Pune
Print_ISBN :
978-1-4673-5114-0
Type :
conf
DOI :
10.1109/HIS.2012.6421360
Filename :
6421360
Link To Document :
بازگشت