DocumentCode :
579767
Title :
A Comparison of External Clustering Evaluation Indices in the Context of Imbalanced Data Sets
Author :
De Souto, Marcilio C P ; Coelho, André L V ; Faceli, Katti ; Sakata, Tiemi C. ; Bonadia, Viviane ; Costa, Ivan G.
Author_Institution :
Centro de Inf., Univ. Fed. de Pernambuco, Recife, Brazil
fYear :
2012
fDate :
20-25 Oct. 2012
Firstpage :
49
Lastpage :
54
Abstract :
For highly imbalanced data sets, almost all the instances are labeled as one class, whereas far fewer examples are labeled as the other classes. In this paper, we present an empirical comparison of seven different clustering evaluation indices when used to assess partitions generated from highly imbalanced data sets. Some of the metrics are based on matching of sets (F-measure), information theory (normalized mutual information and adjusted mutual information), and pair of objects counting (Rand and adjusted Rand indices). We also investigate the BCubed metric, which takes into account the concepts of recall, precision, as well as counting pairs. Furthermore, in order to avoid the class size imbalance effect, we propose a modification to the Rand index, referred to as the normalized class size Rand (NCR) index. In terms of results, apart from NCR, our experiments indicate that all the other analyzed indices are not able to deal properly with the problem of class size imbalance.
Keywords :
data handling; information theory; pattern clustering; set theory; unsupervised learning; BCubed metric; F-measure; NCR; Rand indices; adjusted Rand indices; adjusted mutual information; external clustering evaluation indices; imbalanced data sets; information theory; normalized class size Rand index; normalized mutual information; objects counting; partition assessment; set matching; unsupervised learning; Clustering algorithms; Context; Electronic mail; Frequency modulation; Indexes; Mutual information; Partitioning algorithms; Clustering Algorithms; External Evaluation Indices; Imbalanced Data Sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks (SBRN), 2012 Brazilian Symposium on
Conference_Location :
Curitiba
ISSN :
1522-4899
Print_ISBN :
978-1-4673-2641-4
Type :
conf
DOI :
10.1109/SBRN.2012.25
Filename :
6374823
Link To Document :
بازگشت