Title :
The unbalancing effect of hubs on K-medoids clustering in high-dimensional spaces
Author :
Dominik Schnitzer;Arthur Flexer
Author_Institution :
Austrian Research Institute for Artificial Intelligence, Freyung 6/6, Vienna, Austria
fDate :
7/1/2015 12:00:00 AM
Abstract :
Unbalanced cluster solutions are affected by very different cluster sizes, with some clusters being very large while others contain almost no data. We demonstrate that this phenomenon is connected to `hubness´, a recently discovered general problem of machine learning in high dimensional data spaces. Hub objects have a small distance to an exceptionally large number of data points, and anti-hubs are far from all other data points. In an empirical study of K-medoids clustering we show that hubness gives rise to very unbalanced cluster sizes resulting in impaired internal and external evaluation indices. We compare three methods which reduce hubness in the distance spaces and show that with the balancing of the clusters evaluation indices improve. This is done using artificial and real data sets from diverse domains.
Keywords :
"Tin","Biology"
Conference_Titel :
Neural Networks (IJCNN), 2015 International Joint Conference on
Electronic_ISBN :
2161-4407
DOI :
10.1109/IJCNN.2015.7280303