DocumentCode
659580
Title
Distributed Pivot Clustering with arbitrary distance functions
Author
Branting, L. Karl
fYear
2013
fDate
6-9 Oct. 2013
Firstpage
21
Lastpage
27
Abstract
This paper describes an algorithm, Distributed Pivot Clustering (DPC), that differs from prior distributed clustering algorithms in that it requires neither an inexpensive approximation of the actual distance function nor that pairs of elements in the same cluster share at least one exact feature value. Instead, DPC requires only that the distance function satisfy the triangle inequality and be of sufficiently high-granularity to permit the data to be partitioned into canopies of optimal size based on distance to reference elements, or pivots. An empirical evaluation demonstrated that DPC can lead to accurate distributed hierarchical agglomerative clustering provided that the triangle inequality and granularity requirements are met.
Keywords
distributed algorithms; pattern clustering; DPC algorithm; arbitrary distance functions; distributed hierarchical agglomerative clustering; distributed pivot clustering algorithms; empirical evaluation; feature value; granularity requirements; triangle inequality; Accuracy; Approximation algorithms; Approximation methods; Clustering algorithms; Histograms; Indexes; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data, 2013 IEEE International Conference on
Conference_Location
Silicon Valley, CA
Type
conf
DOI
10.1109/BigData.2013.6691729
Filename
6691729
Link To Document