• DocumentCode
    659580
  • Title

    Distributed Pivot Clustering with arbitrary distance functions

  • Author

    Branting, L. Karl

  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    21
  • Lastpage
    27
  • Abstract
    This paper describes an algorithm, Distributed Pivot Clustering (DPC), that differs from prior distributed clustering algorithms in that it requires neither an inexpensive approximation of the actual distance function nor that pairs of elements in the same cluster share at least one exact feature value. Instead, DPC requires only that the distance function satisfy the triangle inequality and be of sufficiently high-granularity to permit the data to be partitioned into canopies of optimal size based on distance to reference elements, or pivots. An empirical evaluation demonstrated that DPC can lead to accurate distributed hierarchical agglomerative clustering provided that the triangle inequality and granularity requirements are met.
  • Keywords
    distributed algorithms; pattern clustering; DPC algorithm; arbitrary distance functions; distributed hierarchical agglomerative clustering; distributed pivot clustering algorithms; empirical evaluation; feature value; granularity requirements; triangle inequality; Accuracy; Approximation algorithms; Approximation methods; Clustering algorithms; Histograms; Indexes; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691729
  • Filename
    6691729