• DocumentCode
    3164785
  • Title

    Active Density-Based Clustering

  • Author

    Mai, Son T. ; Xiao He ; Hubig, Nina ; Plant, Claudia ; Bohm, Christian

  • Author_Institution
    Univ. of Munich, Munich, Germany
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    508
  • Lastpage
    517
  • Abstract
    The density-based clustering algorithm DBSCAN is a fundamental technique for data clustering with many attractive properties and applications. However, DBSCAN requires specifying all pair wise (dis)similarities among objects that can be non-trivial to obtain in many applications. To tackle this problem, in this paper, we propose a novel active density-based clustering algorithm, named Act-DBSCAN, which works under a restricted number of used pair wise similarities. Act-DBSCAN exploits the pair wise lower-bounding (LB) similarities to initialize the cluster structure. Then, it adaptively selects the most informative pair wise LB similarities to update with the real ones in order to reconstruct the result until the budget limitation is reached. The goal is to approximate as much as possible the true clustering result with each update. Our Act-DBSCAN framework is built upon a proposed probabilistic model to score the impact of the update of each pair wise LB similarity on the change of the intermediate clustering structure. Deriving from this scoring system and the monotonicity and reduction property of our active clustering process, we propose the two efficient algorithms to iteratively select and update pair wise similarities and cluster structure. Experiments on real datasets show that Act-DBSCAN acquires good clustering results with only a few pair wise similarities, and requires only a small fraction of all pair wise similarities to reach the DBSCAN results. Act-DBSCAN also outperforms other related techniques such as active spectral clustering.
  • Keywords
    pattern clustering; probability; Act-DBSCAN framework; active density-based clustering algorithm; active spectral clustering; cluster structure; data clustering; informative pairwise LB similarities; informative pairwise LB similarity selection; intermediate clustering structure; monotonicity property; pairwise lower-bounding similarities; reduction property; scoring system; true clustering; Clustering algorithms; Estimation; Kernel; Merging; Noise; Probabilistic logic; Training; Active clustering; Active learning; Density-based clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2013.39
  • Filename
    6729535