• DocumentCode
    1678691
  • Title

    Active Learning for Semi-Supervised K-Means Clustering

  • Author

    Vu, Viet-Vu ; Labroche, Nicolas ; Bouchon-Meunier, Bernadette

  • Author_Institution
    LIP6, Univ. Pierre et Marie Curie-Paris 6, Paris, France
  • Volume
    1
  • fYear
    2010
  • Firstpage
    12
  • Lastpage
    15
  • Abstract
    K-Means algorithm is one of the most used clustering algorithm for Knowledge Discovery in Data Mining. Seed based K-Means is the integration of a small set of labeled data (called seeds) to the K-Means algorithm to improve its performances and overcome its sensitivity to initial centers. These centers are, most of the time, generated at random or they are assumed to be available for each cluster. This paper introduces a new efficient algorithm for active seeds selection which relies on a Min-Max approach that favors the coverage of the whole dataset. Experiments conducted on artificial and real datasets show that, using our active seeds selection algorithm, each cluster contains at least one seed after a very small number of queries and thus helps reducing the number of iterations until convergence which is crucial in many KDD applications.
  • Keywords
    data mining; learning (artificial intelligence); pattern clustering; active learning; data mining; knowledge discovery algorithm; min-max approach; seed based k-means clustering; semi-supervised k-means clustering; Clustering algorithms; Complexity theory; Convergence; Data mining; Equations; Nearest neighbor searches; Semi-supervied clustering; active learning; seed;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
  • Conference_Location
    Arras
  • ISSN
    1082-3409
  • Print_ISBN
    978-1-4244-8817-9
  • Type

    conf

  • DOI
    10.1109/ICTAI.2010.11
  • Filename
    5670014