• DocumentCode
    579771
  • Title

    A Semi-supervised Approach to Estimate the Number of Clusters per Class

  • Author

    Sestaro, Davidson M. ; Covões, Thiago F. ; Hruschka, Eduardo R.

  • Author_Institution
    Catholic Univ. of Santos (UniSantos) at Santos, Santos, Brazil
  • fYear
    2012
  • fDate
    20-25 Oct. 2012
  • Firstpage
    73
  • Lastpage
    78
  • Abstract
    The disparity between the available amount of unlabeled and labeled data in several applications made semi-supervised learning become an active research topic. Most studies on semi-supervised clustering assume that the number of classes is equal to the number of clusters. This paper introduces a semi-supervised clustering algorithm, named Multiple Clusters per Class k-means (MCCK), which estimates the number of clusters per class via pair wise constraints generated from class labels. Experiments with eight datasets indicate that the algorithm outperforms three traditional algorithms for semi-supervised clustering, especially when the one-cluster-per-class assumption does not hold. Finally, the learned structure can offer a valuable description of the data in several applications. For instance, it can aid the identification of subtypes of diseases in medical diagnosis problems.
  • Keywords
    diseases; learning (artificial intelligence); medical diagnostic computing; pattern clustering; MCCK; class labels; cluster per class estimation; disease subtype identification; medical diagnosis problems; multiple cluster per class k-means algorithm; one-cluster-per-class assumption; pair wise constraints; semisupervised clustering algorithm; semisupervised learning approach; structure learning; unlabeled data; Algorithm design and analysis; Breast cancer; Clustering algorithms; Ionosphere; Partitioning algorithms; Prototypes; constrained clustering; semi-supervised learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (SBRN), 2012 Brazilian Symposium on
  • Conference_Location
    Curitiba
  • ISSN
    1522-4899
  • Print_ISBN
    978-1-4673-2641-4
  • Type

    conf

  • DOI
    10.1109/SBRN.2012.31
  • Filename
    6374827