DocumentCode
579771
Title
A Semi-supervised Approach to Estimate the Number of Clusters per Class
Author
Sestaro, Davidson M. ; Covões, Thiago F. ; Hruschka, Eduardo R.
Author_Institution
Catholic Univ. of Santos (UniSantos) at Santos, Santos, Brazil
fYear
2012
fDate
20-25 Oct. 2012
Firstpage
73
Lastpage
78
Abstract
The disparity between the available amount of unlabeled and labeled data in several applications made semi-supervised learning become an active research topic. Most studies on semi-supervised clustering assume that the number of classes is equal to the number of clusters. This paper introduces a semi-supervised clustering algorithm, named Multiple Clusters per Class k-means (MCCK), which estimates the number of clusters per class via pair wise constraints generated from class labels. Experiments with eight datasets indicate that the algorithm outperforms three traditional algorithms for semi-supervised clustering, especially when the one-cluster-per-class assumption does not hold. Finally, the learned structure can offer a valuable description of the data in several applications. For instance, it can aid the identification of subtypes of diseases in medical diagnosis problems.
Keywords
diseases; learning (artificial intelligence); medical diagnostic computing; pattern clustering; MCCK; class labels; cluster per class estimation; disease subtype identification; medical diagnosis problems; multiple cluster per class k-means algorithm; one-cluster-per-class assumption; pair wise constraints; semisupervised clustering algorithm; semisupervised learning approach; structure learning; unlabeled data; Algorithm design and analysis; Breast cancer; Clustering algorithms; Ionosphere; Partitioning algorithms; Prototypes; constrained clustering; semi-supervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (SBRN), 2012 Brazilian Symposium on
Conference_Location
Curitiba
ISSN
1522-4899
Print_ISBN
978-1-4673-2641-4
Type
conf
DOI
10.1109/SBRN.2012.31
Filename
6374827
Link To Document