Title :
A Semi-supervised Approach to Estimate the Number of Clusters per Class
Author :
Sestaro, Davidson M. ; Covões, Thiago F. ; Hruschka, Eduardo R.
Author_Institution :
Catholic Univ. of Santos (UniSantos) at Santos, Santos, Brazil
Abstract :
The disparity between the available amount of unlabeled and labeled data in several applications made semi-supervised learning become an active research topic. Most studies on semi-supervised clustering assume that the number of classes is equal to the number of clusters. This paper introduces a semi-supervised clustering algorithm, named Multiple Clusters per Class k-means (MCCK), which estimates the number of clusters per class via pair wise constraints generated from class labels. Experiments with eight datasets indicate that the algorithm outperforms three traditional algorithms for semi-supervised clustering, especially when the one-cluster-per-class assumption does not hold. Finally, the learned structure can offer a valuable description of the data in several applications. For instance, it can aid the identification of subtypes of diseases in medical diagnosis problems.
Keywords :
diseases; learning (artificial intelligence); medical diagnostic computing; pattern clustering; MCCK; class labels; cluster per class estimation; disease subtype identification; medical diagnosis problems; multiple cluster per class k-means algorithm; one-cluster-per-class assumption; pair wise constraints; semisupervised clustering algorithm; semisupervised learning approach; structure learning; unlabeled data; Algorithm design and analysis; Breast cancer; Clustering algorithms; Ionosphere; Partitioning algorithms; Prototypes; constrained clustering; semi-supervised learning;
Conference_Titel :
Neural Networks (SBRN), 2012 Brazilian Symposium on
Conference_Location :
Curitiba
Print_ISBN :
978-1-4673-2641-4
DOI :
10.1109/SBRN.2012.31