Title :
Semi-supervised clustering with soft labels
Author :
Cynthia Marea Nebu;Sumy Joseph
Author_Institution :
Amal Jyothi College of Engineering, Kerala, India
Abstract :
This paper devises a semi-supervised learning algorithm to cluster text documents. The proposed algorithm clusters multi-dimensional documents using the k-means algorithm. It initially reduces the dimensionality of the text so that the clustering algorithm can perform well in the low dimensional feature space. It also removes the irrelevant, redundant and noisy features from the corpus which may otherwise mislead the underlying algorithm. The proposed method employs pLSA algorithm to generate soft labels from these reduced feature subset and these labels along with the class labels guide the k-means algorithm. Experiments were conducted on Reuters-21,578 dataset and the results obtained showed that the proposed method outperforms many previous clustering algorithms without supervision.
Keywords :
"Clustering algorithms","Feature extraction","Semisupervised learning","Support vector machines","Noise measurement","Semantics","Algorithm design and analysis"
Conference_Titel :
Control Communication & Computing India (ICCC), 2015 International Conference on
DOI :
10.1109/ICCC.2015.7432969