DocumentCode :
3760821
Title :
Semi-supervised clustering with soft labels
Author :
Cynthia Marea Nebu;Sumy Joseph
Author_Institution :
Amal Jyothi College of Engineering, Kerala, India
fYear :
2015
Firstpage :
612
Lastpage :
616
Abstract :
This paper devises a semi-supervised learning algorithm to cluster text documents. The proposed algorithm clusters multi-dimensional documents using the k-means algorithm. It initially reduces the dimensionality of the text so that the clustering algorithm can perform well in the low dimensional feature space. It also removes the irrelevant, redundant and noisy features from the corpus which may otherwise mislead the underlying algorithm. The proposed method employs pLSA algorithm to generate soft labels from these reduced feature subset and these labels along with the class labels guide the k-means algorithm. Experiments were conducted on Reuters-21,578 dataset and the results obtained showed that the proposed method outperforms many previous clustering algorithms without supervision.
Keywords :
"Clustering algorithms","Feature extraction","Semisupervised learning","Support vector machines","Noise measurement","Semantics","Algorithm design and analysis"
Publisher :
ieee
Conference_Titel :
Control Communication & Computing India (ICCC), 2015 International Conference on
Type :
conf
DOI :
10.1109/ICCC.2015.7432969
Filename :
7432969
Link To Document :
بازگشت