DocumentCode :
3189854
Title :
Semi-supervised Clustering Using Bayesian Regularization
Author :
Xu, Zuobing ; Akella, Ram ; Ching, Mike ; Tang, Renjie
Author_Institution :
Univ. of California, Santa Cruz
fYear :
2007
fDate :
28-31 Oct. 2007
Firstpage :
361
Lastpage :
366
Abstract :
Text clustering is most commonly treated as a fully automated task without user supervision. However, we can improve clustering performance using supervision in the form of pairwise (must-link and cannot-link) constraints. This paper introduces a rigorous Bayesian framework for semi-supervised clustering which incorporates human supervision in the form of pairwise constraints both in the expectation step and maximization step of the EM algorithm. During the expectation step, we model the pairwise constraints as random variables, which enable us to capture the uncertainly in constraints in a principled manner. During the maximization step, we treat the constraint documents as prior information, and adjust the probability mass of model distribution to emphasize words occurring in constraint documents by using Bayesian regularization. Bayesian conjugate prior modeling makes the maximization step more efficient than gradient search methods in the traditional distance learning. Experimental results on several text datasets demonstrate significant advantages over existing algorithms.
Keywords :
Bayes methods; data mining; expectation-maximisation algorithm; pattern clustering; random processes; statistical distributions; text analysis; Bayesian conjugate prior modeling; Bayesian regularization; EM algorithm; cannot-link constraint; constraint documents; data mining; model distribution; must-link constraint; pairwise constraints; probability mass; random variables; semisupervised clustering; text clustering; Bayesian methods; Clustering algorithms; Computer aided instruction; Conferences; Data mining; Humans; Partitioning algorithms; Random variables; Text mining; USA Councils;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on
Conference_Location :
Omaha, NE
Print_ISBN :
978-0-7695-3019-2
Electronic_ISBN :
978-0-7695-3033-8
Type :
conf
DOI :
10.1109/ICDMW.2007.60
Filename :
4476692
Link To Document :
بازگشت