DocumentCode :
3105596
Title :
Latent Dirichlet Co-Clustering
Author :
Shafiei, M. Mahdi ; Milios, Evangelos E.
Author_Institution :
Fac. of Comput. Sci., Dalhousie Univ., Halifax, NS
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
542
Lastpage :
551
Abstract :
We present a generative model for simultaneously clustering documents and terms. Our model is a four-level hierarchical Bayesian model, in which each document is modeled as a random mixture of document topics , where each topic is a distribution over some segments of the text. Each of these segments in the document can be modeled as a mixture of word topics where each topic is a distribution over words. We present efficient approximate inference techniques based on Markov Chain Monte Carlo method and a moment-matching algorithm for empirical Bayes parameter estimation. We report results in document modeling, document and term clustering, comparing to other topic models, Clustering and Co-Clustering algorithms including latent Dirichlet allocation (LDA), model-based overlapping clustering (MOC), model-based overlapping co-clustering (MOCC) and information-theoretic co-clustering (ITCC).
Keywords :
Bayes methods; Markov processes; Monte Carlo methods; pattern clustering; text analysis; Markov Chain Monte Carlo method; approximate inference techniques; clustering documents; document modeling; document topics random mixture; empirical Bayes parameter estimation; four-level hierarchical Bayesian model; information-theoretic coclustering; latent Dirichlet allocation; latent Dirichlet co-clustering; model-based overlapping coclustering; moment-matching algorithm; text segments; Bayesian methods; Clustering algorithms; Computer science; Data mining; Frequency; Inference algorithms; Large scale integration; Linear discriminant analysis; Natural languages; Parameter estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location :
Hong Kong
ISSN :
1550-4786
Print_ISBN :
0-7695-2701-7
Type :
conf
DOI :
10.1109/ICDM.2006.94
Filename :
4053080
Link To Document :
بازگشت