DocumentCode :
2772194
Title :
Dirichlet Mixture Allocation for Multiclass Document Collections Modeling
Author :
Bian, Wei ; Tao, Dacheng
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore
fYear :
2009
fDate :
6-9 Dec. 2009
Firstpage :
711
Lastpage :
715
Abstract :
Topic model, latent Dirichlet allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes.
Keywords :
statistical analysis; text analysis; Dirichlet mixture allocation; TDT2 Corpus; data fitting; latent Dirichlet allocation; multiclass document collections modeling; multimodal Dirichlet mixture prior; statistical analysis; text modeling; unimodal Dirichlet distribution prior; Bayesian methods; Data engineering; Data mining; Image retrieval; Indexing; Inference algorithms; Information retrieval; Linear discriminant analysis; Statistical analysis; Vocabulary; Dirichlet mixture; latent Dirichlet allocation; multiclass; text modeling; topic model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
ISSN :
1550-4786
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2009.102
Filename :
5360299
Link To Document :
بازگشت