DocumentCode :
3124127
Title :
Learning Dirichlet Processes from Partially Observed Groups
Author :
Dubey, Avinava ; Bhattacharya, Indrajit ; Das, Mrinal ; Faruquie, Tanveer ; Bhattacharyya, Chiranjib
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
141
Lastpage :
150
Abstract :
Motivated by the task of vernacular news analysis using known news topics from national news-papers, we study the task of topic analysis, where given source datasets with observed topics, data items from a target dataset need to be assigned either to observed source topics or to new ones. Using Hierarchical Dirichlet Processes for addressing this task imposes unnecessary and often inappropriate generative assumptions on the observed source topics. In this paper, we explore Dirichlet Processes with partially observed groups (POG-DP). POG-DP avoids modeling the given source topics. Instead, it directly models the conditional distribution of the target data as a mixture of a Dirichlet Process and the posterior distribution of a Hierarchical Dirichlet Process with known groups and topics. This introduces coupling between selection probabilities of all topics within a source, leading to effective identification of source topics. We further improve on this with a Combinatorial Dirichlet Process with partially observed groups (POG-CDP) that captures finer grained coupling between related topics by choosing intersections between sources. We evaluate our models in three different real-world applications. Using extensive experimentation, we compare against several baselines to show that our model performs significantly better in all three applications.
Keywords :
information resources; probability; combinatorial dirichlet process; combinatorial dirichlet process with partially observed groups; national news-papers; news topics; selection probabilities; topic analysis; vernacular news analysis; Companies; Couplings; Data models; Equations; Hidden Markov models; Inference algorithms; Mathematical model; Dirichlet Process; grouped data; partial observations; topic analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.85
Filename :
6137218
Link To Document :
بازگشت