DocumentCode :
3126143
Title :
Tracking and Connecting Topics via Incremental Hierarchical Dirichlet Processes
Author :
Gao, Zekai J. ; Song, Yangqiu ; Liu, Shixia ; Wang, Haixun ; Wei, Hao ; Chen, Yang ; Cui, Weiwei
Author_Institution :
Microsoft Res. Asia, Beijing, China
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
1056
Lastpage :
1061
Abstract :
Much research has been devoted to topic detection from text, but one major challenge has not been addressed: revealing the rich relationships that exist among the detected topics. Finding such relationships is important since many applications are interested in how topics come into being, how they develop, grow, disintegrate, and finally disappear. In this paper, we present a novel method that reveals the connections between topics discovered from the text data. Specifically, our method focuses on how one topic splits into multiple topics, and how multiple topics merge into one topic. We adopt the hierarchical Dirichlet process (HDP) model, and propose an incremental Gibbs sampling algorithm to incrementally derive and refine the labels of clusters. We then characterize the splitting and merging patterns among clusters based on how labels change. We propose a global analysis process that focuses on cluster splitting and merging, and a finer granularity analysis process that helps users to better understand the content of the clusters and the evolution patterns. We also develop a visualization process to present the results.
Keywords :
data visualisation; merging; pattern clustering; sampling methods; text analysis; evolution pattern clustering; global analysis process; granularity analysis process; incremental Gibbs sampling algorithm; incremental hierarchical Dirichlet process; merging pattern; splitting pattern; text data; topic detection; Business; Clustering algorithms; Data models; Merging; Predictive models; Semantics; Time measurement; Clustering; Hierarchical Dirichlet processes; Incremental Gibbs Sampling; Mixture models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.148
Filename :
6137314
Link To Document :
بازگشت