DocumentCode :
1388310
Title :
Topic Mining over Asynchronous Text Sequences
Author :
Wang, Xiang ; Jin, Xiaoming ; Chen, Meng-En ; Zhang, Kai ; Shen, Dou
Author_Institution :
Sch. of Software, Tsinghua Univ., Beijing, China
Volume :
24
Issue :
1
fYear :
2012
Firstpage :
156
Lastpage :
169
Abstract :
Time stamped texts, or text sequences, are ubiquitous in real-world applications. Multiple text sequences are often related to each other by sharing common topics. The correlation among these sequences provides more meaningful and comprehensive clues for topic mining than those from each individual sequence. However, it is nontrivial to explore the correlation with the existence of asynchronism among multiple sequences, i.e., documents from different sequences about the same topic may have different time stamps. In this paper, we formally address this problem and put forward a novel algorithm based on the generative topic model. Our algorithm consists of two alternate steps: the first step extracts common topics from multiple sequences based on the adjusted time stamps provided by the second step; the second step adjusts the time stamps of the documents according to the time distribution of the topics discovered by the first step. We perform these two steps alternately and after iterations a monotonic convergence of our objective function can be guaranteed. The effectiveness and advantage of our approach were justified through extensive empirical studies on two real data sets consisting of six research paper repositories and two news article feeds, respectively.
Keywords :
data mining; text analysis; asynchronous text sequences; common topic extraction; document time stamp; generative topic model; monotonic convergence; sequence correlation; time stamped text; topic discovery; topic mining; topic sharing; topic time distribution; Data mining; Frequency synchronization; Probabilistic logic; Random variables; Semantics; Sequential analysis; Synchronization; Text mining; Temporal text mining; asynchronous sequences.; topic model;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.229
Filename :
5645618
Link To Document :
بازگشت