DocumentCode :
967713
Title :
Using Incremental PLSI for Threshold-Resilient Online Event Analysis
Author :
Chou, Tzu-Chuan ; Chen, Meng Chang
Author_Institution :
Acad. Sinica Taiwan, Nankang
Volume :
20
Issue :
3
fYear :
2008
fDate :
3/1/2008 12:00:00 AM
Firstpage :
289
Lastpage :
299
Abstract :
The goal of online event analysis is to detect events and track their associated documents in real time from a continuous stream of documents generated by multiple information sources. Unlike traditional text categorization methods, event analysis approaches consider the temporal relations among documents. However, such methods suffer from the threshold-dependency problem, so they only perform well for a narrow range of thresholds. In addition, if the contents of a document stream change, the optimal threshold (that is, the threshold that yields the best performance) often changes as well. In this paper, we propose a threshold-resilient online algorithm, called the incremental probabilistic latent semantic indexing (IPLSI) algorithm, which alleviates the threshold-dependency problem and simultaneously maintains the continuity of the latent semantics to better capture the story line development of events. The IPLSI algorithm is theoretically sound and empirically efficient and effective for event analysis. The results of the performance evaluation performed on the topic detection and tracking (TDT)-4 corpus show that the algorithm reduces the cost of event analysis by as much as 15 percent ~ 20 percent and increases the acceptable threshold range by 200 percent to 300 percent over the baseline.
Keywords :
document handling; electronic publishing; statistical analysis; associated documents; document stream; incremental PLSI; incremental probabilistic latent semantic indexing; multiple information sources; threshold-dependency problem; threshold-resilient online event analysis; Algorithm design and analysis; Change detection algorithms; Clustering algorithms; Event detection; Indexing; Information analysis; Performance analysis; Performance evaluation; Text analysis; Text categorization; Clustering; Knowledge life cycles; Probabilistic algorithms; Web mining;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2007.190702
Filename :
4378375
Link To Document :
بازگشت