Title :
A Dynamic Graph Model for Analyzing Streaming News Documents
Author :
Hohman, Elizabeth Leeds ; Marchette, David J.
Author_Institution :
Naval Surface Warfare Center, Dahlgren, VA
fDate :
March 1 2007-April 5 2007
Abstract :
In this paper we consider the problem of analyzing streaming documents, in particular streaming news stories. The system is designed to extract statistics from the document, incorporate these into a graph-based model, and discard the document to reduce storage requirements. The model is defined in terms of a changing lexicon and sub-lexicons at each node in the graph, with the nodes of the graph representing topics. An approximation to the TFIDF term weighting is introduced. We illustrate the methodology on a dataset of news articles, and discuss the dynamic nature of the model
Keywords :
document handling; graph theory; TFIDF term weighting; analyzing streaming news documents; dynamic graph model; lexicon; Computational intelligence; Data mining; Electronic mail; Feeds; Frequency; Keyboards; Statistics; Text categorization; Text processing; Traffic control;
Conference_Titel :
Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0705-2
DOI :
10.1109/CIDM.2007.368911