Title :
Sailing the corpus sea: Visual exploration of news stories
Author :
Ilija Subašić;Bettina Berendt;Daniel Trümper
Author_Institution :
K.U. Leuven, Belgium
Abstract :
Rich information spaces like blogs or news are full of “stories”: sets of statements that evolve over time, made in fast-growing streams of documents. Even if one reads a specific source every day and/or subscribes to a selection of feeds, one may easily lose track; in addition, it is difficult to reconstruct a story already in the past. In this paper, we present the STORIES methods and tool for (a) learning an abstracted story representation from a collection of time-indexed documents; (b) visualizing it in a way that encourages users to interact and explore in order to discover temporal “story stages” depending on their interests; (c) supporting the search for documents and facts that pertain to the user-constructed story stages; (d) discovering the most important facts in the corpora; and (e) navigating in document space along multiple meaningful dimensions of document similarity and relatedness. This combination provides users with more control, progressing from “surfing” the Web to “sailing” selected corpora of it, semantically in story space as well as between the underlying documents. An evaluation demonstrates that machine learning and interaction lead to representations that serve to retrieve coherent and relevant document subsets and that help users learn facts about the story.
Keywords :
"Navigation","Visualization","Text mining","USA Councils","Google","Semantics"
Conference_Titel :
Intelligent Systems and Informatics (SISY), 2011 IEEE 9th International Symposium on
Print_ISBN :
978-1-4577-1975-2
DOI :
10.1109/SISY.2011.6034370