Title :
Serendip: Topic model-driven visual exploration of text corpora
Author :
Alexander, Eric ; Kohlmann, Joe ; Valenza, Robin ; Witmore, Michael ; Gleicher, Michael
Author_Institution :
Dept. of Comput. Sci., Univ. of Wisconsin-Madison, Madison, WI, USA
Abstract :
Exploration and discovery in a large text corpus requires investigation at multiple levels of abstraction, from a zoomed-out view of the entire corpus down to close-ups of individual passages and words. At each of these levels, there is a wealth of information that can inform inquiry - from statistical models, to metadata, to the researcher´s own knowledge and expertise. Joining all this information together can be a challenge, and there are issues of scale to be combatted along the way. In this paper, we describe an approach to text analysis that addresses these challenges of scale and multiple information sources, using probabilistic topic models to structure exploration through multiple levels of inquiry in a way that fosters serendipitous discovery. In implementing this approach into a tool called Serendip, we incorporate topic model data and metadata into a highly reorderable matrix to expose corpus level trends; extend encodings of tagged text to illustrate probabilistic information at a passage level; and introduce a technique for visualizing individual word rankings, along with interaction techniques and new statistical methods to create links between different levels and information types. We describe example uses from both the humanities and visualization research that illustrate the benefits of our approach.
Keywords :
data visualisation; matrix algebra; meta data; probability; statistical analysis; text analysis; interaction technique; metadata; multiple information source; probabilistic information; probabilistic topic model; reorderable matrix; serendip; statistical method; statistical model; structure exploration; text analysis; text corpora; topic model data; topic model-driven visual exploration; visualization research; word ranking; Adaptation models; Data models; Data visualization; Market research; Measurement; Probabilistic logic; Vectors; Text visualization; topic modeling;
Conference_Titel :
Visual Analytics Science and Technology (VAST), 2014 IEEE Conference on
Conference_Location :
Paris
DOI :
10.1109/VAST.2014.7042493