DocumentCode :
1504784
Title :
Exemplar-based Visualization of Large Document Corpus (InfoVis2009-1115)
Author :
Chen, Yanhua ; Wang, Lijun ; Dong, Ming ; Hua, Jing
Author_Institution :
Dept. of Comput. Sci., Wayne State Univ., Detroit, MI, USA
Volume :
15
Issue :
6
fYear :
2009
Firstpage :
1161
Lastpage :
1168
Abstract :
With the rapid growth of the World Wide Web and electronic information services, text corpus is becoming available online at an incredible rate. By displaying text data in a logical layout (e.g., color graphs), text visualization presents a direct way to observe the documents as well as understand the relationship between them. In this paper, we propose a novel technique, Exemplar-based visualization (EV), to visualize an extremely large text corpus. Capitalizing on recent advances in matrix approximation and decomposition, EV presents a probabilistic multidimensional projection model in the low-rank text subspace with a sound objective function. The probability of each document proportion to the topics is obtained through iterative optimization and embedded to a low dimensional space using parameter embedding. By selecting the representative exemplars, we obtain a compact approximation of the data. This makes the visualization highly efficient and flexible. In addition, the selected exemplars neatly summarize the entire data set and greatly reduce the cognitive overload in the visualization, leading to an easier interpretation of large text corpus. Empirically, we demonstrate the superior performance of EV through extensive experiments performed on the publicly available text data sets.
Keywords :
biology computing; data visualisation; iterative methods; optimisation; exemplar-based visualization; iterative optimization; large document corpus; matrix approximation; parameter embedding; text corpus; text visualization; Computer science; Data visualization; Drugs; Indexing; Large-scale systems; Matrix decomposition; Multidimensional systems; Principal component analysis; Text mining; Web sites; Exemplar; large-scale document visualization; multidimensional projection.;
fLanguage :
English
Journal_Title :
Visualization and Computer Graphics, IEEE Transactions on
Publisher :
ieee
ISSN :
1077-2626
Type :
jour
DOI :
10.1109/TVCG.2009.140
Filename :
5290725
Link To Document :
بازگشت