Title :
Relative N-gram signatures: Document visualization at the level of character N-grams
Author :
Jankowska, M. ; Keselj, V. ; Milios, Evangelos
Abstract :
The Common N-Gram (CNG) classifier is a text classification algorithm based on the comparison of frequencies of character n-grams (strings of characters of length n) that are the most common in the considered documents and classes of documents. We present a text analytic visualization system that employs the CNG approach for text classification and uses the differences in frequency values of common n-grams in order to visually compare documents at the sub-word level. The visualization method provides both an insight into n-gram characteristics of documents or classes of documents and a visual interpretation of the workings of the CNG classifier.
Keywords :
data visualisation; pattern classification; text analysis; CNG classifier; character n-grams; common n-grams classifier; document visualization; relative n-gram signatures; text analytic visualization system; text classification algorithm; Color; Context; Data visualization; Electronic mail; Frequency measurement; Visual analytics; Visual analytics; text classification; visual text analysis;
Conference_Titel :
Visual Analytics Science and Technology (VAST), 2012 IEEE Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
978-1-4673-4752-5
DOI :
10.1109/VAST.2012.6400484