Title :
Visualizing stylistic variation
Author :
Karlgren, Jussi ; Straszheim, Troy
Author_Institution :
Courant Inst. of Math. Sci., New York Univ., NY, USA
Abstract :
Texts vary not only by topic, but by style; indeed, often the variation between texts `about the same thing´ can be just as noticeable as the variation between texts `about different things´. Some facets of this variation are quite easy to detect, and quite predictable when applied to categorization of texts by genre, functional style, or-tentatively-quality. Making use of such variation in a retrieval context is quite straightforward in principle; our work consists of an implementation of a visualization tool for document databases. The issues addressed include: (1) choice of stylistic items to investigate; (2) composition of dimensions of variation; and (3) judicious naming of dimensions for presentation. We use use principal components analysis to combine our quite large number of stylistic items into two most significant dimensions of variation and plot the document space under consideration into a plane. This space can be used as a first or last filter in an information retrieval task. The composition of the most significant dimensions is naturally corpus dependent, as is the naming of them: our work is tested on Internet and TREC data
Keywords :
Internet; data visualisation; information retrieval; word processing; Internet; TREC data; corpus dependent; document databases; document space; functional style; information retrieval task; judicious naming; principal components analysis; retrieval context; stylistic items; stylistic variation visualization; text categorization; visualization tool; Content based retrieval; Humans; Information retrieval; Information technology; Internet; Principal component analysis; Testing; Visual databases; Visualization; Writing;
Conference_Titel :
System Sciences, 1997, Proceedings of the Thirtieth Hawaii International Conference on
Conference_Location :
Wailea, HI
Print_ISBN :
0-8186-7743-0
DOI :
10.1109/HICSS.1997.665488