Title :
Scalable Recursive Top-Down Hierarchical Clustering Approach with Implicit Model Selection for Textual Data Sets
Author :
Muhr, Markus ; Sabol, Vedran ; Granitzer, Michael
Author_Institution :
Knowledge Relationship Discovery Know-Center Graz, Graz, Austria
fDate :
Aug. 30 2010-Sept. 3 2010
Abstract :
Automatic generation of taxonomies can be useful for a wide area of applications. In our application scenario a topical hierarchy should be constructed reasonably fast from a large document collection to aid browsing of the data set. The hierarchy should also be used by the InfoSky projection algorithm to create an information landscape visualization suitable for explorative navigation of the data. We developed an algorithm that applies a scalable, recursive, top-down clustering approach to generate a dynamic concept hierarchy. The algorithm recursively applies a workflow consisting of preprocessing, clustering, cluster labeling and projection into 2D space. Besides presenting and discussing the benefits of combining hierarchy browsing with visual exploration, we also investigate the clustering results achieved on a real world data set.
Keywords :
pattern clustering; text analysis; InfoSky projection algorithm; automatic generation; document collection; information landscape visualization; scalable recursive top down hierarchical clustering approach; textual data set; visual exploration; Clustering algorithms; Encyclopedias; Heuristic algorithms; Internet; Labeling; Projection algorithms; growing k-means; information landscape; model selection; topic hierarchy; vector space model;
Conference_Titel :
Database and Expert Systems Applications (DEXA), 2010 Workshop on
Conference_Location :
Bilbao
Print_ISBN :
978-1-4244-8049-4
DOI :
10.1109/DEXA.2010.25