Title :
A Self-Organizing Map Based Approach for Document Clustering and Visualization
Author :
Yen, Gary G. ; Wu, Zheng
Author_Institution :
Oklahoma State Univ., Stillwater
Abstract :
In this paper, the clustering and visualization capabilities of the SOM, specifically tailored for the analysis of textual data, are reviewed and further developed. A novel clustering and visualization approach is proposed for the task of textual data mining. The proposed approach first transforms the document space into a multi-dimensional vector space by means of citation patterns. An intuitive and effective projection method, namely the ranked centroid projection (RCP), is then applied in conjunction with a dynamic SOM model, the growing hierarchical self-organizing map, which automatically produces document maps with various levels of details. The RCP is used both as a data analysis tool as well as a direct interface to the data. We also extend the RCP to address the problem of the incremental clustering of dynamic document collections. In a set of simulations, the proposed approach is applied to a synthetic data set and two real-world scientific document collections, to demonstrate its applicability.
Keywords :
data analysis; data mining; data visualisation; document handling; pattern clustering; self-organising feature maps; citation patterns; document clustering; document visualization; growing hierarchical self-organizing map; multi-dimensional vector space; ranked centroid projection; scientific document collections; textual data analysis; textual data mining; Clustering algorithms; Clustering methods; Data analysis; Data engineering; Data mining; Data visualization; Displays; Neurons; Pattern recognition; Prototypes;
Conference_Titel :
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
0-7803-9490-9
DOI :
10.1109/IJCNN.2006.247324