DocumentCode
2774382
Title
A Self-Organizing Map Based Approach for Document Clustering and Visualization
Author
Yen, Gary G. ; Wu, Zheng
Author_Institution
Oklahoma State Univ., Stillwater
fYear
0
fDate
0-0 0
Firstpage
3279
Lastpage
3286
Abstract
In this paper, the clustering and visualization capabilities of the SOM, specifically tailored for the analysis of textual data, are reviewed and further developed. A novel clustering and visualization approach is proposed for the task of textual data mining. The proposed approach first transforms the document space into a multi-dimensional vector space by means of citation patterns. An intuitive and effective projection method, namely the ranked centroid projection (RCP), is then applied in conjunction with a dynamic SOM model, the growing hierarchical self-organizing map, which automatically produces document maps with various levels of details. The RCP is used both as a data analysis tool as well as a direct interface to the data. We also extend the RCP to address the problem of the incremental clustering of dynamic document collections. In a set of simulations, the proposed approach is applied to a synthetic data set and two real-world scientific document collections, to demonstrate its applicability.
Keywords
data analysis; data mining; data visualisation; document handling; pattern clustering; self-organising feature maps; citation patterns; document clustering; document visualization; growing hierarchical self-organizing map; multi-dimensional vector space; ranked centroid projection; scientific document collections; textual data analysis; textual data mining; Clustering algorithms; Clustering methods; Data analysis; Data engineering; Data mining; Data visualization; Displays; Neurons; Pattern recognition; Prototypes;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2006. IJCNN '06. International Joint Conference on
Conference_Location
Vancouver, BC
Print_ISBN
0-7803-9490-9
Type
conf
DOI
10.1109/IJCNN.2006.247324
Filename
1716546
Link To Document