DocumentCode :
2430052
Title :
Using NLP to efficiently visualize text collections with SOMs
Author :
Henderson, James ; Merlo, Paola ; Petroff, Ivan ; Schneider, Gerold
Author_Institution :
Geneva Univ., Switzerland
fYear :
2002
fDate :
2-6 Sept. 2002
Firstpage :
210
Lastpage :
214
Abstract :
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.
Keywords :
data mining; self-organising feature maps; text analysis; bag-of-words representation; document representation; natural language parsing; self-organizing maps; text documents; Clustering algorithms; Computational efficiency; Computer displays; Degradation; Encoding; Information retrieval; Natural languages; Self organizing feature maps; Sparse matrices; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on
ISSN :
1529-4188
Print_ISBN :
0-7695-1668-8
Type :
conf
DOI :
10.1109/DEXA.2002.1045900
Filename :
1045900
Link To Document :
بازگشت