Title :
Using NLP to efficiently visualize text collections with SOMs
Author :
Henderson, James ; Merlo, Paola ; Petroff, Ivan ; Schneider, Gerold
Author_Institution :
Geneva Univ., Switzerland
Abstract :
Self-Organizing Maps (SOMs) are a good method to cluster and visualize large collections of text documents, but they are computationally expensive. In this paper, we investigate ways to use natural language parsing of the texts to remove unimportant terms from the usual bag-of-words representation, to improve efficiency. We find that reducing the document representation to just the heads of noun and verb phrases does indeed reduce the heavy computational cost without degrading the quality of the map, while more severe reductions which focus on subject and object noun phrases degrade map quality.
Keywords :
data mining; self-organising feature maps; text analysis; bag-of-words representation; document representation; natural language parsing; self-organizing maps; text documents; Clustering algorithms; Computational efficiency; Computer displays; Degradation; Encoding; Information retrieval; Natural languages; Self organizing feature maps; Sparse matrices; Visualization;
Conference_Titel :
Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on
Print_ISBN :
0-7695-1668-8
DOI :
10.1109/DEXA.2002.1045900