Title :
Self-organizing maps of massive document collections
Author_Institution :
Neural Networks Res. Centre, Helsinki Univ. of Technol., Espoo, Finland
Abstract :
Huge document collections can be organized according to textual similarities by the self-organizing map (SOM) algorithm, when statistical representations of the textual contents are used as the feature vectors of the documents. In a practical experiment we mapped 6,840,568 patent abstracts onto a 1,002,240-node SOM. For the feature vectors we selected 500-dimensional random projections of the weighted word histograms
Keywords :
full-text databases; self-organising feature maps; statistical analysis; very large databases; SOM algorithm; feature vectors; massive document collections; self-organizing map; statistical representations; textual similarities; weighted word histogram random projections; Abstracts; Arithmetic; Data analysis; Databases; Displays; Histograms; Information retrieval; Neural networks; Scalability; Self organizing feature maps;
Conference_Titel :
Neural Networks, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on
Conference_Location :
Como
Print_ISBN :
0-7695-0619-4
DOI :
10.1109/IJCNN.2000.857865