DocumentCode :
3483407
Title :
A SOM-based document clustering using phrases
Author :
Bakus, J. ; Hussin, M.F. ; Kamel, M.
Author_Institution :
Dept. of Syst. Design Eng, Waterloo Univ., Ont., Canada
Volume :
5
fYear :
2002
fDate :
18-22 Nov. 2002
Firstpage :
2212
Abstract :
Most of the existing techniques for document clustering rely on a "bag of words" document representation. Each word in the document is considered as a separate feature, ignoring the word order. We investigate the use of phrases rather than words as document features for the document clustering. We present a phrase grammar extraction technique, and use the extracted phrases as the features in a self-organizing map based document clustering algorithm. We present clustering results using the REUTERS corpus and show an improvement in clustering performance using both entropy and F-measure evaluation measures.
Keywords :
document handling; natural languages; pattern clustering; self-organising feature maps; F-measure evaluation measures; REUTERS corpus; SOM-based document clustering; bag of words; document features; document representation; phrase grammar extraction technique; self-organizing map; Automatic control; Clustering algorithms; Computer science; Data mining; Entropy; Information retrieval; Internet; Machine learning; Merging; Organizing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN :
981-04-7524-1
Type :
conf
DOI :
10.1109/ICONIP.2002.1201886
Filename :
1201886
Link To Document :
بازگشت