DocumentCode
3483407
Title
A SOM-based document clustering using phrases
Author
Bakus, J. ; Hussin, M.F. ; Kamel, M.
Author_Institution
Dept. of Syst. Design Eng, Waterloo Univ., Ont., Canada
Volume
5
fYear
2002
fDate
18-22 Nov. 2002
Firstpage
2212
Abstract
Most of the existing techniques for document clustering rely on a "bag of words" document representation. Each word in the document is considered as a separate feature, ignoring the word order. We investigate the use of phrases rather than words as document features for the document clustering. We present a phrase grammar extraction technique, and use the extracted phrases as the features in a self-organizing map based document clustering algorithm. We present clustering results using the REUTERS corpus and show an improvement in clustering performance using both entropy and F-measure evaluation measures.
Keywords
document handling; natural languages; pattern clustering; self-organising feature maps; F-measure evaluation measures; REUTERS corpus; SOM-based document clustering; bag of words; document features; document representation; phrase grammar extraction technique; self-organizing map; Automatic control; Clustering algorithms; Computer science; Data mining; Entropy; Information retrieval; Internet; Machine learning; Merging; Organizing;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Information Processing, 2002. ICONIP '02. Proceedings of the 9th International Conference on
Print_ISBN
981-04-7524-1
Type
conf
DOI
10.1109/ICONIP.2002.1201886
Filename
1201886
Link To Document