DocumentCode :
538059
Title :
Evaluation of clustering algorithms for Polish Word Sense Disambiguation
Author :
Broda, Bartosz ; Mazur, Wojciech
Author_Institution :
Inst. of Inf., Wroclaw Univ. of Technol., Wrocław, Poland
fYear :
2010
fDate :
18-20 Oct. 2010
Firstpage :
25
Lastpage :
32
Abstract :
Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. Thus, this work focuses on evaluation of a few selected clustering algorithms in task of Word Sense Disambiguation for Polish. We tested 6 clustering algorithms (K-Means, K-Medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, Growing Hierarchical Self Organising Maps, graph-partitioning based clustering) and five weighting schemes. For agglomerative and divisive algorithm 13 criterion function were tested. The achieved results are interesting, because best clustering algorithms are close in terms of cluster purity to precision of supervised clustering algorithm on the same dataset, using the same features.
Keywords :
natural language processing; pattern clustering; text analysis; Polish; clustering algorithms; text analysis; word sense disambiguation; Algorithm design and analysis; Clustering algorithms; Context; Feature extraction; Mutual information; Neurons; Partitioning algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location :
Wisla
ISSN :
2157-5525
Print_ISBN :
978-1-4244-6432-6
Type :
conf
DOI :
10.1109/IMCSIT.2010.5679861
Filename :
5679861
Link To Document :
بازگشت