DocumentCode :
2346702
Title :
Sense-based clustering of Polish nouns in the extraction of semantic relatedness
Author :
Broda, Bartosz ; Piasecki, Maciej ; Szpakowicz, Stanislaw
Author_Institution :
Inst. of Appl. Inf., Wroclaw Univ. of Technol., Wroclaw
fYear :
2008
fDate :
20-22 Oct. 2008
Firstpage :
83
Lastpage :
89
Abstract :
The construction of a wordnet from scratch requires intelligent software support. An accurate measure of semantic relatedness can be used to extract groups of semantically close words from a corpus. Such groups help a lexicographer make decisions about synset membership and synset placement in the network. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on the largest Polish corpus available. The evaluation by way of a plWordNet-based synonymy test used Polish WordNet, a resource still under development. The results are consistent with a few benchmarks, but not encouraging enough yet to make a wordnet writer´s support tool immediately useful.
Keywords :
natural language processing; software engineering; Polish WordNet; Polish nouns; intelligent software support; lexicographer; plWordNet-based synonymy test; semantic relatedness extraction; sense-based clustering; synset membership; synset placement; wordnet; Benchmark testing; Clustering algorithms; Computer science; Data mining; Helium; Informatics; Information technology; Large-scale systems; Mutual information; Software algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology, 2008. IMCSIT 2008. International Multiconference on
Conference_Location :
Wisia
Print_ISBN :
978-83-60810-14-9
Type :
conf
DOI :
10.1109/IMCSIT.2008.4747222
Filename :
4747222
Link To Document :
بازگشت