Title of article :
Automatic Construction of Persian ICT WordNet using Princeton WordNet
Author/Authors :
Mansoorizadeh, M. Computer Department - Faculty of Engineering - Bu-Ali Sina University - Hamedan, Iran , Nassiri, M Computer Department - Faculty of Engineering - Bu-Ali Sina University - Hamedan, Iran , Ahmadi Tameh, A Computer Department - Faculty of Engineering - Bu-Ali Sina University - Hamedan, Iran
Abstract :
WordNet is a large lexical database of the English language in which nouns, verbs, adjectives, and adverbs are
grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked
by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation,
information retrieval, and text translation. In this paper, we propose several automatic methods to extract
Information and Communication Technology (ICT)-related data from Princeton WordNet. We then add these
extracted data to our Persian WordNet. The advantage of automated methods is to reduce the interference of
human factors and accelerate the development of our bilingual ICT WordNet.
In our first proposed method, based on a small subset of ICT words, we use the definition of each synset to
decide whether that synset is ICT. The second mechanism is to extract the synsets that are in a semantic relation
with the ICT synsets. We also use two similarity criteria, namely LCS and S3M, to measure the similarity
between a synset definition in WordNet and definition of any word in Microsoft dictionary. Our last method is to
verify the coordinate of ICT synsets. The results obtained show that our proposed mechanisms are able to extract
the ICT data from Princeton WordNet at a good level of accuracy
Keywords :
Information and Communication Technology , Part of Speech , Semantic Relation , WordNet synset
Journal title :
Astroparticle Physics