Building up Lexical Sample Dataset for Turkish Word Sense Disambiguation

Author

Bahar İlgen;Eşref Adali;A. Cüneyd Tantuğ

Author_Institution

Computer Engineering Department, Istanbul Kü

fYear

2012

fDate

7/1/2012 12:00:00 AM

Firstpage

Lastpage

Abstract

Word Sense Disambiguation (WSD) has become even more important research area in recent years with the widespread usage of Natural Language Processing (NLP) applications. WSD task has two variants: “Lexical Sample” and “All Words” approaches. Lexical Sample approach disambiguates the occurrences of a small sample of target words that were previously selected, while in the latter all the words in a piece of text are disambiguated. In the scope of this work, a Lexical Sample Dataset for Turkish has been prepared. As a first step, highly ambiguous words in Turkish have been selected. Collection of text samples for chosen words has been completed. Five taggers have annotated the word senses. This paper summarizes the step-by-step building-up process of a Lexical Sample Dataset in Turkish and presents the results of some experiments on it.

Keywords

"Dictionaries","Accuracy","Natural language processing","Humans","Buildings","Educational institutions","Reliability"

Publisher

ieee

Conference_Titel

Innovations in Intelligent Systems and Applications (INISTA), 2012 International Symposium on

Print_ISBN

978-1-4673-1446-6

Type

conf

DOI

10.1109/INISTA.2012.6247026

Filename

6247026

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3647688