A Suffix Based Part-of-Speech Tagger for Turkish

Author

Dincer, Taner ; Karaoglan, Bahar ; Kisla, Tarik

Author_Institution

Mugla Univ., Mugla

fYear

2008

fDate

7-9 April 2008

Firstpage

680

Lastpage

685

Abstract

In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best- case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.

Keywords

hidden Markov models; information retrieval; natural languages; vocabulary; Turkish language; hidden Markov model; information retrieval; suffix based stochastic part-of-speech tagger; vocabulary; Hidden Markov models; Indexing; Information retrieval; Information technology; Natural languages; Speech; Statistics; Stochastic processes; Tagging; Vocabulary; Agglutinative languages; Closed vocabulary; Information Retrieval.; Part-Of-Speech Tagging;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on

Conference_Location

Las Vegas, NV

Print_ISBN

0-7695-3099-0

Type

conf

DOI

10.1109/ITNG.2008.103

Filename

4492560