• DocumentCode
    3288607
  • Title

    A Suffix Based Part-of-Speech Tagger for Turkish

  • Author

    Dincer, Taner ; Karaoglan, Bahar ; Kisla, Tarik

  • Author_Institution
    Mugla Univ., Mugla
  • fYear
    2008
  • fDate
    7-9 April 2008
  • Firstpage
    680
  • Lastpage
    685
  • Abstract
    In this paper, we present a stochastic part-of-speech tagger for Turkish. The tagger is primarily developed for information retrieval purposes, but it can as well serve as a light-weight PoS tagger for other purposes. The tagger uses a well-established Hidden Markov model of the language with a closed lexicon that consists of fixed number of letters from the word endings. We have considered seven different lengths of word endings against 30 training corpus sizes. Best- case accuracy obtained is 90.2% with 5 characters. The main contribution of this paper is to present a way of constructing a closed vocabulary for part-of-speech tagging effort that can be useful for highly inflected languages like Turkish, Finnish, Hungarian, Estonian, and Czech.
  • Keywords
    hidden Markov models; information retrieval; natural languages; vocabulary; Turkish language; hidden Markov model; information retrieval; suffix based stochastic part-of-speech tagger; vocabulary; Hidden Markov models; Indexing; Information retrieval; Information technology; Natural languages; Speech; Statistics; Stochastic processes; Tagging; Vocabulary; Agglutinative languages; Closed vocabulary; Information Retrieval.; Part-Of-Speech Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology: New Generations, 2008. ITNG 2008. Fifth International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    0-7695-3099-0
  • Type

    conf

  • DOI
    10.1109/ITNG.2008.103
  • Filename
    4492560