• DocumentCode
    131296
  • Title

    Improving Persian POS tagging using the maximum entropy model

  • Author

    Kardan, Ahmad ; Imani, Maryam Bahojb

  • Author_Institution
    Dept. of Comput. Eng. & Inf. Technol., Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2014
  • fDate
    4-6 Feb. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Part of Speech (POS) tagging is one of the fundamental steps in various speech and text processing applications. POS tagging is the process of assigning the words in input sentences with their categories according to their contextual and grammatical properties. In addition to the general POS tagging difficulties such as the disambiguation of multi-category words and unknown words, the Persian language, unlike the English language, is a free order language and it has its own characteristics. These challenges can greatly affect the quality of the part-of-speech tagging process. An efficient POS tagging process has been developed for some languages, especially for the English language, but just a few researches have been done on the Persian language. To address these issues and achieve high POS tagging accuracy, we chose features which can show the important characteristics of words in a sentence, as well as maximum entropy as a machine learning classifier. Experimental results show that the proposed Persian POS tagging system outperforms the other state-of-the-art Persian taggers.
  • Keywords
    learning (artificial intelligence); maximum entropy methods; natural language processing; pattern classification; speech processing; text analysis; English language; Persian POS tagging improvement; Persian language; contextual properties; free-order language; grammatical properties; input sentences; machine learning classifier; maximum entropy model; multicategory word disambiguation; part-of-speech tagging; part-of-speech tagging process quality; speech processing; text processing; unknown word characteristics; word assignment; word categories; Accuracy; Entropy; Feature extraction; Hidden Markov models; Speech; Speech processing; Tagging; Maximum Entropy; Natural Language Processing; Part of Speech Tagging;Persian Part of Speech Tagging;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems (ICIS), 2014 Iranian Conference on
  • Conference_Location
    Bam
  • Print_ISBN
    978-1-4799-3350-1
  • Type

    conf

  • DOI
    10.1109/IranianCIS.2014.6802567
  • Filename
    6802567