• DocumentCode
    2348540
  • Title

    Affix-augmented stem-based language model for persian

  • Author

    Faili, Heshaam ; Ravanbakhsh, Hadi

  • Author_Institution
    Dept. ECE, Univ. of Tehran, Tehran, Iran
  • fYear
    2010
  • fDate
    21-23 Aug. 2010
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Language modeling is used in many NLP applications like machine translation, POS tagging, speech recognition and information retrieval. It assigns a probability to a sequence of words. This task becomes a challenging problem for high inflectional languages. In this paper we investigate standard statistical language models on the Persian as an inflectional language. We propose two variations of morphological language models that rely on a morphological analyzer to manipulate the dataset before modeling. Then we discuss shortcoming of these models, and introduce a novel approach that exploits the structure of the language and produces more accurate. Experimental results are encouraging especially when we use n-gram models with small training dataset.
  • Keywords
    natural language processing; statistical analysis; Persian language; affix-augmented stem-based language; inflectional language; language modeling; morphological analyzer; morphological language models; natural language processing; statistical language models; Computational modeling; Data models; Mathematical model; Probability; Speech recognition; Training; Vocabulary; Persian; Tracking; language model; morphological; n-gram;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-6896-6
  • Type

    conf

  • DOI
    10.1109/NLPKE.2010.5587823
  • Filename
    5587823