• DocumentCode
    234363
  • Title

    A new method to construct a statistical model for Arabic language

  • Author

    Sadiqui, Ali ; Zinedine, Ahmed

  • Author_Institution
    Fac. of Sci. Dhar El Mahrez, Sidi Mohamed Ben Abdellah Univ., Atlas, Morocco
  • fYear
    2014
  • fDate
    20-22 Oct. 2014
  • Firstpage
    296
  • Lastpage
    299
  • Abstract
    Language models are one of the key components in modern systems of automatic language processing. In this study we present a new approach for the realization of a statistical model of Arabic language for non-vocalized texts. This approach allows to overcome the morphological complexity of the Arabic language and to address the limitations of existing morphological analyzers. Indeed the classic approach adopted by most of the morphological analyzers, bring the word out of its context and therefore generate several options for segmentation. Our solution proposes using trellises at a time to keep the possibilities of segmentation generated by the morphological analyzer and then create the model language. In order to realize this solution, we have used these tools: AraMorph and Lattice-Tool from the box SRILM and AT & WSF. The language was estimated from a corpus composed of 100 K words and has been tested on a corpus of 7 K words. The results and analysis are presented in this document.
  • Keywords
    computational linguistics; natural language processing; statistical analysis; text analysis; Arabic language processing; language model; morphological analyzer; nonvocalized text; statistical model; Analytical models; Complexity theory; Context; Decision support systems; Arabic Laguage Model; Automatic Arabic Language processing; Non-vocalized text; Statistical Model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Technology (CIST), 2014 Third IEEE International Colloquium in
  • Conference_Location
    Tetouan
  • Print_ISBN
    978-1-4799-5978-5
  • Type

    conf

  • DOI
    10.1109/CIST.2014.7016635
  • Filename
    7016635