• DocumentCode
    721391
  • Title

    Parsing modern standard Arabic using Treebank resources

  • Author

    Al-Emran, Mostafa ; Zaza, Sarween ; Shaalan, Khaled

  • Author_Institution
    Al Buraimi Univ. Coll., Al Buraimi, Oman
  • fYear
    2015
  • fDate
    17-19 May 2015
  • Firstpage
    80
  • Lastpage
    83
  • Abstract
    A Treebank is a linguistic resource that is composed of a large collection of manually annotated and verified syntactically analyzed sentences. Statistical Natural Language Processing (NLP) approaches have been successful in using these annotations for developing basic NLP tasks such as tokenization, diacritization, part-of-speech tagging, parsing, among others. In this paper, we address the problem of exploiting Treebank resources for statistical parsing of Modern Standard Arabic (MSA) sentences. Statistical parsing is significant for NLP tasks that use parsed text as an input such as Information Retrieval, and Machine Translation. We conducted an experiment on Pen Arabic Treebank (PATB) and the parsing performance obtained in terms of Precision, Recall, and F-measure was 82.4%, 86.6%, 84.4%, respectively.
  • Keywords
    computational linguistics; grammars; natural language processing; statistical analysis; MSA sentence statistical parsing; NLP tasks; PATB; Pen Arabic Treebank; Treebank resources; diacritization; information retrieval; linguistic resource; machine translation; manually annotated sentences; modern standard Arabic parsing; part-of-speech tagging; statistical natural language processing; tokenization; verified syntactically analyzed sentences; Gold; Natural language processing; Pragmatics; Standards; Syntactics; Training; Arabic; Statistical Parsing; Treebank;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technology Research (ICTRC), 2015 International Conference on
  • Conference_Location
    Abu Dhabi
  • Type

    conf

  • DOI
    10.1109/ICTRC.2015.7156426
  • Filename
    7156426