• DocumentCode
    172461
  • Title

    Automatic Arabic term extraction from special domain corpora

  • Author

    Al-Thubaity, Abdul Mohsen ; Khan, Mahrukh ; Alotaibi, Saad ; Alonazi, Badriyya

  • Author_Institution
    King Abdulaziz City for Sci. & Technol., Riyadh, Saudi Arabia
  • fYear
    2014
  • fDate
    20-22 Oct. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.
  • Keywords
    dictionaries; linguistics; natural language processing; ontologies (artificial intelligence); Arabic special domain corpora; Arabic university publications; Web sites; applied Arabic linguistics; automatic Arabic term extraction; automatic term extraction; digital libraries; frequent words; functional words; heuristics; machine-readable Arabic special domain text; n-grams; special domain dictionary building; special domain ontology construction; terminology resource creation; Accuracy; Buildings; Data mining; Dictionaries; Pragmatics; Semantics; Arabic term extraction; special domain corpora; term frequency-inverse document frequency; terminology resources;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2014 International Conference on
  • Conference_Location
    Kuching
  • Type

    conf

  • DOI
    10.1109/IALP.2014.6973468
  • Filename
    6973468