• DocumentCode
    2258784
  • Title

    A hybrid approach for Arabic multi-word term extraction

  • Author

    Bounhas, Ibrahim ; Slimani, Yahya

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Tunis, Tunis, Tunisia
  • fYear
    2009
  • fDate
    24-27 Sept. 2009
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Building a domain model from a specialized corpus requires identifying candidate terms. It also includes identifying semantic relations between terms. Once this model is constructed it can be used for many tasks of information retrieval. In this process, multi-word terms have a great importance. In the one hand they constitute domain relevant candidate terms. On the other hand syntactic relations that link their constituents can be used to infer semantic relations between terms. In this paper we propose to extract mutli-word terms from Arabic specialized corpora. The proposed approach uses linguistic rules based on morphological features and POS (Part Of Speech) tags to parse documents and retrieve candidate terms. Statistical measures are used to deal with ambiguities generated by the linguistic tools and to rank candidate terms according to their relevance. We present experiments on a corpus from the environment domain. We report high quality results that are confirm the targets set for the precision metric.
  • Keywords
    information retrieval; natural language processing; Arabic multiword term extraction; Arabic specialized corpora; domain model; information retrieval; linguistic rules; multiword terms; part of speech tags; semantic relation; syntactic relation; Bellows; Books; Buildings; Computer science; Data mining; Information retrieval; Ontologies; Speech; Terminology; Arabic language processing; morpho-syntactic parsing; multi-word terms; terminology extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-4538-7
  • Electronic_ISBN
    978-1-4244-4540-0
  • Type

    conf

  • DOI
    10.1109/NLPKE.2009.5313728
  • Filename
    5313728