• DocumentCode
    3281046
  • Title

    Arabic collocations extraction using Gate

  • Author

    Zaidi, Soraya ; Laskri, M.T. ; Abdelali, Ahmed

  • Author_Institution
    Dept of Comput. Sci., Badji Mokhtar Univ., Annaba, Algeria
  • fYear
    2010
  • fDate
    3-5 Oct. 2010
  • Firstpage
    473
  • Lastpage
    475
  • Abstract
    Information extraction (IE) from corpora is texts analysis in order to extract structured information such as Named Entities (NE) which may be names of person, organization, address, date, location etc. ... GATE is a software toolkit written in Java from 1995 and widely used worldwide by many communities (scientists, companies, teachers, students) for natural language processing. We have experimented Gate for extracting terms by writing new Jape rules (Java Annotation Pattern Engine) and used them on a tagged corpus developed at Leeds University. These terms will be used in the texts-based ontologies building. In our case this ontology will be incorporated into a search engine to expand queries on the Web, in the specified domain.
  • Keywords
    Java; natural language processing; ontologies (artificial intelligence); query processing; search engines; software tools; text analysis; Arabic collocation; GATE; Jape rules; Java; Java annotation pattern engine; information extraction; named entity; natural language processing; ontology; search engine; software toolkit; tagged corpus; texts analysis; Computer architecture; Data mining; Grammar; Logic gates; Ontologies; Software; Transducers; Collocation extraction; GATE; JAPE; NLP; Ontologies; Textual engineering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine and Web Intelligence (ICMWI), 2010 International Conference on
  • Conference_Location
    Algiers
  • Print_ISBN
    978-1-4244-8608-3
  • Type

    conf

  • DOI
    10.1109/ICMWI.2010.5648038
  • Filename
    5648038