• DocumentCode
    2717917
  • Title

    An interactive system for Extracting arabic lexicon from arabic newspaper text

  • Author

    Ben Halima, Mohamed ; Alimi, Adel M.

  • Author_Institution
    High Sch. of Nat. Eng. of Sfax, Sfax
  • fYear
    2008
  • fDate
    16-18 Dec. 2008
  • Firstpage
    678
  • Lastpage
    682
  • Abstract
    We describe how to build a large comprehensive, integrated Arabic lexicon by automatic parsing of newspaper text. We have built a parser system to read Arabic newspaper articles, isolate the tokens from them, find the part of speech, and the features for each token. To achieve this goal we designed a set of algorithms, we generated several sets of rules, and we developed a set of techniques, and a set of components to carry out these techniques. As each sentence is processed, new words and features are added to the lexicon, so that it grows continuously as the system runs. To test the system we have used 75 articles (7 108 words) from the ASSAHAFA newspaper. The system consists of several modules: the tokenizer module to isolate the tokens, the type finder system to find the part of speech of each token, the proper noun phrase parser module to mark the proper nouns and to discover some information about them and the feature finder module to find the features of the words.
  • Keywords
    feature extraction; grammars; interactive systems; natural language processing; text analysis; ASSAHAFA newspaper; Arabic lexicon extraction; Arabic newspaper text; automatic parsing; feature finder module; interactive system; noun phrase parser module; tokenizer module; Algorithm design and analysis; Buildings; Data mining; Educational institutions; Interactive systems; Natural language processing; Natural languages; Speech; Spine; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovations in Information Technology, 2008. IIT 2008. International Conference on
  • Conference_Location
    Al Ain
  • Print_ISBN
    978-1-4244-3396-4
  • Electronic_ISBN
    978-1-4244-3397-1
  • Type

    conf

  • DOI
    10.1109/INNOVATIONS.2008.4781719
  • Filename
    4781719