DocumentCode
2717917
Title
An interactive system for Extracting arabic lexicon from arabic newspaper text
Author
Ben Halima, Mohamed ; Alimi, Adel M.
Author_Institution
High Sch. of Nat. Eng. of Sfax, Sfax
fYear
2008
fDate
16-18 Dec. 2008
Firstpage
678
Lastpage
682
Abstract
We describe how to build a large comprehensive, integrated Arabic lexicon by automatic parsing of newspaper text. We have built a parser system to read Arabic newspaper articles, isolate the tokens from them, find the part of speech, and the features for each token. To achieve this goal we designed a set of algorithms, we generated several sets of rules, and we developed a set of techniques, and a set of components to carry out these techniques. As each sentence is processed, new words and features are added to the lexicon, so that it grows continuously as the system runs. To test the system we have used 75 articles (7 108 words) from the ASSAHAFA newspaper. The system consists of several modules: the tokenizer module to isolate the tokens, the type finder system to find the part of speech of each token, the proper noun phrase parser module to mark the proper nouns and to discover some information about them and the feature finder module to find the features of the words.
Keywords
feature extraction; grammars; interactive systems; natural language processing; text analysis; ASSAHAFA newspaper; Arabic lexicon extraction; Arabic newspaper text; automatic parsing; feature finder module; interactive system; noun phrase parser module; tokenizer module; Algorithm design and analysis; Buildings; Data mining; Educational institutions; Interactive systems; Natural language processing; Natural languages; Speech; Spine; System testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Innovations in Information Technology, 2008. IIT 2008. International Conference on
Conference_Location
Al Ain
Print_ISBN
978-1-4244-3396-4
Electronic_ISBN
978-1-4244-3397-1
Type
conf
DOI
10.1109/INNOVATIONS.2008.4781719
Filename
4781719
Link To Document