DocumentCode :
2717917
Title :
An interactive system for Extracting arabic lexicon from arabic newspaper text
Author :
Ben Halima, Mohamed ; Alimi, Adel M.
Author_Institution :
High Sch. of Nat. Eng. of Sfax, Sfax
fYear :
2008
fDate :
16-18 Dec. 2008
Firstpage :
678
Lastpage :
682
Abstract :
We describe how to build a large comprehensive, integrated Arabic lexicon by automatic parsing of newspaper text. We have built a parser system to read Arabic newspaper articles, isolate the tokens from them, find the part of speech, and the features for each token. To achieve this goal we designed a set of algorithms, we generated several sets of rules, and we developed a set of techniques, and a set of components to carry out these techniques. As each sentence is processed, new words and features are added to the lexicon, so that it grows continuously as the system runs. To test the system we have used 75 articles (7 108 words) from the ASSAHAFA newspaper. The system consists of several modules: the tokenizer module to isolate the tokens, the type finder system to find the part of speech of each token, the proper noun phrase parser module to mark the proper nouns and to discover some information about them and the feature finder module to find the features of the words.
Keywords :
feature extraction; grammars; interactive systems; natural language processing; text analysis; ASSAHAFA newspaper; Arabic lexicon extraction; Arabic newspaper text; automatic parsing; feature finder module; interactive system; noun phrase parser module; tokenizer module; Algorithm design and analysis; Buildings; Data mining; Educational institutions; Interactive systems; Natural language processing; Natural languages; Speech; Spine; System testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovations in Information Technology, 2008. IIT 2008. International Conference on
Conference_Location :
Al Ain
Print_ISBN :
978-1-4244-3396-4
Electronic_ISBN :
978-1-4244-3397-1
Type :
conf
DOI :
10.1109/INNOVATIONS.2008.4781719
Filename :
4781719
Link To Document :
بازگشت