DocumentCode
1930802
Title
Automatic extraction of Arabic multi-word terms
Author
Al Khatib, K. ; Badarneh, Amer
Author_Institution
Dept. of Comput. Sci., Jordan Univ. of Sci. & Technol., Irbid, Jordan
fYear
2010
fDate
18-20 Oct. 2010
Firstpage
411
Lastpage
418
Abstract
Whereas a wide range of methods has been conducted to English multi-word terms (MWTs) extraction, relatively few studied have been applied to Arabic MWTs extraction. In this paper, we present an efficient approach for automatic extraction of Arabic MWTs. The approach relies on two main filtering steps: the linguistic filter, where simple part of speech (POS) tagger is used to extract candidate MWTs matching given syntactic patterns, and the statistical filter, where two statistical methods (log-likelihood ratio and C-value) are used to rank candidate MWTs. Many types of variations (e.g. inflectional variants) are taken into consideration to improve the quality of extracted MWTs. We obtained promising results in both coverage and precision of MWTs extraction in our experiments based on environment domain corpus.
Keywords
feature extraction; information filtering; natural language processing; statistical analysis; MWT; POS; arabic multiword terms; automatic extraction; linguistic filter; log likelihood ratio; multiword terms; part of speech; statistical filter; syntactic patterns; Barium; Computer science; Information technology; Iron; Syntactics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location
Wisla
ISSN
2157-5525
Print_ISBN
978-1-4244-6432-6
Type
conf
DOI
10.1109/IMCSIT.2010.5679929
Filename
5679929
Link To Document