Title :
Exploring term proximity statistic for Arabic information retrieval
Author :
El Mahdaouy, Abdelkader ; Gaussier, Eric ; El Alaoui, Said Ouatik
Author_Institution :
FSDM - LIM, Univ. USMBA, Fez, Morocco
Abstract :
Term proximity statistic, which consists of rewarding documents where the matched query terms occur in close proximity, has proved its effectiveness in document retrieval performance. However, this field of research remains unexplored for Arabic information retrieval (IR) despite of the non diacritical text and the rich morphology of Arabic language which complicate the retrieval process. In this paper, we propose to boost the Arabic information retrieval performance by using proximity information. Our aim is to evaluate proximity features for Arabic language in order to go beyond the bag-of-words, and to overcome the problems related to text preprocessing. We investigate several state-of-the-art proximity models, including the Cross-Term model (CRTER), the Markov Random Field model (MRF), the divergence from randomness (DFR) multinomial model, and the Positional Language Model (PLM). For preprocessing purposes, Khoja and light stemming algorithms have been used. Experiments are performed on the Arabic TREC-2001/2002 collection using Terrier IR platform. The obtained results show significant improvements by using proximity based-models for Arabic IR.
Keywords :
Markov processes; natural language processing; query processing; random processes; text analysis; Arabic IR; Arabic information retrieval performance; Arabic language; CRTER; DFR multinomial model; Khoja algorithms; MRF; Markov random field model; PLM; bag-of-words; cross-term model; divergence from randomness multinomial model; document retrieval performance; light stemming algorithms; matched query terms; nondiacritical text; positional language model; proximity features; proximity information; proximity models; retrieval process; rewarding documents; term proximity statistic; text preprocessing; Computational modeling; Electronic mail; Indexing; Information retrieval; Kernel; Markov random fields; Probabilistic logic;
Conference_Titel :
Information Science and Technology (CIST), 2014 Third IEEE International Colloquium in
Conference_Location :
Tetouan
Print_ISBN :
978-1-4799-5978-5
DOI :
10.1109/CIST.2014.7016631