A web statistics based conflation approach to improve Arabic text retrieval

Author

Ahmed, Farag ; Nürnberger, Andreas

Author_Institution

Data & Knowledge Eng. Group, Otto-von-Guericke-Univ. of Magdeburg, Magdeburg, Germany

fYear

2011

fDate

18-21 Sept. 2011

Firstpage

3

Lastpage

9

Abstract

We present a language independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that is used to group related words based on a revised string-similarity measure. In order to detect and eliminate terms that are created by this process, but that are most likely not relevant for the query (”noisy terms”), an approach based on mutual information scores computed based on web statistical cooccurrences data is proposed. Furthermore, an evaluation of this approach is presented.

Keywords

information retrieval; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Arabic text retrieval; Web statistics; conflation approach; language independent approach; mutual information scores; pure n-gram model; string-similarity measure; unsupervised method; Computational modeling; Computer science; Dictionaries; Morphology; Mutual information; Noise measurement;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on

Conference_Location

Szczecin

Print_ISBN

978-1-4577-0041-5

Electronic_ISBN

978-83-60810-35-4

Type

conf

Filename

6078293