Title :
A web statistics based conflation approach to improve Arabic text retrieval
Author :
Ahmed, Farag ; Nürnberger, Andreas
Author_Institution :
Data & Knowledge Eng. Group, Otto-von-Guericke-Univ. of Magdeburg, Magdeburg, Germany
Abstract :
We present a language independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that is used to group related words based on a revised string-similarity measure. In order to detect and eliminate terms that are created by this process, but that are most likely not relevant for the query (”noisy terms”), an approach based on mutual information scores computed based on web statistical cooccurrences data is proposed. Furthermore, an evaluation of this approach is presented.
Keywords :
information retrieval; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Arabic text retrieval; Web statistics; conflation approach; language independent approach; mutual information scores; pure n-gram model; string-similarity measure; unsupervised method; Computational modeling; Computer science; Dictionaries; Morphology; Mutual information; Noise measurement;
Conference_Titel :
Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on
Conference_Location :
Szczecin
Print_ISBN :
978-1-4577-0041-5
Electronic_ISBN :
978-83-60810-35-4