DocumentCode
555974
Title
A web statistics based conflation approach to improve Arabic text retrieval
Author
Ahmed, Farag ; Nürnberger, Andreas
Author_Institution
Data & Knowledge Eng. Group, Otto-von-Guericke-Univ. of Magdeburg, Magdeburg, Germany
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
3
Lastpage
9
Abstract
We present a language independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that is used to group related words based on a revised string-similarity measure. In order to detect and eliminate terms that are created by this process, but that are most likely not relevant for the query (”noisy terms”), an approach based on mutual information scores computed based on web statistical cooccurrences data is proposed. Furthermore, an evaluation of this approach is presented.
Keywords
information retrieval; learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Arabic text retrieval; Web statistics; conflation approach; language independent approach; mutual information scores; pure n-gram model; string-similarity measure; unsupervised method; Computational modeling; Computer science; Dictionaries; Morphology; Mutual information; Noise measurement;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Information Systems (FedCSIS), 2011 Federated Conference on
Conference_Location
Szczecin
Print_ISBN
978-1-4577-0041-5
Electronic_ISBN
978-83-60810-35-4
Type
conf
Filename
6078293
Link To Document