Title :
A combined statistical query term disambiguation in cross-language information retrieval
Author :
Sadat, Fatiha ; Maeda, Akira ; Yoshikawa, Masatoshi ; Uemura, Shunsuke
Author_Institution :
Graduate Sch. of Inf. Sci., Nara Inst. of Sci. & Technol. (NAIST), Japan
Abstract :
The diversity of information sources and the explosive growth of the Internet worldwide are compelling evidence of a need for information retrieval that can cross language boundaries. Ambiguity from failure to translate queries is one of the major causes for large drops in effectiveness below monolingual performance, for the dictionary-based method in Cross-Language Information Retrieval. In this paper, we focus on the query translation and disambiguation, to improve the effectiveness of an information retrieval and to dramatically reduce errors such an approach normally makes. A combined statistical disambiguation method both before and after translation is proposed, to avoid the problem of wrong selection of target translations. We tested the effectiveness of the proposed disambiguation method, by an application to French-English Information Retrieval. Evaluations using TREC data collection proved a great effectiveness of the proposed disambiguation method.
Keywords :
computational linguistics; language translation; natural languages; query processing; statistical analysis; French-English information retrieval; Internet; TREC data collection; combined statistical query term disambiguation; cross-language information retrieval; dictionary-based method; query disambiguation; query translation; Availability; Computer science; Dictionaries; Electronic mail; Explosives; Informatics; Information retrieval; Information science; Internet; Testing;
Conference_Titel :
Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on
Print_ISBN :
0-7695-1668-8
DOI :
10.1109/DEXA.2002.1045907