DocumentCode :
3317926
Title :
Using the Web corpus to translate the queries in cross-lingual information retrieval
Author :
Zhang, Junlin ; Le Sun ; Min, Jinming
Author_Institution :
Open Syst. & Chinese Inf. Process. Center, Chinese Acad. of Sci., Beijing, China
fYear :
2005
fDate :
30 Oct.-1 Nov. 2005
Firstpage :
493
Lastpage :
498
Abstract :
Accurate cross-language information retrieval requires that query terms be correctly translated. In this paper, we propose a new method for Web corpus based query translation, which contains two steps: (1) translation candidate extraction and (2) translation selection. In translation candidate extraction, we use the search engine to find out the corpus data in the target language on the Web by submitting the query in source language. The candidate translations are expected to be both in the title and query-biased summary of searched document. Then we find the intersection substrings of different title pairs (or title-summary pairs) to fix down the possible translation. In translation selection, we determine the possible translation(s) from the candidates by combining substring frequency, inverse translation frequency and top result preferred factor to design the ranking function. Experimental results indicate that the top 3 inclusion rate of translation is 75.57% and our method is also very effective in CLIR task.
Keywords :
Internet; language translation; linguistics; natural languages; query formulation; search engines; string matching; Web corpus; cross-lingual information retrieval; inverse translation frequency; query translation candidate extraction; query translation selection; query-biased summary; search engine; substring frequency; title-summary pairs; Data mining; Diversity reception; Frequency; Information processing; Information retrieval; Internet; Natural languages; Open systems; Search engines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
Type :
conf
DOI :
10.1109/NLPKE.2005.1598787
Filename :
1598787
Link To Document :
بازگشت