DocumentCode :
3121178
Title :
Learning translation models from the Web
Author :
Nie, Jian-Yun ; Chen, Jiang
Author_Institution :
Dept. d´´Inf. et de Recherche Oper., Montreal Univ., Que., Canada
Volume :
4
fYear :
2002
fDate :
4-5 Nov. 2002
Firstpage :
1999
Abstract :
Query translation is-the key problem in cross-language information retrieval. It can be made by exploiting a large set of parallel texts. We describe a mining system that automatically discovers parallel Web pages on the Web. This system exploits the existing search engines, and the common characteristics in the organization of Web pages. Several large text corpora have been constructed using this system. Our experiments show that query translation using the obtained corpora can be as good as those by high-quality machine translation systems. This study shows the feasibility of building automatically a query translation system for all the active languages on the Web.
Keywords :
Web sites; data mining; dynamic programming; language translation; learning (artificial intelligence); probability; query processing; search engines; cross-language information retrieval; mining system; parallel Web pages; parallel texts; query translation; search engines; translation models; Buildings; Costs; Cybernetics; Dictionaries; Information retrieval; Machine learning; Natural languages; Search engines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN :
0-7803-7508-4
Type :
conf
DOI :
10.1109/ICMLC.2002.1175387
Filename :
1175387
Link To Document :
بازگشت