Title :
A CLIR-oriented OOV translation mining method from bilingual webpages
Author :
Liu, Lan ; Ge, Yun-dong ; Yan, Zhen-xiang ; Yao, Jian-min
Author_Institution :
Inf. Suzhou Key Lab. on Inf. Process., Soochow Univ., Suzhou, China
Abstract :
Translating unknown terms is a major bottleneck for cross-language IR. An effective solution to relevant webpage detection, translation extraction with correct boundaries, and candidate translation ranking is proposed. Topic word translations are used to expand the source query and collect bilingual search engine snippets. Then an improved Frequency Change Measurement method is used to extract valid candidates from noisy, small bilingual corpora. To choose the translation, frequency-distance, surface patterns and phonetic features are used to pick out the correct translation. Experimental results show an impressive performance for unknown term translation mining.
Keywords :
Internet; data mining; natural language processing; CLIR oriented OOV translation mining method; bilingual search engine snippets; bilingual webpages; frequency change measurement method; translation extraction; translation ranking; webpage detection; word translations; Cybernetics; Data mining; Frequency measurement; Machine learning; Noise; Pattern matching; Search engines; Cross-language IR; Search engine; Web mining;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
Print_ISBN :
978-1-4577-0305-8
DOI :
10.1109/ICMLC.2011.6016958