مرکز منطقه ای اطلاع رساني علوم و فناوري - A Cross-Lingual Word Kernel SVM for SMT Training Corpus Selection

DocumentCode :

495784

Title :

A Cross-Lingual Word Kernel SVM for SMT Training Corpus Selection

Author :

Han, Xiwu

Author_Institution :

Sch. of Comput. Sci. & Technol., Heilongjiang Univ., Harbin, China

Volume :

fYear :

2009

fDate :

March 31 2009-April 2 2009

Firstpage :

626

Lastpage :

630

Abstract :

Instead of collecting more and more parallel training corpora, this paper aims to improve SMT performance by exploiting full potential of the existing parallel corpora. Inspired by the mechanism of string subsequence and word sequence kernels, we first propose a cross-lingual word kernel (CWK) SVM to classify SMT training corpus as literal translation and free translation, and then use these data to train SMT models. One experiment indicates that larger training corpus do not always lead to higher decoding performance when the incremental data are not literal translation. And another experiment shows that properly enlarging the contribution of literal translation can improve SMT performance significantly.

Keywords :

computational linguistics; language translation; support vector machines; SMT training corpus; cross-lingual word kernel SVM; decoding; free translation; literal translation; statistical machine translation; string subsequence; word sequence kernels; Computer science; Decoding; Frequency estimation; Humans; Kernel; Probability; Support vector machine classification; Support vector machines; Surface-mount technology; Training data; Cross-lingual; SMT; Word Kernel SVM;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Science and Information Engineering, 2009 WRI World Congress on

Conference_Location :

Los Angeles, CA

Print_ISBN :

978-0-7695-3507-4

Type :

conf

DOI :

10.1109/CSIE.2009.278

Filename :

5171414

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=495784