DocumentCode :
3545549
Title :
Exploiting Non-Parallel Corpora for Statistical Machine Translation
Author :
Cuong Hoang ; Le Anh Cuong ; Nguyen Phuong Thai ; Ho Tu Bao
Author_Institution :
Univ. of Eng. & Technol., Hanoi, Vietnam
fYear :
2012
fDate :
Feb. 27 2012-March 1 2012
Firstpage :
1
Lastpage :
6
Abstract :
Constructing a corpus of parallel sentence pairs is an important work in building a Statistical Machine Translation system. It impacts deeply how the quality of a Statistical Machine Translation could achieve. The more parallel sentence pairs we use to train the system, the better translation\´s quality it is. Nowadays, comparable non-parallel corpora become important resources to alleviate scarcity of parallel corpora. The problem here is how to extract parallel sentence pairs automatically but accurately from comparable non-parallel corpora, which are usually very "noisy". This paper presents how we can apply the reinforcement-learning scheme with our new proposed algorithm for detecting parallel sentence pairs. We specify that from an initial set of parallel sentences in a domain, the proposed model can extract a large number of new parallel sentence pairs from non-parallel corpora resources in different domains, concurrently increasing the system\´s translation ability gradually.
Keywords :
language translation; learning (artificial intelligence); nonparallel corpora; parallel sentence pair detection; reinforcement learning scheme; statistical machine translation system; Electronic publishing; Encyclopedias; Error analysis; Internet; Length measurement; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2012 IEEE RIVF International Conference on
Conference_Location :
Ho Chi Minh City
Print_ISBN :
978-1-4673-0307-1
Type :
conf
DOI :
10.1109/rivf.2012.6169833
Filename :
6169833
Link To Document :
بازگشت