DocumentCode
170325
Title
Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus
Author
Hao Wang ; Wei Yang ; Lepage, Y.
Author_Institution
Sch. of Comput. Eng. & Technol., Shanghai Univ., Shanghai, China
fYear
2014
fDate
16-18 May 2014
Firstpage
6
Lastpage
10
Abstract
State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is the case of Chinese-Japanese. In this paper, we propose to build an extension of an initial parallel corpus in the form of quasi-parallel sentences, instead of adding new parallel sentences. The extension of the initial corpus is obtained by using monolingual analogical associations. Our experiments show that the use of such quasi-parallel corpora improves the performance of Chinese-Japanese translation systems.
Keywords
language translation; natural language processing; Chinese-Japanese phrase-based MT quality; Chinese-Japanese translation systems; extended quasiparallel corpus; monolingual analogical associations; open-source bilingual corpora; phrase-based machine translation systems; quasiparallel sentences; Computational linguistics; Educational institutions; Hidden Markov models; Mathematical model; Natural language processing; Training; Training data; analogy; machine translation; paraphrasing; quasi-parallel data;
fLanguage
English
Publisher
ieee
Conference_Titel
Progress in Informatics and Computing (PIC), 2014 International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4799-2033-4
Type
conf
DOI
10.1109/PIC.2014.6972285
Filename
6972285
Link To Document