Title :
Translation table compression under End-Tagged Dense Code
Author :
Valencia, Tito ; Cerdeira, Lorena O. ; Iglesias, Eva L. ; Rodríguez, Francisco J.
Author_Institution :
Dept. of Comput. Sci., Univ. of Vigo, Ourense, Spain
Abstract :
In recent years, the quality of Phrase-Based Statistical Machine Translation has increased dramatically partially due to the significant increase of available parallel corpus. If we talk in terms of space, this advantage becomes a disadvantage because the increased size of the parallel corpus implies an exponential increase in the size of the translation tables. To solve this problem, there are solutions that reduce the size of the translation tables limiting the length of sentences that are incorporated into the tables. This solution reduces the space, but at the expense of increasing the possibility of worsening the translation of long sentences. In this paper, we propose the compression of the phrase-based translation tables using End-Tagged Dense Code to codify the phrases in source and target languages. The use of this technique allows us to reduce the size of translation tables and therefore it is possible to add longer sentences.
Keywords :
language translation; statistical analysis; end-tagged dense code; phrase-based statistical machine translation quality; source language; target language; translation table compression; translation table size reduction; Computer science; Decoding; Electronic mail; Encoding; Humans; Natural languages; Vocabulary;
Conference_Titel :
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-7821-7
DOI :
10.1109/IUCS.2010.5666012