DocumentCode
1601904
Title
Translation table compression under End-Tagged Dense Code
Author
Valencia, Tito ; Cerdeira, Lorena O. ; Iglesias, Eva L. ; Rodríguez, Francisco J.
Author_Institution
Dept. of Comput. Sci., Univ. of Vigo, Ourense, Spain
fYear
2010
Firstpage
306
Lastpage
311
Abstract
In recent years, the quality of Phrase-Based Statistical Machine Translation has increased dramatically partially due to the significant increase of available parallel corpus. If we talk in terms of space, this advantage becomes a disadvantage because the increased size of the parallel corpus implies an exponential increase in the size of the translation tables. To solve this problem, there are solutions that reduce the size of the translation tables limiting the length of sentences that are incorporated into the tables. This solution reduces the space, but at the expense of increasing the possibility of worsening the translation of long sentences. In this paper, we propose the compression of the phrase-based translation tables using End-Tagged Dense Code to codify the phrases in source and target languages. The use of this technique allows us to reduce the size of translation tables and therefore it is possible to add longer sentences.
Keywords
language translation; statistical analysis; end-tagged dense code; phrase-based statistical machine translation quality; source language; target language; translation table compression; translation table size reduction; Computer science; Decoding; Electronic mail; Encoding; Humans; Natural languages; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Universal Communication Symposium (IUCS), 2010 4th International
Conference_Location
Beijing
Print_ISBN
978-1-4244-7821-7
Type
conf
DOI
10.1109/IUCS.2010.5666012
Filename
5666012
Link To Document