• DocumentCode
    1601904
  • Title

    Translation table compression under End-Tagged Dense Code

  • Author

    Valencia, Tito ; Cerdeira, Lorena O. ; Iglesias, Eva L. ; Rodríguez, Francisco J.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Vigo, Ourense, Spain
  • fYear
    2010
  • Firstpage
    306
  • Lastpage
    311
  • Abstract
    In recent years, the quality of Phrase-Based Statistical Machine Translation has increased dramatically partially due to the significant increase of available parallel corpus. If we talk in terms of space, this advantage becomes a disadvantage because the increased size of the parallel corpus implies an exponential increase in the size of the translation tables. To solve this problem, there are solutions that reduce the size of the translation tables limiting the length of sentences that are incorporated into the tables. This solution reduces the space, but at the expense of increasing the possibility of worsening the translation of long sentences. In this paper, we propose the compression of the phrase-based translation tables using End-Tagged Dense Code to codify the phrases in source and target languages. The use of this technique allows us to reduce the size of translation tables and therefore it is possible to add longer sentences.
  • Keywords
    language translation; statistical analysis; end-tagged dense code; phrase-based statistical machine translation quality; source language; target language; translation table compression; translation table size reduction; Computer science; Decoding; Electronic mail; Encoding; Humans; Natural languages; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Universal Communication Symposium (IUCS), 2010 4th International
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-7821-7
  • Type

    conf

  • DOI
    10.1109/IUCS.2010.5666012
  • Filename
    5666012