• DocumentCode
    170325
  • Title

    Improved Chinese-Japanese phrase-based MT quality using an extended quasi-parallel corpus

  • Author

    Hao Wang ; Wei Yang ; Lepage, Y.

  • Author_Institution
    Sch. of Comput. Eng. & Technol., Shanghai Univ., Shanghai, China
  • fYear
    2014
  • fDate
    16-18 May 2014
  • Firstpage
    6
  • Lastpage
    10
  • Abstract
    State-of-the-art phrase-based machine translation (MT) systems usually demand large parallel corpora in the step of training. The quality and the quantity of the training data exert a direct influence on the performance of such translation systems. The lack of open-source bilingual corpora for a particular language pair results in lower translation scores reported for such a language pair. This is the case of Chinese-Japanese. In this paper, we propose to build an extension of an initial parallel corpus in the form of quasi-parallel sentences, instead of adding new parallel sentences. The extension of the initial corpus is obtained by using monolingual analogical associations. Our experiments show that the use of such quasi-parallel corpora improves the performance of Chinese-Japanese translation systems.
  • Keywords
    language translation; natural language processing; Chinese-Japanese phrase-based MT quality; Chinese-Japanese translation systems; extended quasiparallel corpus; monolingual analogical associations; open-source bilingual corpora; phrase-based machine translation systems; quasiparallel sentences; Computational linguistics; Educational institutions; Hidden Markov models; Mathematical model; Natural language processing; Training; Training data; analogy; machine translation; paraphrasing; quasi-parallel data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Progress in Informatics and Computing (PIC), 2014 International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4799-2033-4
  • Type

    conf

  • DOI
    10.1109/PIC.2014.6972285
  • Filename
    6972285