• DocumentCode
    3309165
  • Title

    Sentence Splitting for Vietnamese-English Machine Translation

  • Author

    Hung, Bui Thanh ; Minh, Nguyen Le ; Shimazu, Akira

  • Author_Institution
    Grad. Sch. of Inf. Sci., Japan Adv. Inst. of Sci. & Technol., Ishikawa, Japan
  • fYear
    2012
  • fDate
    17-19 Aug. 2012
  • Firstpage
    156
  • Lastpage
    160
  • Abstract
    Translation quality is often disappointed when a phrase based machine translation system deals with long sentences. Because of syntactic structure discrepancy between two languages, the translation output will not preserve the same word order as the source. When a sentence is long, it should be partitioned into several clauses and the word reordering in the translation should be done within clauses, not between clauses. In this paper, a rule-based technique is proposed to split long Vietnamese sentences based on linguistic information. We use splitting boundaries for translating sentences with two type of constrains: wall and zone. This method is useful for preserving word order and improving translation quality. We describe experiments on translation from Vietnamese to English, showing an improvement BLEU and NIST score.
  • Keywords
    knowledge based systems; language translation; BLEU score improvement; NIST score improvement; Vietnamese-English machine translation; linguistic information; phrase based machine translation system; rule-based technique; sentence splitting; sentence translation; splitting boundaries; syntactic structure discrepancy; translation quality improvement; word order preservation; word reordering; Barium; Computational modeling; Context; Decoding; NIST; Pragmatics; Training; phrase-based machine translation; rule-based sentence splitting; wall and zone constraints;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering (KSE), 2012 Fourth International Conference on
  • Conference_Location
    Danang
  • Print_ISBN
    978-1-4673-2171-6
  • Type

    conf

  • DOI
    10.1109/KSE.2012.28
  • Filename
    6299413