• DocumentCode
    1051873
  • Title

    HMM Word and Phrase Alignment for Statistical Machine Translation

  • Author

    Deng, Yonggang ; Byrne, William

  • Author_Institution
    IBM, Yorktown Heights
  • Volume
    16
  • Issue
    3
  • fYear
    2008
  • fDate
    3/1/2008 12:00:00 AM
  • Firstpage
    494
  • Lastpage
    507
  • Abstract
    Estimation and alignment procedures for word and phrase alignment hidden Markov models (HMMs) are developed for the alignment of parallel text. The development of these models is motivated by an analysis of the desirable features of IBM Model 4, one of the original and most effective models for word alignment. These models are formulated to capture the desirable aspects of Model 4 in an HMM alignment formalism. Alignment behavior is analyzed and compared to human-generated reference alignments, and the ability of these models to capture different types of alignment phenomena is evaluated. In analyzing alignment performance, Chinese-English word alignments are shown to be comparable to those of IBM Model 4 even when models are trained over large parallel texts. In translation performance, phrase-based statistical machine translation systems based on these HMM alignments can equal and exceed systems based on Model 4 alignments, and this is shown in Arabic-English and Chinese-English translation. These alignment models can also be used to generate posterior statistics over collections of parallel text, and this is used to refine and extend phrase translation tables with a resulting improvement in translation quality.
  • Keywords
    hidden Markov models; language translation; natural language processing; text analysis; word processing; Arabic-English translation; Chinese-English word alignments; Model 4 alignments; hidden Markov models; parallel text; phrase alignment; statistical machine translation; word alignment; Hidden Markov model; phrase alignment; statistical machine translation; word alignment;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1558-7916
  • Type

    jour

  • DOI
    10.1109/TASL.2008.916056
  • Filename
    4443885