DocumentCode :
1954577
Title :
Sentence Similarity-Based Source Context Modelling in PBSMT
Author :
Haque, Rejwanul ; Naskar, Sudip Kumar ; Way, Andy ; Costa-jussà, Marta R. ; Banchs, Rafael E.
Author_Institution :
Sch. of Comput., Dublin City Univ., Dublin, Ireland
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
257
Lastpage :
260
Abstract :
Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of super tags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with super tag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a super tag-based feature.
Keywords :
language translation; natural language processing; statistical analysis; English-to-Chinese translation task; PBSMT; bilingual training sentences; context-rich word-sense disambiguation techniques; grammatical structure; lexical syntactic descriptions; long-range word-to-word dependencies; n-gram word sequences; phrase-based statistical machine translation model; sentence length; sentence similarity-based source context modelling; sentence-similarity features; super tag-based features; target phrase selection; translation hypotheses; Computational modeling; Context; Context modeling; Feature extraction; Grammar; Syntactics; Training; sentence similarity; source context information; statistical machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.45
Filename :
5681568
Link To Document :
بازگشت