مرکز منطقه ای اطلاع رساني علوم و فناوري - Sentence Similarity-Based Source Context Modelling in PBSMT

DocumentCode :

1954577

Title :

Sentence Similarity-Based Source Context Modelling in PBSMT

Author :

Haque, Rejwanul ; Naskar, Sudip Kumar ; Way, Andy ; Costa-jussà, Marta R. ; Banchs, Rafael E.

Author_Institution :

Sch. of Comput., Dublin City Univ., Dublin, Ireland

fYear :

2010

fDate :

28-30 Dec. 2010

Firstpage :

257

Lastpage :

260

Abstract :

Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target phrase selection. Among the various types of lexical and syntactic features, lexical syntactic descriptions in the form of super tags that preserve long-range word-to-word dependencies in a sentence have proven to be effective. These rich contextual features are able to disambiguate a source phrase, on the basis of the local syntactic behaviour of that phrase. In addition to local contextual information, global contextual information such as the grammatical structure of a sentence, sentence length and n-gram word sequences could provide additional important information to enhance this phrase-sense disambiguation. In this work, we explore various sentence similarity features by measuring similarity between a source sentence to be translated with the source-side of the bilingual training sentences and integrate them directly into the PBSMT model. We performed experiments on an English-to-Chinese translation task by applying sentence-similarity features both individually, and collaboratively with super tag-based features. We evaluate the performance of our approach and report a statistically significant relative improvement of 5.25% BLEU score when adding a sentence-similarity feature together with a super tag-based feature.

Keywords :

language translation; natural language processing; statistical analysis; English-to-Chinese translation task; PBSMT; bilingual training sentences; context-rich word-sense disambiguation techniques; grammatical structure; lexical syntactic descriptions; long-range word-to-word dependencies; n-gram word sequences; phrase-based statistical machine translation model; sentence length; sentence similarity-based source context modelling; sentence-similarity features; super tag-based features; target phrase selection; translation hypotheses; Computational modeling; Context; Context modeling; Feature extraction; Grammar; Syntactics; Training; sentence similarity; source context information; statistical machine translation;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Asian Language Processing (IALP), 2010 International Conference on

Conference_Location :

Harbin

Print_ISBN :

978-1-4244-9063-9

Type :

conf

DOI :

10.1109/IALP.2010.45

Filename :

5681568

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1954577