DocumentCode :
1135674
Title :
Improving Structural Statistical Machine Translation for Sign Language With Small Corpus Using Thematic Role Templates as Translation Memory
Author :
Su, Hung-Yu ; Wu, Chung-Hsien
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Volume :
17
Issue :
7
fYear :
2009
Firstpage :
1305
Lastpage :
1315
Abstract :
This paper presents a structural statistical machine translation (SSMT) model to deal with the data sparseness problem that occurs as a result of the necessarily small corpus to translate Chinese into Taiwanese Sign Language (TSL). A parallel bilingual corpus was developed, and linguistic information from the Sinica Treebank is adopted for Chinese sentence analysis. The synchronous context free grammar (SCFG) was adopted to convert a Chinese structure to the corresponding TSL structure and then extract a translation memory which comprises the thematic relations between the grammar rules of both structures. In structural translation, the statistical MT (SMT) approach was used to align the thematic roles in the grammar rules and the translation memory provides the reference templates for TSL structure translation. Finally, the agreement information for TSL verbs was labeled for enriching the expressiveness of the translated TSL sequence. Several experiments were conducted to evaluate the translation performance and the communication effectiveness for the deaf. The evaluation results demonstrate that the proposed approach outperforms a baseline statistical MT system using the same small corpus, especially for the translation of long sentences.
Keywords :
computational linguistics; context-free grammars; language translation; natural language processing; statistical analysis; Chinese sentence analysis; Sinica Treebank; Taiwanese sign language; data sparseness problem; linguistic information; parallel bilingual corpus; small corpus; structural statistical machine translation model; synchronous context free grammar; thematic role template; translation memory; Auditory system; Avatars; Communication effectiveness; Data mining; Deafness; Employment; Handicapped aids; Information analysis; Natural languages; Surface-mount technology; Sign language; small corpus; structural statistical machine translation (SSMT); thematic relation;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2009.2016234
Filename :
5165114
Link To Document :
بازگشت