DocumentCode :
3166369
Title :
New methods and evaluation experiments on translating TED talks in the IWSLT benchmark
Author :
Axelrod, Amittai ; He, Xiaodong ; Deng, Li ; Acero, Alex ; Hwang, Mei-Yuh
Author_Institution :
Microsoft Res., Redmond, WA, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4945
Lastpage :
4948
Abstract :
The IWSLT benchmark task is an annual evaluation campaign on spoken language translation held by the International Workshop on Spoken Language Processing (IWSLT). The task is to translate TED talks (www.ted.com). This task presents two unique challenges: Firstly, the underlying topic switches sharply from talk to talk, and each one contains only tens to hundreds of utterances. The translation system therefore needs to adapt to the current topic quickly and dynamically. Secondly, unlike other machine translation benchmark tasks, only a very small relevant parallel corpus (transcripts of TED talks) is available. Therefore, it is necessary to perform accurate translation model estimation with limited data. In this paper, we present our recent progress and two new methods on the IWSLT TED talk translation task from Chinese into English. In particular, to address the first problem, we use unsupervised topic modeling to select additional topic-dependent parallel data from a globally irrelevant corpus. These additional data slices can then be used to build an unsupervised topic-adapted machine translation system. For the second problem, we develop a discriminative training method to estimate the translation models more accurately. Our experimental evaluation results show that both methods improve the translation quality over a state-of-the-art baseline.
Keywords :
language translation; natural language processing; speech processing; IWSLT TED talk translation task; IWSLT benchmark task; International Workshop on Spoken Language Processing; data slices; discriminative training method; machine translation benchmark tasks; spoken language translation system; topic-dependent parallel data; unsupervised topic modeling; unsupervised topic-adapted machine translation system; Adaptation models; Benchmark testing; Data models; Estimation; Helium; Hidden Markov models; Training; IWSLT; discriminative training; spoken language translation; topic adaptation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6289029
Filename :
6289029
Link To Document :
بازگشت