DocumentCode :
2789454
Title :
Automatic disfluency removal for improving spoken language translation
Author :
Wang, Wen ; Tur, Gokhan ; Zheng, Jing ; Ayan, Necip Fazil
Author_Institution :
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
fYear :
2010
fDate :
14-19 March 2010
Firstpage :
5214
Lastpage :
5217
Abstract :
Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrase-based SMT system and implement automatic disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of disfluency removal on SMT performance across different disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic disfluency removal approaches could produce significant improvement in BLEU and TER.
Keywords :
language translation; speech processing; automatic disfluency removal; conversational speech phenomena; speech disfluencies; spoken language translation; statistical machine translation systems; Automatic speech recognition; Decoding; Error analysis; Hidden Markov models; Natural languages; Radio broadcasting; Speech analysis; Surface-mount technology; System testing; TV broadcasting; automatic disfluency detection; broadcast conversation; spoken language translation; statistical machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
ISSN :
1520-6149
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2010.5494999
Filename :
5494999
Link To Document :
بازگشت