DocumentCode
2789454
Title
Automatic disfluency removal for improving spoken language translation
Author
Wang, Wen ; Tur, Gokhan ; Zheng, Jing ; Ayan, Necip Fazil
Author_Institution
Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
fYear
2010
fDate
14-19 March 2010
Firstpage
5214
Lastpage
5217
Abstract
Statistical machine translation (SMT) systems for spoken languages suffer from conversational speech phenomena, in particular, the presence of speech disfluencies. We examine the impact of disfluencies from broadcast conversation data on our hierarchical phrase-based SMT system and implement automatic disfluency removal approaches for cleansing the MT input. We evaluate the efficacy of proposed approaches and investigate the impact of disfluency removal on SMT performance across different disfluency types. We show that for translating Mandarin broadcast conversational transcripts into English, our automatic disfluency removal approaches could produce significant improvement in BLEU and TER.
Keywords
language translation; speech processing; automatic disfluency removal; conversational speech phenomena; speech disfluencies; spoken language translation; statistical machine translation systems; Automatic speech recognition; Decoding; Error analysis; Hidden Markov models; Natural languages; Radio broadcasting; Speech analysis; Surface-mount technology; System testing; TV broadcasting; automatic disfluency detection; broadcast conversation; spoken language translation; statistical machine translation;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location
Dallas, TX
ISSN
1520-6149
Print_ISBN
978-1-4244-4295-9
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2010.5494999
Filename
5494999
Link To Document