DocumentCode :
3167963
Title :
Optimization in speech-centric information processing: Criteria and techniques
Author :
He, Xiaodong ; Deng, Li
Author_Institution :
Microsoft Res., Redmond, WA, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
5241
Lastpage :
5244
Abstract :
Automatic speech recognition (ASR) is an enabling technology for a wide range of information processing applications including speech translation, voice search (i.e., information retrieval with speech input), and conversational understanding. In these speech-centric applications, the output of ASR as “noisy” text is fed into down-stream processing systems to accomplish the designated tasks of translation, information retrieval, or natural language understanding, etc. In conventional applications, the ASR model as a sub-system is usually trained without considering the down-stream systems. This often leads to sub-optimal end-to-end performance. In this paper, we propose a unifying end-to-end optimization framework in which the model parameters in all sub-systems including ASR are learned by Extended Baum-Welch (EBW) algorithms via optimizing the criteria directly tied to the end-to-end performance measure. We demonstrate the effectiveness of the proposed approach on a speech translation task using the spoken language translation benchmark test of IWSLT. Our experimental results show that the proposed method leads to significant improvement of translation quality over the conventional techniques based on separate modular sub-system design. We also analyze the EBW-based optimization algorithms employed in our work and discuss its relationship with other popular optimization techniques.
Keywords :
information retrieval; language translation; natural language processing; optimisation; speech recognition; ASR model; EBW-based optimization algorithms; IWSLT; automatic speech recognition; conversational understanding; down-stream processing systems; end-to-end optimization; end-to-end performance measure; extended Baum-Welch algorithms; information retrieval; natural language understanding; noisy text; speech translation; speech translation task; speech-centric information processing; spoken language translation benchmark test; suboptimal end-to-end performance; voice search; Computational modeling; Information processing; Linear programming; Optimization; Speech; Speech recognition; Training; EBW algorithm; Speech translation; end-to-end optimization criteria; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6289102
Filename :
6289102
Link To Document :
بازگشت