مرکز منطقه ای اطلاع رساني علوم و فناوري - Optimization in speech-centric information processing: Criteria and techniques

DocumentCode :

3167963

Title :

Optimization in speech-centric information processing: Criteria and techniques

Author :

He, Xiaodong ; Deng, Li

Author_Institution :

Microsoft Res., Redmond, WA, USA

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

5241

Lastpage :

5244

Abstract :

Automatic speech recognition (ASR) is an enabling technology for a wide range of information processing applications including speech translation, voice search (i.e., information retrieval with speech input), and conversational understanding. In these speech-centric applications, the output of ASR as “noisy” text is fed into down-stream processing systems to accomplish the designated tasks of translation, information retrieval, or natural language understanding, etc. In conventional applications, the ASR model as a sub-system is usually trained without considering the down-stream systems. This often leads to sub-optimal end-to-end performance. In this paper, we propose a unifying end-to-end optimization framework in which the model parameters in all sub-systems including ASR are learned by Extended Baum-Welch (EBW) algorithms via optimizing the criteria directly tied to the end-to-end performance measure. We demonstrate the effectiveness of the proposed approach on a speech translation task using the spoken language translation benchmark test of IWSLT. Our experimental results show that the proposed method leads to significant improvement of translation quality over the conventional techniques based on separate modular sub-system design. We also analyze the EBW-based optimization algorithms employed in our work and discuss its relationship with other popular optimization techniques.

Keywords :

information retrieval; language translation; natural language processing; optimisation; speech recognition; ASR model; EBW-based optimization algorithms; IWSLT; automatic speech recognition; conversational understanding; down-stream processing systems; end-to-end optimization; end-to-end performance measure; extended Baum-Welch algorithms; information retrieval; natural language understanding; noisy text; speech translation; speech translation task; speech-centric information processing; spoken language translation benchmark test; suboptimal end-to-end performance; voice search; Computational modeling; Information processing; Linear programming; Optimization; Speech; Speech recognition; Training; EBW algorithm; Speech translation; end-to-end optimization criteria; speech recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6289102

Filename :

6289102

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3167963