مرکز منطقه ای اطلاع رساني علوم و فناوري - Articulatory trajectories for large-vocabulary speech recognition

DocumentCode :

1688718

Title :

Articulatory trajectories for large-vocabulary speech recognition

Author :

Mitra, Ved ; Wen Wang ; Stolcke, Andreas ; Hosung Nam ; Richey, Colleen ; Jiahong Yuan ; Liberman, Mark

Author_Institution :

Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA

fYear :

2013

Firstpage :

7145

Lastpage :

7149

Abstract :

Studies have demonstrated that articulatory information can model speech variability effectively and can potentially help to improve speech recognition performance. Most of the studies involving articulatory information have focused on effectively estimating them from speech, and few studies have actually used such features for speech recognition. Speech recognition studies using articulatory information have been mostly confined to digit or medium vocabulary speech recognition, and efforts to incorporate them into large vocabulary systems have been limited. We present a neural network model to estimate articulatory trajectories from speech signals where the model was trained using synthetic speech signals generated by Haskins Laboratories´ task-dynamic model of speech production. The trained model was applied to natural speech, and the estimated articulatory trajectories obtained from the models were used in conjunction with standard cepstral features to train acoustic models for large-vocabulary recognition systems. Two different large-vocabulary English datasets were used in the experiments reported here. Results indicate that employing articulatory information improves speech recognition performance not only under clean conditions but also under noisy background conditions. Perceptually motivated robust features were also explored in this study and the best performance was obtained when systems based on articulatory, standard cepstral and perceptually motivated feature were all combined.

Keywords :

cepstral analysis; neural nets; speech recognition; telecommunication computing; vocabulary; English datasets; Haskins Laboratories; acoustic models; articulatory information; articulatory trajectories; digit vocabulary speech recognition; large-vocabulary speech recognition; medium vocabulary speech recognition; natural speech; neural network; noisy background conditions; robust features; speech production; standard cepstral features; synthetic speech signals; task-dynamic model; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; Trajectory; articulatory trajectories; artificial neural networks; large vocabulary speech recognition; vocal tract variables;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639049

Filename :

6639049

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1688718