مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving robustness of deep neural networks via spectral masking for automatic speech recognition

DocumentCode :

672366

Title :

Improving robustness of deep neural networks via spectral masking for automatic speech recognition

Author :

Bo Li ; Khe Chai Sim

Author_Institution :

Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

279

Lastpage :

284

Abstract :

The performance of human listeners degrades rather slowly compared to machines in noisy environments. This has been attributed to the ability of performing auditory scene analysis which separates the speech prior to recognition. In this work, we investigate two mask estimation approaches, namely the state dependent and the deep neural network (DNN) based estimations, to separate speech from noises for improving DNN acoustic models´ noise robustness. The second approach has been experimentally shown to outperform the first one. Due to the stereo data based training and ill-defined masks for speech with channel distortions, both methods do not generalize well to unseen conditions and fail to beat the performance of the multi-style trained baseline system. However, the model trained on masked features demonstrates strong complementariness to the baseline model. The simple average of the two system´s posteriors yields word error rates of 4.4% on Aurora2 and 12.3% on Aurora4.

Keywords :

feature extraction; hidden Markov models; neural nets; source separation; spectral analysis; speech intelligibility; speech recognition; Aurora2; Aurora4; DNN acoustic model noise robustness; DNN based estimation; auditory scene analysis; automatic speech recognition; channel distortion; deep neural networks; human listener performance degradation; hybrid DNN-HMM based ASR system; hybrid DNN-hidden Markov model based ASR system; mask estimation approach; masked features; multistyle trained baseline system; noisy environments; spectral masking; speech separation; state dependent estimation; stereo data based training; word error rate; Estimation; Hidden Markov models; Interpolation; Noise; Noise measurement; Speech; Speech recognition; Deep Neural Network; Noise Robustness; Spectral Masking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707743

Filename :

6707743

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672366