Title :
Ideal ratio mask estimation using deep neural networks for robust speech recognition
Author :
Narayanan, Arun ; DeLiang Wang
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.
Keywords :
feature extraction; frequency-domain analysis; neural nets; speech enhancement; speech recognition; ASR model; Aurora-4 robust ASR corpus; IRM estimation; Mel frequency domain; cepstral feature extraction; deep neural networks; feature enhancement algorithm; ideal binary mask estimation; ideal ratio mask estimation; instantaneous SNR estimation performance; mean absolute error; multicondition training data; noisy Mel spectrogram; robust ASR; robust automatic speech recognition; robust speech recognition; time-frequency unit level features; word error rates; Estimation; Feature extraction; Robustness; Signal to noise ratio; Speech; Speech recognition; Aurora-4; Computational Auditory Scene Analysis; instantaneous SNR; noise robust ASR;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
DOI :
10.1109/ICASSP.2013.6639038