Title :
ASR-driven top-down binary mask estimation using spectral priors
Author :
Hartmann, William ; Fosler-Lussier, Eric
Author_Institution :
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Abstract :
Typical mask estimation algorithms use low-level features to estimate the interfering noise or instantaneous SNR. We propose a simple top-down approach to mask estimation. The estimated mask is based on a specific hypothesis of the underlying speech without using information about the interference or the instantaneous SNR. In this pilot study, we observe a 9% reduction in word error over a baseline recognition system on the Aurora4 corpus, though much greater gains could theoretically be achieved through improvements to the model selection process. We also present SNR improvement results showing our method performs as well as a standard MMSE-based method, demonstrating that speech recognition can aid speech enhancement. Thus, the relationship between recognition and enhancement need not be one way: linguistic information can play a significant role in speech enhancement.
Keywords :
least mean squares methods; speech enhancement; speech recognition; ASR-driven top-down binary mask estimation; Aurora4 corpus; MMSE-based method; SNR; automatic speech recognition; baseline recognition system; interfering noise estimation; linguistic information; low-level features; spectral priors; speech enhancement; Estimation; Hidden Markov models; Signal to noise ratio; Speech; Speech enhancement; Speech recognition; ideal binary mask; mask estimation; robust automatic speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2012.6288964