DocumentCode
3165052
Title
ASR-driven top-down binary mask estimation using spectral priors
Author
Hartmann, William ; Fosler-Lussier, Eric
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
fYear
2012
fDate
25-30 March 2012
Firstpage
4685
Lastpage
4688
Abstract
Typical mask estimation algorithms use low-level features to estimate the interfering noise or instantaneous SNR. We propose a simple top-down approach to mask estimation. The estimated mask is based on a specific hypothesis of the underlying speech without using information about the interference or the instantaneous SNR. In this pilot study, we observe a 9% reduction in word error over a baseline recognition system on the Aurora4 corpus, though much greater gains could theoretically be achieved through improvements to the model selection process. We also present SNR improvement results showing our method performs as well as a standard MMSE-based method, demonstrating that speech recognition can aid speech enhancement. Thus, the relationship between recognition and enhancement need not be one way: linguistic information can play a significant role in speech enhancement.
Keywords
least mean squares methods; speech enhancement; speech recognition; ASR-driven top-down binary mask estimation; Aurora4 corpus; MMSE-based method; SNR; automatic speech recognition; baseline recognition system; interfering noise estimation; linguistic information; low-level features; spectral priors; speech enhancement; Estimation; Hidden Markov models; Signal to noise ratio; Speech; Speech enhancement; Speech recognition; ideal binary mask; mask estimation; robust automatic speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location
Kyoto
ISSN
1520-6149
Print_ISBN
978-1-4673-0045-2
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2012.6288964
Filename
6288964
Link To Document