DocumentCode :
3161929
Title :
A novel approach to soft-mask estimation and Log-Spectral enhancement for robust speech recognition
Author :
Van Hout, Julien ; Alwan, Abeer
Author_Institution :
Electr. Eng. Dept., Univ. of California, Los Angeles, CA, USA
fYear :
2012
fDate :
25-30 March 2012
Firstpage :
4105
Lastpage :
4108
Abstract :
This paper describes a technique for enhancing the Mel-filtered log spectra of noisy speech, with application to noise robust speech recognition. We first compute an SNR-based soft-decision mask in the Mel-spectral domain as an indicator of speech presence. Then, we exploit the known time-frequency correlation of speech by treating this mask as an image, and performing median filtering and blurring to remove the outliers and to smooth the decision regions. This mask constitutes a set of multiplicative coefficients (ranging in [0,1]) that are used to discard the unreliable parts of the Mel-filtered log-spectrum of noisy speech. Finally, we apply Log-Spectral Flooring [1] on the liftered spectra of both clean and noisy speech so as to match their respective dynamic ranges and to emphasize the information in the spectral peaks. The noisy MFCCs computed on these modified log-spectra show an increased similarity with their corresponding clean MFCCs. Evaluation on the Aurora-2 corpus shows that the proposed approach competes with state-of-the-art front-ends, like ETSI-AFE, MVA or PNCC.
Keywords :
estimation theory; masks; median filters; speech recognition; time-frequency analysis; Aurora-2 corpus; Mel-filtered log spectra; Mel-spectral domain; clean speech; liftered spectra; log-spectral enhancement; log-spectral flooring; median blurring; median filtering; multiplicative coefficients; noisy speech; robust speech recognition; soft-decision mask; soft-mask estimation; time-frequency correlation; Estimation; Hidden Markov models; Noise; Noise measurement; Speech; Speech enhancement; Speech recognition; Feature Extraction; Mask Estimation; Median Filtering; Speech Enhancement; Speech Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on
Conference_Location :
Kyoto
ISSN :
1520-6149
Print_ISBN :
978-1-4673-0045-2
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2012.6288821
Filename :
6288821
Link To Document :
بازگشت