DocumentCode :
1688241
Title :
Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction
Author :
Shuo-Yiin Chang ; Meyer, Bernd T. ; Morgan, Nigel
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA, USA
fYear :
2013
Firstpage :
7063
Lastpage :
7067
Abstract :
Previous work has demonstrated that spectro-temporal Gabor features reduced word error rates for automatic speech recognition under noisy conditions. However, the features based on mel spectra were easily corrupted in the presence of noise or channel distortion. We have exploited an algorithm for power normalized cepstral coefficients (PNCCs) to generate a more robust spectro-temporal representation. We refer to it as power normalized spectrum (PNS), and to the corresponding output processed by Gabor filters and MLP nonlinear weighting as PNS-Gabor. We show that the proposed feature outperforms state-of-the-art noise-robust features, ETSI-AFE and PNCC for both Aurora2 and a noisy version of the Wall Street Jounal (WSJ) corpus. A comparison of the individual processing steps of mel spectra and PNS shows that power bias subtraction is the most important aspect of PNS-Gabor features to provide an improvement over Mel-Gabor features. The result indicates that Gabor processing compensates the limitation of PNCC for channels with frequency-shift characteristic. Overall, PNS-Gabor features decrease the word error rate by 32% relative to MFCC and 13% relative to PNCC in Aurora2. For noisy WSJ, they decrease the word error rate by 30.9% relative to MFCC and 24.7% relative to PNCC.
Keywords :
Gabor filters; cepstral analysis; speech recognition; ETSI-AFE; Gabor filters; Gabor processing; MFCC; MLP nonlinear weighting; PNCC; PNS-Gabor; WSJ; WSJ corpus; Wall Street Jounal; automatic speech recognition; frequency-shift characteristic; noise-robust speech recognition; power normalized cepstral coefficient; power normalized spectrum; power-bias subtraction; power-law nonlinearity; spectro-temporal feature; word error rate; Filter banks; Gabor filters; Mel frequency cepstral coefficient; Noise; Noise measurement; Robustness; Speech recognition; large vocabulary speech recognizion; robust speech recognition; spectro-temporal features;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639032
Filename :
6639032
Link To Document :
بازگشت