DocumentCode :
394272
Title :
Perceptually non-uniform spectral compression for noisy speech recognition
Author :
Chu, K.K. ; Leung, S.H. ; Yip, C.S.
Author_Institution :
Dept. of Electron. Eng., City Univ. of Hong Kong, China
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
Loudness is a function of sound pressure level. The power law used in approximating the loudness function has an exponent that depends on the bandwidth of the sound signal. This exponent decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition for FFT-based methods. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or perceptually based linear prediction (PLP). Further to this, those sound segments or frames having broadband characteristics like those of fricatives are given larger compression as well. The frame energy is used as the index to determine the degree of compression. By using this new scheme of non-uniform spectral compression, significant improvement in recognition accuracy is obtained, especially in very low SNR, under white noise environment.
Keywords :
acoustic intensity; data compression; fast Fourier transforms; feature extraction; hearing; loudness; spectral analysis; speech coding; speech recognition; white noise; FFT-based methods; SNR; broadband uniform-exciting noise; broadband-like high frequency bands; energy compression; feature extraction method; fricatives; hearing psychoacoustics; loudness function approximation; narrow band tone; noisy speech recognition; perceptually nonuniform spectral compression; power law; power spectrum; recognition accuracy; robust speech recognition; sound frames; sound pressure level; sound segments; sound signal bandwidth; white noise environment; Acoustic noise; Auditory system; Bandwidth; Cepstral analysis; Feature extraction; Frequency; Narrowband; Noise robustness; Psychoacoustics; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198803
Filename :
1198803
Link To Document :
بازگشت