Title :
Feature extraction based on perceptually non-uniform spectral compression for speech recognition
Author :
Chu, K.K. ; Leung, S.H.
Author_Institution :
Dept. of Electron. Eng., City Univ. of Hong Kong, China
Abstract :
The power law of hearing used in approximating the loudness function has an exponent that decreases from about 0.3 for a narrow band tone to 0.23 for a broadband uniform-exciting noise. Exploiting this property of psychoacoustics of hearing, this paper proposes a new feature extraction method for robust speech recognition. In the method, larger energy compression is applied to broadband-like high frequency bands of the power spectrum of each frame, instead of a fixed compression for all frequency bands as in root cepstral analysis or PLP analysis. In addition, those sound segments having broadband characteristics are given larger compression as well, using frame energy as the measuring index. The scatter of feature vectors and the class discrimination of our new method for phonemes are compared against traditional feature extraction techniques. It is shown that the feature derived from the new scheme has smaller variation and better class discrimination than the traditional features. Significant improvement in recognition accuracy is also obtained, especially in very low SNR, under white noise environment.
Keywords :
acoustic noise; data compression; feature extraction; hearing; loudness; speech coding; speech recognition; white noise; PLP analysis; SNR; broadband uniform-exciting noise; broadband-like high frequency bands; class discrimination; energy compression; feature class discrimination; feature extraction; feature extraction techniques; feature variation; feature vectors; fixed compression; frame energy measuring index; hearing power law; hearing psychoacoustics; loudness function approximation; narrow band tone; perceptually nonuniform spectral compression; phonemes; power spectrum; recognition accuracy; robust speech recognition; root cepstral analysis; sound segment broadband characteristics; sound segment compression; speech recognition; white noise environment; Acoustic noise; Auditory system; Cepstral analysis; Energy measurement; Feature extraction; Frequency; Narrowband; Noise robustness; Psychoacoustics; Speech recognition;
Conference_Titel :
Circuits and Systems, 2003. ISCAS '03. Proceedings of the 2003 International Symposium on
Print_ISBN :
0-7803-7761-3
DOI :
10.1109/ISCAS.2003.1205122