Title :
A model of dynamic auditory perception and its application to robust word recognition
Author :
Strope, Brian ; Alwan, Abeer
Author_Institution :
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
fDate :
9/1/1997 12:00:00 AM
Abstract :
This paper describes two mechanisms that augment the common automatic speech recognition (ASR) front end and provide adaptation and isolation of local spectral peaks. A dynamic model consisting of a linear filterbank with a novel additive logarithmic adaptation stage after each filter output is proposed. An extensive series of perceptual forward masking experiments, together with previously reported forward masking data, determine the model´s dynamic parameters. Once parameterized, the simple exponential dynamic mechanism predicts the nature of forward masking data from several studies across wide ranging frequencies, input levels, and probe delay times. An initial evaluation of the dynamic model together with a local peak isolation mechanism as a front end for dynamic time warp (DTW) and hidden Markov model (HMM) word recognition systems shows an improvement in robustness to background noise when compared to Mel-frequency cepstral coefficients (MFCC), linear prediction cepstral coefficients (LPCC), and relative spectra (RASTA) based front ends
Keywords :
band-pass filters; filtering theory; hearing; hidden Markov models; noise; parameter estimation; prediction theory; spectral analysis; speech processing; speech recognition; HMM word recognition systems; Mel-frequency cepstral coefficients; additive logarithmic adaptation; automatic speech recognition front end; background noise; dynamic auditory perception; dynamic parameters; exponential dynamic mechanism; filter output; forward masking data; hidden Markov model; input levels; linear filterbank; linear prediction cepstral coefficients; local spectral peaks; local spectral peaks isolation; perceptual forward masking experiments; probe delay times; relative spectra; robust word recognition; Automatic speech recognition; Cepstral analysis; Delay; Filter bank; Frequency; Hidden Markov models; Noise robustness; Nonlinear filters; Predictive models; Probes;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on