Title :
Modeling auditory perception to improve robust speech recognition
Author :
Strope, Brian ; Alwan, Abeer
Author_Institution :
Dept. of Electr. Eng., California Univ., Los Angeles, CA, USA
Abstract :
While non-stationary stochastic techniques have led to substantial improvements in vocabulary size and speaker independence, most automatic speech recognition (ASR) systems remain overly sensitive to the acoustic environment, precluding robust widespread applications. Our approach to this problem has been to model fundamental aspects of auditory perception, which are typically neglected in common ASR front ends, to derive a more robust and phonetically relevant parameterization of speech. Short-term adaptation and recovery, a sensitivity to local spectral peaks, together with an explicit parameterization of the position and motion of local spectral peaks reduces the error rate of a word recognition task by as much as a factor of 4. Current work also investigates the perceptual significance of pitch-rate amplitude-modulation cues in noise.
Keywords :
amplitude modulation; hearing; spectral analysis; speech recognition; ASR front ends; acoustic environment; auditory perception modelling; automatic speech recognition; error rate reduction; local spectral peaks sensitivity; noise; phonetically relevant parameterization; pitch-rate amplitude-modulation cues; robust speech recognition; short-term adaptation; short-term recovery; word recognition task; Automatic speech recognition; Discrete cosine transforms; Filters; Frequency estimation; Hidden Markov models; Robustness; Signal processing; Spectrogram; Speech recognition; Vocabulary;
Conference_Titel :
Signals, Systems & Computers, 1997. Conference Record of the Thirty-First Asilomar Conference on
Conference_Location :
Pacific Grove, CA, USA
Print_ISBN :
0-8186-8316-3
DOI :
10.1109/ACSSC.1997.679067