Auditory pathway model and its VLSI implementation for robust speech recognition in real-world noisy environment

Author

Lee, Soo-Young ; Kim, Chang-Min ; Won, Young-Gul ; Park, Hyung-Min

Author_Institution

Dept. of Electr. Eng. & Comput. Sci., Korea Adv. Inst. of Sci. & Technol., South Korea

Volume

2

fYear

2003

fDate

14-17 Dec. 2003

Firstpage

1728

Abstract

A robust speech recognition system is reported based on mathematical models of auditory pathway and also their VLSI implementations. The developed auditory model consists of 3 components, i.e., nonlinear feature extraction at cochlea, binaural processing at superior olivery complex, and top-down attention through backward path. The feature extraction is based on cochlear filter bank and time-frequency masking, which is modeled with lateral inhibition in both time and frequency domain. Unlike the popular binaural processing models based on simple interaural time delay and interaural intensity difference our model incorporates hundreds of time-delays for noisy reverberated signals. The top-down (TD) attention comes from familiarity and/or importance of the sound, and a simple but efficient TD attention model had been developed based on error backpropagation algorithm. These auditory models require intensive computing, and special hardwares had been developed for real-time applications. Experimental results demonstrate much better recognition performance in real-world noisy environments.

Keywords

VLSI; backpropagation; delays; feature extraction; speech recognition; VLSI; auditory pathway model; binaural processing; cochlear filter bank; error backpropagation algorithm; frequency domain; interaural intensity difference; interaural time delay; mathematical models; noisy reverberated signals; nonlinear feature extraction; real world noisy environment; robust speech recognition; time domain; time frequency masking; top down attention; very large scale integration; Acoustic noise; Feature extraction; Filter bank; Frequency domain analysis; Mathematical model; Robustness; Speech recognition; Time frequency analysis; Very large scale integration; Working environment noise;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks and Signal Processing, 2003. Proceedings of the 2003 International Conference on

Conference_Location

Nanjing

Print_ISBN

0-7803-7702-8

Type

conf

DOI

10.1109/ICNNSP.2003.1281219

Filename

1281219