DocumentCode
2739480
Title
Auditory pathway model and its VLSI implementation for robust speech recognition in real-world noisy environment
Author
Lee, Soo-Young ; Kim, Chang-Min ; Won, Young-Gul ; Park, Hyung-Min
Author_Institution
Dept. of Electr. Eng. & Comput. Sci., Korea Adv. Inst. of Sci. & Technol., South Korea
Volume
2
fYear
2003
fDate
14-17 Dec. 2003
Firstpage
1728
Abstract
A robust speech recognition system is reported based on mathematical models of auditory pathway and also their VLSI implementations. The developed auditory model consists of 3 components, i.e., nonlinear feature extraction at cochlea, binaural processing at superior olivery complex, and top-down attention through backward path. The feature extraction is based on cochlear filter bank and time-frequency masking, which is modeled with lateral inhibition in both time and frequency domain. Unlike the popular binaural processing models based on simple interaural time delay and interaural intensity difference our model incorporates hundreds of time-delays for noisy reverberated signals. The top-down (TD) attention comes from familiarity and/or importance of the sound, and a simple but efficient TD attention model had been developed based on error backpropagation algorithm. These auditory models require intensive computing, and special hardwares had been developed for real-time applications. Experimental results demonstrate much better recognition performance in real-world noisy environments.
Keywords
VLSI; backpropagation; delays; feature extraction; speech recognition; VLSI; auditory pathway model; binaural processing; cochlear filter bank; error backpropagation algorithm; frequency domain; interaural intensity difference; interaural time delay; mathematical models; noisy reverberated signals; nonlinear feature extraction; real world noisy environment; robust speech recognition; time domain; time frequency masking; top down attention; very large scale integration; Acoustic noise; Feature extraction; Filter bank; Frequency domain analysis; Mathematical model; Robustness; Speech recognition; Time frequency analysis; Very large scale integration; Working environment noise;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks and Signal Processing, 2003. Proceedings of the 2003 International Conference on
Conference_Location
Nanjing
Print_ISBN
0-7803-7702-8
Type
conf
DOI
10.1109/ICNNSP.2003.1281219
Filename
1281219
Link To Document