Title :
Learning-Based Auditory Encoding for Robust Speech Recognition
Author :
Chiu, Yu-Hsiang Bosco ; Raj, Bhiksha ; Stern, Richard M.
Author_Institution :
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
fDate :
3/1/2012 12:00:00 AM
Abstract :
This paper describes an approach to the optimization of the nonlinear component of a physiologically motivated feature extraction system for automatic speech recognition. Most computational models of the peripheral auditory system include a sigmoidal nonlinear function that relates the log of signal intensity to output level, which we represent by a set of frequency dependent logistic functions. The parameters of these rate-level functions are estimated to maximize the a posteriori probability of the correct class in training data. The performance of this approach was verified by the results of a series of experiments conducted with the CMU S phinx-III speech recognition system on the DARPA Resource Management, Wall Street Journal databases, and on the AURORA 2 database. In general, it was shown that feature extraction that incorporates the learned rate-nonlinearity, combined with a complementary loudness compensation function, results in better recognition accuracy in the presence of background noise than traditional MFCC feature extraction without the optimized nonlinearity when the system is trained on clean speech and tested in noise. We also describe the use of lattice structure that constraints the training process, enabling training with much more complicated acoustic models.
Keywords :
encoding; feature extraction; learning (artificial intelligence); probability; speech coding; speech recognition; AURORA 2 database; CMU Sphinx-III speech recognition system; DARPA resource management; MFCC feature extraction; a posteriori probability; automatic speech recognition; learning-based auditory encoding; peripheral auditory system; physiologically motivated feature extraction system; robust speech recognition; Computational modeling; Feature extraction; Hidden Markov models; Noise; Speech; Speech recognition; Training; Auditory model; discriminative training; feature extraction; robust automatic speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2011.2168209