Title :
Learning-based auditory encoding for robust speech recognition
Author :
Chiu, Yu-Hsiang Bosco ; Raj, Bhiksha ; Stern, Richard M.
Author_Institution :
Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
This paper describes ways of speeding up the optimization process for learning physiologically-motivated components of a feature computation module directly from data. During training, word lattices generated by the speech decoder and conjugate gradient descent were included to train the parameters of logistic functions in a fashion that maximizes the a posteriori probability of the correct class in the training data. These functions represent the rate-level nonlinearities found in most mammalian auditory systems. Experiments conducted using the CMU SPHINX-III system on the DARPA Resource Management and Wall Street Journal tasks show that the use of discriminative training to estimate the shape of the rate-level nonlinearity provides better recognition accuracy in the presence of background noise than traditional procedures which do not employ learning. More importantly, the inclusion of conjugate gradient descent optimization and a word lattice to reduce the number of hypotheses considered greatly increases the training speed, which makes training with much more complicated models possible.
Keywords :
acoustic noise; acoustic signal processing; conjugate gradient methods; hearing; speech coding; speech recognition; background noise; conjugate gradient; conjugate gradient descent optimization; discriminative training; feature computation module; learning-based auditory encoding; logistic functions; mammalian auditory systems; physiologically-motivated components; posteriori probability; rate-level nonlinearities; robust speech recognition; speech decoder; word lattices; Auditory system; Decoding; Encoding; Lattices; Logistics; Management training; Resource management; Robustness; Speech recognition; Training data; auditory models; automatic speech recognition; data analysis; discriminative training;
Conference_Titel :
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4244-4295-9
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2010.5495666