Title :
Speech recognition with prediction-adaptation-correction recurrent neural networks
Author :
Yu Zhang ; Dong Yu ; Seltzer, Michael L. ; Droppo, Jasha
Author_Institution :
CSAIL, MIT, Cambridge, MA, USA
Abstract :
We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current frame information, the main DNN makes a correction on the prediction made by the prediction DNN. Alternatively, it can be viewed as adapting the main DNN´s behavior based on the prediction DNN´s prediction. Experiments on the TIMIT phone recognition task indicate that the PAC-RNN outperforms DNN, RNN, and LSTM with 2.4%, 2.1%, and 1.9% absolute phone accuracy improvement, respectively. We found that incorporating the prediction objective and including the recurrent loop are both important to boost the performance of the PAC-RNN.
Keywords :
prediction theory; probability; recurrent neural nets; speech recognition; LSTM; PAC-RNN; TIMIT phone recognition; correction DNN; deep neural network; phone accuracy improvement; prediction DNN; prediction-adaptation-correction recurrent neural networks; speech recognition; state posterior probability; Accuracy; Hidden Markov models; Recurrent neural networks; Speech; Speech recognition; Training; DNN; Deep Neural Network; PAC-RNN; Prediction-Adaptation-Correction RNN; RNN; Recurrent neural network;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178923