Title :
Improving long short-term memory networks using maxout units for large vocabulary speech recognition
Author :
Xiangang Li ; Xihong Wu
Author_Institution :
Key Lab. of Machine Perception (Minist. of Educ.), Peking Univ., Beijing, China
Abstract :
Long short-tem memory (LSTM) recurrent neural networks have been shown to give state-of-the-art performance on many speech recognition tasks. To achieve a further performance improvement, in this paper, maxout units are proposed to be integrated with the LSTM cells, considering those units have brought significant improvements to deep feed-forward neural networks. A novel architecture was constructed by replacing the input activation units (generally tanh) in the LSTM networks with maxout units. We implemented the LSTM network training on multi-GPU devices with truncated BPTT, and empirically evaluated the proposed designs on a large vocabulary Mandarin conversational telephone speech recognition task. The experimental results support our claim that the performance of LSTM based acoustic models can be further improved using the maxout units.
Keywords :
neural nets; speech recognition; vocabulary; LSTM recurrent neural networks; Mandarin conversational telephone speech recognition task; feed forward neural networks; large vocabulary speech recognition; long short term memory networks; maxout units; Acoustics; Computer architecture; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; acoustic modeling; deep neural network; large vocabulary speech recognition; long short-term memory; maxout;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178842