DocumentCode
3430126
Title
Improving long short-term memory networks using maxout units for large vocabulary speech recognition
Author
Xiangang Li ; Xihong Wu
Author_Institution
Key Lab. of Machine Perception (Minist. of Educ.), Peking Univ., Beijing, China
fYear
2015
fDate
19-24 April 2015
Firstpage
4600
Lastpage
4604
Abstract
Long short-tem memory (LSTM) recurrent neural networks have been shown to give state-of-the-art performance on many speech recognition tasks. To achieve a further performance improvement, in this paper, maxout units are proposed to be integrated with the LSTM cells, considering those units have brought significant improvements to deep feed-forward neural networks. A novel architecture was constructed by replacing the input activation units (generally tanh) in the LSTM networks with maxout units. We implemented the LSTM network training on multi-GPU devices with truncated BPTT, and empirically evaluated the proposed designs on a large vocabulary Mandarin conversational telephone speech recognition task. The experimental results support our claim that the performance of LSTM based acoustic models can be further improved using the maxout units.
Keywords
neural nets; speech recognition; vocabulary; LSTM recurrent neural networks; Mandarin conversational telephone speech recognition task; feed forward neural networks; large vocabulary speech recognition; long short term memory networks; maxout units; Acoustics; Computer architecture; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; acoustic modeling; deep neural network; large vocabulary speech recognition; long short-term memory; maxout;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178842
Filename
7178842
Link To Document