DocumentCode :
1686308
Title :
Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR
Author :
Plahl, Christian ; Kozielski, Michal ; Schluter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2013
Firstpage :
6714
Lastpage :
6718
Abstract :
This paper investigates the combination of different short-term features and the combination of recurrent and non-recurrent neural networks (NNs) on a Spanish speech recognition task. Several methods exist to combine different feature sets such as concatenation or linear discriminant analysis (LDA). Even though all these techniques achieve reasonable improvements, feature combination by multi-layer perceptrons (MLPs) outperforms all known approaches. We develop the concept of MLP based feature combination further using recurrent neural networks (RNNs). The phoneme posterior estimates derived from an RNN lead to a significant improvement over the result of the MLPs and achieve a 5% relative better word error rate (WER) with much less parameters. Moreover, we improve the system performance further by combining an MLP and an RNN in a hierarchical framework. The MLP benefits from the preprocessing of the RNN. All NNs are trained on phonemes. Nevertheless, the same concepts could be applied using context-dependent states. In addition to the improvements in recognition performance w.r.t. WER, NN based feature combination methods reduce both, the training and the testing complexity. Overall, the systems are based on a single set of acoustic models, together with the training of different NNs.
Keywords :
acoustic signal processing; error statistics; multilayer perceptrons; natural language processing; recurrent neural nets; speech recognition; LVCSR; MLP based feature combination; RNN; Spanish speech recognition task; WER; acoustic models; context-dependent states; multilayer perceptrons; phoneme posterior estimates; recognition performance; recurrent neural networks; system performance; testing complexity; word error rate; Acoustics; Artificial neural networks; Hidden Markov models; Recurrent neural networks; Speech; Speech recognition; Training; feature combination; long-short-term-memory; multi-layer perceptron; recurrent neural networks; speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6638961
Filename :
6638961
Link To Document :
بازگشت