DocumentCode
730667
Title
Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables
Author
Tuske, Zoltan ; Tahir, Muhammad Ali ; Schluter, Ralf ; Ney, Hermann
Author_Institution
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear
2015
fDate
19-24 April 2015
Firstpage
4285
Lastpage
4289
Abstract
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.
Keywords
Gaussian processes; broadcasting; hidden Markov models; mixture models; neural nets; optimisation; speech recognition; GMM; Gaussian mixture model; HMM; LMM; broadcast news; conversations task; deep neural networks; feedforward hidden layers; hidden Markov model; hidden variables; log-linear mixture model; optimization; posterior probability estimates; recognition gains; rectified linear unit activation functions; sigmoid; softmax layer; speech recognition; summation pooling layer; Acoustics; Approximation methods; Artificial neural networks; Hidden Markov models; Joints; Training; ASR; DNN; GMM; LMM; Log-linear; bottleneck; hybrid; mixture model; neural network; tandem;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178779
Filename
7178779
Link To Document