Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Author

Tuske, Zoltan ; Tahir, Muhammad Ali ; Schluter, Ralf ; Ney, Hermann

Author_Institution

Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

fYear

2015

fDate

19-24 April 2015

Firstpage

4285

Lastpage

4289

Abstract

In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By exploiting its equivalence with the log-linear mixture model (LMM), GMM can be transformed to a large softmax layer followed by a summation pooling layer. Theoretical and experimental results indicate that the jointly trained and optimally chosen GMM and bottleneck tandem features cannot perform worse than a hybrid model. Thus, the question “hybrid vs. tandem” simplifies to optimizing the output layer of a neural network. Speech recognition experiments are carried out on a broadcast news and conversations task using up to 12 feed-forward hidden layers with sigmoid and rectified linear unit activation functions. The evaluation of the LMM layer shows recognition gains over the classic softmax output.

Keywords

Gaussian processes; broadcasting; hidden Markov models; mixture models; neural nets; optimisation; speech recognition; GMM; Gaussian mixture model; HMM; LMM; broadcast news; conversations task; deep neural networks; feedforward hidden layers; hidden Markov model; hidden variables; log-linear mixture model; optimization; posterior probability estimates; recognition gains; rectified linear unit activation functions; sigmoid; softmax layer; speech recognition; summation pooling layer; Acoustics; Approximation methods; Artificial neural networks; Hidden Markov models; Joints; Training; ASR; DNN; GMM; LMM; Log-linear; bottleneck; hybrid; mixture model; neural network; tandem;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178779

Filename

7178779