Porting concepts from DNNs back to GMMs

Author

Demuynck, Kris ; Triefenbach, Fabian

Author_Institution

ELIS/MultimediaLab, Ghent Univ., Ghent, Belgium

fYear

2013

fDate

8-12 Dec. 2013

Firstpage

356

Lastpage

361

Abstract

Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the `deep´ GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination.

Keywords

Gaussian processes; maximum likelihood estimation; mixture models; neural nets; speech recognition; DNN-based modeling; GMM-based system; Gaussian mixture models; TIMIT database; deep GMM; deep neural networks; maximum-likelihood trained Gaussians; porting concepts; speech recognition benchmarks; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; DNN; GMM; Gaussian mixture models; deep neural networks; deep structures; speech recognition;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location

Olomouc

Type

conf

DOI

10.1109/ASRU.2013.6707756

Filename

6707756