Title :
Porting concepts from DNNs back to GMMs
Author :
Demuynck, Kris ; Triefenbach, Fabian
Author_Institution :
ELIS/MultimediaLab, Ghent Univ., Ghent, Belgium
Abstract :
Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the `deep´ GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination.
Keywords :
Gaussian processes; maximum likelihood estimation; mixture models; neural nets; speech recognition; DNN-based modeling; GMM-based system; Gaussian mixture models; TIMIT database; deep GMM; deep neural networks; maximum-likelihood trained Gaussians; porting concepts; speech recognition benchmarks; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; DNN; GMM; Gaussian mixture models; deep neural networks; deep structures; speech recognition;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
DOI :
10.1109/ASRU.2013.6707756