• DocumentCode
    672379
  • Title

    Porting concepts from DNNs back to GMMs

  • Author

    Demuynck, Kris ; Triefenbach, Fabian

  • Author_Institution
    ELIS/MultimediaLab, Ghent Univ., Ghent, Belgium
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    356
  • Lastpage
    361
  • Abstract
    Deep neural networks (DNNs) have been shown to outperform Gaussian Mixture Models (GMM) on a variety of speech recognition benchmarks. In this paper we analyze the differences between the DNN and GMM modeling techniques and port the best ideas from the DNN-based modeling to a GMM-based system. By going both deep (multiple layers) and wide (multiple parallel sub-models) and by sharing model parameters, we are able to close the gap between the two modeling techniques on the TIMIT database. Since the `deep´ GMMs retain the maximum-likelihood trained Gaussians as first layer, advanced techniques such as speaker adaptation and model-based noise robustness can be readily incorporated. Regardless of their similarities, the DNNs and the deep GMMs still show a sufficient amount of complementarity to allow effective system combination.
  • Keywords
    Gaussian processes; maximum likelihood estimation; mixture models; neural nets; speech recognition; DNN-based modeling; GMM-based system; Gaussian mixture models; TIMIT database; deep GMM; deep neural networks; maximum-likelihood trained Gaussians; porting concepts; speech recognition benchmarks; Acoustics; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; DNN; GMM; Gaussian mixture models; deep neural networks; deep structures; speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707756
  • Filename
    6707756