• DocumentCode
    3744829
  • Title

    Time delay deep neural network-based universal background models for speaker recognition

  • Author

    David Snyder;Daniel Garcia-Romero;Daniel Povey

  • Author_Institution
    Center for Language and Speech Processing & Human Language Technology Center of Excellence, The Johns Hopkins University, Baltimore, MD 21218, USA
  • fYear
    2015
  • Firstpage
    92
  • Lastpage
    97
  • Abstract
    Recently, deep neural networks (DNN) have been incorporated into i-vector-based speaker recognition systems, where they have significantly improved state-of-the-art performance. In these systems, a DNN is used to collect sufficient statistics for i-vector extraction. In this study, the DNN is a recently developed time delay deep neural network (TDNN) that has achieved promising results in LVCSR tasks. We believe that the TDNN-based system achieves the best reported results on SRE10 and it obtains a 50% relative improvement over our GMM baseline in terms of equal error rate (EER). For some applications, the computational cost of a DNN is high. Therefore, we also investigate a lightweight alternative in which a supervised GMM is derived from the TDNN posteriors. This method maintains the speed of the traditional unsupervised-GMM, but achieves a 20% relative improvement in EER.
  • Keywords
    "Speaker recognition","Feature extraction","Delay effects","Training","Neural networks","Computational modeling","Acoustics"
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on
  • Type

    conf

  • DOI
    10.1109/ASRU.2015.7404779
  • Filename
    7404779