• DocumentCode
    672328
  • Title

    Speaker adaptation of neural network acoustic models using i-vectors

  • Author

    Saon, George ; Soltau, Hagen ; Nahamoo, David ; Picheny, Michael

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2013
  • fDate
    8-12 Dec. 2013
  • Firstpage
    55
  • Lastpage
    59
  • Abstract
    We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. Experimental results on a Switchboard 300 hours corpus show that DNNs trained on speaker independent features and i-vectors achieve a 10% relative improvement in word error rate (WER) over networks trained on speaker independent features only. These networks are comparable in performance to DNNs trained on speaker-adapted features (with VTLN and FMLLR) with the advantage that only one decoding pass is needed. Furthermore, networks trained on speaker-adapted features and i-vectors achieve a 5-6% relative improvement in WER after hessian-free sequence training over networks trained on speaker-adapted features only.
  • Keywords
    learning (artificial intelligence); neural nets; speech recognition; ASR; DNN; FMLLR; Switchboard 300 hours corpus; VTLN; WER; acoustic features; deep neural network acoustic models; hessian-free sequence training; i-vectors; speaker adaptation; speaker independent features; speaker-adapted features; word error rate; Acoustics; Feature extraction; Hidden Markov models; Neural networks; Training; Training data; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
  • Conference_Location
    Olomouc
  • Type

    conf

  • DOI
    10.1109/ASRU.2013.6707705
  • Filename
    6707705