• DocumentCode
    134191
  • Title

    Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition

  • Author

    Shaofei Xue ; Hui Jiang ; Lirong Dai

  • Author_Institution
    Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    Recently several speaker adaptation methods have been proposed for deep neural network (DNN) in many large vocabulary continuous speech recognition (LVCSR) tasks. However, only a few methods rely on tuning the weight matrices in trained DNNs to optimize system performance since it is very prone to over-fitting especially when some class labels are missing in the adaptation data. In this paper, we propose a new speaker adaptation method for the hybrid NN/HMM speech recognition model based on singular value decomposition (SVD). We apply SVD on the weight matrices in trained DNNs, and then tune diagonal matrices with the adaptation data. This solves the over-fitting problem since we can change the weight matrices slightly by only modifying the singular values. We evaluate the proposed adaptation method in two standard speech recognition tasks, namely TIMIT phone recognition and large vocabulary speech recognition in the Switchboard task. Experimental results have shown that it is effective to adapt large DNN models using only a small amount of adaptation data. For example, the Switchboard results have shown that the proposed SVD-based adaptation method may achieve up to 3-6% relative error reduction using only a few dozens of adaptation utterances per speaker.
  • Keywords
    hidden Markov models; neural nets; singular value decomposition; speech recognition; SVD-based adaptation method; TIMIT phone recognition; adaptation data; adaptation utterances; deep neural network; diagonal matrices; hidden Markov models; hybrid NN-HMM speech recognition model; large DNN models; large vocabulary continuous speech recognition tasks; over-fitting problem; singular value decomposition; speaker adaptation methods; switchboard task; trained DNN; weight matrices; Adaptation models; Hidden Markov models; Matrix decomposition; Neural networks; Signal processing; Speech; Speech recognition; Deep Neural Network (DNN); Hybrid DNN/HMM; Speaker Adaptation; singular value decomposition (SVD);
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936583
  • Filename
    6936583