• DocumentCode
    134192
  • Title

    Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes

  • Author

    Changqing Kong ; Shaofei Xue ; Jianqing Gao ; Wu Guo ; Lirong Dai ; Hui Jiang

  • Author_Institution
    Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2014
  • fDate
    12-14 Sept. 2014
  • Firstpage
    83
  • Lastpage
    87
  • Abstract
    Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.
  • Keywords
    learning (artificial intelligence); natural language processing; speaker recognition; speech coding; DNN model; LVCSR; PSC Mandarin task; Switchboard task; discriminative learning; extraction instrument; fast speaker adaptation method; hybrid DNN-HMM speech recognition model; mSA-SC; speaker adaptation technique; speaker adaptive bottleneck features extraction method; speaker codes; speaker-independent model; speech recognition task; target speaker feature; unsupervised speaker adaptation scheme; word error rate; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Bottleneck Features; Deep Neural Network (DNN); Hybrid DNNHMM; Speaker Adaptation; Speaker Codes;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/ISCSLP.2014.6936584
  • Filename
    6936584