DocumentCode
134192
Title
Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes
Author
Changqing Kong ; Shaofei Xue ; Jianqing Gao ; Wu Guo ; Lirong Dai ; Hui Jiang
Author_Institution
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
fYear
2014
fDate
12-14 Sept. 2014
Firstpage
83
Lastpage
87
Abstract
Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.
Keywords
learning (artificial intelligence); natural language processing; speaker recognition; speech coding; DNN model; LVCSR; PSC Mandarin task; Switchboard task; discriminative learning; extraction instrument; fast speaker adaptation method; hybrid DNN-HMM speech recognition model; mSA-SC; speaker adaptation technique; speaker adaptive bottleneck features extraction method; speaker codes; speaker-independent model; speech recognition task; target speaker feature; unsupervised speaker adaptation scheme; word error rate; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Bottleneck Features; Deep Neural Network (DNN); Hybrid DNNHMM; Speaker Adaptation; Speaker Codes;
fLanguage
English
Publisher
ieee
Conference_Titel
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
Conference_Location
Singapore
Type
conf
DOI
10.1109/ISCSLP.2014.6936584
Filename
6936584
Link To Document