Speaker adaptive bottleneck features extraction for LVCSR based on discriminative learning of speaker codes

Author

Changqing Kong ; Shaofei Xue ; Jianqing Gao ; Wu Guo ; Lirong Dai ; Hui Jiang

Author_Institution

Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China

fYear

2014

fDate

12-14 Sept. 2014

Firstpage

83

Lastpage

87

Abstract

Recently, several fast speaker adaptation methods based on the so-called speaker codes (SC) have been proposed for the hybrid DNN-HMM speech recognition model [1, 2, 3]. In these methods the target speaker features are modified to match the given speaker-independent models or the speaker-independent models are transformed towards one particular speaker based on the discriminative learning of speaker codes. Previous researches have shown that these proposed SC-based adaptation methods are very effective to adapt large DNN models using only a small amount of adaptation data. In this work, we have explored the combination of direct speaker adaptation technique in model space based on speaker codes (mSA-SC) and bottleneck features where mSA-SC is used as an extraction instrument of speaker adaptive bottleneck features. We have evaluated the proposed speaker adaptive bottleneck features extraction method in two speech recognition tasks, namely PSC Mandarin task and large scale 320-hr Switchboard task. Experimental results have verified that it is quite suitable for very large scale tasks. For example, the Switchboard results have shown that it can achieve relative 9% reduction in word error rate on an unsupervised speaker adaptation scheme.

Keywords

learning (artificial intelligence); natural language processing; speaker recognition; speech coding; DNN model; LVCSR; PSC Mandarin task; Switchboard task; discriminative learning; extraction instrument; fast speaker adaptation method; hybrid DNN-HMM speech recognition model; mSA-SC; speaker adaptation technique; speaker adaptive bottleneck features extraction method; speaker codes; speaker-independent model; speech recognition task; target speaker feature; unsupervised speaker adaptation scheme; word error rate; Adaptation models; Feature extraction; Hidden Markov models; Neural networks; Speech recognition; Switches; Training; Bottleneck Features; Deep Neural Network (DNN); Hybrid DNNHMM; Speaker Adaptation; Speaker Codes;

fLanguage

English

Publisher

ieee

Conference_Titel

Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on

Conference_Location

Singapore

Type

conf

DOI

10.1109/ISCSLP.2014.6936584

Filename

6936584