Title : 
Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code
         
        
            Author : 
Shaofei Xue ; Abdel-Hamid, Ossama ; Hui Jiang ; Lirong Dai
         
        
            Author_Institution : 
Nat. Eng. Lab. of Speech & Language Inf. Process., Univ. of Sci. & Technol. of China, Hefei, China
         
        
        
        
        
        
            Abstract : 
Recently an effective fast speaker adaptation method using discriminative speaker code (SC) has been proposed for the hybrid DNN-HMM models in speech recognition [1]. This adaptation method depends on a joint learning of a large generic adaptation neural network for all speakers as well as multiple small speaker codes using the standard back-propagation algorithm. In this paper, we propose an alternative direct adaptation in model space, where speaker codes are directly connected to the original DNN models through a set of new connection weights, which can be estimated very efficiently from all or part of training data. As a result, the proposed method is more suitable for large scale speech recognition tasks since it eliminates the time-consuming training process to estimate another adaptation neural networks. In this work, we have evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task. Experimental results have shown that the proposed SC-based rapid adaptation method is very effective not only for small recognition tasks but also for very large scale tasks. For example, it has shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker (from 10 to a few dozens). Moreover, the extra training time required for adaptation is also significantly reduced from the method in [1].
         
        
            Keywords : 
backpropagation; neural nets; speaker recognition; speech codecs; LVCSR; adaptation method; adaptation neural networks; direct SC-based adaptation method; direct adaptation; discriminative speaker code; fast speaker adaptation; hybrid DNN-HMM model; speaker code; speaker codes; speech recognition; standard backpropagation algorithm; switchboard task; time 320 hr; word error rate; Adaptation models; Hidden Markov models; Neural networks; Speech; Speech recognition; Training; Training data; Deep Neural Network (DNN); Fast Speaker Adaptation; Hybrid DNN-HMM; Speaker Code;
         
        
        
        
            Conference_Titel : 
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
         
        
            Conference_Location : 
Florence
         
        
        
            DOI : 
10.1109/ICASSP.2014.6854824