Title : 
Decision tree based state tying for speech recognition using DNN derived embeddings
         
        
            Author : 
Xiangang Li ; Xihong Wu
         
        
            Author_Institution : 
Key Lab. of Machine Perception (Minist. of Educ.), Peking Univ., Beijing, China
         
        
        
        
        
        
            Abstract : 
Recently, context dependent (CD)-deep neural network (DNN)-hidden Markov model (HMM) obtains significant improvements in many automatic speech recognition (ASR) tasks. In the standard training procedure for CD-DNN-HMM, the Gaussian mixture models (GMM) based ASR system has to be firstly built to pre-segment the training data and to define the CD states as the targets for DNN. In this paper, we propose a novel decision tree based state tying procedure, in which, the state embeddings derived from DNN are used and clustered to minimize the sum-of-squared error. Thus, the GMM is not a necessary part to define the targets for CD-DNN. Besides, we introduce a training procedure for CD-DNN-HMM, where, the forward backward algorithm is used for context independent (CI) DNN-HMM training, and the proposed state tying approach is applied to define the CD-DNN targets. Experiments were conducted on a 30-hour Chinese broadcast news speech database and the results demonstrate that the proposed DNN based state tying approach yielded comparable performance to the GMM based one.
         
        
            Keywords : 
decision trees; hidden Markov models; learning (artificial intelligence); neural nets; pattern clustering; speech recognition; ASR systems; CD-DNN-HMM training procedure; CI DNN-HMM; Chinese broadcast news speech database; DNN derived embeddings; automatic speech recognition; clustering; context dependent; context independent DNN-HMM training; decision tree; deep neural network; forward backward algorithm; hidden Markov model; state tying procedure; sum-of-squared error minimization; Acoustics; Context; Decision trees; Hidden Markov models; Speech; Speech recognition; Training; DNN embedding; clustering; decision tree based state tying; speech recognition;
         
        
        
        
            Conference_Titel : 
Chinese Spoken Language Processing (ISCSLP), 2014 9th International Symposium on
         
        
            Conference_Location : 
Singapore
         
        
        
            DOI : 
10.1109/ISCSLP.2014.6936637