Title :
WFST-based structural classification integrating dnn acoustic features and RNN language features for speech recognition
Author :
Quoc Truong Do ; Nakamura, Satoshi ; Delcroix, Marc ; Hori, Takaaki
Author_Institution :
Nara Inst. of Sci. & Technol., Nara, Japan
Abstract :
This paper proposes a method to train Weighted Finite State Transducer (WFST) based structural classifiers using deep neural network (DNN) acoustic features and recurrent neural network (RNN) language features for speech recognition. Structural classification is an effective approach to achieve highly accurate recognition of structured data in which the classifier is optimized to maximize the discriminative performance using different kinds of features. A WFST-based classifier, which can integrate acoustic, pronunciation, and language features embedded in a composed WFST, was recently extended to incorporate DNN bottleneck (DNNBN) features. In this paper, we further investigate the integration of a RNN language model (RNNLM) with the WFST classifier. To this end, we introduce a lattice rescoring method using a RNNLM for efficient classifier training. In a lecture transcription task, we reduced the word error rate from 19.2% to 18.6% by optimizing the WFST parameters for the DNNBN acoustic and RNNLM language features.
Keywords :
natural language processing; neural nets; speech recognition; DNN; RNN language features; WFST; acoustic features; data structure; deep neural network; language features; pronunciation features; recurrent neural network; speech recognition; structural classification integrating DNN acoustic features; train weighted finite state transducer; Acoustics; Decoding; Feature extraction; History; Lattices; Speech recognition; Training; Lattice rescoring; RNNLM; Speech recognition; Structural classification; WFST-DNN;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location :
South Brisbane, QLD
DOI :
10.1109/ICASSP.2015.7178914