DocumentCode :
1691600
Title :
Large vocabulary continuous speech recognition based on WFST structured classifiers and deep bottleneck features
Author :
Kubo, Yuji ; Hori, Toshikazu ; Nakamura, A.
Author_Institution :
NTT Commun. Sci. Labs., NTT Corp., Keihanna Science City, Japan
fYear :
2013
Firstpage :
7629
Lastpage :
7633
Abstract :
Recently, structured classification approaches have been considered important with a view to achieving unified modeling of the acoustic and linguistic aspects of speech recognizers. With these approaches, unified representation is achieved by directly optimizing a score function that measures the correspondence between the input and output of the system. Since structured classifiers typically employ a linear function as a score function, extracting expressive features from the input and output of the system is very important. On the other hand, the effectiveness of deep neural networks has been verified by several experiments, and it has been suggested that the outputs of hidden layers in deep neural networks (DNNs) are essential speech features that purely express phonetic information. In this paper, we propose a method for structured classification with DNN features. The proposed method expands conventional DNN- based acoustic models so that they optimizes the weight terms of the arcs in a decoding WFST, which is constructed with the on-the-fly composition method. Since DNN-based features can be considered enhancements in the input representation, the enhancements in the output representation based on the WFST arcs are expected to complement the DNN-based features. The proposed method achieved an 8 % relative error reduction even compared with a strong acoustic model based on DNNs.
Keywords :
neural nets; signal classification; speech recognition; DNN-based acoustic models; WFST decoding; WFST structured classifier approach; deep bottleneck features; deep neural networks; large vocabulary continuous speech recognition; linear function; linguistic modeling; on-the-fly composition method; phonetic information; score function; speech recognizers; unified acoustic modeling; weighted finite-state transducer; Acoustics; Cost function; Hidden Markov models; Linear programming; Speech; Speech recognition; Training; Speech recognition; deep neural networks; structured classification; weighed finite-state transducers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639147
Filename :
6639147
Link To Document :
بازگشت