DocumentCode
178606
Title
Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition
Author
Guangsen Wang ; Khe Chai Sim
Author_Institution
Sch. of Comput., Nat. Univ. of Singapore, Singapore, Singapore
fYear
2014
fDate
4-9 May 2014
Firstpage
3022
Lastpage
3026
Abstract
The data sparsity problem of context-dependent (CD) acoustic modelling of deep neural networks (DNNs) in speech recognition is addressed by using the decision tree state clusters as the training targets. The CD states within a cluster cannot be distinguished during decoding. This problem, referred to as the clustering problem, is not explicitly addressed in the current literature. In our previous work, a regression-based CD-DNN framework was proposed to address both the data sparsity and the clustering problems. This paper investigates several refinements for the regression-based CD-DNN including two more representative state approximation schemes and the incorporation of sequential learning. The two approximations are obtained based on the statistics learned from the training data. Sequential learning is applied to both broad phone DNN detectors and the regression NN. The proposed refinements are evaluated on a broadcast news transcription task. For the cross-entropy systems, the two approximations perform consistently better than our previous work. Consistent performance gain over the corresponding cross-entropy trained systems is also observed for both the baseline CD-DNN and the regression model with sequential learning.
Keywords
decision trees; learning (artificial intelligence); neural nets; regression analysis; speech recognition; CD acoustic modelling; baseline CD-DNN; broad phone DNN detectors; broadcast news transcription task; clustering problem; context-dependent acoustic modelling; cross-entropy trained systems; data sparsity problem; decision tree state clusters; deep neural networks; regression-based CD-DNN framework; representative state approximation schemes; sequential learning; speech recognition; training targets; Approximation methods; Detectors; Hidden Markov models; Mathematical model; Neural networks; Speech recognition; Training; Articulatory Features; Canonical State Modelling; Context Dependent Modelling; Deep Neural Network; Logistic Regression; Sequential Learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location
Florence
Type
conf
DOI
10.1109/ICASSP.2014.6854155
Filename
6854155
Link To Document