Author :
Lv, Guoyun ; Jiang, Dongmei ; Zhao, Rongchun
Abstract :
In this paper, based on an single stream word- phone Dynamic Bayesian Network (WP-DBN) model and an single stream word-phone-state DBN (WPS- DBN) model proposed by Guoyun et al [8], to more accurately capture the variations in real continuous speech spectra, context-dependent triphone models are considered, two single stream DBN models, word- triphone DBN (WT-DBN) model and word-triphone- state DBN (WTS-DBN) model, are proposed for continuous speech recognition. Simultaneously, decision tree-based state tying clustering method is used to maintain the balance between model complexity and their corresponding available training data. Essentially, WTS-DBN model is a triphone model whose recognition modeling units are triphones, and simulates a conventional triphone Hidden Markov Model (HMM). Recognition experiments are done on continuous speech database, and results show that WTS-DBN model has the best performance in speech recognition rate. In clean speech environment, comparing with triphone HMM, WPS-DBN model and WT-DBN model, the improvements of 20.53%, 7.52% and 40.77% are obtained for WTS-DBN model respectively in speech recognition rate.