Title :
Incorporating Knowledge Sources Into a Statistical Acoustic Model for Spoken Language Communication Systems
Author :
Sakti, Sakriani ; Markov, Konstantin ; Nakamura, Satoshi
Author_Institution :
NICT/ATR Spoken Language Commun. Res. Labs., Keihanna Science City
Abstract :
This paper introduces a general framework for incorporating additional sources of knowledge into an HMM-based statistical acoustic model. Since the knowledge sources are often derived from different domains, it may be difficult to formulate a probabilistic function of the model without learning the causal dependencies between the sources. We utilized a Bayesian network framework to solve this problem. The advantages of this graphical model framework are 1) it allows the probabilistic relationship between information sources to be learned and 2) it facilitates the decomposition of the joint probability density function (PDF) into a linked set of local conditional PDFs. This way, a simplified form of the model can be constructed and reliably estimated using a limited amount of training data. We applied this framework to the problem of incorporating wide-phonetic knowledge information, which often suffers from a sparsity of data and memory constraints. We evaluated how well the proposed method performed on an large-vocabulary continuous speech recognition (LVCSR) task using English speech data that contained two different types of accents. The experimental results revealed that it improved the word accuracy with respect to standard HMM, with or without additional sources of knowledge.
Keywords :
belief networks; hidden Markov models; natural language interfaces; probability; speech recognition; speech-based user interfaces; Bayesian network framework; English speech data; HMM-based statistical acoustic model; graphical model framework; knowledge sources; large-vocabulary continuous speech recognition; probability density function; spoken language communication system; Bayesian methods; Graphical models; Hidden Markov models; Memory management; Natural languages; Performance evaluation; Probability density function; Speech analysis; Speech recognition; Training data; Acoustic modeling; Bayesian network; junction tree; knowledge incorporation; wide-context dependency;
Journal_Title :
Computers, IEEE Transactions on
DOI :
10.1109/TC.2007.1069