Title :
Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables
Author :
Stephenson, Todd A. ; Escofet, Jaume ; Magimai-Doss, Mathew ; Bourlard, Heré
Author_Institution :
Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland
Abstract :
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition performance due to the noise inherent in estimating or modeling them. We show experimentally how this can be corrected by either conditioning the emission distributions upon these features or by marginalizing out these features in recognition. Since to do this is not obvious with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayesian networks (DBNs), resulting in more flexibility in defining the topology of the emission distributions and in specifying whether variables should be marginalized out.
Keywords :
belief networks; feature extraction; learning (artificial intelligence); parameter estimation; random noise; speech recognition; HMM; acoustic feature estimation; automatic speech recognition; dynamic Bayesian networks; emission distributions; energy; hidden Markov models; pitch; training data; Acoustic emission; Artificial intelligence; Automatic speech recognition; Bayesian methods; Degradation; Hidden Markov models; Humans; Network topology; Speech enhancement; Speech recognition;
Conference_Titel :
Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on
Print_ISBN :
0-7803-7616-1
DOI :
10.1109/NNSP.2002.1030075