DocumentCode
2200267
Title
Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables
Author
Stephenson, Todd A. ; Escofet, Jaume ; Magimai-Doss, Mathew ; Bourlard, Heré
Author_Institution
Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland
fYear
2002
fDate
2002
Firstpage
637
Lastpage
646
Abstract
Pitch and energy are two fundamental features describing speech, having importance in human speech recognition. However, when incorporated as features in automatic speech recognition (ASR), they usually result in a significant degradation on recognition performance due to the noise inherent in estimating or modeling them. We show experimentally how this can be corrected by either conditioning the emission distributions upon these features or by marginalizing out these features in recognition. Since to do this is not obvious with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayesian networks (DBNs), resulting in more flexibility in defining the topology of the emission distributions and in specifying whether variables should be marginalized out.
Keywords
belief networks; feature extraction; learning (artificial intelligence); parameter estimation; random noise; speech recognition; HMM; acoustic feature estimation; automatic speech recognition; dynamic Bayesian networks; emission distributions; energy; hidden Markov models; pitch; training data; Acoustic emission; Artificial intelligence; Automatic speech recognition; Bayesian methods; Degradation; Hidden Markov models; Humans; Network topology; Speech enhancement; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks for Signal Processing, 2002. Proceedings of the 2002 12th IEEE Workshop on
Print_ISBN
0-7803-7616-1
Type
conf
DOI
10.1109/NNSP.2002.1030075
Filename
1030075
Link To Document