Title :
Utterance verification in continuous speech recognition: decoding and training procedures
Author :
Lleida, Eduardo ; Rose, Richard C.
Author_Institution :
Centro Politecnico Superior, Zaragoza Univ., Spain
fDate :
3/1/2000 12:00:00 AM
Abstract :
This paper introduces a set of acoustic modeling and decoding techniques for utterance verification (UV) in hidden Markov model (HMM) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string correspond to incorrectly decoded vocabulary words or out-of-vocabulary words that may appear in an utterance. This capability is implemented here as a likelihood ratio (LR) based hypothesis testing procedure for, verifying individual words in a decoded string. There are two UV techniques that are presented here. The first is a procedure for estimating the parameters of UV models during training according to an optimization criterion which is directly related to the LR measure used in UV. The second technique is a speech recognition decoding procedure where the “best” decoded path is defined to be that which optimizes a LR criterion. These techniques were evaluated in terms of their ability to improve UV performance on a speech dialog task over the public switched telephone network. The results of an experimental study presented in the paper shows that LR based parameter estimation results in a significant improvement in UV performance for this task. The study also found that the use of the LR based decoding procedure, when used in conjunction with models trained using the LR criterion, can provide as much as an 11% improvement in UV performance when compared to existing UV procedures. Finally, it was also found that the performance of the LR decoder was highly dependent on the use of the LR criterion in training acoustic models. Several observations are made in the paper concerning the formation of confidence measures for UV and the interaction of these techniques with statistical language models used in ASR
Keywords :
Viterbi decoding; hidden Markov models; maximum likelihood decoding; optimisation; parameter estimation; speech recognition; statistical analysis; HMM; Neyman-Pearson hypothesis testing; Viterbi algorithm; acoustic modeling; acoustic models training; best decoded path; confidence measures; continuous speech recognition; decoded string; decoding; experimental study; hidden Markov model; hypothesized word string; incorrectly decoded vocabulary words; likelihood ratio based hypothesis testing; maximum likelihood HMM decoding; out-of-vocabulary words; parameter estimation; performance; public switched telephone network; speech dialog task; statistical language models; training procedure; utterance verification; Acoustic measurements; Automatic speech recognition; Decoding; Hidden Markov models; Parameter estimation; Speech analysis; Speech recognition; Telephony; Testing; Vocabulary;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on