DocumentCode :
394312
Title :
Utterance verification based on statistics of phone-level confidence scores
Author :
Sankar, Apama ; Wu, Su-Lin
Author_Institution :
Nuance Commun., Menlo Park, CA, USA
Volume :
1
fYear :
2003
fDate :
6-10 April 2003
Abstract :
We present new acoustic confidence scores for utterance verification based on novel combinations of phone-level posterior probability statistics. A common utterance acoustic confidence score used in the literature is the arithmetic mean (computed over the utterance) of the phone log posterior probabilities. This approach can be problematic when a large part of the utterance is in-grammar (IG), but a small part is out-of-grammar (OOG). For example, a caller says an OOG name "Larry" and is incorrectly recognized as an IG name "Harry". Since most phones were correctly recognized, the mean of the phone posteriors gives a high utterance level score even though the recognition result should ideally be rejected. We introduce additional statistics, such as the variance and low percentile points of the phone-posterior scores over the utterance, that help in capturing the deviation of otherwise good recognition matches. We report on our experiments on combining these statistics. In particular, by normalizing the mean with the standard deviation, we achieved a 10-20% relative improvement in performance for alpha-digit test sets where OOG utterances are often incorrectly recognized as very similar IG ones.
Keywords :
acoustic signal processing; probability; speech recognition; statistical analysis; ASR; IG utterances; OOG utterances; acoustic confidence scores; alpha-digit test sets; arithmetic mean; automatic speech recognition; in-grammar utterance; log posterior probabilities; low percentile; out-of-grammar utterance; phone posteriors mean; phone-level confidence scores; phone-level posterior probability statistics; phone-posterior scores; standard deviation; utterance verification; variance; word recognition; Acoustic measurements; Acoustic testing; Arithmetic; Automatic speech recognition; Probability; Statistics; Viterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1198848
Filename :
1198848
Link To Document :
بازگشت