DocumentCode
394312
Title
Utterance verification based on statistics of phone-level confidence scores
Author
Sankar, Apama ; Wu, Su-Lin
Author_Institution
Nuance Commun., Menlo Park, CA, USA
Volume
1
fYear
2003
fDate
6-10 April 2003
Abstract
We present new acoustic confidence scores for utterance verification based on novel combinations of phone-level posterior probability statistics. A common utterance acoustic confidence score used in the literature is the arithmetic mean (computed over the utterance) of the phone log posterior probabilities. This approach can be problematic when a large part of the utterance is in-grammar (IG), but a small part is out-of-grammar (OOG). For example, a caller says an OOG name "Larry" and is incorrectly recognized as an IG name "Harry". Since most phones were correctly recognized, the mean of the phone posteriors gives a high utterance level score even though the recognition result should ideally be rejected. We introduce additional statistics, such as the variance and low percentile points of the phone-posterior scores over the utterance, that help in capturing the deviation of otherwise good recognition matches. We report on our experiments on combining these statistics. In particular, by normalizing the mean with the standard deviation, we achieved a 10-20% relative improvement in performance for alpha-digit test sets where OOG utterances are often incorrectly recognized as very similar IG ones.
Keywords
acoustic signal processing; probability; speech recognition; statistical analysis; ASR; IG utterances; OOG utterances; acoustic confidence scores; alpha-digit test sets; arithmetic mean; automatic speech recognition; in-grammar utterance; log posterior probabilities; low percentile; out-of-grammar utterance; phone posteriors mean; phone-level confidence scores; phone-level posterior probability statistics; phone-posterior scores; standard deviation; utterance verification; variance; word recognition; Acoustic measurements; Acoustic testing; Arithmetic; Automatic speech recognition; Probability; Statistics; Viterbi algorithm;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1198848
Filename
1198848
Link To Document