Title :
Context dependent phonetic duration models for decoding conversational speech
Author :
Monkowski, Michael D. ; Picheny, Michael A. ; Rao, P. Srinivasa
Author_Institution :
Human Language Technol. Group, IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
Conversational speech provides a particularly difficult task for speech recognition. It provides much more variability than either dictation, read speech, or isolated commands. Phonetic context was used to predict the durations of phones using a decision tree. These predictions were used to calculate context dependent HMM transition probabilities for these phone models, which were used to decode telephone conversations from the SwitchBoard corpus. We observed that the duration models do not appreciably improve the word error rate; that more can be gained by modeling phone durations within words than by adjusting for local average speaking rates; and conclude that local or global variations in speaking rate are not major contributors to the observed high error rates for SwitchBoard
Keywords :
decision theory; decoding; error statistics; hidden Markov models; prediction theory; probability; speech processing; speech recognition; telephony; trees (mathematics); SwitchBoard corpus; context dependent HMM transition probabilities; context dependent phonetic duration models; conversational speech decoding; decision tree; error rates; global variations; local average speaking rates; local variations; phonetic context; phonetic duration prediction; speech recognition; telephone conversations; word error rate; Context modeling; Decision trees; Decoding; Error analysis; Hidden Markov models; Probability; Shape measurement; Speech recognition; Telephony; Topology; Viterbi algorithm;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Print_ISBN :
0-7803-2431-5
DOI :
10.1109/ICASSP.1995.479645