DocumentCode
417264
Title
Phone duration modeling for LVCSR
Author
Povey, D.
Author_Institution
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Volume
1
fYear
2004
fDate
17-21 May 2004
Abstract
Modeling phone durations in a word-specific fashion has previously been shown to lead to improvements in LVCSR recognition performance. We report results on the Switchboard database which confirm that at least small improvements (around 0.2-0.3% absolute) can be obtained. The duration probabilities are applied to time-marked recognition lattices. Features of the system include a novel data-driven method for smoothing discrete distributions, and a form of discrete distribution which allows phone and word lengths to be modeled simultaneously within a consistent probabilistic framework.
Keywords
Gaussian distribution; smoothing methods; speech coding; speech recognition; LVCSR; Switchboard database; data-driven method; discrete distribution smoothing; duration probabilities; phone duration modeling; probabilistic framework; speech recognition performance; time-marked recognition lattices; word lengths; word-specific fashion; Character generation; Chromium; Frequency; Gaussian processes; Hidden Markov models; Lattices; Probability distribution; Smoothing methods; Spatial databases; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-8484-9
Type
conf
DOI
10.1109/ICASSP.2004.1326114
Filename
1326114
Link To Document