• DocumentCode
    417264
  • Title

    Phone duration modeling for LVCSR

  • Author

    Povey, D.

  • Author_Institution
    IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • Volume
    1
  • fYear
    2004
  • fDate
    17-21 May 2004
  • Abstract
    Modeling phone durations in a word-specific fashion has previously been shown to lead to improvements in LVCSR recognition performance. We report results on the Switchboard database which confirm that at least small improvements (around 0.2-0.3% absolute) can be obtained. The duration probabilities are applied to time-marked recognition lattices. Features of the system include a novel data-driven method for smoothing discrete distributions, and a form of discrete distribution which allows phone and word lengths to be modeled simultaneously within a consistent probabilistic framework.
  • Keywords
    Gaussian distribution; smoothing methods; speech coding; speech recognition; LVCSR; Switchboard database; data-driven method; discrete distribution smoothing; duration probabilities; phone duration modeling; probabilistic framework; speech recognition performance; time-marked recognition lattices; word lengths; word-specific fashion; Character generation; Chromium; Frequency; Gaussian processes; Hidden Markov models; Lattices; Probability distribution; Smoothing methods; Spatial databases; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-8484-9
  • Type

    conf

  • DOI
    10.1109/ICASSP.2004.1326114
  • Filename
    1326114