• DocumentCode
    2173894
  • Title

    Discriminative duration modeling for speech recognition with segmental conditional random fields

  • Author

    Kao, Justine T. ; Zweig, Geoffrey ; Nguyen, Patrick

  • Author_Institution
    Symbolic Syst. Program, Stanford Univ., Stanford, CA, USA
  • fYear
    2011
  • fDate
    22-27 May 2011
  • Firstpage
    4476
  • Lastpage
    4479
  • Abstract
    This paper describes a new approach to modeling duration for LVCSR using SCARF, a toolkit for speech recognition with segmental conditional random fields. We utilize SCARF´s ability to integrate long-span, segment-level features to design and test duration models that help discriminate between correct and incorrect word hypotheses. We show that the duration distributions of correct and incorrect word hypotheses differ. Given a word hypothesis in the lattice and its duration, conditional length probabilities are integrated to the SCARF system as duration features. We evaluate three kinds of duration features on Broadcast News: word, pre- and post-pausal durations, and word span confusions. Adding the duration features to SCARF results in an up to 0.3% improvement over a state of-the-art discriminatively trained baseline of 15.3% WER on a Broadcast News task.
  • Keywords
    speech recognition; LVCSR; SCARF system; WER; discriminative duration modeling; post-pausal durations; segmental conditional random fields; speech recognition; Acoustics; Context; Hidden Markov models; Lattices; Mathematical model; Speech; Speech recognition; automatic speech recognition; duration modeling; segmental conditional random fields;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
  • Conference_Location
    Prague
  • ISSN
    1520-6149
  • Print_ISBN
    978-1-4577-0538-0
  • Electronic_ISBN
    1520-6149
  • Type

    conf

  • DOI
    10.1109/ICASSP.2011.5947348
  • Filename
    5947348