• DocumentCode
    394317
  • Title

    A prosody-based approach to end-of-utterance detection that does not require speech recognition

  • Author

    Ferrer, Luciana ; Shriberg, Elizabeth ; Stolcke, Andreas

  • Author_Institution
    Speech Technol. & Res. Lab., SRI Int., Menlo Park, CA, USA
  • Volume
    1
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    In previous work we showed that state-of-the-art end-of-utterance detection (as used, for example, in dialog systems) can be improved significantly by making use of prosodic and/or language models that predict utterance endpoints, based on word and alignment output from a speech recognizer. However, using a recognizer in endpointing might not be practical in certain applications. We demonstrate that the improvements due to the prosodic knowledge can be realized largely without alignment information, i.e., without requiring a speech recognizer. A prosodic end-of-utterance detector using only speech/nonspeech detection output is still considerably more accurate and has lower latency than a baseline system based on pause-length thresholding.
  • Keywords
    natural languages; signal detection; speech recognition; alignment output; automatic speech recognition system; dialog systems; end-of-utterance detection; language models; latency; pause-length thresholding; prosodic end-of-utterance detector; prosodic models; prosody-based approach; speech/nonspeech detection output; utterance endpoints prediction; word output; Decision trees; Delay; Detection algorithms; Detectors; Laboratories; Man machine systems; Natural languages; Predictive models; Speech recognition; Yield estimation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1198854
  • Filename
    1198854