• DocumentCode
    2106046
  • Title

    Automatic identification of filled pauses in spontaneous speech

  • Author

    O´Shaughnessy, D. ; Gabrea, Marcel

  • Author_Institution
    INRS-Telecommun., Montreal, Que., Canada
  • Volume
    2
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    620
  • Abstract
    Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., “uhh”, “umm”), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficients), identifying filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation
  • Keywords
    acoustic signal processing; natural languages; speech processing; speech recognition; Switchboard database; acoustical analysis; automatic identification; automatic speech recognition; continuous speech; dialogues; duration; estimated textual output; filled pauses; fundamental frequency; hesitations; low false alarm rate; low-order cepstral coefficients; natural speech; natural telephone conversations; normal conversational voice input; phones; speech recognizers; spontaneous speech; utterance; Automatic speech recognition; Cepstral analysis; Databases; Hidden Markov models; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Stress; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electrical and Computer Engineering, 2000 Canadian Conference on
  • Conference_Location
    Halifax, NS
  • ISSN
    0840-7789
  • Print_ISBN
    0-7803-5957-7
  • Type

    conf

  • DOI
    10.1109/CCECE.2000.849540
  • Filename
    849540