Automatic identification of filled pauses in spontaneous speech

Author

O´Shaughnessy, D. ; Gabrea, Marcel

Author_Institution

INRS-Telecommun., Montreal, Que., Canada

Volume

2

fYear

2000

fDate

2000

Firstpage

620

Abstract

Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., “uhh”, “umm”), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficients), identifying filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation

Keywords

acoustic signal processing; natural languages; speech processing; speech recognition; Switchboard database; acoustical analysis; automatic identification; automatic speech recognition; continuous speech; dialogues; duration; estimated textual output; filled pauses; fundamental frequency; hesitations; low false alarm rate; low-order cepstral coefficients; natural speech; natural telephone conversations; normal conversational voice input; phones; speech recognizers; spontaneous speech; utterance; Automatic speech recognition; Cepstral analysis; Databases; Hidden Markov models; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Stress; Telephony;

fLanguage

English

Publisher

ieee

Conference_Titel

Electrical and Computer Engineering, 2000 Canadian Conference on

Conference_Location

Halifax, NS

ISSN

0840-7789

Print_ISBN

0-7803-5957-7

Type

conf

DOI

10.1109/CCECE.2000.849540

Filename

849540