DocumentCode
2106046
Title
Automatic identification of filled pauses in spontaneous speech
Author
O´Shaughnessy, D. ; Gabrea, Marcel
Author_Institution
INRS-Telecommun., Montreal, Que., Canada
Volume
2
fYear
2000
fDate
2000
Firstpage
620
Abstract
Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., “uhh”, “umm”), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficients), identifying filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation
Keywords
acoustic signal processing; natural languages; speech processing; speech recognition; Switchboard database; acoustical analysis; automatic identification; automatic speech recognition; continuous speech; dialogues; duration; estimated textual output; filled pauses; fundamental frequency; hesitations; low false alarm rate; low-order cepstral coefficients; natural speech; natural telephone conversations; normal conversational voice input; phones; speech recognizers; spontaneous speech; utterance; Automatic speech recognition; Cepstral analysis; Databases; Hidden Markov models; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Stress; Telephony;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2000 Canadian Conference on
Conference_Location
Halifax, NS
ISSN
0840-7789
Print_ISBN
0-7803-5957-7
Type
conf
DOI
10.1109/CCECE.2000.849540
Filename
849540
Link To Document