DocumentCode :
2106046
Title :
Automatic identification of filled pauses in spontaneous speech
Author :
O´Shaughnessy, D. ; Gabrea, Marcel
Author_Institution :
INRS-Telecommun., Montreal, Que., Canada
Volume :
2
fYear :
2000
fDate :
2000
Firstpage :
620
Abstract :
Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., “uhh”, “umm”), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficients), identifying filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation
Keywords :
acoustic signal processing; natural languages; speech processing; speech recognition; Switchboard database; acoustical analysis; automatic identification; automatic speech recognition; continuous speech; dialogues; duration; estimated textual output; filled pauses; fundamental frequency; hesitations; low false alarm rate; low-order cepstral coefficients; natural speech; natural telephone conversations; normal conversational voice input; phones; speech recognizers; spontaneous speech; utterance; Automatic speech recognition; Cepstral analysis; Databases; Hidden Markov models; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Stress; Telephony;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical and Computer Engineering, 2000 Canadian Conference on
Conference_Location :
Halifax, NS
ISSN :
0840-7789
Print_ISBN :
0-7803-5957-7
Type :
conf
DOI :
10.1109/CCECE.2000.849540
Filename :
849540
Link To Document :
بازگشت