Title : 
Automatic identification of filled pauses in spontaneous speech
         
        
            Author : 
O´Shaughnessy, D. ; Gabrea, Marcel
         
        
            Author_Institution : 
INRS-Telecommun., Montreal, Que., Canada
         
        
        
        
        
        
            Abstract : 
Practical speech recognizers must accept normal conversational voice input (including hesitations). However, most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. Hesitations, of which the most frequent are filled pauses, are common in natural speech, yet few recognition systems handle such disfluencies with any degree of success. Filled pauses (e.g., “uhh”, “umm”), unlike most silent pauses, resemble phones which form words in continuous speech. The work reported here further develops techniques to allow automatic identification of filled pauses. Such identification, if reliable, would reduce potential confusion in determining an estimated textual output for an utterance. The Switchboard database (of natural telephone conversations) provided data for the study. While most automatic recognition methods rely entirely on spectral envelope (e.g., low-order cepstral coefficients), identifying filled pauses requires using a combination of spectra, fundamental frequency and duration. High precision and a low false alarm rate for filled pauses are feasible without excessive computation
         
        
            Keywords : 
acoustic signal processing; natural languages; speech processing; speech recognition; Switchboard database; acoustical analysis; automatic identification; automatic speech recognition; continuous speech; dialogues; duration; estimated textual output; filled pauses; fundamental frequency; hesitations; low false alarm rate; low-order cepstral coefficients; natural speech; natural telephone conversations; normal conversational voice input; phones; speech recognizers; spontaneous speech; utterance; Automatic speech recognition; Cepstral analysis; Databases; Hidden Markov models; Loudspeakers; Natural languages; Speech analysis; Speech recognition; Stress; Telephony;
         
        
        
        
            Conference_Titel : 
Electrical and Computer Engineering, 2000 Canadian Conference on
         
        
            Conference_Location : 
Halifax, NS
         
        
        
            Print_ISBN : 
0-7803-5957-7
         
        
        
            DOI : 
10.1109/CCECE.2000.849540