DocumentCode
3427224
Title
An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems
Author
Laskowski, Kornel ; Edlund, Jens ; Heldner, Mattias
Author_Institution
interACT, Carnegie Mellon Univ., Pittsburgh, PA
fYear
2008
fDate
March 31 2008-April 4 2008
Firstpage
5041
Lastpage
5044
Abstract
As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations. We derive an instantaneous vector representation of pitch variation and show that it is amenable to standard acoustic modeling techniques. Using a small amount of automatically labeled data, we train models which significantly outperform current state-of-the-art pause-only systems, and replicate to within 1% absolute the performance of our previously published hand-crafted baseline. The new system additionally offers scope for run-time control over the precision or recall of locations at which to speak.
Keywords
acoustic signal processing; signal representation; speaker recognition; acoustic modeling; complex domains; delta pitch variation; human-human conversations; instantaneous vector representation; run-time control; speaker change prediction; spoken dialogue systems; Automatic control; Automatic speech recognition; Communication system control; Control systems; Delay; Humans; Loudspeakers; Oral communication; Predictive models; Runtime; Frequency domain analysis; Signal representation; Speech communication; Speech processing; User interfaces;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location
Las Vegas, NV
ISSN
1520-6149
Print_ISBN
978-1-4244-1483-3
Electronic_ISBN
1520-6149
Type
conf
DOI
10.1109/ICASSP.2008.4518791
Filename
4518791
Link To Document