Title :
Exploiting loudness dynamics in stochastic models of turn-taking
Author_Institution :
Carnegie Mellon Univ., Pittsburgh, PA, USA
Abstract :
Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.
Keywords :
computational complexity; entropy; feedforward neural nets; speech recognition; speech synthesis; stochastic processes; N-gram baseline; attention memory requirements; binary-valued features; binary-valued speech-nonspeech contours; continuous-valued features; cross-entropy; feedforward neural networks; loudness contours; loudness dynamics; loudness-sensitive conversants; model complexity; spoken dialogue systems; stochastic turn-taking models; Artificial neural networks; Computational modeling; Context; Entropy; Speech; Standards; Stochastic processes; Interaction models; neural networks; prosody; spoken dialogue systems;
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2012 IEEE
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4673-5125-6
Electronic_ISBN :
978-1-4673-5124-9
DOI :
10.1109/SLT.2012.6424201