• DocumentCode
    3131835
  • Title

    Exploiting loudness dynamics in stochastic models of turn-taking

  • Author

    Laskowski, K.

  • Author_Institution
    Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2012
  • fDate
    2-5 Dec. 2012
  • Firstpage
    79
  • Lastpage
    84
  • Abstract
    Stochastic turn-taking models have traditionally been implemented as N-grams, which condition predictions on recent binary-valued speech/non-speech contours. The current work re-implements this function using feed-forward neural networks, capable of accepting binary- as well as continuous-valued features; performance is shown to asymptotically approach that of the N-gram baseline as model complexity increases. The conditioning context is then extended to leverage loudness contours. Experiments indicate that the additional sensitivity to loudness considerably decreases average cross entropy rates on unseen data, by 0.03 bits per framing interval of 100 ms. This reduction is shown to make loudness-sensitive conversants capable of better predictions, with attention memory requirements at least 5 times smaller and responsiveness latency at least 10 times shorter than the loudness-insensitive baseline.
  • Keywords
    computational complexity; entropy; feedforward neural nets; speech recognition; speech synthesis; stochastic processes; N-gram baseline; attention memory requirements; binary-valued features; binary-valued speech-nonspeech contours; continuous-valued features; cross-entropy; feedforward neural networks; loudness contours; loudness dynamics; loudness-sensitive conversants; model complexity; spoken dialogue systems; stochastic turn-taking models; Artificial neural networks; Computational modeling; Context; Entropy; Speech; Standards; Stochastic processes; Interaction models; neural networks; prosody; spoken dialogue systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Spoken Language Technology Workshop (SLT), 2012 IEEE
  • Conference_Location
    Miami, FL
  • Print_ISBN
    978-1-4673-5125-6
  • Electronic_ISBN
    978-1-4673-5124-9
  • Type

    conf

  • DOI
    10.1109/SLT.2012.6424201
  • Filename
    6424201