• DocumentCode
    794820
  • Title

    Speaking mode variability in multimodal speech production

  • Author

    Vatikiotis-Bateson, Eric ; Yehia, Hani C.

  • Author_Institution
    Commun. Dynamics Project, ATR Human Inf. Sci. Labs., Kyoto, Japan
  • Volume
    13
  • Issue
    4
  • fYear
    2002
  • fDate
    7/1/2002 12:00:00 AM
  • Firstpage
    894
  • Lastpage
    899
  • Abstract
    The speech acoustics and the phonetically relevant motion of the face during speech are determined by the time-varying behavior of the vocal tract. A benefit of this linkage is that we are able to estimate face motion from the spectral acoustics during speech production using simple neural networks. Thus far, however, the scope of reliable estimation has been limited to individual sentences; network training degrades sharply when multiple sentences are analyzed together. While there is a number of potential avenues for improving network generalization, this paper investigates the possibility that the experimental recording procedures introduce artificial boundary constraints between sentence length utterances. Specifically, the same sentence materials were recorded individually and as part of longer, paragraph length utterances. The scope of reliable network estimation was found to depend both on the length of the utterance (sentence versus paragraph) and, not surprisingly, on phonetic content: estimation of face motion from speech acoustics was reliable for larger sentence training sets when sentences were recorded in continuous paragraph readings; and greater phonetic diversity reduced reliability.
  • Keywords
    face recognition; generalisation (artificial intelligence); image motion analysis; neural nets; speech processing; time-varying systems; artificial boundary constraints; face motion estimation; multimodal speech production; multiple sentences; network generalization; neural networks; paragraph length utterances; phonetically relevant face motion; sentence length utterances; speaking mode variability; spectral acoustics; speech acoustics; speech production; time-varying behavior; utterance length; vocal tract; Acoustics; Continuous production; Couplings; Degradation; Facial animation; Humans; Motion estimation; Natural languages; Neural networks; Speech;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2002.1021890
  • Filename
    1021890