• DocumentCode
    3462172
  • Title

    A multimodal learning interface for word acquisition

  • Author

    Ballard, Dana H. ; Yu, Chen

  • Author_Institution
    Dept. of Comput. Sci., Rochester Univ., NY, USA
  • Volume
    5
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    We present a multimodal interface that learns words from natural interactions with users. The system can be trained in an unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behavior. We collect acoustic signals in concert with user-centric multisensory information from non-speech modalities, such as user\´s perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize temporal correlations of data from different modalities to build hypothesized lexical items. From those items, an EM-based method selects correct word-meaning pairs. Successful learning has been demonstrated in the experiment of the natural task of "stapling papers".
  • Keywords
    gesture recognition; learning systems; natural language interfaces; optimisation; speech processing; speech recognition; speech-based user interfaces; unsupervised learning; video signal processing; EM-based method; acoustic signals; contextual information; gaze positions; hand movements; head directions; hypothesized lexical items; multimodal learning algorithm; multimodal learning interface; multisensory information; natural interactions; natural language descriptions; unsupervised learning; video; word acquisition; word spotting; Computational modeling; Computer science; Computer vision; Hidden Markov models; Humans; Learning systems; Man machine systems; Natural languages; Pattern recognition; Speech recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1200088
  • Filename
    1200088