DocumentCode
3462172
Title
A multimodal learning interface for word acquisition
Author
Ballard, Dana H. ; Yu, Chen
Author_Institution
Dept. of Comput. Sci., Rochester Univ., NY, USA
Volume
5
fYear
2003
fDate
6-10 April 2003
Abstract
We present a multimodal interface that learns words from natural interactions with users. The system can be trained in an unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behavior. We collect acoustic signals in concert with user-centric multisensory information from non-speech modalities, such as user\´s perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize temporal correlations of data from different modalities to build hypothesized lexical items. From those items, an EM-based method selects correct word-meaning pairs. Successful learning has been demonstrated in the experiment of the natural task of "stapling papers".
Keywords
gesture recognition; learning systems; natural language interfaces; optimisation; speech processing; speech recognition; speech-based user interfaces; unsupervised learning; video signal processing; EM-based method; acoustic signals; contextual information; gaze positions; hand movements; head directions; hypothesized lexical items; multimodal learning algorithm; multimodal learning interface; multisensory information; natural interactions; natural language descriptions; unsupervised learning; video; word acquisition; word spotting; Computational modeling; Computer science; Computer vision; Hidden Markov models; Humans; Learning systems; Man machine systems; Natural languages; Pattern recognition; Speech recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-7663-3
Type
conf
DOI
10.1109/ICASSP.2003.1200088
Filename
1200088
Link To Document