DocumentCode :
3462172
Title :
A multimodal learning interface for word acquisition
Author :
Ballard, Dana H. ; Yu, Chen
Author_Institution :
Dept. of Comput. Sci., Rochester Univ., NY, USA
Volume :
5
fYear :
2003
fDate :
6-10 April 2003
Abstract :
We present a multimodal interface that learns words from natural interactions with users. The system can be trained in an unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behavior. We collect acoustic signals in concert with user-centric multisensory information from non-speech modalities, such as user\´s perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize temporal correlations of data from different modalities to build hypothesized lexical items. From those items, an EM-based method selects correct word-meaning pairs. Successful learning has been demonstrated in the experiment of the natural task of "stapling papers".
Keywords :
gesture recognition; learning systems; natural language interfaces; optimisation; speech processing; speech recognition; speech-based user interfaces; unsupervised learning; video signal processing; EM-based method; acoustic signals; contextual information; gaze positions; hand movements; head directions; hypothesized lexical items; multimodal learning algorithm; multimodal learning interface; multisensory information; natural interactions; natural language descriptions; unsupervised learning; video; word acquisition; word spotting; Computational modeling; Computer science; Computer vision; Hidden Markov models; Humans; Learning systems; Man machine systems; Natural languages; Pattern recognition; Speech recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
ISSN :
1520-6149
Print_ISBN :
0-7803-7663-3
Type :
conf
DOI :
10.1109/ICASSP.2003.1200088
Filename :
1200088
Link To Document :
بازگشت