Title :
Connecting concepts from vision and speech processing
Author :
Wachsmuth, Sven ; Sagerer, Gerhard
Author_Institution :
Fac. of Technol., Bielefeld Univ., Germany
fDate :
6/21/1905 12:00:00 AM
Abstract :
This paper addresses the problem of how to establish referential links between interpretations of speech and visual data. In order to get rid of erroneous, vague, or incomplete conceptual descriptions, we propose a probabilistic interaction scheme. The modelling of dependencies and the calculation of inferences are realized by using Bayesian networks. This interaction scheme provides a basis for disambiguation and error recovery. We implemented an interaction component in an assembly task environment. A robot constructor can be instructed by speech and pointing gestures in order to connect primitive component parts of a wooden toy construction kit. The system is evaluated on a test data set which consists of 448 spoken utterances from 16 speakers who name objects on 10 images from different scenes. First results show the effectiveness and robustness of the probabilistic approach
Keywords :
belief networks; speech processing; Bayesian networks; assembly task environment; disambiguation; error recovery; inferences; pointing gestures; probabilistic interaction scheme; referential links; speech data; speech processing; test data set; visual data; Artificial intelligence; Bayesian methods; Information resources; Joining processes; Layout; Robotic assembly; Robots; Robustness; Speech processing; System testing;
Conference_Titel :
Integration of Speech and Image Understanding, 1999. Proceedings
Conference_Location :
Corfu
Print_ISBN :
0-7695-0471-X
DOI :
10.1109/ISIU.1999.824829