Title :
Learning audio-visual associations using mutual information
Author :
Roy, Deb ; Schiele, Bernt ; Pentland, Alex
Author_Institution :
MIT, Cambridge, MA, USA
fDate :
6/21/1905 12:00:00 AM
Abstract :
This paper addresses the problem of finding useful associations between audio and visual input signals. The proposed approach is based on the maximization of mutual information of audio-visual clusters. This approach results in segmentation of continuous speech signals, and finds visual categories which correspond to segmented spoken words. Such audio-visual associations may be used for modeling infant language acquisition and to dynamically personalize speech-based human-computer interfaces for various applications including catalog browsing and wearable computing. This paper describes an implemented system for learning shape names from camera and microphone input. We present results in an evaluation of the system for the domain of modeling language learning
Keywords :
image segmentation; optimisation; speech-based user interfaces; audio-visual associations learning; catalog browsing; continuous speech signals; image segmentation; infant language acquisition; language learning; maximization; mutual information; speech-based human-computer interfaces; visual categories; wearable computing; Cameras; Clothing; Electronic switching systems; Laboratories; Mutual information; Natural languages; Shape; Speech; Streaming media; Wearable computers;
Conference_Titel :
Integration of Speech and Image Understanding, 1999. Proceedings
Conference_Location :
Corfu
Print_ISBN :
0-7695-0471-X
DOI :
10.1109/ISIU.1999.824909