Title :
Towards visually-grounded spoken language acquisition
Author_Institution :
Media Lab., MIT, Cambridge, MA, USA
Abstract :
A characteristic shared by most approaches to natural language understanding and generation is the use of symbolic representations of word and sentence meanings. Frames and semantic nets are examples of symbolic representations. Symbolic methods are inappropriate for applications which require natural language semantics to be linked to perception, as is the case in tasks such as scene description or human-robot interaction. This paper presents two implemented systems, one that learns to generate, and one that learns to understand visually-grounded spoken language. These implementations are part of our on-going effort to develop a comprehensive model of perceptually-grounded semantics.
Keywords :
learning (artificial intelligence); natural language interfaces; speech recognition; speech synthesis; speech-based user interfaces; human-robot interaction; natural language generation; natural language understanding; perception; perceptually-grounded semantics; scene description; sentence meanings; statistical learning algorithms; symbolic representations; visually-grounded spoken language acquisition; word meanings; Character generation; Cognitive robotics; Grounding; Humans; Joining processes; Laboratories; Layout; Natural languages; Speech synthesis; Statistical learning;
Conference_Titel :
Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on
Print_ISBN :
0-7695-1834-6
DOI :
10.1109/ICMI.2002.1166977