Title :
Far-Field Multimodal Speech Processing and Conversational Interaction in Smart Spaces
Author :
Potamianos, Gerasimos ; Huang, Jing ; Marcheret, Etienne ; Libal, Vit ; Balchandran, Rajesh ; Epstein, Mark ; Seredi, Ladislav ; Labsky, Martin ; Ures, Lubos ; Black, Matthew ; Lucey, Patrick
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY
Abstract :
Robust speech processing constitutes a crucial component in the development of usable and natural conversational interfaces. In this paper we are particularly interested in human-computer interaction taking place in "smart" spaces - equipped with a number of far- field, unobtrusive microphones and camera sensors. Their availability allows multi-sensory and multi-modal processing, thus improving robustness of speech-based perception technologies in a number of scenarios of interest, for example lectures and meetings held inside smart conference rooms, or interaction with domotic devices in smart homes. In this paper, we overview recent work at IBM Research in developing state-of-the-art speech technology in smart spaces. In particular we discuss acoustic scene analysis, speech activity detection, speaker diarization, and speech recognition, emphasizing multi-sensory or multi-modal processing. The resulting technology is envisaged to allow far-field conversational interaction in smart spaces based on dialog management and natural language understanding of user requests.
Keywords :
human computer interaction; intelligent sensors; microphones; natural language interfaces; speech recognition; dialog management; far-field multimodal robust speech processing; human-computer interaction; microphone; multimodal processing; multisensory processing; natural conversational interface development; natural language understanding; smart camera sensor; smart space; speaker diarization; speech activity detection; speech recognition; speech-based perception technology; Acoustic devices; Image analysis; Intelligent sensors; Microphones; Robustness; Smart cameras; Smart homes; Space technology; Speech analysis; Speech processing; Acoustic scene analysis; audio-visual speech recognition; dialog systems; fusion; smart rooms; speaker diarization; speech activity detection; speech recognition;
Conference_Titel :
Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008
Conference_Location :
Trento
Print_ISBN :
978-1-4244-2337-8
Electronic_ISBN :
978-1-4244-2338-5
DOI :
10.1109/HSCMA.2008.4538701