DocumentCode :
157989
Title :
Physical querying with multi-modal sensing
Author :
Iljoo Baek ; Stine, Taylor ; Dash, Denver ; Fanyi Xiao ; Sheikh, Yaser ; Movshovitz-Attias, Yair ; Mei Chen ; Hebert, Martial ; Kanade, Takeo
Author_Institution :
Robot. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA
fYear :
2014
fDate :
24-26 March 2014
Firstpage :
183
Lastpage :
190
Abstract :
We present Marvin, a system that can search physical objects using a mobile or wearable device. It integrates HOG-based object recognition, SURF-based localization information, automatic speech recognition, and user feedback information with a probabilistic model to recognize the “object of interest” at high accuracy and at interactive speeds. Once the object of interest is recognized, the information that the user is querying, e.g. reviews, options, etc., is displayed on the user´s mobile or wearable device. We tested this prototype in a real-world retail store during business hours, with varied degree of background noise and clutter. We show that this multi-modal approach achieves superior recognition accuracy compared to using a vision system alone, especially in cluttered scenes where a vision system would be unable to distinguish which object is of interest to the user without additional input. It is computationally able to scale to large numbers of objects by focusing compute-intensive resources on the objects most likely to be of interest, inferred from user speech and implicit localization information. We present the system architecture, the probabilistic model that integrates the multi-modal information, and empirical results showing the benefits of multi-modal integration.
Keywords :
object recognition; speech recognition; user interfaces; HOG-based object recognition; Marvin; SURF-based localization information; automatic speech recognition; background noise; business hours; cluttered scenes; compute-intensive resources; interactive speeds; mobile device; multi-modal approach; multimodal information; multimodal integration; multimodal sensing; physical querying; probabilistic model; superior recognition accuracy; system architecture; user feedback; vision system; wearable device; Computer architecture; Context; Feature extraction; Object recognition; Speech; Speech recognition; Visualization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on
Conference_Location :
Steamboat Springs, CO
Type :
conf
DOI :
10.1109/WACV.2014.6836103
Filename :
6836103
Link To Document :
بازگشت