Semantic structure from motion with object and point interactions

Author

Bao, Sid Yingze ; Bagra, Mohit ; Savarese, Silvio

Author_Institution

Univ. of Michigan at Ann Arbor, Ann Arbor, MI, USA

fYear

2011

fDate

6-13 Nov. 2011

Firstpage

982

Lastpage

989

Abstract

We propose a new method for jointly detecting objects and recovering the geometry of the scene (camera pose, object and scene point 3D locations) from multiple semi-calibrated images (camera internal parameters are known). To achieve this task, our method models high level semantics (i.e. object class labels and relevant characteristics such as location and pose) and the interaction (correlations) of objects and feature points within the same view and across views. We validate our algorithm against state-of-the-art baseline methods using two public datasets - Ford Car dataset and Kinect Office dataset [1] - and show that we: i) significantly improve the camera pose estimation results compared to point-based SFM algorithm; ii) achieve better 2D and 3D object detection accuracy than using single images separately. Our algorithm is critical in many application scenarios including object manipulation and autonomous navigation.

Keywords

image motion analysis; image reconstruction; object detection; pose estimation; 2D object detection; 3D object detection; Ford Car dataset; Kinect Office dataset; camera pose estimation; multiple semicalibrated images; point-based SFM algorithm; scene geometry recovery; semantic structure; structure from motion; Cameras; Correlation; Feature extraction; Object detection; Semantics; Three dimensional displays; Training;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on

Conference_Location

Barcelona

Print_ISBN

978-1-4673-0062-9

Type

conf

DOI

10.1109/ICCVW.2011.6130358

Filename

6130358