Monocular Visual Scene Understanding: Understanding Multi-Object Traffic Scenes

Author

Wojek, Christian ; Walk, Stefan ; Roth, Stefan ; Schindler, Kaspar ; Schiele, Bernt

Author_Institution

Max Planck Inst. for Inf., Saarbrucken, Germany

Volume

35

Issue

4

fYear

2013

fDate

Apr-13

Firstpage

882

Lastpage

897

Abstract

Following recent advances in detection, context modeling, and tracking, scene understanding has been the focus of renewed interest in computer vision research. This paper presents a novel probabilistic 3D scene model that integrates state-of-the-art multiclass object detection, object tracking and scene labeling together with geometric 3D reasoning. Our model is able to represent complex object interactions such as inter-object occlusion, physical exclusion between objects, and geometric context. Inference in this model allows us to jointly recover the 3D scene context and perform 3D multi-object tracking from a mobile observer, for objects of multiple categories, using only monocular video as input. Contrary to many other approaches, our system performs explicit occlusion reasoning and is therefore capable of tracking objects that are partially occluded for extended periods of time, or objects that have never been observed to their full extent. In addition, we show that a joint scene tracklet model for the evidence collected over multiple frames substantially improves performance. The approach is evaluated for different types of challenging onboard sequences. We first show a substantial improvement to the state of the art in 3D multipeople tracking. Moreover, a similar performance gain is achieved for multiclass 3D tracking of cars and trucks on a challenging dataset.

Keywords

automobiles; computer graphics; computer vision; image motion analysis; image representation; inference mechanisms; natural scenes; object detection; object recognition; object tracking; observers; traffic engineering computing; video surveillance; 3D multiclass object tracking; 3D multipeople tracking; cars; complex object interaction representation; computer vision; context modeling; geometric 3D reasoning; inference mechanism; mobile observer; monocular video; monocular visual scene understanding; multiclass object detection; multiobject traffic scene understanding; occlusion reasoning; probabilistic 3D scene model; scene labeling; trucks; Cameras; Cognition; Computational modeling; Detectors; Hidden Markov models; Object detection; Solid modeling; MCMC; Scene understanding; scene tracklets; tracking; tracking-by-detection; Algorithms; Automobiles; Cluster Analysis; Databases, Factual; Human Activities; Humans; Image Processing, Computer-Assisted; Models, Theoretical; Pattern Recognition, Automated; Video Recording; Walking;

fLanguage

English

Journal_Title

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Publisher

ieee

ISSN

0162-8828

Type

jour

DOI

10.1109/TPAMI.2012.174

Filename

6265058