DocumentCode
31575
Title
Explicit Modeling of Human-Object Interactions in Realistic Videos
Author
Prest, A. ; Ferrari, V. ; Schmid, Cordelia
Author_Institution
Comput. Vision Lab., ETH Zurich, Zurich, Switzerland
Volume
35
Issue
4
fYear
2013
fDate
Apr-13
Firstpage
835
Lastpage
848
Abstract
We introduce an approach for learning human actions as interactions between persons and objects in realistic videos. Previous work typically represents actions with low-level features such as image gradients or optical flow. In contrast, we explicitly localize in space and track over time both the object and the person, and represent an action as the trajectory of the object w.r.t. to the person position. Our approach relies on state-of-the-art techniques for human detection [32], object detection [10], and tracking [39]. We show that this results in human and object tracks of sufficient quality to model and localize human-object interactions in realistic videos. Our human-object interaction features capture the relative trajectory of the object w.r.t. the human. Experimental results on the Coffee and Cigarettes dataset [25], the video dataset of [19], and the Rochester Daily Activities dataset [29] show that 1) our explicit human-object model is an informative cue for action recognition; 2) it is complementary to traditional low-level descriptors such as 3D--HOG [23] extracted over human tracks. We show that combining our human-object interaction features with 3D-HOG improves compared to their individual performance as well as over the state of the art [23], [29].
Keywords
gesture recognition; human computer interaction; image sequences; learning (artificial intelligence); object detection; object tracking; realistic images; solid modelling; video signal processing; 3D-HOG; Rochester daily activities dataset; action recognition; coffee and cigarettes dataset; explicit human-object model; explicit modeling; human actions learning; human detection; human tracks; human-object interaction features; human-object interactions; image gradients; informative cue; low-level descriptors; low-level features; object detection; object tracking; object trajectory; object w.r.t; optical flow; person position; realistic videos; relative trajectory; state-of-the-art techniques; video dataset; Detectors; Feature extraction; Humans; Target tracking; Training; Videos; Action recognition; human-object interaction; video analysis; Algorithms; Artificial Intelligence; Databases, Factual; Human Activities; Humans; Image Processing, Computer-Assisted; Pattern Recognition, Automated; Video Recording;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2012.175
Filename
6265059
Link To Document