What do 15,000 object categories tell us about classifying and localizing actions?

Author

Mihir Jain;Jan C. van Gemert;Cees G. M. Snoek

Author_Institution

University of Amsterdam, The Netherlands

fYear

2015

fDate

6/1/2015 12:00:00 AM

Firstpage

46

Lastpage

55

Abstract

This paper contributes to automatic classification and localization of human actions in video. Whereas motion is the key ingredient in modern approaches, we assess the benefits of having objects in the video representation. Rather than considering a handful of carefully selected and localized objects, we conduct an empirical study on the benefit of encoding 15,000 object categories for action using 6 datasets totaling more than 200 hours of video and covering 180 action classes. Our key contributions are i) the first in-depth study of encoding objects for actions, ii) we show that objects matter for actions, and are often semantically relevant as well. iii) We establish that actions have object preferences. Rather than using all objects, selection is advantageous for action recognition. iv)We reveal that object-action relations are generic, which allows to transferring these relationships from the one domain to the other. And, v) objects, when combined with motion, improve the state-of-the-art for both action classification and localization.

Keywords

"Encoding","Games","Accuracy","Cameras","Training","Visualization","Neural networks"

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on

Electronic_ISBN

1063-6919

Type

conf

DOI

10.1109/CVPR.2015.7298599

Filename

7298599