Action snippets: How many frames does human action recognition require?

Author

Schindler, Konrad ; Van Gool, Luc

Author_Institution

BIWI, ETH Zurich, Zurich

fYear

2008

fDate

23-28 June 2008

Firstpage

1

Lastpage

8

Abstract

Visual recognition of human actions in video clips has been an active field of research in recent years. However, most published methods either analyse an entire video and assign it a single action label, or use relatively large look-ahead to classify each frame. Contrary to these strategies, human vision proves that simple actions can be recognised almost instantaneously. In this paper, we present a system for action recognition from very short sequences (ldquosnippetsrdquo) of 1-10 frames, and systematically evaluate it on standard data sets. It turns out that even local shape and optic flow for a single frame are enough to achieve ap90% correct recognitions, and snippets of 5-7 frames (0.3-0.5 seconds of video) are enough to achieve a performance similar to the one obtainable with the entire video sequence.

Keywords

image classification; image motion analysis; image recognition; image sequences; video signal processing; action recognition; action snippets; human action recognition; human vision; video clips; visual recognition; Feature extraction; Humans; Image motion analysis; Layout; Legged locomotion; Shape; Surveillance; Video sequences; Visual databases; Voting;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on

Conference_Location

Anchorage, AK

ISSN

1063-6919

Print_ISBN

978-1-4244-2242-5

Electronic_ISBN

1063-6919

Type

conf

DOI

10.1109/CVPR.2008.4587730

Filename

4587730