مرکز منطقه ای اطلاع رساني علوم و فناوري - Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

DocumentCode :

2916178

Title :

Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis

Author :

Le, Quoc V. ; Zou, Will Y. ; Yeung, Serena Y. ; Ng, Andrew Y.

Author_Institution :

Comput. Sci. Dept., Stanford Univ., Stanford, CA, USA

fYear :

2011

fDate :

20-25 June 2011

Firstpage :

3361

Lastpage :

3368

Abstract :

Previous work on action recognition has focused on adapting hand-designed local features, such as SIFT or HOG, from static images to the video domain. In this paper, we propose using unsupervised feature learning as a way to learn features directly from video data. More specifically, we present an extension of the Independent Subspace Analysis algorithm to learn invariant spatio-temporal features from unlabeled video data. We discovered that, despite its simplicity, this method performs surprisingly well when combined with deep learning techniques such as stacking and convolution to learn hierarchical representations. By replacing hand-designed features with our learned features, we achieve classification results superior to all previous published results on the Hollywood2, UCF, KTH and YouTube action recognition datasets. On the challenging Hollywood2 and YouTube action datasets we obtain 53.3% and 75.8% respectively, which are approximately 5% better than the current best published results. Further benefits of this method, such as the ease of training and the efficiency of training and prediction, will also be discussed. You can download our code and learned spatio-temporal features here: http://ai.stanford.edu/~wzou/.

Keywords :

approximation theory; feature extraction; image classification; image representation; spatiotemporal phenomena; unsupervised learning; video signal processing; HOG; KTH; SIFT; UCF; YouTube action recognition dataset; action recognition; hand-designed local feature; hierarchical invariant spatio-temporal feature learning technique; hierarchical representation; independent subspace analysis algorithm; static image; unsupervised feature learning; video data; video domain; Convolution; Detectors; Feature extraction; Image edge detection; Neurons; Training; Videos;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on

Conference_Location :

Providence, RI

ISSN :

1063-6919

Print_ISBN :

978-1-4577-0394-2

Type :

conf

DOI :

10.1109/CVPR.2011.5995496

Filename :

5995496

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2916178