DocumentCode :
3005721
Title :
Hierarchical spatio-temporal context modeling for action recognition
Author :
Ju Sun ; Xiao Wu ; Shuicheng Yan ; Loong-Fah Cheong ; Tat-Seng Chua ; Jintao Li
Author_Institution :
Interactive & Digital Media Inst., Nat. Univ. of Singapore, Singapore, Singapore
fYear :
2009
fDate :
20-25 June 2009
Firstpage :
2004
Lastpage :
2011
Abstract :
The problem of recognizing actions in realistic videos is challenging yet absorbing owing to its great potentials in many practical applications. Most previous research is limited due to the use of simplified action databases under controlled environments or focus on excessively localized features without sufficiently encapsulating the spatio-temporal context. In this paper, we propose to model the spatio-temporal context information in a hierarchical way, where three levels of context are exploited in ascending order of abstraction: 1) point-level context (SIFT average descriptor), 2) intra-trajectory context (trajectory transition descriptor), and 3) inter-trajectory context (trajectory proximity descriptor). To obtain efficient and compact representations for the latter two levels, we encode the spatiotemporal context information into the transition matrix of a Markov process, and then extract its stationary distribution as the final context descriptor. Building on the multichannel nonlinear SVMs, we validate this proposed hierarchical framework on the realistic action (HOHA) and event (LSCOM) recognition databases, and achieve 27% and 66% relative performance improvements over the state-of-the-art results, respectively. We further propose to employ the Multiple Kernel Learning (MKL) technique to prune the kernels towards speedup in algorithm evaluation.
Keywords :
Markov processes; image motion analysis; image recognition; video signal processing; visual databases; Markov process; SIFT average descriptor; action databases; action recognition; event recognition databases; hierarchical spatio-temporal context modeling; inter-trajectory context; intra-trajectory context; multichannel nonlinear SVM; multiple kernel learning; point-level context; realistic videos; spatio-temporal context information; stationary distribution; trajectory proximity descriptor; trajectory transition descriptor; transition matrix; Application software; Computer vision; Context modeling; Kernel; Markov processes; Object detection; Spatial databases; Spatiotemporal phenomena; Videos; Visual databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
Conference_Location :
Miami, FL
ISSN :
1063-6919
Print_ISBN :
978-1-4244-3992-8
Type :
conf
DOI :
10.1109/CVPR.2009.5206721
Filename :
5206721
Link To Document :
بازگشت