DocumentCode
3000600
Title
Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences
Author
Tran, Quang D. ; Ly, Ngoc Q.
Author_Institution
Fac. of Inf. Technol., Univ. of Sci., Ho Chi Minh City, Vietnam
fYear
2013
fDate
10-13 Nov. 2013
Firstpage
253
Lastpage
258
Abstract
The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.
Keywords
feature extraction; image motion analysis; image representation; image sequences; learning (artificial intelligence); object recognition; 3D sensors; 3D spherical coordinate; 3D spherical histogram of oriented normal vectors; 3DS-HONV; MSRAction3D dataset; MSRGesture3D dataset; depth sequences; discriminative dictionary learning; distinctive representation learning; feature representation; high-level pattern capture; human action recognition; joint shape-motion cues; joint shape-motion descriptor; noise intrinsic effects; optical flow field; sparse coding; sparse representation learning; sparse spatio-temporal representation; visual recognition tasks; Dictionaries; Feature extraction; Histograms; Joints; Quantization (signal); Three-dimensional displays; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
Conference_Location
Hanoi
Print_ISBN
978-1-4799-1349-7
Type
conf
DOI
10.1109/RIVF.2013.6719903
Filename
6719903
Link To Document