Title :
Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences
Author :
Tran, Quang D. ; Ly, Ngoc Q.
Author_Institution :
Fac. of Inf. Technol., Univ. of Sci., Ho Chi Minh City, Vietnam
Abstract :
The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.
Keywords :
feature extraction; image motion analysis; image representation; image sequences; learning (artificial intelligence); object recognition; 3D sensors; 3D spherical coordinate; 3D spherical histogram of oriented normal vectors; 3DS-HONV; MSRAction3D dataset; MSRGesture3D dataset; depth sequences; discriminative dictionary learning; distinctive representation learning; feature representation; high-level pattern capture; human action recognition; joint shape-motion cues; joint shape-motion descriptor; noise intrinsic effects; optical flow field; sparse coding; sparse representation learning; sparse spatio-temporal representation; visual recognition tasks; Dictionaries; Feature extraction; Histograms; Joints; Quantization (signal); Three-dimensional displays; Vectors;
Conference_Titel :
Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on
Conference_Location :
Hanoi
Print_ISBN :
978-1-4799-1349-7
DOI :
10.1109/RIVF.2013.6719903