مرکز منطقه ای اطلاع رساني علوم و فناوري - Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences

DocumentCode :

3000600

Title :

Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences

Author :

Tran, Quang D. ; Ly, Ngoc Q.

Author_Institution :

Fac. of Inf. Technol., Univ. of Sci., Ho Chi Minh City, Vietnam

fYear :

2013

fDate :

10-13 Nov. 2013

Firstpage :

253

Lastpage :

258

Abstract :

The availability of 3D sensors has recently made it possible to capture depth maps in real time, which simplifies a variety of visual recognition tasks, including object/action classification, 3D reconstruction, etc.We address here the problems of human action recognition in depth sequences. On one hand, we present a new joint shape-motion descriptor which we call as 3D Spherical Histogram of Oriented Normal Vectors (3DS-HONV), since it is a spatio-temporal extension of the original HONV quantized in a 3D spherical coordinate. We further prove that the Optical Flow fields in depth sequences could be used in conjunction with the presented descriptor to augment the ability of capturing in-plane movements; the experiments later show that this combination is more efficient than the standalone 3DS-HONV. In addition, a discriminative dictionary learning and feature representation via Sparse Coding is applied to proposed descriptors to relieve the intrinsic effects of noise and capture high-level patterns. By learning these sparse and distinctive representations, we demonstrate large improvements over the state-of-the-art on two challenging benchmarks, which results with an overall accuracy of 91.92% on the MSRAction3D and 93.31% on the MSRGesture3D datasets, respectively.

Keywords :

feature extraction; image motion analysis; image representation; image sequences; learning (artificial intelligence); object recognition; 3D sensors; 3D spherical coordinate; 3D spherical histogram of oriented normal vectors; 3DS-HONV; MSRAction3D dataset; MSRGesture3D dataset; depth sequences; discriminative dictionary learning; distinctive representation learning; feature representation; high-level pattern capture; human action recognition; joint shape-motion cues; joint shape-motion descriptor; noise intrinsic effects; optical flow field; sparse coding; sparse representation learning; sparse spatio-temporal representation; visual recognition tasks; Dictionaries; Feature extraction; Histograms; Joints; Quantization (signal); Three-dimensional displays; Vectors;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013 IEEE RIVF International Conference on

Conference_Location :

Hanoi

Print_ISBN :

978-1-4799-1349-7

Type :

conf

DOI :

10.1109/RIVF.2013.6719903

Filename :

6719903

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3000600