Hierarchical 3D kernel descriptors for action recognition using depth sequences

Author

Yu Kong ; Satarboroujeni, Behnam ; Yun Fu

Author_Institution

Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA

fYear

2015

fDate

4-8 May 2015

Firstpage

1

Lastpage

6

Abstract

Action recognition is a challenging task due to intra-class motion variation caused by diverse style and duration in performed action videos. Previous works on action recognition task are more focused on hand-crafted features, treat different sources of information independently, and simply combine them before classification. In this paper we study action recognition from depth sequences captured by RGB-D cameras using kernel descriptors. Kernel descriptors provide an elegant way for combining a variety of information sources and can be easily applied to a hierarchical structure. We show how using kernel descriptors over pixel-level attributes in video sequences gains a great success compared to state-of-the-art methods. Following the success of kernel descriptors [1] on object recognition tasks, we employ 3D kernel descriptors, which are a unified framework for capturing pixel-level attributes and turning them into discriminative low-level features on individual 3D patches. We use efficient match kernel (EMK) [2] as the next level of our hierarchical structure to abstract the mid-level features for classification. Through extensive experiments we demonstrate using pixel-level attributes in the hierarchical architecture of our 3D kernel descriptor and EMK achieves superior performance on the standard depth sequences benchmarks.

Keywords

image classification; image matching; image motion analysis; image sequences; object recognition; video cameras; video signal processing; 3D patches; EMK; RGB-D cameras; action recognition task; action videos; depth sequences; discriminative low-level features; efficient match kernel; hand-crafted features; hierarchical 3D kernel descriptors; hierarchical structure; information sources; intra-class motion variation; object recognition tasks; pixel-level attributes; video sequences; Accuracy; Feature extraction; Kernel; Shape; Testing; Three-dimensional displays; Videos;

fLanguage

English

Publisher

ieee

Conference_Titel

Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on

Conference_Location

Ljubljana

Type

conf

DOI

10.1109/FG.2015.7163084

Filename

7163084