Title :
Hierarchical 3D kernel descriptors for action recognition using depth sequences
Author :
Yu Kong ; Satarboroujeni, Behnam ; Yun Fu
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
Abstract :
Action recognition is a challenging task due to intra-class motion variation caused by diverse style and duration in performed action videos. Previous works on action recognition task are more focused on hand-crafted features, treat different sources of information independently, and simply combine them before classification. In this paper we study action recognition from depth sequences captured by RGB-D cameras using kernel descriptors. Kernel descriptors provide an elegant way for combining a variety of information sources and can be easily applied to a hierarchical structure. We show how using kernel descriptors over pixel-level attributes in video sequences gains a great success compared to state-of-the-art methods. Following the success of kernel descriptors [1] on object recognition tasks, we employ 3D kernel descriptors, which are a unified framework for capturing pixel-level attributes and turning them into discriminative low-level features on individual 3D patches. We use efficient match kernel (EMK) [2] as the next level of our hierarchical structure to abstract the mid-level features for classification. Through extensive experiments we demonstrate using pixel-level attributes in the hierarchical architecture of our 3D kernel descriptor and EMK achieves superior performance on the standard depth sequences benchmarks.
Keywords :
image classification; image matching; image motion analysis; image sequences; object recognition; video cameras; video signal processing; 3D patches; EMK; RGB-D cameras; action recognition task; action videos; depth sequences; discriminative low-level features; efficient match kernel; hand-crafted features; hierarchical 3D kernel descriptors; hierarchical structure; information sources; intra-class motion variation; object recognition tasks; pixel-level attributes; video sequences; Accuracy; Feature extraction; Kernel; Shape; Testing; Three-dimensional displays; Videos;
Conference_Titel :
Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on
Conference_Location :
Ljubljana
DOI :
10.1109/FG.2015.7163084