One-shot learning gesture recognition based on improved 3D SMoSIFT feature descriptor from RGB-D videos

Author

Jia Lin ; Xiaogang Ruan ; Naigong Yu ; Ruoyan Wei

Author_Institution

Electron. Inf. & Control Eng. Coll., Beijing Univ. of Technol., Beijing, China

fYear

2015

fDate

23-25 May 2015

Firstpage

4911

Lastpage

4916

Abstract

To satisfy the distinctive feature extraction requirement of one-shot learning gesture recognition for mobile robot control, a improved three-dimensional local sparse motion scale invariant feature transform (3D SMoSIFT) feature descriptor is proposed, which fuses RGB-D videos. Firstly, gray pyramid, depth pyramid and optical flow pyramids are built as scale space for each gray frame (converted from RGB frame) and depth frame. Then interest regions are extracted according the variance of optical flow, and variance is calculated in horizontal and vertical direction. Subsequently, corners are just extracted in each interest region as interest points, and then the information of gray and depth optical flow is simultaneously used to detect robust keypoints around the motion pattern in the scale space. Finally, SIFT descriptors are calculated on 3D gradient space and 3D motion space. The improved feature descriptor has been evaluated under a bag of feature model on one-shot learning Chalearn Gesture Dataset. Experiments demonstrate that the proposed method distinctly improves the accuracy of gesture recognition. The results also show that the improved 3D SMoSIFT feature descriptor surpasses other spatiotemporal feature descriptors and is comparable to the state-of-the-art approaches.

Keywords

feature extraction; gesture recognition; image colour analysis; image fusion; image motion analysis; image sequences; learning (artificial intelligence); mobile robots; robot vision; video signal processing; 3D SMoSIFT feature descriptor; 3D gradient space; 3D motion space; RGB-D video fusion; RGB-D videos; SIFT descriptors; corner extraction; depth pyramid; feature extraction; gray pyramid; interest region extraction; mobile robot control; motion pattern; one-shot learning Chalearn gesture dataset; one-shot learning gesture recognition; optical flow pyramid; robust keypoint detection; three-dimensional local sparse motion scale invariant feature transform; Conferences; Gesture Recognition; One-shot Learning; RGB-D Data; Three dimensional Sparse Motion Scale-invariant Feature Transform (3D SMoSIFT); Variance of Optical Flow;

fLanguage

English

Publisher

ieee

Conference_Titel

Control and Decision Conference (CCDC), 2015 27th Chinese

Conference_Location

Qingdao

Print_ISBN

978-1-4799-7016-2

Type

conf

DOI

10.1109/CCDC.2015.7162803

Filename

7162803