Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
One of the dominant approaches to gesture recognition, especially when we have one or few samples per class, is to compute the time-warped distance between the two sequences and perform nearest-neighbor classification. In this work, we show that we get much better results if instead we consider the similarity of the pattern of frame-wise distances of these two sequences with a third (anchor) sequence from the modelbase. We refer to these distance pattern vectors as the warp vectors. If these warp vectors are similar, then so are the sequences, if not, they are dissimilar. At the algorithmic core we have two dynamic time warping processes, one to compute the warp vectors with the anchor sequences and the other to compare these warp vectors. We select the anchor sequence to be the one that minimizes the overall distance, i.e. the sequence with respect to which these two sequences are the most similar. We present results on a large dataset of 1500 RGBD sequences spanning 150 gesture classes, such as traffic signals, sign language, and every day actions, extracted from the ChaLearn Gesture Challenge dataset. We experimented with three different feature types: difference of frames, HOG and relational distributions. We found that there were improvements of 5%, 15%, and 7%, respectively, at 20% false alarm rate, over traditional two-sequence based timewarped distance.
Keywords :
gesture recognition; image classification; vectors; ChaLearn Gesture Challenge dataset; HOG; RGBD sequences; anchor sequence; distance pattern vectors; dynamic time warping process; frame difference; frame-wise distances; gesture recognition; nearest-neighbor classification; relational distributions; similarity measure; time-warped distance; triplets; two-sequence based time warped distance; warp vectors; Assistive technology; Computational modeling; Equations; Gesture recognition; Hidden Markov models; Mathematical model; Vectors;