Title :
Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees
Author :
Jiang, Zhuolin ; Lin, Zhe ; Davis, Larry S.
Author_Institution :
Inst. for Adv. Comput. Studies, Univ. of Maryland, College Park, MD, USA
fDate :
3/1/2012 12:00:00 AM
Abstract :
A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
Keywords :
image matching; image recognition; image sequences; learning (artificial intelligence); pattern clustering; table lookup; video signal processing; CMU action data set; KTH action data set; UCF sports data set; Weizmann action data set; action prototype; actor location; brute-force computation; distance measures; dynamic backgrounds; dynamic prototype sequence matching; flexible action matching; frame-to-frame distances; frame-to-prototype correspondence; hierarchical k-means clustering; human action recognition; joint probability model; joint shape; large gesture data set; learning; look-up table indexing; motion space; moving cameras; prototype-to-prototype distances; shape-motion prototype-based approach; training sequence; video sequences; Feature extraction; Hidden Markov models; Humans; Joints; Prototypes; Shape; Training; Action recognition; dynamic time warping.; hierarchical K-means clustering; joint probability; shape-motion prototype tree; Algorithms; Humans; Image Interpretation, Computer-Assisted; Imaging, Three-Dimensional; Pattern Recognition, Automated;
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
DOI :
10.1109/TPAMI.2011.147