Title :
Sparse representations of clustered video shots for action recognition
Author :
Yi Ding ; Hongtao Lu
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
In most action recognition tasks, the target videos are of different temporal length and an action may be repeated several times in one video. As a result, encoding the distribution of video feature points that are far away from each other or are detected from different occurrences of the action into the video representation is inappropriate. It is better that sub video shots containing different occurrences of the action are detected and encoded respectively. In this paper, a novel video representation framework in which video shots are detected efficiently by clustering feature points according to their temporal locations in the video is proposed. Moreover, shot representations are generated based on a variation of the spatial-temporal pyramid matching method using max-pooling and sparse coding of video features. The video can finally be presented by max-pooling shot representations again and be classified using a linear SVM. This framework is evaluated on the KTH and the UCF sports human action datasets and promising performances are obtained compared to state-of-the-art methods.
Keywords :
image matching; pattern clustering; support vector machines; video coding; action recognition; clustered video shots; linear SVM; max-pooling; sparse coding; sparse representations; spatial-temporal pyramid matching method; temporal length; video representation; Computer vision; Conferences; Detectors; Encoding; Feature extraction; Pattern recognition; Vectors; clustering; max-pooling; sparse coding; spatio-temporal pyramid matching;
Conference_Titel :
Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on
Conference_Location :
Dalian
DOI :
10.1109/ICCSNT.2013.6967162