Constructing Visual Vocabularies Using Sparse Coding for Action Recognition

Author

Liu, Changhong ; Yang, Yang ; Chen, Yong

Author_Institution

Sch. of Inf. Eng., Univ. of Sci. & Technol. Beijing, Beijing, China

fYear

2009

fDate

19-20 Dec. 2009

Firstpage

1

Lastpage

4

Abstract

Much of action recognition research is recently based on a bag of words (BOW) representation by quantizing the extracted 3D interest points from videos. The k-means algorithm is commonly used to construct a visual vocabulary. However, it has two major drawbacks. Firstly, the visual vocabulary is sensitive to the vocabulary size and the initialization. Secondly, k-means is unable to capture the salient properties of the videos and this vocabulary may contain a large amount of information redundancy. In this paper, we propose a novel action recognition approach which constructs a visual vocabulary and represents a video by sparse coding followed by the max pooling. Unlike the k-means algorithm, the sparse coding approach can capture the salient properties of videos owing to its powerful discriminative ability. Experiments are conducted on the KTH action dataset. The results demonstrate that our approach achieves better performance than k-means and outperforms most recently proposed methods.

Keywords

feature extraction; image motion analysis; object recognition; video coding; 3D interest points extraction; KTH action dataset; bag of words representation; human action recognition; k-means algorithm; sparse coding; visual vocabularies construction; Computer vision; Data mining; Detectors; Engineering management; Noise reduction; Prototypes; Signal processing algorithms; Technology management; Videos; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on

Conference_Location

Wuhan

Print_ISBN

978-1-4244-4994-1

Type

conf

DOI

10.1109/ICIECS.2009.5366461

Filename

5366461