Towards Good Practices for Action Video Encoding

Author

Jianxin Wu ; Yu Zhang ; Weiyao Lin

Author_Institution

Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China

fYear

2014

fDate

23-28 June 2014

Firstpage

2577

Lastpage

2584

Abstract

High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.

Keywords

feature extraction; image recognition; image representation; maximum entropy methods; video coding; FV encoding framework; VLAD data distribution properties; VLAD-based video encoding; action recognition; action video encoding; benchmark action recognition datasets; bimodal encoding method; fisher vector; good practices; high dimensional representations; maximum entropy linear feature learning process; Accuracy; Encoding; Feature extraction; Gaussian distribution; Principal component analysis; Vectors; YouTube;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on

Conference_Location

Columbus, OH

Type

conf

DOI

10.1109/CVPR.2014.330

Filename

6909726