DocumentCode :
254100
Title :
Towards Good Practices for Action Video Encoding
Author :
Jianxin Wu ; Yu Zhang ; Weiyao Lin
Author_Institution :
Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
fYear :
2014
fDate :
23-28 June 2014
Firstpage :
2577
Lastpage :
2584
Abstract :
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.
Keywords :
feature extraction; image recognition; image representation; maximum entropy methods; video coding; FV encoding framework; VLAD data distribution properties; VLAD-based video encoding; action recognition; action video encoding; benchmark action recognition datasets; bimodal encoding method; fisher vector; good practices; high dimensional representations; maximum entropy linear feature learning process; Accuracy; Encoding; Feature extraction; Gaussian distribution; Principal component analysis; Vectors; YouTube;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location :
Columbus, OH
Type :
conf
DOI :
10.1109/CVPR.2014.330
Filename :
6909726
Link To Document :
بازگشت