Title :
Towards Good Practices for Action Video Encoding
Author :
Jianxin Wu ; Yu Zhang ; Weiyao Lin
Author_Institution :
Nat. Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
Abstract :
High dimensional representations such as VLAD or FV have shown excellent accuracy in action recognition. This paper shows that a proper encoding built upon VLAD can achieve further accuracy boost with only negligible computational cost. We empirically evaluated various VLAD improvement technologies to determine good practices in VLAD-based video encoding. Furthermore, we propose an interpretation that VLAD is a maximum entropy linear feature learning process. Combining this new perspective with observed VLAD data distribution properties, we propose a simple, lightweight, but powerful bimodal encoding method. Evaluated on 3 benchmark action recognition datasets (UCF101, HMDB51 and Youtube), the bimodal encoding improves VLAD by large margins in action recognition.
Keywords :
feature extraction; image recognition; image representation; maximum entropy methods; video coding; FV encoding framework; VLAD data distribution properties; VLAD-based video encoding; action recognition; action video encoding; benchmark action recognition datasets; bimodal encoding method; fisher vector; good practices; high dimensional representations; maximum entropy linear feature learning process; Accuracy; Encoding; Feature extraction; Gaussian distribution; Principal component analysis; Vectors; YouTube;
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location :
Columbus, OH
DOI :
10.1109/CVPR.2014.330