DocumentCode :
76646
Title :
Super Fast Event Recognition in Internet Videos
Author :
Yu-Gang Jiang ; Qi Dai ; Tao Mei ; Yong Rui ; Shih-Fu Chang
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
Volume :
17
Issue :
8
fYear :
2015
fDate :
Aug. 2015
Firstpage :
1174
Lastpage :
1186
Abstract :
Techniques for recognizing high-level events in consumer videos on the Internet have many applications. Systems that produced state-of-the-art recognition performance usually contain modules requiring extensive computation, such as the extraction of the temporal motion trajectories, which cannot be deployed on large-scale datasets. In this paper, we provide a comprehensive study on efficient methods in this area and identify technical options for super fast event recognition in Internet videos. We start from analyzing a multimodal baseline that has produced good performance on popular benchmarks, by systematically evaluating each component in terms of both computational cost and contribution to recognition accuracy. After that, we identify alternative features, classifiers, and fusion strategies that can all be efficiently computed. In addition, we also provide a study on the following interesting question: for event recognition in Internet videos, what is the minimum number of visual and audio frames needed to obtain a comparable accuracy to that of using all the frames? Results on two rigorously designed datasets indicate that similar results can be maintained by using only a small portion of the visual frames. We also find that, different from the visual frames, the soundtracks contain little redundant information and thus sampling is always harmful. Integrating all the findings, our suggested recognition system is 2,350-fold faster than a baseline approach with even higher recognition accuracies. It recognizes 20 classes on a 120-second video sequence in just 1.78 seconds, using a regular desktop computer.
Keywords :
Internet; computational complexity; feature extraction; image classification; image fusion; image recognition; image sequences; video signal processing; Internet videos; alternative feature identification; classifiers; computational cost; fusion strategies; soundtracks; super fast event recognition; video sequence; Feature extraction; Kernel; Quantization (signal); Support vector machines; Trajectory; Videos; Visualization; Consumer videos; Internet videos; efficiency; event recognition; real time;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2015.2436813
Filename :
7112152
Link To Document :
بازگشت