مرکز منطقه ای اطلاع رساني علوم و فناوري - Super Fast Event Recognition in Internet Videos

DocumentCode :

76646

Title :

Super Fast Event Recognition in Internet Videos

Author :

Yu-Gang Jiang ; Qi Dai ; Tao Mei ; Yong Rui ; Shih-Fu Chang

Author_Institution :

Sch. of Comput. Sci., Fudan Univ., Shanghai, China

Volume :

Issue :

fYear :

2015

fDate :

Aug. 2015

Firstpage :

1174

Lastpage :

1186

Abstract :

Techniques for recognizing high-level events in consumer videos on the Internet have many applications. Systems that produced state-of-the-art recognition performance usually contain modules requiring extensive computation, such as the extraction of the temporal motion trajectories, which cannot be deployed on large-scale datasets. In this paper, we provide a comprehensive study on efficient methods in this area and identify technical options for super fast event recognition in Internet videos. We start from analyzing a multimodal baseline that has produced good performance on popular benchmarks, by systematically evaluating each component in terms of both computational cost and contribution to recognition accuracy. After that, we identify alternative features, classifiers, and fusion strategies that can all be efficiently computed. In addition, we also provide a study on the following interesting question: for event recognition in Internet videos, what is the minimum number of visual and audio frames needed to obtain a comparable accuracy to that of using all the frames? Results on two rigorously designed datasets indicate that similar results can be maintained by using only a small portion of the visual frames. We also find that, different from the visual frames, the soundtracks contain little redundant information and thus sampling is always harmful. Integrating all the findings, our suggested recognition system is 2,350-fold faster than a baseline approach with even higher recognition accuracies. It recognizes 20 classes on a 120-second video sequence in just 1.78 seconds, using a regular desktop computer.

Keywords :

Internet; computational complexity; feature extraction; image classification; image fusion; image recognition; image sequences; video signal processing; Internet videos; alternative feature identification; classifiers; computational cost; fusion strategies; soundtracks; super fast event recognition; video sequence; Feature extraction; Kernel; Quantization (signal); Support vector machines; Trajectory; Videos; Visualization; Consumer videos; Internet videos; efficiency; event recognition; real time;

fLanguage :

English

Journal_Title :

Multimedia, IEEE Transactions on

Publisher :

ieee

ISSN :

1520-9210

Type :

jour

DOI :

10.1109/TMM.2015.2436813

Filename :

7112152

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=76646