Author :
Duan, Xiaohua ; Lin, Liang ; Chao, Hongyang
Abstract :
Video shots are often treated as the basic elements for retrieving information from videos. In recent years, video shot categorization has received increasing attention, but most of the methods involve a procedure of supervised learning, i.e., training a multi-class predictor (classifier) on the labeled data. In this paper, we study a general framework to unsupervisedly discover video shot categories. The contributions are three-fold in feature, representation, and inference: (1) A new feature is proposed to capture local information in videos, defined with small video patches (e.g., 11 × 11 × 5 pixels). A dictionary of video words can be thus clustered off-line, characterizing both appearance and motion dynamics. (2) We pose the problem of categorization as an automated graph partition task, in that each graph vertex represents a video shot, and a partitioned sub-graph consisting of connected graph vertices represents a clustered category. The model of each video shot category can be analytically calculated by a projection pursuit type of learning process. (3) An MCMC-based cluster sampling algorithm, namely Swendsen-Wang cuts, is adopted to efficiently solve the graph partition. Unlike traditional graph partition techniques, this algorithm is able to explore the nearly global optimal solution and eliminate the need for good initialization. We apply our method on a wide variety of 1600 video shots collected from Internet as well as a subset of TRECVID 2010 data, and two benchmark metrics, i.e., Purity and Conditional Entropy, are adopted for evaluating performance. The experimental results demonstrate superior performance of our method over other popular state-of-the-art methods.
Keywords :
information retrieval; learning (artificial intelligence); stochastic processes; video retrieval; video signal processing; MCMC-based cluster sampling algorithm; Swendsen-Wang cuts; TRECVID 2010 data; automated graph partition task; benchmark metrics; conditional entropy; connected graph vertices; graph vertex; local information; motion dynamics; multiclass predictor; nearly global optimal solution; purity entropy; supervised learning; traditional graph partition techniques; unsupervised stochastic graph partition; video information retrieval; video patches; video shot categorization; video shot category discovery; video words; Clustering algorithms; Dictionaries; Dynamics; Image color analysis; Manifolds; Stochastic processes; Vectors; Category discovery; graph partition; unsupervised categorization; video shot;