Title :
View-invariant feature discovering for multi-camera human action recognition
Author :
Hong Lin ; Chaisorn, Lekha ; Yongkang Wong ; An-An Liu ; Yu-Ting Su ; Kankanhalli, Mohan S.
Author_Institution :
Sch. of Electron. Inf. Eng., Tianjin Univ., Tianjin, China
Abstract :
Intelligent video surveillance system is built to automatically detect events of interest, especially on object tracking and behavior understanding. In this paper, we focus on the task of human action recognition under surveillance environment, specifically in a multi-camera monitoring scene. Despite many approaches have achieved success in recognizing human action from video sequences, they are designed for single view and generally not robust against viewpoint invariant. Human action recognition across different views remains challenging due to the large variations from one view to another. We present a framework to solve the problem of transferring action models learned in one view (source view) to another view (target view). First, local space-time interest point feature and global shape-flow feature are extracted as low-level feature, followed by building the hybrid Bag-of-Words model for each action sequence. The data distribution of relevant actions from source view and target view are linked via a cross-view discriminative dictionary learning method. Through the view-adaptive dictionary pair learned by the method, the data from source and target view can be respectively mapped into a common space which is view-invariant. Furthermore, We extend our framework to transfer action models from multiple views to one view when there are multiple source views available. Experiments on the IXMAS human action dataset, which contains videos captured with five viewpoints, show the efficacy of our framework.
Keywords :
feature extraction; image motion analysis; image sequences; learning (artificial intelligence); object detection; object tracking; video cameras; video surveillance; action models; action sequence; behavior understanding; cross-view discriminative dictionary learning method; data distribution; events detection; global shape-flow feature extraction; hybrid bag-of-words model; intelligent video surveillance system; local space-time interest point feature; low-level feature extraction; multicamera human action recognition; multicamera monitoring scene; object tracking; source view; surveillance environment; target view; video sequences; view-adaptive dictionary pair; view-invariant feature discovering; viewpoint invariant; Cameras; Context; Dictionaries; Feature extraction; Surveillance; Target recognition; Three-dimensional displays;
Conference_Titel :
Multimedia Signal Processing (MMSP), 2014 IEEE 16th International Workshop on
Conference_Location :
Jakarta
DOI :
10.1109/MMSP.2014.6958807