Abstract :
This paper attempts to address the problem of recognizing human actions while training and testing on distinct datasets, when test videos are neither labeled nor available during training. In this scenario, learning of a joint vocabulary, or domain transfer techniques are not applicable. We first explore reasons for poor classifier performance when tested on novel datasets, and quantify the effect of scene backgrounds on action representations and recognition. Using only the background features and partitioning of gist feature space, we show that the background scenes in recent datasets are quite discriminative and can be used classify an action with reasonable accuracy. We then propose a new process to obtain a measure of confidence in each pixel of the video being a foreground region, using motion, appearance, and saliency together in a 3D MRF based framework. We also propose multiple ways to exploit the foreground confidence: to improve bag-of-words vocabulary, histogram representation of a video, and a novel histogram decomposition based representation and kernel. We used these foreground confidences to recognize actions trained on one data set and test on a different data set. We have performed extensive experiments on several datasets that improve cross dataset recognition accuracy as compared to baseline methods.
Keywords :
feature extraction; image classification; image motion analysis; image representation; learning (artificial intelligence); object recognition; statistical analysis; video signal processing; action representation; appearance feature; bag-of-words vocabulary; classifier performance; cross dataset recognition accuracy; domain transfer techniques; foreground confidence; foreground-weighted histogram decomposition; histogram representation; human action recognition; joint vocabulary learning; motion feature; saliency feature; video pixel; Accuracy; Cameras; Color; Histograms; Training; Videos; Visualization; action recognition; cross dataset; dataset bias; foreground weighted representation;