Abstract :
Human action recognition is an active area with applications in several domains such as visual surveillance, video retrieval and human-computer interaction. Current approaches assign action labels to video streams considering the whole video as a single image sequence. Such approaches, albeit very refined, may fail on some samples due to large variability between frames, suggesting that features extracted from training videos may adopt description and classification models that may better represent video portions rather than the whole stream. To this aim, in this paper we propose a multiple subsequence combination method dividing the video into a number of consecutive subsequences and classifying each one applying a part-based method in conjunction with the bag of visual words approach. We classify a video combining subsequence labels according to rules inspired by the multiple expert system framework. We extensively tested our approach on the KTH, UCF sport and Youtube datasets showing, on the one side, that it outperforms a method classifying action using the whole stream and, on the other side, that its performance are robust and stable throughout all the datasets since our best results are comparable with the best published ones.
Keywords :
expert systems; feature extraction; gait analysis; gesture recognition; image classification; image segmentation; image sequences; motion estimation; video signal processing; KTH dataset; UCF sport dataset; Youtube dataset; action label assignment; bag-of-visual word approach; feature extraction; human action recognition; human-computer interaction; image sequence; multiple expert system framework; multiple subsequence combination method; part-based method; training video streams; video combining subsequence label classification; video portion representation; video retrieval; visual surveillance; Feature extraction; Humans; Streaming media; Vectors; Visualization; Vocabulary; YouTube;