Automatic Segmentation and Recognition of Human Actions in Monocular Sequences

Author

Orrite, C. ; Rodriguez, M. ; Herrero, E. ; Rogez, G. ; Velastin, S.A.

Author_Institution

Aragon Inst. of Eng. Res., Univ. of Zaragoza, Zaragoza, Spain

fYear

2014

fDate

24-28 Aug. 2014

Firstpage

4218

Lastpage

4223

Abstract

This paper addresses the problem of silhouette-based human action segmentation and recognition in monocular sequences. Motion History Images (MHIs), used as 2D templates, capture motion information by encoding where and when motion occurred in the images. Inspired by codebook approaches for object and scene categorization, we first construct a codebook of temporal motion templates by clustering all the MHIs of each particular action. These MHIs capture different actors, speeds and a wide range of camera viewpoints. In this paper, we use a Kohonen´s Self-Organizing Map (SOM) to simultaneously cluster the MHI templates and represent them in lower dimensional subspaces. To cope with temporal segmentation, and concurrently carry out action recognition, a new architecture is proposed where the obsrvation MHIs are projected onto all these action-specific manifolds and the Euclidean distance between each MHI and the nearest cluster within each action-manifold constitutes the observation vector of a Markov Model. To estimate the state/action at each time step, we introduce a new method based on Observable Markov Models (OMMs) where the Markov model is augmented with a neutral state. The combination of our action-specific manifolds with the augmented OMM allows to automatically segment and recognize long sequences of consecutive actions, without any prior knowledge about initial and ending frames of each action. Importantly, our method allows to interpolate betweeen training viewpoint and recognizes actions, independently of the camera viewpoint, even from unseen viewpoints.

Keywords

Markov processes; cameras; image coding; image recognition; image segmentation; image sequences; interpolation; motion estimation; self-organising feature maps; video signal processing; 2D templates; Euclidean distance; MHI template clustering; OMM; SOM; action estimation; action-specific manifolds; automatic silhouette-based human action recognition; automatic silhouette-based human action segmentation; camera viewpoint; information encoding; interpolation; low-dimensional subspace representation; monocular sequences; motion history images; motion information capture; nearest cluster; neutral state; observable Markov models; observation vector; self-organizing map; state estimation; temporal motion template codebook; temporal segmentation; training viewpoint; unseen viewpoints; Cameras; Hidden Markov models; Kernel; Manifolds; Markov processes; Motion segmentation; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Pattern Recognition (ICPR), 2014 22nd International Conference on

Conference_Location

Stockholm

ISSN

1051-4651

Type

conf

DOI

10.1109/ICPR.2014.723

Filename

6977435