Title :
Unsupervised learning of human expressions, gestures, and actions
Author :
O´Hara, Stephen ; Lui, Yui Man ; Draper, Bruce A.
Author_Institution :
Dept. of Comput. Sci., Colorado State Univ., Fort Collins, CO, USA
Abstract :
This paper analyzes completely unsupervised clustering of human expressions, gestures, and actions in video. Lacking any supervision, there is nothing except the inherent biases of a given technique to guide grouping of video clips along semantically meaningful partitions. This paper evaluates two contemporary behavior recognition methods, Bag of Features (BOF) and Product Manifolds (PM), for clustering video clips of human facial expressions, hand gestures, and full-body actions. Our goal is to better understand how well these very different approaches to behavior recognition produce semantically useful clustering of relevant data. We show that PM yields superior results when measuring the alignment between the generated clusters over a range of K-values (number of clusters) and the nominal class labelling of the data set. A key result is that unsupervised clustering with PM yields accuracy comparable to state-of-the-art supervised classification methods on KTH Actions. At the same time, BOF experiences a substantial drop in performance between unsupervised and supervised implementations on the same data sets, indicating a greater reliance on supervision for achieving high performance. We also found that while gross motions were easily clustered by both methods, the lack of preservation of structural information inherent to the BOF representation leads to limitations that are not easily overcome without supervised training. This was evidenced by the poor separation of shape labels in the hand gestures data by BOF, and the overall poor performance on full-body actions.
Keywords :
face recognition; feature extraction; gesture recognition; human computer interaction; pattern clustering; unsupervised learning; video signal processing; KTH action; bag of features; behavior recognition; clustering video clips; contemporary behavior recognition; data set; full body action; hand gesture; human facial expression; product manifold; state-of-the-art supervised classification; unsupervised learning; Accuracy; Couplings; Feature extraction; Humans; Labeling; Manifolds; Shape;
Conference_Titel :
Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on
Conference_Location :
Santa Barbara, CA
Print_ISBN :
978-1-4244-9140-7
DOI :
10.1109/FG.2011.5771473