Title :
A method for dance motion recognition and scoring using two-layer classifier based on conditional random field and stochastic error-correcting context-free grammar
Author :
Heryadi, Y. ; Fanany, M.I. ; Arymurthy, A.M.
Author_Institution :
Comput. Sci. Program, Binus Univ.-Binus Int., Jakarta, Indonesia
Abstract :
This paper presents a unified framework for recognizing and scoring dance motion using 2-layer classifier so that computation complexity is distributed into two layers. This research examines the performance of sliding window, hidden Markov Model (HMM) and conditional random field (CRF) as the first layer classifier to segment the input video into a sequence of motion primitive label. The second layer classifier is stochastic error-correcting context-free grammar, built based on dance master knowledge, to parse the sequence of labels, builds a parse tree, and computes the accumulated dance score. The dataset for this research is captured using one Kinect camera. The training dataset is: 212 samples of 12 motion primitive samples and seven videos of Pendet dance performance. From 5-fold cross-validation, accuracy of sliding window, HMM, and CRF are 0.63, 0.79, and 0.86 respectively. This result shows that CRF achieves higher performance as a dance motion primitive recognizer than HMM as proposed by [1]. The CRF model achieves 0.88 of accuracy when motion feature is all skeleton joint angular coordinates as proposed by [2] but increases to 0.93 if the motion feature is only upper-body joint coordinates. Stochastic error-correcting context-free grammar is chosen as dance choreography model. The experiment using synthetic sequence label with cost factor ci=1 and error-sequence labels up to 50 percent shows the grammar can tolerate the input label sequence error up to 25 percent. The experiment using Pendet dance performances show that the average dance score is 79.3. The low dance score is due to several factors including: dance skill variation, unstable basic gesture repetition, high cost contributed by replacing deletion and substitution of local error by insertion operation, duration variation due the absence of timing guideline of body part motions, and limited training dataset to capture possible basic gesture variations.
Keywords :
cameras; computational complexity; context-free grammars; feature extraction; hidden Markov models; humanities; image classification; image motion analysis; image segmentation; object recognition; trees (mathematics); video signal processing; CRF; HMM; Kinect camera; Pendet dance performance; computation complexity; conditional random field; dance choreography model; dance master knowledge; dance motion primitive recognizer; dance motion recognition; dance motion scoring; dance skill variation; duration variation; error-sequence label; gesture variations; hidden Markov model; input video segmentation; insertion operation; label sequence parsing; motion feature; motion primitive label sequence; parse tree; skeleton joint angular coordinates; sliding window; stochastic error-correcting context-free grammar; synthetic sequence label; training dataset; two-layer classifier; unified framework; unstable basic gesture repetition; upper-body joint coordinates; Accuracy; Grammar; Hidden Markov models; Mathematical model; Motion segmentation; Stochastic processes; Training; dance motion recognition and scoring;
Conference_Titel :
Consumer Electronics (GCCE), 2014 IEEE 3rd Global Conference on
Conference_Location :
Tokyo
DOI :
10.1109/GCCE.2014.7031294