Title :
Exploiting Visual Cues in Non-Scripted Lecture Videos for Multi-modal Action Recognition
Author :
Imran, Ali Shariq ; Moreno, Alexander ; Cheikh, Faouzi Alaya
Author_Institution :
Dept. of Comput. Sci. & Media Technol., Gjovik Univ. Coll., Gjovik, Norway
Abstract :
The usage of non-scripted lecture videos as a part of learning material is becoming an everyday activity in most of higher education institutions due to the growing interest in flexible and blended education. Generally these videos are delivered as part of Learning Objects (LO) through various Learning Management Systems (LMS). Currently creating these video learning objects (VLO) is a cumbersome process. Because it requires thorough analyses of the lecture content for meta-data extraction and the extraction of the structural information for indexing and retrieval purposes. Current e-learning systems and libraries (such as libSCORM) lack the functionally for exploiting semantic content for automatic segmentation. Without the additional meta-data and structural information lecture videos thus do not provide the required level of interactivity required for flexible education. As a result, they fail to captivate students´ attention for long time and thus their effective use remains a challenge. Exploiting visual actions present in non-scripted lecture videos can be useful for automatically segmenting and extracting the structure of these videos. Such visual cues help identify possible key frames, index points, key events and relevant meta-data useful for e-learning systems, video surrogates and video skims. We therefore, propose a multi-model action classification system for four predefined actions performed by instructor in lecture videos. These actions are writing, erasing, speaking and being idle. The proposed approach is based on human shape and motion analysis using motion history images (MHI) at different temporal resolutions allowing robust action classification. Additionally, it augments the visual features classification based on audio analysis which is shown to improve the overall action classification performance. The initial experimental results using recorded lecture videos gave an overall classification accuracy of 89.06%. We evaluated the performance o- our approch to template matching using correlation and similitude and found nearly 30% improvement over it. These are very encouraging results that prove the validity of the approach and its potential in extracting structural information from instructional videos.
Keywords :
computer aided instruction; educational aids; educational institutions; feature extraction; further education; image classification; image matching; image motion analysis; image recognition; image resolution; image segmentation; indexing; meta data; video retrieval; video signal processing; LMS; MHI; VLO; audio analysis; automatic video segmentation; blended education; e-learning systems; flexible education; higher education institutions; human shape; index points; indexing; instructional videos; key events; key frames; learning management systems; learning material; meta-data extraction; motion analysis; motion history images; multimodal action recognition; multimodel action classification system; nonscripted lecture videos; robust action classification; structural information extraction; structural information lecture videos; structural information retrieval; template matching; temporal resolutions; video learning objects; visual cues; visual feature classification; Accuracy; Feature extraction; Humans; Motion segmentation; Shape; Videos; Writing; action classification; lecture videos; multi-modal analysis; recognition; visual actions;
Conference_Titel :
Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on
Conference_Location :
Naples
Print_ISBN :
978-1-4673-5152-2
DOI :
10.1109/SITIS.2012.12