Title :
Instructional Video Content Analysis Using Audio Information
Author :
Li, Ying ; Dorai, Chitra
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY
Abstract :
Automatic media content analysis and understanding for efficient topic searching and browsing are current challenges in the management of e-learning content repositories. This paper presents our current work on analyzing and structuralizing instructional videos using pure audio information. Specifically, an audio classification scheme is first developed to partition the sound-track of an instructional video into homogeneous audio segments where each segment has a unique sound type such as speech or music. We then apply a statistical approach to extract discussion scenes in the video by modeling the instructor with a Gaussian mixture model (GMM) and updating it on the fly. Finally, we categorize obtained discussion scenes into either two-speaker or multispeaker discussions using an adaptive mode-based clustering approach. Experiments carried out on four training videos and five IBM MicroMBA class videos have yielded encouraging results. It is our belief that by detecting and identifying various types of discussions, we are able to better understand and annotate the learning media content and subsequently facilitate its content access, browsing, and retrieval
Keywords :
Gaussian processes; audio signal processing; audio systems; audio-visual systems; computer aided instruction; distance learning; educational aids; interactive video; Gaussian mixture model; adaptive mode-based clustering approach; audio classification scheme; audio information; automatic media content analysis; e-learning content repositories; instructional video content analysis; learning media content; multispeaker discussions; sound-track; statistical approach; training videos; Content management; Data mining; Electronic learning; Indexing; Information analysis; Layout; Music; Speech; Support vector machine classification; Support vector machines; Adaptive Gaussian mixture modeling; audio classification; discussion scene detection; e-learning; instructional video content analysis; speaker clustering; state transition machine; support vector machine (SVM);
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2006.872602