Title :
Robust Spatiotemporal Matching of Electronic Slides to Presentation Videos
Author :
Fan, Quanfu ; Barnard, Kobus ; Amir, Arnon ; Efrat, Alon
Author_Institution :
T. J. Watson Res. Center, IBM, Armonk, NY, USA
Abstract :
We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.
Keywords :
distance learning; hidden Markov models; image matching; probability; random processes; technical presentation; transforms; video signal processing; HMM; RANSAC; SIFT keypoints; arbitrary slides sequence; binary classifier; camera events; camera movement; detected camera operations; distance-learning applications; electronic slides; estimated matching probability; frame composition; hidden Markov model; homography; image-based matching; low-quality video capture; projective transformation; quantitative experiments; random sample consensus; robust automatic matching; robust spatiotemporal matching; scale-invariant feature-transformation keypoints; slide distortion; temporal information; temporal model; video browsing; video frames; video indexing; video presentation; video searching; visual information; Accuracy; Cameras; Hidden Markov models; Image color analysis; Robustness; Synchronization; Videos; Distance learning; homography constraint; matching slides to video frames; scale-invariant feature-transformation (SIFT) keypoints; video indexing and browsing;
Journal_Title :
Image Processing, IEEE Transactions on
DOI :
10.1109/TIP.2011.2109727