Title :
Fast unsupervised ego-action learning for first-person sports videos
Author :
Kitani, Kris M. ; Okabe, Takahiro ; Sato, Yoichi ; Sugimoto, Akihiro
Author_Institution :
UEC Tokyo, Tokyo, Japan
Abstract :
Portable high-quality sports cameras (e.g. head or helmet mounted) built for recording dynamic first-person video footage are becoming a common item among many sports enthusiasts. We address the novel task of discovering first-person action categories (which we call ego-actions) which can be useful for such tasks as video indexing and retrieval. In order to learn ego-action categories, we investigate the use of motion-based histograms and unsupervised learning algorithms to quickly cluster video content. Our approach assumes a completely unsupervised scenario, where labeled training videos are not available, videos are not pre-segmented and the number of ego-action categories are unknown. In our proposed framework we show that a stacked Dirichlet process mixture model can be used to automatically learn a motion histogram codebook and the set of ego-action categories. We quantitatively evaluate our approach on both in-house and public YouTube videos and demonstrate robust ego-action categorization across several sports genres. Comparative analysis shows that our approach outperforms other state-of-the-art topic models with respect to both classification accuracy and computational speed. Preliminary results indicate that on average, the categorical content of a 10 minute video sequence can be indexed in under 5 seconds.
Keywords :
gesture recognition; image classification; image sequences; sport; unsupervised learning; video cameras; video recording; YouTube videos; classification accuracy; first-person sports video recording; labeled training videos; motion histogram codebook; motion-based histograms; portable high-quality sports cameras; stacked Dirichlet process mixture model; unsupervised ego-action learning; video indexing; video retrieval; video sequence; Cameras; Feature extraction; Histograms; Humans; Robustness; Sensors; Videos;
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
Conference_Location :
Providence, RI
Print_ISBN :
978-1-4577-0394-2
DOI :
10.1109/CVPR.2011.5995406