DocumentCode :
3333984
Title :
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
Author :
Das, Pritam ; Chenliang Xu ; Doell, Richard F. ; Corso, Jason J.
Author_Institution :
Comput. Sci. & Eng., SUNY at Buffalo, Buffalo, NY, USA
fYear :
2013
fDate :
23-28 June 2013
Firstpage :
2634
Lastpage :
2641
Abstract :
The problem of describing images through natural language has gained importance in the computer vision community. Solutions to image description have either focused on a top-down approach of generating language through combinations of object detections and language models or bottom-up propagation of keyword tags from training images to test images through probabilistic or nearest neighbor techniques. In contrast, describing videos with natural language is a less studied problem. In this paper, we combine ideas from the bottom-up and top-down approaches to image description and propose a method for video description that captures the most relevant contents of a video in a natural language description. We propose a hybrid system consisting of a low level multimodal latent topic model for initial keyword annotation, a middle level of concept detectors and a high level module to produce final lingual descriptions. We compare the results of our system to human descriptions in both short and long forms on two datasets, and demonstrate that final system output has greater agreement with the human descriptions than any single level.
Keywords :
natural language processing; object detection; video signal processing; computer vision; concept detectors; human descriptions; image description; initial keyword annotation; language models; low level multimodal latent topic model; natural language description; object detection; sparse object stitching; video description; videos lingual description; Detectors; Natural languages; Predictive models; Semantics; Training; Videos; Visualization; multimodal topic model; natural language; video to text; video understanding;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on
Conference_Location :
Portland, OR
ISSN :
1063-6919
Type :
conf
DOI :
10.1109/CVPR.2013.340
Filename :
6619184
Link To Document :
بازگشت