DocumentCode :
2954713
Title :
Automatic construction of an action video shot database using web videos
Author :
Nga, Do Hang ; Yanai, Keiji
Author_Institution :
Dept. of Inf., Univ. of Electro-Commun., Chofu, Japan
fYear :
2011
fDate :
6-13 Nov. 2011
Firstpage :
527
Lastpage :
534
Abstract :
There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to specific actions with just only providing action keywords such as “walking” and “eating”. The proposed method consists of three steps: (1) tag-based video selection, (2) segmenting videos into shots and extracting features from the shots, and (3) visual-feature-based video shot selection with tag-based scores taken into account. Firstly, we gather video IDs and tag lists for 1000 Web videos corresponding to given keywords via Web API, and we calculate tag relevance scores for each video using a tag-co-occurrence dictionary which is constructed in advance. Secondly, we fetch the top 200 videos from the Web in the descending order of the tag relevance scores, and segment each downloaded video into several shots. From each shot we extract spatio-temporal features, global motion features and appearance features, and convert them into the bag-of-features representation. Finally, we apply the VisualRank method to select the video shots which describe the actions corresponding to the given keywords best after calculating a similarity matrix between video shots. In the experiments, we achieved the 49.5% precision at 100 shots over six kinds of human actions by just providing keywords without any supervision. In addition, we made large-scale experiments on 100 kinds of action keywords.
Keywords :
feature extraction; image motion analysis; image segmentation; matrix algebra; video retrieval; video signal processing; VisualRank method; Web API; Web videos; action keyword; action video shot database; appearance feature; bag-of-features representation; feature extraction; global motion feature; similarity matrix; spatio-temporal feature; tag-based score; tag-based video selection; tag-cooccurrence dictionary; video segmentation; visual-feature-based video shot selection; Databases; Dictionaries; Feature extraction; Humans; Vectors; Visualization; YouTube;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision (ICCV), 2011 IEEE International Conference on
Conference_Location :
Barcelona
ISSN :
1550-5499
Print_ISBN :
978-1-4577-1101-5
Type :
conf
DOI :
10.1109/ICCV.2011.6126284
Filename :
6126284
Link To Document :
بازگشت