مرکز منطقه ای اطلاع رساني علوم و فناوري - Automatic construction of an action video shot database using web videos

DocumentCode :

2954713

Title :

Automatic construction of an action video shot database using web videos

Author :

Nga, Do Hang ; Yanai, Keiji

Author_Institution :

Dept. of Inf., Univ. of Electro-Commun., Chofu, Japan

fYear :

2011

fDate :

6-13 Nov. 2011

Firstpage :

527

Lastpage :

534

Abstract :

There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to specific actions with just only providing action keywords such as “walking” and “eating”. The proposed method consists of three steps: (1) tag-based video selection, (2) segmenting videos into shots and extracting features from the shots, and (3) visual-feature-based video shot selection with tag-based scores taken into account. Firstly, we gather video IDs and tag lists for 1000 Web videos corresponding to given keywords via Web API, and we calculate tag relevance scores for each video using a tag-co-occurrence dictionary which is constructed in advance. Secondly, we fetch the top 200 videos from the Web in the descending order of the tag relevance scores, and segment each downloaded video into several shots. From each shot we extract spatio-temporal features, global motion features and appearance features, and convert them into the bag-of-features representation. Finally, we apply the VisualRank method to select the video shots which describe the actions corresponding to the given keywords best after calculating a similarity matrix between video shots. In the experiments, we achieved the 49.5% precision at 100 shots over six kinds of human actions by just providing keywords without any supervision. In addition, we made large-scale experiments on 100 kinds of action keywords.

Keywords :

feature extraction; image motion analysis; image segmentation; matrix algebra; video retrieval; video signal processing; VisualRank method; Web API; Web videos; action keyword; action video shot database; appearance feature; bag-of-features representation; feature extraction; global motion feature; similarity matrix; spatio-temporal feature; tag-based score; tag-based video selection; tag-cooccurrence dictionary; video segmentation; visual-feature-based video shot selection; Databases; Dictionaries; Feature extraction; Humans; Vectors; Visualization; YouTube;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision (ICCV), 2011 IEEE International Conference on

Conference_Location :

Barcelona

ISSN :

1550-5499

Print_ISBN :

978-1-4577-1101-5

Type :

conf

DOI :

10.1109/ICCV.2011.6126284

Filename :

6126284

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2954713