• DocumentCode
    2954713
  • Title

    Automatic construction of an action video shot database using web videos

  • Author

    Nga, Do Hang ; Yanai, Keiji

  • Author_Institution
    Dept. of Inf., Univ. of Electro-Commun., Chofu, Japan
  • fYear
    2011
  • fDate
    6-13 Nov. 2011
  • Firstpage
    527
  • Lastpage
    534
  • Abstract
    There are a huge number of videos with text tags on the Web nowadays. In this paper, we propose a method of automatically extracting from Web videos video shots corresponding to specific actions with just only providing action keywords such as “walking” and “eating”. The proposed method consists of three steps: (1) tag-based video selection, (2) segmenting videos into shots and extracting features from the shots, and (3) visual-feature-based video shot selection with tag-based scores taken into account. Firstly, we gather video IDs and tag lists for 1000 Web videos corresponding to given keywords via Web API, and we calculate tag relevance scores for each video using a tag-co-occurrence dictionary which is constructed in advance. Secondly, we fetch the top 200 videos from the Web in the descending order of the tag relevance scores, and segment each downloaded video into several shots. From each shot we extract spatio-temporal features, global motion features and appearance features, and convert them into the bag-of-features representation. Finally, we apply the VisualRank method to select the video shots which describe the actions corresponding to the given keywords best after calculating a similarity matrix between video shots. In the experiments, we achieved the 49.5% precision at 100 shots over six kinds of human actions by just providing keywords without any supervision. In addition, we made large-scale experiments on 100 kinds of action keywords.
  • Keywords
    feature extraction; image motion analysis; image segmentation; matrix algebra; video retrieval; video signal processing; VisualRank method; Web API; Web videos; action keyword; action video shot database; appearance feature; bag-of-features representation; feature extraction; global motion feature; similarity matrix; spatio-temporal feature; tag-based score; tag-based video selection; tag-cooccurrence dictionary; video segmentation; visual-feature-based video shot selection; Databases; Dictionaries; Feature extraction; Humans; Vectors; Visualization; YouTube;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-5499
  • Print_ISBN
    978-1-4577-1101-5
  • Type

    conf

  • DOI
    10.1109/ICCV.2011.6126284
  • Filename
    6126284