• DocumentCode
    629089
  • Title

    Retina enhanced SIFT descriptors for video indexing

  • Author

    Strat, Sabin Tiberius ; Benoit, A. ; Lambert, Peter

  • Author_Institution
    LISTIC, Univ. de Savoie Annecy Le Vieux, Annecy, France
  • fYear
    2013
  • fDate
    17-19 June 2013
  • Firstpage
    201
  • Lastpage
    206
  • Abstract
    This paper investigates how the detection of diverse high-level semantic concepts (objects, actions, scene types, persons etc.) in videos can be improved by applying a model of the human retina. A large part of the current approaches for Content-Based Image/Video Retrieval (CBIR/CBVR) relies on the Bag-of-Words (BoW) model, which has shown to perform well especially for object recognition in static images. Nevertheless, the current state-of-the-art framework shows its limits when applied to videos because of the added temporal information. In this paper, we enhance a BoW model based on the classical SIFT local spatial descriptor, by preprocessing videos with a model of the human retina. This retinal preprocessing allows the SIFT descriptor to become aware of temporal information. Our proposed descriptors extend the SIFT genericity to spatio-temporal content, making them interesting for generic video indexing. They also benefit from the retinal spatio-temporal “robustness” to various disturbances such as noise, compression artifacts, luminance variations or shadows. The proposed approaches are evaluated on the TRECVID 2012 Semantic Indexing task dataset.
  • Keywords
    content-based retrieval; indexing; transforms; video retrieval; video signal processing; BoW model; CBIR-CBVR; SIFT genericity; TRECVID 2012 Semantic Indexing task dataset; bag-of-words model; classical SIFT local spatial descriptor; compression artifacts; content-based image retrieval; content-based video retrieval; generic video indexing; high-level semantic concept detection; human retina model; luminance variations; noise; retina enhanced SIFT descriptors; retinal preprocessing; retinal spatio-temporal robustness; shadows; spatio-temporal content; temporal information; video preprocessing; Feature extraction; Indexing; Noise; Retina; Semantics; Transient analysis; Visualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Content-Based Multimedia Indexing (CBMI), 2013 11th International Workshop on
  • Conference_Location
    Veszprem
  • ISSN
    1949-3983
  • Print_ISBN
    978-1-4799-0955-1
  • Type

    conf

  • DOI
    10.1109/CBMI.2013.6576582
  • Filename
    6576582