• DocumentCode
    3281237
  • Title

    Large-scale video event classification using dynamic temporal pyramid matching of visual semantics

  • Author

    Codella, Noel C. E. ; Gang Hua ; Liangliang Cao ; Merler, Michele ; Leiguang Gong ; Hill, Mark ; Smith, J.R.

  • Author_Institution
    Multimedia Res. Group, IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
  • fYear
    2013
  • fDate
    15-18 Sept. 2013
  • Firstpage
    2877
  • Lastpage
    2881
  • Abstract
    Video event classification and retrieval has recently emerged as a challenging research topic. In addition to the variation in appearance of visual content and the large scale of the collections to be analyzed, this domain presents new and unique challenges in the modeling of the explicit temporal structure and implicit temporal trends of content within the video events. In this study, we present a technique for video event classification that captures temporal information over semantics using a scalable and efficient modeling scheme. An architecture for partitioning videos into a linear temporal pyramid, using segments of equal length and segments determined by the patterns of the underlying data, is applied over a rich underlying semantic description at the frame level using a taxonomy of nearly 1000 concepts containing 500,000 training images. Forward model selection with data bagging is used to prune the space of temporal features and data for efficiency. The system is implemented in the Hadoop Map-Reduce environment for arbitrary scalability. Our method is applied to the TRECVID Multimedia Event Detection 2012 task. Results demonstrate a significant boost in performance of over 50%, in terms of mean average precision, compared to common max or average pooling, and 17.7% compared to more complex pooling strategies that ignore temporal content.
  • Keywords
    image classification; image matching; video signal processing; Hadoop Map-Reduction; TRECVID Multimedia Event Detection; dynamic temporal pyramid matching; large-scale video event classification; linear temporal pyramid; semantic description; visual content; visual semantics; event; pyramid; semantics; temporal; video;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Processing (ICIP), 2013 20th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/ICIP.2013.6738592
  • Filename
    6738592