Title :
Online multimodal matrix factorization for human action video indexing
Author :
Paez, Fabian ; Vanegas, Jorge A. ; Gonzalez, Fabio A.
Author_Institution :
MindLAB Res. Group, Univ. Nac. de Colombia, Bogota, Colombia
Abstract :
This paper addresses the problem of searching for videos containing instances of specific human actions. The proposed strategy builds a multimodal latent space representation where both visual content and annotations are simultaneously mapped. The hypothesis behind the method is that such a latent space yields better results when built from multiple data modalities. The semantic embedding is learned using matrix factorization through stochastic gradient descent, which makes it suitable to deal with large-scale collections. The method is evaluated on a large-scale human action video dataset with three modalities corresponding to action labels, action attributes and visual features. The evaluation is based on a query-by-example strategy, where a sample video is used as input to the system. A retrieved video is considered relevant if it contains an instance of the same human action present in the query. Experimental results show that the learned multimodal latent semantic representation produces improved performance when compared with an exclusively visual representation.
Keywords :
gradient methods; image representation; indexing; learning (artificial intelligence); matrix decomposition; query processing; video signal processing; action attributes; action labels; annotations; data modalities; human action video dataset; human action video indexing; latent space; matrix factorization; multimodal latent semantic representation; multimodal latent space representation; online multimodal matrix factorization; query-by-example strategy; semantic embedding; stochastic gradient descent; visual content; visual features; visual representation; Histograms; Indexing; Semantics; Training; Trajectory; Visualization; Matrix factorization; human actions; information retrieval; latent space; multimodal data; query by example; video processing;
Conference_Titel :
Content-Based Multimedia Indexing (CBMI), 2014 12th International Workshop on
Conference_Location :
Klagenfurt
DOI :
10.1109/CBMI.2014.6849823