Title : 
Less Is More: Video Trimming for Action Recognition
         
        
            Author : 
Antic, Borislav ; Milbich, Timo ; Ommer, Bjorn
         
        
            Author_Institution : 
Heidelberg Collaboratory for Image Process., Univ. of Heidelberg, Heidelberg, Germany
         
        
        
        
        
        
            Abstract : 
Action recognition is an important precursor for understanding human activities in videos. The current paradigm of action recognition is to classify a video sequence as a whole. However, actions usually occur only in part of a video sequence, rendering the rest of the video irrelevant for action recognition. In this paper, we propose a method for learning a subsequence classifier which can detect and classify part of a video that corresponds to the action. The subsequence classifier is trained from weakly labeled training videos whose subsequence labels are not provided, but need to be inferred during learning. We use the framework of multiple instance learning to solve two problems jointly: i) find the action subsequences in training videos, ii) train the subsequence classifier using the inferred action subsequences. To obtain a robust solution to the MIL problem, we propose a sequential algorithm that consecutively decreases the number of inferred action subsequences per video and trims their length until only one short subsequence is used as the action representative in each video. We evaluate the combination of the automatically trained subsequence classifier and the full sequence classifier on the very challenging Hollywood2 benchmark set and observe a significant gain in the performance over the baseline full sequence classifier. Moreover, a favorable performance of the subsequence classifier for temporal localization of actions in videos is evidenced on two categories of the Hollywood2 dataset.
         
        
            Keywords : 
image classification; image motion analysis; image sequences; learning (artificial intelligence); object detection; object recognition; video signal processing; Hollywood2 benchmark set; MIL problem; action recognition; baseline full sequence classifier; human activities understanding; multiple instance learning framework; subsequence classifier learning; subsequence labels; video detection; video sequence classification; video trimming; weakly labeled training videos; Detectors; Histograms; Kernel; Spatiotemporal phenomena; Support vector machines; Training; Video sequences; action recognition; multiple instance learning; video analysis;
         
        
        
        
            Conference_Titel : 
Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on
         
        
            Conference_Location : 
Sydney, NSW
         
        
        
            DOI : 
10.1109/ICCVW.2013.73