Title of article :
Object Level Grouping for Video Shots
Author/Authors :
JOSEF SIVIC ، نويسنده , , FREDERIK SCHAFFALITZKY AND ANDREW ZISSERMAN، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2006
Abstract :
We describe a method for automatically obtaining object representations suitable for retrieval from
generic video shots. The object representation consists of an association of frame regions. These regions provide
exemplars of the object’s possible visual appearances.
Two ideas are developed: (i) associating regions within a single shot to represent a deforming object; (ii) associating
regions from the multiple visual aspects of a 3D object, thereby implicitly representing 3D structure. For the
association we exploit temporal continuity (tracking) and wide baseline matching of affine covariant regions.
In the implementation there are three areas of novelty: First, we describe a method to repair short gaps in tracks.
Second, we show how to join tracks across occlusions (where many tracks terminate simultaneously). Third, we
develop an affine factorization method that copes with motion degeneracy.
We obtain tracks that last throughout the shot, without requiring a 3D reconstruction. The factorization method is
used to associate tracks into object-level groups, with common motion. The outcome is that separate parts of an
object that are not simultaneously visible (such as the front and back of a car, or the front and side of a face) are
associated together. In turn this enables object-level matching and recognition throughout a video.
We illustrate the method on the feature film “Groundhog Day.” Examples are given for the retrieval of deforming
objects (heads, walking people) and rigid objects (vehicles, locations).
Keywords :
independent motion segmentation , 3D object retrieval in videos , tracking affine covariant regions , robust affine factorization
Journal title :
INTERNATIONAL JOURNAL OF COMPUTER VISION
Journal title :
INTERNATIONAL JOURNAL OF COMPUTER VISION