• DocumentCode
    1214115
  • Title

    A multiresolution manifold distance for invariant image similarity

  • Author

    Vasconcelos, Nuno ; Lippman, Andrew

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of California, La Jolla, CA, USA
  • Volume
    7
  • Issue
    1
  • fYear
    2005
  • Firstpage
    127
  • Lastpage
    142
  • Abstract
    Accounting for spatial image transformations is a requirement for multimedia problems such as video classification and retrieval, face/object recognition or the creation of image mosaics from video sequences. We analyze a transformation invariant metric recently proposed in the machine learning literature to measure the distance between image manifolds - the tangent distance (TD) - and show that it is closely related to alignment techniques from the motion analysis literature. Exposing these relationships results in benefits for the two domains. On one hand, it allows leveraging on the knowledge acquired in the alignment literature to build better classifiers. On the other, it provides a new interpretation of alignment techniques as one component of a decomposition that has interesting properties for the classification of video. In particular, we embed the TD into a multiresolution framework that makes it significantly less prone to local minima. The new metric - multiresolution tangent distance (MRTD) - can be easily combined with robust estimation procedures, and exhibits significantly higher invariance to image transformations than the TD and the Euclidean distance (ED). For classification, this translates into significant improvements in face recognition accuracy. For video characterization, it leads to a decomposition of image dissimilarity into "differences due to camera motion" plus "differences due to scene activity" that is useful for classification. Experimental results on a movie database indicate that the distance could be used as a basis for the extraction of semantic primitives such as action and romance.
  • Keywords
    computational geometry; estimation theory; face recognition; feature extraction; image classification; image motion analysis; image retrieval; image segmentation; image sequences; object recognition; video signal processing; Euclidean distance; affine transformations; face recognition; image decomposition; image mosaics; image motion analysis; invariant image similarity; machine learning; movie database; multiresolution manifold distance; multiresolution robust estimation procedures; object recognition; semantic movie classification; spatial image transformations; tangent distance; video characterization; video classification; video sequences; Image analysis; Image motion analysis; Image resolution; Image retrieval; Machine learning; Manifolds; Motion measurement; Object recognition; Spatial resolution; Video sequences;
  • fLanguage
    English
  • Journal_Title
    Multimedia, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1520-9210
  • Type

    jour

  • DOI
    10.1109/TMM.2004.840596
  • Filename
    1386248