مرکز منطقه ای اطلاع رساني علوم و فناوري - Image Phylogeny by Minimal Spanning Trees

Abstract :

Nowadays, digital content is widespread and also easily redistributable, either lawfully or unlawfully. Images and other digital content can also mutate as they spread out. For example, after images are posted on the Internet, other users can copy, resize and/or re-encode them and then repost their versions, thereby generating similar but not identical copies. While it is straightforward to detect exact image duplicates, this is not the case for slightly modified versions. In the last decade, some researchers have successfully focused on the design and deployment of near-duplicate detection and recognition systems to identify the cohabiting versions of a given document in the wild. Those efforts notwithstanding, only recently have there been the first attempts to go beyond the detection of near-duplicates to find the structure of evolution within a set of images. In this paper, we tackle and formally define the problem of identifying these image relationships within a set of near-duplicate images, what we call Image Phylogeny Tree (IPT), due to its natural analogy with biological systems. The mechanism of building IPTs aims at finding the structure of transformations and their parameters if necessary, among a near-duplicate image set, and has immediate applications in security and law-enforcement, forensics, copyright enforcement, and news tracking services. We devise a method for calculating an asymmetric dissimilarity matrix from a set of near-duplicate images and formally introduce an efficient algorithm to build IPTs from such a matrix. We validate our approach with more than 625000 test cases, including both synthetic and real data, and show that when using an appropriate dissimilarity function we can obtain good IPT reconstruction even when some pieces of information are missing. We also evaluate our solution when there are more than one near-duplicate sets in the pool of analysis and compare to other recent related approaches in the literature.

Keywords :

Internet; image processing; trees (mathematics); IPT reconstruction; Internet; biological systems; digital content; identical copies; image phylogeny tree; minimal spanning trees; History; Image edge detection; Internet; Phylogeny; Vegetation; Visualization; Watermarking; Image dependencies; image phylogeny; image phylogeny tree; image´s ancestry relationships; near-duplicate detection and recognition; near-duplicates kinship;