• DocumentCode
    3530889
  • Title

    CoPhIR Image Collection under the Microscope

  • Author

    Batko, Michal ; Kohoutkova, Petra ; Novak, David

  • Author_Institution
    Fac. of Inf., Masaryk Univ., Brno, Czech Republic
  • fYear
    2009
  • fDate
    29-30 Aug. 2009
  • Firstpage
    47
  • Lastpage
    54
  • Abstract
    The content-based photo image retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we analyze this dataset focusing on 1) efficiency of similarity-based indexing and searching and on 2) expressiveness of combination of the descriptors with respect to subjective perception of visual similarity. We treat the descriptors as metric spaces and then combine them into a multi-metric space. We analyze distance distributions of individual descriptors, measure intrinsic dimensionality of these datasets and statistically evaluate correlation between these descriptors. Further, we use two methods to assess subjective accuracy and satisfaction of similarity retrieval based on a combination of descriptors that is recommended for CoPhIR, and we compare these results on databases of 10 and 100 million CoPhIR images. Finally, we suggest, explore and evaluate two approaches to improve the accuracy: 1) applying logarithms in order to weaken influence of a single descriptor contribution if it deviates from the rest, and 2) the possibility of categorization of the dataset and identifying visual characteristics important for individual categories.
  • Keywords
    content-based retrieval; database indexing; feature extraction; image retrieval; photography; visual databases; CoPhIR image collection; Flickr photo-sharing system; MPEG-7 global descriptor; content-based photo image retrieval; digital image database; feature extraction; similarity-based indexing; statistical analysis; visual characteristics; Content based retrieval; Data analysis; Digital images; Extraterrestrial measurements; Image databases; Image retrieval; Information retrieval; MPEG 7 Standard; Microscopy; Visual databases; CoPhIR dataset; MPEG-7; dataset analysis; metric space; visual descriptors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Similarity Search and Applications, 2009. SISAP '09. Second International Workshop on
  • Conference_Location
    Prague
  • Print_ISBN
    978-0-7695-3765-8
  • Type

    conf

  • DOI
    10.1109/SISAP.2009.25
  • Filename
    5271953