• DocumentCode
    3406601
  • Title

    ARISTA - image search to annotation on billions of web photos

  • Author

    Wang, Xin-Jing ; Zhang, Lei ; Liu, Ming ; Li, Yi ; Ma, Wei-Ying

  • Author_Institution
    Microsoft Res. Asia, Beijing, China
  • fYear
    2010
  • fDate
    13-18 June 2010
  • Firstpage
    2987
  • Lastpage
    2994
  • Abstract
    Though it has cost great research efforts for decades, object recognition is still a challenging problem. Traditional methods based on machine learning or computer vision are still in the stage of tackling hundreds of object categories. In recent years, non-parametric approaches have demonstrated great success, which understand the content of an image by propagating labels of its similar images in a large-scale dataset. However, due to the limited dataset size and imperfect image crawling strategy, previous work can only address a biased small subset of image concepts. Here we introduce the Arista project, which aims to build a practical image annotation engine targeting at popular concepts in the real world. In this project, we are particularly interested in understanding how many image concepts can be addressed by the data-driven annotation approach (coverage) and how good the performance is (precision). This paper reports the first stage of the work. Two billions web images were indexed, and based on simple yet effective near-duplicate detection, the system is capable of automatically generating accurate tags for popular web images having near-duplicates in the database. We found that about 8.1% web images have more than ten near duplicate and the number increases to 28.5% for top images in search results. Further, based on random samples in the latter case, we observed the precision of 57.9% at the point of the highest recall of 28% on ground truth tags.
  • Keywords
    Internet; image retrieval; indexing; object recognition; very large databases; ARISTA; Arista project; Web images; Web photos; computer vision; data-driven annotation approach; image annotation engine; image concepts; image search; imperfect image crawling strategy; indexing; large-scale dataset; machine learning; near-duplicate detection; nonparametric approaches; object category; object recognition; Asia; Computer vision; Costs; Engines; Image databases; Image recognition; Large-scale systems; Machine learning; Object recognition; Surges;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
  • Conference_Location
    San Francisco, CA
  • ISSN
    1063-6919
  • Print_ISBN
    978-1-4244-6984-0
  • Type

    conf

  • DOI
    10.1109/CVPR.2010.5540046
  • Filename
    5540046