DocumentCode :
51621
Title :
Partial-Duplicate Clustering and Visual Pattern Discovery on Web Scale Image Database
Author :
Wei Li ; Changhu Wang ; Lei Zhang ; Yong Rui ; Bo Zhang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
17
Issue :
7
fYear :
2015
fDate :
Jul-15
Firstpage :
967
Lastpage :
980
Abstract :
In this paper, we study the problem of discovering visual patterns and partial-duplicate images, which is fundamental to visual concept representation and image parsing, but very challenging when the database is extremely large, such as billions of images indexed by a commercial search engine. Although extensive research with sophisticated algorithms has been conducted for either partial-duplicate clustering or visual pattern discovery, most of them can not be easily extended to this scale, since both are clustering problems in nature and require pairwise comparisons. To tackle this computational challenge, we introduce a novel and highly parallelizable framework to discover partial-duplicate images and visual patterns in a unified way in distributed computing systems. We emphasize the nested property of local features, and propose the generalized nested feature (GNF) as a mid-level representation for regions and local patterns. Initial coarse clusters are then discovered by GNFs, upon which n-gram GNF is defined to represent co-occurrent visual patterns. After that, efficient merging and refining algorithms are used to get the partial-duplicate clusters, and logical combinations of probabilistic GNF models are leveraged to represent the visual patterns of partially duplicate images. Extensive experiments show the parallelizable property and effectiveness of the algorithms on both partial-duplicate clustering and visual pattern discovery. With 2000 machines, it costs about eight and 400 minutes to process one million and 40 million images respectively, which is quite efficient compared to previous methods.
Keywords :
feature extraction; image representation; parallel processing; pattern clustering; visual databases; GNF; Web scale image database; distributed computing systems; generalized nested feature; image parsing; merging algorithm; parallelizable framework; parallelizable property; partial-duplicate image clustering; refining algorithm; search engine; visual concept representation; visual pattern discovery; Clustering algorithms; Databases; Feature extraction; Merging; Pattern matching; Semantics; Visualization; Local features; parallel algorithms; partial - duplicate images; visual patterns;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2015.2428996
Filename :
7100925
Link To Document :
بازگشت