DocumentCode :
1433339
Title :
Finding Celebrities in Billions of Web Images
Author :
Zhang, Xiao ; Zhang, Lei ; Wang, Xin-Jing ; Shum, Heung-Yeung
Author_Institution :
Inst. of Adv. Study, Tsinghua Univ., Beijing, China
Volume :
14
Issue :
4
fYear :
2012
Firstpage :
995
Lastpage :
1007
Abstract :
In this paper, we present a face annotation system to automatically collect and label celebrity faces from the web. With the proposed system, we have constructed a large-scale dataset called “Celebrities on the Web,” which contains 2.45 million distinct images of 421 436 celebrities and is orders of magnitude larger than previous datasets. Collecting and labeling such a large-scale dataset pose great challenges on current multimedia mining methods. In this work, a two-step face annotation approach is proposed to accomplish this task. In the first step, an image annotation system is proposed to label an input image with a list of celebrities. To utilize the noisy textual data, we construct a large-scale celebrity name vocabulary to identify candidate names from the surrounding text. Moreover, we expand the scope of analysis to the surrounding text of webpages hosting near-duplicates of the input image. In the second step, the celebrity names are assigned to the faces by label propagation on a facial similarity graph. To cope with the large variance in the facial appearances, a context likelihood is proposed to constrain the name assignment process. In an evaluation on 21 735 faces, both the image annotation system and name assignment algorithm significantly outperform previous techniques.
Keywords :
Internet; data mining; face recognition; graph theory; image retrieval; multimedia systems; text analysis; vocabulary; Celebrities on the Web dataset; Web images; Web pages; automatic celebrity face collection; automatic celebrity face labeling; candidate name identification; celebrity name assignment; context likelihood; facial appearances; facial similarity graph; image annotation system; large-scale celebrity name vocabulary; large-scale dataset; multimedia mining methods; name assignment process; near-duplicate input image; noisy textual data; text analysis; two-step face annotation approach; Databases; Electronic mail; Face; Face recognition; Labeling; Multimedia communication; Visualization; Face recognition; image annotation; image database; image retrieval;
fLanguage :
English
Journal_Title :
Multimedia, IEEE Transactions on
Publisher :
ieee
ISSN :
1520-9210
Type :
jour
DOI :
10.1109/TMM.2012.2186121
Filename :
6140979
Link To Document :
بازگشت