DocumentCode :
2988317
Title :
Structural and visual similarity learning for Web page archiving
Author :
Law, Marc T. ; Gutierrez, C.S. ; Thome, Nicolas ; Gancarski, Stephane ; Cord, Matthieu
Author_Institution :
LIP6, Sorbonne Univ., Paris, France
fYear :
2012
fDate :
27-29 June 2012
Firstpage :
1
Lastpage :
6
Abstract :
We present in this paper a Web page archiving approach combining image and structural techniques. Our main goal is to learn a similarity between Web pages in order to detect whether successive versions of pages are similar or not. Our system is based on a visual similarity measure designed for Web pages. Combined with a structural analysis of Web page source codes, a supervised feature selection method adapted to Web archiving is proposed. Experiments on real Web archives are reported including scalability issues.
Keywords :
Internet; information retrieval systems; Web page archiving; Web page source code; image technique; scalability issue; structural similarity learning; supervised feature selection method; visual similarity learning; Accuracy; Color; Feature extraction; Image color analysis; Rendering (computer graphics); Visualization; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Content-Based Multimedia Indexing (CBMI), 2012 10th International Workshop on
Conference_Location :
Annecy
ISSN :
1949-3983
Print_ISBN :
978-1-4673-2368-0
Electronic_ISBN :
1949-3983
Type :
conf
DOI :
10.1109/CBMI.2012.6269849
Filename :
6269849
Link To Document :
بازگشت