DocumentCode
1992797
Title
Robust and Efficient Near-Duplicate Documents Detection on P2P Networks
Author
Zhang, Yan ; Dai, Xuebing ; Wang, Caojing ; Wei, Zhiqiang
Author_Institution
Ocean Univ. of China Qingdao, Qingdao, China
fYear
2012
fDate
27-30 May 2012
Firstpage
1
Lastpage
4
Abstract
Near-duplicate document detection has received much attention from both industrial and research areas during the recent years. Various methods have been proposed to identify near-duplicate documents that share same content with exception of a small fraction difference. Most of research work are discussed on a centralized system, and focus on the accuracy and efficiency of detection method. However, techniques for detecting near-duplicate documents in a distributed system are very much lacking. In this paper, we propose a Robust and Efficient Near-duplICate document detection on a P2P network (RENDIC), which improves the scalability and performance due to the inheritance of P2P network. Experimental results have demonstrated that our method Rendic achieves a higher effectiveness than that of the existing methods.
Keywords
content management; document handling; peer-to-peer computing; P2P network inheritance; RENDIC; centralized system; content sharing; distributed system; efficient near-duplicate document detection; fraction difference; robust near-duplicate document detection; Accuracy; Bandwidth; Feature extraction; Robustness; Routing; Vectors; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering and Technology (S-CET), 2012 Spring Congress on
Conference_Location
Xian
Print_ISBN
978-1-4577-1965-3
Type
conf
DOI
10.1109/SCET.2012.6342140
Filename
6342140
Link To Document